# Faro Shuffle retention

This post is about faro shuffles. A Faro shuffle is a perfect riffle shuffle of two halves of a deck of cards so that each card from each half is interlaced perfectly.
There are two kinds of Faro shuffles, the “in” and “out” Faro shuffle. We only consider the “out” Faro shuffle of even number of cards in our experiment, read more about it http://en.wikipedia.org/wiki/Faro_shuffle.

The interesting fact that I wish to visualize is that shuffling an ordinary stack of cards with consecutive (out-) faro shuffles require 8 shuffles to retain the original order of the stack. As a magician I always thought this was very interesting and I wanted to visualize how many shuffles that are needed for different stack sizes. So, I wrote a simple shuffle code in MATLAB.

The first thought I had was to use a transpose idea (using reshape) in Mathematica this could be written as

Flatten[Transpose[{deck[[firsthalf]], deck[[secondhalf]]}]]

where the vectors firsthalf and secondhalf are defined as the indices for the halves. We can also use Riffle directly ofcourse, but I wanted to use MATLAB and the transpose idea translated to MATLAB in a nice way. Let A = 1:52, we can write

A = [A(1:end/2), A(end/2+1:end)]’; (1)

given the number of elements in A is even.

The riffled vector is aquired after we flatten the matrix with A(:)

### Optimization

The way the temporary vectors in (1) is temporarily stored is not the fastest way to do this. I found that using reshape is actually faster, like

A = reshape(A, numel(A)/2, 2)’; (2)

As a comparison, the program took 10 seconds using (1) versus 6 seconds for (2). The reshape code was actually my first version of the riffle code.

On a side note, the first code (1) is slow, but with a very simple modification it could be made slightly faster by considering the transposed vector. If we use A as a row vector, the code will become about 20% faster. This is not much at all, but it is kind of interesting, why is this?

Consider the following timings

N_max=5000;
tic; for N=2:2:N_max; A = (1:N)’; B = [A(1:end/2), A(end/2+1:end)]’; B=B(:); end; toc

gives 3.13 seconds and

tic; for N=2:2:N_max; A = (1:N); B = [A(1:end/2); A(end/2+1:end)]; B=B(:); end; toc

gives 2.64 seconds.

The reason for the difference in timing is because of the extra transposes. If we
add a simple transpose to the faster code, we will tangent the performance of the slower. The shuffling (heh) of data using transposes is not very efficient as can be seen in a future post on my masters report on FFT (add link).

Just to reiterate, the final version of the riffle code is:

A = reshape(A, numel(A)/2, 2)’;
A = A(:);

I ran the code for 52*52 cards. I let the inner loop just faro shuffle over and over until the cards are in order again, and then I break and store the number of shuffles.

The visualization of the required shuffles can be seen below

An interesting point here is that a deck of 52 cards require 8 shuffles, compared to a deck consisting of 64 cards, which requires only 6 shuffles!

Now, we need to shuffle more cards, but it is only 12 cards extra, 6 cards in each pile. That is 6*6=36 “extra” cards to shuffle. I regard one card shuffle the interweaving of two cards. For practical reasons we let these 12 extra cards be anonymous.

So, in a sense we get away with doing less work by adding more work. The reason the 64 card deck is special is because it only has 2 as prime factors (2^6). For a 52 card deck we have 52 = 2^2*13 for instance.

This is -at first glance- counter-intuitive, but it is the same with the Fourier transform. If we wish to transform a near-prime signal we would be better off transforming the nearest $$2^n$$ signal instead. This seems like much more work, but because of arithmetic, we actually do less work.

As the last figure, I show how the number of shuffles depend on size of stack. We can see a linear “worst case scenario” for some sizes, which is interesting.