Search This Blog

Thursday, April 9, 2015

Constructing the Shuffle Algorithm

I recently needed to perform a shuffle operation on a set.  Shuffling seems very easy, but it actually can be difficult and fraught with errors.  Here's some info on how I went from the start to the finish of the development of my shuffle algorithm.

Like shuffling a deck of cards, a shuffle operation does not change the frequency of the values found in the set or list.  The shuffle operation can apply to a set or a list.  Recalling that a set consists of only unique values (no repeats) and a list consists of values and so, it can contain duplicates.  No matter what the source, the algorithm is the same.

The algorithm requires a random number generator (I will use Java):

Random random=new Random(System.currentTimeMillis());
int ranvalue=0;
int ranloc=0;
for(ranloc=0; ranloc<1000; ranloc++) {
  ranvalue=random.nextInt(1000);
}

I always set up a number generator as I show above, to properly seed the generator and then "spin" it 1000 times to get the random number generator properly set up.

Here is the basic starting setup and algorithm.
List<String> sourceList = new ArrayList<String>();
populate sourceList from the source.
List<String> shuffleList = new ArrayList<String>();
add the values from the sourceList to the shuffleList in a shuffled order.

This method:
shuffleList.addAll(sourceList)
would not be a good idea, the data are copied in the same order.

Using this method:

int numSource=sourceList.size();
for(ranloc=0; ranloc<numSource; ranloc++) {
  ranvalue=random.nextInt(numSource);
  shuffleList.add(sourceList.get(ranvalue);
}

This method would randomize the values, but depending on the size of the source list, there is a high likelyhood of the shuffleList containing the source data in different frequencies, as there is a high probablility of the random.nextInt call returning the same value in two calls. This method is flawed.

You could add a list of used values:

int numSource=sourceList.size();
List<Integer> usedLocations=new ArrayList<Integer>();
Integer usedLocation=null;
for(ranloc=0; ranloc<numSource; ranloc++) {
  ranvalue=random.nextInt(numSource);
  usedLocation=new Integer(ranvalue);
  while(usedLocations.contains(usedLocation) {
    ranvalue=random.nextInt(numSource);
    usedLocation=new Integer(ranvalue);
  }
  shuffleList.add(sourceList.get(ranvalue);
  usedLocations.add(usedLocation);
}

This will work, but can get into a very long spin of the while loop as the shuffleList fills up, looking for that one value that has not been filled in.

The better way is to use a use and delete method:

int numSource=sourceList.size();
List<Integer> unusedLocations=new ArrayList<Integer>();
Integer usedLocation=null;
int numLocations=numSource;
for(ranloc=0; ranloc<numSource; ranloc++) {
  usedLocation=new Integer(ranloc);
  unusedLocations.add(usedLocation);
}
for(ranloc=0; ranloc<numSource; ranloc++) {
  ranvalue=random.nextInt(numLocations);
  usedLocation=unusedLocations.get(ranvalue);
  ranvalue=usedLocations.intValue();
  shuffleList.add(sourceList.get(ranvalue);
  usedLocations.remove(usedLocation);
  numLocations=unusedLocations.size();
}

This method works properly and has a good speed.
Happy Shuffling!

No comments:

Post a Comment