PetaSort

Sorting a Petabyte of data

Name:
Location: Sanford, NC, United States

Avid Conservative

Friday, November 11, 2005

Sorting a Petabyte of data

Computer hardware has come a long way over the past generation. I now have at home a computer that can sort a petabyte of data in about a week. I say this somewhat with tongue-in-cheek because I am only emulating the input and output. All that is required is to create the index because a file in memory can be in any order. I also take a shortcut in only reviewing the first four of one hundred bytes, which statistically, only requires slight rearangement to get to fully sorted.

I am using a realistic size of million records at a time. I am not sorting an entire petabyte of data as one unit but rather a 1,000,000 records of 100 bytes, 10,000 times. I display the statistics, then repeat the process 1000 times. Do the math, that is 1,000,000 X 100 X 10,000 X 1000 which equals 1,000,000,000,000,000 bytes or a petabyte. If you do all this in memory on a three megahertz computer with plenty of fast internal memory, it takes about a week.

The purpose of this blog is to explore the simplest and most elegant sorting solution for the future.