Sorting a Petabyte of data
Computer hardware has come a long way over the past generation. I now have at home a computer that can sort a petabyte of data in about a week. I say this somewhat with tongue-in-cheek because I am only emulating the input and output. All that is required is to create the index because a file in memory can be in any order. I also take a shortcut in only reviewing the first four of one hundred bytes, which statistically, only requires slight rearangement to get to fully sorted.
I am using a realistic size of million records at a time. I am not sorting an entire petabyte of data as one unit but rather a 1,000,000 records of 100 bytes, 10,000 times. I display the statistics, then repeat the process 1000 times. Do the math, that is 1,000,000 X 100 X 10,000 X 1000 which equals 1,000,000,000,000,000 bytes or a petabyte. If you do all this in memory on a three megahertz computer with plenty of fast internal memory, it takes about a week.
The purpose of this blog is to explore the simplest and most elegant sorting solution for the future.
I am using a realistic size of million records at a time. I am not sorting an entire petabyte of data as one unit but rather a 1,000,000 records of 100 bytes, 10,000 times. I display the statistics, then repeat the process 1000 times. Do the math, that is 1,000,000 X 100 X 10,000 X 1000 which equals 1,000,000,000,000,000 bytes or a petabyte. If you do all this in memory on a three megahertz computer with plenty of fast internal memory, it takes about a week.
The purpose of this blog is to explore the simplest and most elegant sorting solution for the future.
