Create a program that will sort a file stored on a computer hard drive. Assume that the input file is stored in the current directory and named data.txt The file data.txt is a text file. Each row consists of two 63-bit unsigned values - a key and a value. In the sorted output file data.out the keys should be sorted in increasing order and for each key only the minimum value associated with the key should be output.
Use the unix program /afs/ms.mff.cuni.cz/u/b/babkm5am/ds1/hw1/gen-data available at Rotunda laboratory to generate the input file data.txt. Just redirect the standard output to the file data.txt since the generator outputs the data to the standard output. Then measure the running time of your solution using one of the computers u-pl0 to u-pl15. For measuring the time use the time utility.
The generated input file data.txt is about 45 GB large and the time limit for sorting is 35 minutes. A solution which does not finish within the time limit is considered to be incorrect. We will also test your program on different but similar data.
When running your program, save the input file data.txt, the outputfile data.out and all temporary files into /tmp. After finishing your work delete all your files in /tmp so that others can run their programs as well. In particular, do not forget to remove the input and output files. If you do not delete your files the machine may become unusable for others as others may not have sufficient access rights to delete your files.
Be considerate. Always check whether someone else is not using a particular computer. Use commands who and top or htop. If you see that someone else is performing computations on a given machine use another one or wait until he or she is done. Please report any bad behaviour or a unusable computer to ds1@kam.mff.cuni.cz. The lab administrator can clear /tmp directory, if necessary.
Submit your solution before the deadline to ds1@kam.mff.cuni.cz along with the details specified below.
generates the whole input data, roughly 45 GB.
$/afs/ms.mff.cuni.cz/u/b/babkm5am/ds1/hw1/gen-data -s XX --short >/tmp/data.txtgenerates the data for debugging your program, roughly 3 GB.
The XX are the two last digits of your student ID.
When submitting using the email report us the measured time in the following format.
Subject should be 'HW1-55973318'. (i.e. HW1-<student number>).
Attach the source code file namedsorter-55973318.cpp.
Name: Martin BabkaIf you measure the time after 27th October, just submit a second similar email with the measured values and without the source code. If you can not achieve the time limit but you are close, try using XX = 333.