Implement two types of hash tables and two universal families of hash functions. Then compare their performance on two datasets consiting of Insertions and Finds. Insert operations do not create duplicate keys. If an insertion creating a non-unique key shall be performed do not change the table. Also these operations are counted in the performance statistics. The first dataset consists mainly of Insert operations (Find has only 20 %), the second one constists mainly of Find operations (contains 80 % of Find operations).
Implement linear probing exactly as told in the course. Also implement the course version of cuckoo hashing, i.e. use a single table. Choose the size of the table to be M - the number is described in the section Generating the data. In order to measure the performance of cuckoo hashing use the generator parameter C, which bounds the maximal load factor.
Tabulation. Let x be an element of the universe U. If we treat x as a bitstring then we can split it into c parts having the same length, i.e. x = x1x2...xc. For each part we randomly initialize a tabulation table. The table contains exactly 2u/c bit strings of length k. Then h(x) = T1(x1) ⊕ T2(x2) ⊕ ... ⊕ Tc(xc). In other words, each part of the bit string x determines an index in its tabulation table. The tables at the indices contain values from the set {0, ..., m - 1}. The value of the hash function equals to the XOR of these values. A fast implementation usually relies on a fixed value of c, choose c = 4.
Multiplicative function. Let x ∈ U and a ∈ U - {0} be a random number chosen uniformly from the universe. Then h(x) = ((a * x) mod |U|) / (|U| / m). For a 32-bit universe we can implement this function effectively by ignoring the overflow of the multiplication and using the bit shifts.
For each dataset create two graphs, i.e. the results are four graphs. In the first graph plot the average number of operations caused by Insert or Find with respect to the load factor of the table. The second one contains the average running time (scale to miliseconds or microseconds according to your needs) of Insert or Find with respect to the load factor of the table. The operation count should determined as follows.
Write down the report consisting of:
Use the program /afs/ms.mff.cuni.cz/u/b/babkm5am/ds1/hw5/hashgen
in the unix lab Rotunda. Run it with the following parameters:
-s XX
, where XX are the last two digits of your student ID;--short
, for shorter data;I
, for a dataset mainly consisting of insertions;F
, for a dataset mainly consisting of finds;C
, restricts the load factor to at most 55 % for cuckoo hashing.The generator produces a text output. Each row describes one
# N M
: start of the run for the set of N values hashed into a table of size M -
delete the old table and create a new one. The size of the set can be used to determine the load factor in order to draw the graphs. Moreover the size of the table M remains the same for each run. The value N determines the size of the set created by this run, i.e. the number of successful Insert operations.I X
: insertion of the key X;F X
: find the key X in the table.A single invocation of the generator creates a sequence of runs ranging having load factor from 30 to 90 %,. (the maximal load factor is 55 %, if you used the parameter C). For each load factor there are 4 runs generated. The results shown in the graph should be averaged over these runs.
# 2 4 I 1 I 2 I 1 I 3 F 1 # 3 4 I 2 I 3 I 7 I 3 I 2 F 2 F 0
Send an e-mail with subject of the form 'HW5-55973318' (i.e., HW5-student ID) with the following attachments:
The body of your e-mail shall have the following form:
Jmeno: Martin Babka
Pseudonym: Guybrush Threepwood
Tutorial section: Monday-12:20-odd