Bloomfilter and Cassandra = Why used and why hashed several times?

I Read this: http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html

My Questions:

1.) Is it correct, that Cassandra only uses the bloom filter, to find out the SST (Sorted String Table) which most likely contains the key? As there might be several SSTs and Cassandra does not know in Which SST a key might be? So to speed this up looking in all SSTs bloomfilters are used. Is this correct? (I am trying to understand how cassandra works...)

2.) Why are (as explained in the link above) keys hashed several times? Is it correct that the keys need to be hashed with different Hash functions several times, to get a better "random distribution of the" Bits? If this is wrong, why does a key need to be hashed several times? This will cost CPU cycles? If I have the output of several Hash functions, what is done with the results, are they ANDed or XORded. Does this make any difference?

3.)Using MD5 how big is the difference of "Fales positives by using the Bloomfilter" compared to SHA1 (which according to the articles is random distributed)? Why is MD5 not random distributed?

Thanks very much!! Jens

7
задан sbridges 2 May 2011 в 00:40
поделиться