What is a hash?
Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value. It is also used in many encryption algorithms.
What Makes A Good Hash Function
Most good hashing functions work by computing the remainder after dividing by the table size N.
This always gives a value between 0 and N-1 so it suitable but if N is a prime number then it is also excellent at scattering the data round the table. Of course, if you have a text value that you want to hash you first have to convert it into a suitable numeric value and a simple scheme like the one in the example will not do.
You need to produce a different numeric value for every possible text value and adding together the ASCII codes of the first two letters clearly doesn't work. A better method is to weight each of the ASCII codes by the position of the letter by multiplying by 1 for the first character, 10 for the second, 100 for the third and so on.. before adding them up to give a single value.
In general building a really good hash function is difficult and in most cases you need to find one that has good properties and has been well tested.
This brings us to the question of what makes a good hash function?
In general the set of things that you want to apply a hash function to is big, very big, but the set of results that you want to get back from the hash function is much smaller. That is, in hash(x)=y the range of x is usually large and the range of y is much smaller. For example, you might have x in the set of all possible 10-character words and y in the range 0 to 1023. The idea is that not all of the very large number of 10-character words will actually occur - usually less than the 1024 storage locations you might have set aside. You want the hash function to map the input words onto the 1024 possible values as evenly as possible - this minimizes the probability of a collision.
So a good hash function maps its input to its output in a way that is as unpredictable as possible i.e. in simple terms it is a repeatable random looking function.