一、本节重点
- 哈希化简介
- 开放地址法
- 链地址法
- 哈希函数
- 哈希化的效率
- 哈希化和外部存储
二、简介
哈希表是一种数据结构,它可以提供快速的插入操作和查找操作:插入和删除只需要接近常量的时间即O(1)的时间
缺点:基于数组,数据创建后难于扩展,某些哈希表被基本填满时,性能下降非常严重;
没有一种简便的方法可以以任何一种顺序(从小到大)遍历表中数据项,如果需要这种能力需要其他的数据结构
如果不需要遍历数据,并且可以提前预测数量大小,那么哈希表在速度和易用性方面是无与伦比的
关键:把关键字转换成数组下标,需要通过哈希函数来完成,对于特定的关键字,并不需要哈希函数,关键字的值可以直接用与数组下标。
例子1:雇员号码作为关键字
1、关键字作为索引
一种可能是用数组,每个雇员号码站数组的一个单元,单元的下标是当前记录的雇员号码
2、不总是如此有序
基于数组的数据库,使得存储数据速度快且非常简单
字典
哈希表的经典例子‘字典’,另一个应用就是高级计算机语言的编译器
给出一个单词,如何找到下标?
1、把单词转化为数组下标
计算机编码方案不一样:一种是ASCII编码,其中a是97,b是98,依次类推,直到122代表z
ASCII 码从0到255,可以容纳字母,标点等字符。英文字母26个,可以设计一种自己的编码方案a=1,b=2,c=3,...,z=26,0为空格
2、数字相加
cats转换 c=3、a=1、t20、s=19,把它们相加 3+1+20+19 = 43;10个字母的单词编码和只有10*26 = 260个,但是只能表示260个
如果存储50000个单词,每个数组中存260个。查找就会出现问题
3、幂的连乘
利用类似的 7564 = 7*1000+5*100+4*10+6*1 方式,把单词转换成字母组合,把字母转换成数字组合。cats
3*27^3+1*27^2+20*27^1+19*27^0 = 60337 创造了独一无二的数字,但是范围变得异常大
哈希化
需要一种数据压缩算法,把数位幂等的连乘系统中得到的巨大的整数范围压缩到可以接受的数组范围
smallNumber = largeNumber%smallRange
这就是一种哈希函数,它把一个大范围的数字哈希(转化)成一个小范围的数组,
冲突
在已有的数组我位置上以及存在一个单词,再要插入的过程就称为冲突
解决方案:
1、指定数组大小两倍于需要存储的数据量,因此,可能有一半是空的,当冲突发生时,一种是通过系统方法找到空位,并把单词填入,而不再是数组的下标:开放地址法
2、创建一个存放单词链表的数组。数组内不直接存单词,这样当发生冲突时,新的数组项直接到这个数组下标所指的链表中,链地址法
开放地址法:
1、线性探测
// hash.java
// demonstrates hash table with linear probing
// to run this program: C:>java HashTableApp
import java.io.*;
class DataItem
{ // (could have more data)
private int iData; // data item (key)
//--------------------------------------------------------------
public DataItem(int ii) // constructor
{ iData = ii; }
//--------------------------------------------------------------
public int getKey()
{ return iData; }
//--------------------------------------------------------------
} // end class DataItem
class HashTable
{
private DataItem[] hashArray; // array holds hash table
private int arraySize;
private DataItem nonItem; // for deleted items
// -------------------------------------------------------------
public HashTable(int size) // constructor
{
arraySize = size;
hashArray = new DataItem[arraySize];
nonItem = new DataItem(-1); // deleted item key is -1
}
// -------------------------------------------------------------
public void displayTable()
{
System.out.print("Table: ");
for(int j=0; j<arraySize; j++)
{
if(hashArray[j] != null)
System.out.print(hashArray[j].getKey() + " ");
else
System.out.print("** ");
}
System.out.println("");
}
// -------------------------------------------------------------
public int hashFunc(int key)
{
return key % arraySize; // hash function
}
// -------------------------------------------------------------
public void insert(DataItem item) // insert a DataItem
// (assumes table not full)
{
int key = item.getKey(); // extract key
int hashVal = hashFunc(key); // hash the key
// until empty cell or -1,
while(hashArray[hashVal] != null &&
hashArray[hashVal].getKey() != -1)
{
++hashVal; // go to next cell
hashVal %= arraySize; // wraparound if necessary
}
hashArray[hashVal] = item; // insert item
} // end insert()
// -------------------------------------------------------------
public DataItem delete(int key) // delete a DataItem
{
int hashVal = hashFunc(key); // hash the key
while(hashArray[hashVal] != null) // until empty cell,
{ // found the key?
if(hashArray[hashVal].getKey() == key)
{
DataItem temp = hashArray[hashVal]; // save item
hashArray[hashVal] = nonItem; // delete item
return temp; // return item
}
++hashVal; // go to next cell
hashVal %= arraySize; // wraparound if necessary
}
return null; // can't find item
} // end delete()
// -------------------------------------------------------------
public DataItem find(int key) // find item with key
{
int hashVal = hashFunc(key); // hash the key
while(hashArray[hashVal] != null) // until empty cell,
{ // found the key?
if(hashArray[hashVal].getKey() == key)
return hashArray[hashVal]; // yes, return item
++hashVal; // go to next cell
hashVal %= arraySize; // wraparound if necessary
}
return null; // can't find item
}
// -------------------------------------------------------------
} // end class HashTable
class HashTableApp
{
public static void main(String[] args) throws IOException
{
DataItem aDataItem;
int aKey, size, n, keysPerCell;
// get sizes
System.out.print("Enter size of hash table: ");
size = getInt();
System.out.print("Enter initial number of items: ");
n = getInt();
keysPerCell = 10;
// make table
HashTable theHashTable = new HashTable(size);
for(int j=0; j<n; j++) // insert data
{
aKey = (int)(java.lang.Math.random() *
keysPerCell * size);
aDataItem = new DataItem(aKey);
theHashTable.insert(aDataItem);
}
while(true) // interact with user
{
System.out.print("Enter first letter of ");
System.out.print("show, insert, delete, or find: ");
char choice = getChar();
switch(choice)
{
case 's':
theHashTable.displayTable();
break;
case 'i':
System.out.print("Enter key value to insert: ");
aKey = getInt();
aDataItem = new DataItem(aKey);
theHashTable.insert(aDataItem);
break;
case 'd':
System.out.print("Enter key value to delete: ");
aKey = getInt();
theHashTable.delete(aKey);
break;
case 'f':
System.out.print("Enter key value to find: ");
aKey = getInt();
aDataItem = theHashTable.find(aKey);
if(aDataItem != null)
{
System.out.println("Found " + aKey);
}
else
System.out.println("Could not find " + aKey);
break;
default:
System.out.print("Invalid entry\n");
} // end switch
} // end while
} // end main()
//--------------------------------------------------------------
public static String getString() throws IOException
{
InputStreamReader isr = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(isr);
String s = br.readLine();
return s;
}
//--------------------------------------------------------------
public static char getChar() throws IOException
{
String s = getString();
return s.charAt(0);
}
//-------------------------------------------------------------
public static int getInt() throws IOException
{
String s = getString();
return Integer.parseInt(s);
}
//--------------------------------------------------------------
} // end class HashTableApp
2、二次探测
开放地址法上会存在聚集,聚集越多越难移动。
装填因子:已填入数据项与表长的比率叫装填因子
步距是步数的平方:x+1^2,x+2^2,x+3^2
二次聚集:二次探测发生的聚集
3、再哈希法
把关键字用不同的哈希函数再做一遍哈希化,用这个结果做步长,对指定的关键字,步长在整个探测中是不变的
- 与第一个哈希函数不同
- 不能输出0(否则,没有步长,每次都是原地踏步,算法陷入死循环)
// hashDouble.java
// demonstrates hash table with double hashing
// to run this program: C:>java HashDoubleApp
import java.io.*;
class DataItem
{ // (could have more items)
private int iData; // data item (key)
//--------------------------------------------------------------
public DataItem(int ii) // constructor
{ iData = ii; }
//--------------------------------------------------------------
public int getKey()
{ return iData; }
//--------------------------------------------------------------
} // end class DataItem
class HashTable
{
private DataItem[] hashArray; // array is the hash table
private int arraySize;
private DataItem nonItem; // for deleted items
// -------------------------------------------------------------
HashTable(int size) // constructor
{
arraySize = size;
hashArray = new DataItem[arraySize];
nonItem = new DataItem(-1);
}
// -------------------------------------------------------------
public void displayTable()
{
System.out.print("Table: ");
for(int j=0; j<arraySize; j++)
{
if(hashArray[j] != null)
System.out.print(hashArray[j].getKey()+ " ");
else
System.out.print("** ");
}
System.out.println("");
}
// -------------------------------------------------------------
public int hashFunc1(int key)
{
return key % arraySize;
}
// -------------------------------------------------------------
public int hashFunc2(int key)
{
// non-zero, less than array size, different from hF1
// array size must be relatively prime to 5, 4, 3, and 2
return 5 - key % 5;
}
// -------------------------------------------------------------
// insert a DataItem
public void insert(int key, DataItem item)
// (assumes table not full)
{
int hashVal = hashFunc1(key); // hash the key
int stepSize = hashFunc2(key); // get step size
// until empty cell or -1
while(hashArray[hashVal] != null &&
hashArray[hashVal].getKey() != -1)
{
hashVal += stepSize; // add the step
hashVal %= arraySize; // for wraparound
}
hashArray[hashVal] = item; // insert item
} // end insert()
// -------------------------------------------------------------
public DataItem delete(int key) // delete a DataItem
{
int hashVal = hashFunc1(key); // hash the key
int stepSize = hashFunc2(key); // get step size
while(hashArray[hashVal] != null) // until empty cell,
{ // is correct hashVal?
if(hashArray[hashVal].getKey() == key)
{
DataItem temp = hashArray[hashVal]; // save item
hashArray[hashVal] = nonItem; // delete item
return temp; // return item
}
hashVal += stepSize; // add the step
hashVal %= arraySize; // for wraparound
}
return null; // can't find item
} // end delete()
// -------------------------------------------------------------
public DataItem find(int key) // find item with key
// (assumes table not full)
{
int hashVal = hashFunc1(key); // hash the key
int stepSize = hashFunc2(key); // get step size
while(hashArray[hashVal] != null) // until empty cell,
{ // is correct hashVal?
if(hashArray[hashVal].getKey() == key)
return hashArray[hashVal]; // yes, return item
hashVal += stepSize; // add the step
hashVal %= arraySize; // for wraparound
}
return null; // can't find item
}
// -------------------------------------------------------------
} // end class HashTable
class HashDoubleApp
{
public static void main(String[] args) throws IOException
{
int aKey;
DataItem aDataItem;
int size, n;
// get sizes
System.out.print("Enter size of hash table: ");
size = getInt();
System.out.print("Enter initial number of items: ");
n = getInt();
// make table
HashTable theHashTable = new HashTable(size);
for(int j=0; j<n; j++) // insert data
{
aKey = (int)(java.lang.Math.random() * 2 * size);
aDataItem = new DataItem(aKey);
theHashTable.insert(aKey, aDataItem);
}
while(true) // interact with user
{
System.out.print("Enter first letter of ");
System.out.print("show, insert, delete, or find: ");
char choice = getChar();
switch(choice)
{
case 's':
theHashTable.displayTable();
break;
case 'i':
System.out.print("Enter key value to insert: ");
aKey = getInt();
aDataItem = new DataItem(aKey);
theHashTable.insert(aKey, aDataItem);
break;
case 'd':
System.out.print("Enter key value to delete: ");
aKey = getInt();
theHashTable.delete(aKey);
break;
case 'f':
System.out.print("Enter key value to find: ");
aKey = getInt();
aDataItem = theHashTable.find(aKey);
if(aDataItem != null)
System.out.println("Found " + aKey);
else
System.out.println("Could not find " + aKey);
break;
default:
System.out.print("Invalid entry\n");
} // end switch
} // end while
} // end main()
//--------------------------------------------------------------
public static String getString() throws IOException
{
InputStreamReader isr = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(isr);
String s = br.readLine();
return s;
}
//--------------------------------------------------------------
public static char getChar() throws IOException
{
String s = getString();
return s.charAt(0);
}
//-------------------------------------------------------------
public static int getInt() throws IOException
{
String s = getString();
return Integer.parseInt(s);
}
//--------------------------------------------------------------
} // end class HashDoubleApp
链地址法:
在哈希表每个单元中设置链表
// hashChain.java
// demonstrates hash table with separate chaining
// to run this program: C:>java HashChainApp
import java.io.*;
class Link
{ // (could be other items)
private int iData; // data item
public Link next; // next link in list
// -------------------------------------------------------------
public Link(int it) // constructor
{ iData= it; }
// -------------------------------------------------------------
public int getKey()
{ return iData; }
// -------------------------------------------------------------
public void displayLink() // display this link
{ System.out.print(iData + " "); }
} // end class Link
class SortedList
{
private Link first; // ref to first list item
// -------------------------------------------------------------
public void SortedList() // constructor
{ first = null; }
// -------------------------------------------------------------
public void insert(Link theLink) // insert link, in order
{
int key = theLink.getKey();
Link previous = null; // start at first
Link current = first;
// until end of list,
while( current != null && key > current.getKey() )
{ // or current > key,
previous = current;
current = current.next; // go to next item
}
if(previous==null) // if beginning of list,
first = theLink; // first --> new link
else // not at beginning,
previous.next = theLink; // prev --> new link
theLink.next = current; // new link --> current
} // end insert()
// -------------------------------------------------------------
public void delete(int key) // delete link
{ // (assumes non-empty list)
Link previous = null; // start at first
Link current = first;
// until end of list,
while( current != null && key != current.getKey() )
{ // or key == current,
previous = current;
current = current.next; // go to next link
}
// disconnect link
if(previous==null) // if beginning of list
first = first.next; // delete first link
else // not at beginning
previous.next = current.next; // delete current link
} // end delete()
// -------------------------------------------------------------
public Link find(int key) // find link
{
Link current = first; // start at first
// until end of list,
while(current != null && current.getKey() <= key)
{ // or key too small,
if(current.getKey() == key) // is this the link?
return current; // found it, return link
current = current.next; // go to next item
}
return null; // didn't find it
} // end find()
// -------------------------------------------------------------
public void displayList()
{
System.out.print("List (first-->last): ");
Link current = first; // start at beginning of list
while(current != null) // until end of list,
{
current.displayLink(); // print data
current = current.next; // move to next link
}
System.out.println("");
}
} // end class SortedList
class HashTable
{
private SortedList[] hashArray; // array of lists
private int arraySize;
// -------------------------------------------------------------
public HashTable(int size) // constructor
{
arraySize = size;
hashArray = new SortedList[arraySize]; // create array
for(int j=0; j<arraySize; j++) // fill array
hashArray[j] = new SortedList(); // with lists
}
// -------------------------------------------------------------
public void displayTable()
{
for(int j=0; j<arraySize; j++) // for each cell,
{
System.out.print(j + ". "); // display cell number
hashArray[j].displayList(); // display list
}
}
// -------------------------------------------------------------
public int hashFunc(int key) // hash function
{
return key % arraySize;
}
// -------------------------------------------------------------
public void insert(Link theLink) // insert a link
{
int key = theLink.getKey();
int hashVal = hashFunc(key); // hash the key
hashArray[hashVal].insert(theLink); // insert at hashVal
} // end insert()
// -------------------------------------------------------------
public void delete(int key) // delete a link
{
int hashVal = hashFunc(key); // hash the key
hashArray[hashVal].delete(key); // delete link
} // end delete()
// -------------------------------------------------------------
public Link find(int key) // find link
{
int hashVal = hashFunc(key); // hash the key
Link theLink = hashArray[hashVal].find(key); // get link
return theLink; // return link
}
// -------------------------------------------------------------
} // end class HashTable
class HashChainApp
{
public static void main(String[] args) throws IOException
{
int aKey;
Link aDataItem;
int size, n, keysPerCell = 100;
// get sizes
System.out.print("Enter size of hash table: ");
size = getInt();
System.out.print("Enter initial number of items: ");
n = getInt();
// make table
HashTable theHashTable = new HashTable(size);
for(int j=0; j<n; j++) // insert data
{
aKey = (int)(java.lang.Math.random() *
keysPerCell * size);
aDataItem = new Link(aKey);
theHashTable.insert(aDataItem);
}
while(true) // interact with user
{
System.out.print("Enter first letter of ");
System.out.print("show, insert, delete, or find: ");
char choice = getChar();
switch(choice)
{
case 's':
theHashTable.displayTable();
break;
case 'i':
System.out.print("Enter key value to insert: ");
aKey = getInt();
aDataItem = new Link(aKey);
theHashTable.insert(aDataItem);
break;
case 'd':
System.out.print("Enter key value to delete: ");
aKey = getInt();
theHashTable.delete(aKey);
break;
case 'f':
System.out.print("Enter key value to find: ");
aKey = getInt();
aDataItem = theHashTable.find(aKey);
if(aDataItem != null)
System.out.println("Found " + aKey);
else
System.out.println("Could not find " + aKey);
break;
default:
System.out.print("Invalid entry\n");
} // end switch
} // end while
} // end main()
//--------------------------------------------------------------
public static String getString() throws IOException
{
InputStreamReader isr = new InputStreamReader(System.in);
BufferedReader br = new BufferedReader(isr);
String s = br.readLine();
return s;
}
//-------------------------------------------------------------
public static char getChar() throws IOException
{
String s = getString();
return s.charAt(0);
}
//-------------------------------------------------------------
public static int getInt() throws IOException
{
String s = getString();
return Integer.parseInt(s);
}
//--------------------------------------------------------------
} // end class HashChainApp
哈希函数
1、快速计算:位运算
2、随机关键字:关键字真随机分布
3、非随机关键字
- 不使用无用数据
- 使用所有的数据
哈希化字符串
数学恒等式