Algs4 - Hash Table and Symbol Table Application, 散列表和符号表的应用

最新推荐文章于 2020-11-29 22:11:16 发布

焦下鹿

最新推荐文章于 2020-11-29 22:11:16 发布

阅读量375

点赞数

分类专栏： Data Structure and Algorithms

本文链接：https://blog.csdn.net/qq_46105155/article/details/105321689

版权

Data Structure and Algorithms 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

文章目录

Hash Tables and Symbol Tables Application

Hash Tables and Symbol Tables Application

Hash Tables

Save items in a key-indexed table (index is a function of the key)
Issues.
- Computing the hash function
- Equality test: Method for checking whether two keys are equal.
- Collision resolution: Algorithm and data structure to handle two keys that hash to the same array index

Hash functions

Idealistic goal: Scramble the keys uniformly to produce a table index

Java’s hash code conventions
- All Java classes inherit a method hashCode(), which returns a 32-bit int, signed integer.
- Requirement. If x.equals(y), then x.hashCode() == y.hashCode().
- Highly desirable. If !x.equals(y), then x.hashCode() != y.hashCode().
- Default implementation. Memory address of x.
- Customized implementation. Integer, Double, String, File, URL, Data, …
- User defined

Implementing hash code

Integer

public final class Integer {
    private final int value;
    ...
    public int hashCode() { return value; }
}

Boolean

public final class Boolean {
    private final boolean value;
    ...
    public int hashCode() {
        if (value) return 1231;
        else return 1237;
    }
}

Double

public final class Double {
    private final double value;
    ...
    public int hashCode() {
        long bits = doubleToLongBits(Value);
        return (int) (bits ^ (bits >>> 32));
    }
}

String

public final class String {
    private final char[] s;
    ...
    public int hashCode() {
      int hash = 0;
      for (int i = 0; i < s.length(); i++) 
          hash = s[i] + (31 * hash);
      return hash;
    }
}

String is immutable, when string contents change, assign a new string to the reference. See Java basic.

User-defined types

public final class Transaction implements Comparable<Transaction> {
    private final String who;
    private final Date when;
    private final double amount;
    ...
    public int hashCode() {
        int hash = 17;    // no zero constant
        hash = 31*hash + who.hashCode();  // for reference types, use hashCode().
        hash = 31*hash + when.hashCode();
        hash = 31*hash + ((Double) amount).hashCode(); // for primitive types, use hashCode() of wrapper type.
        return hash;
    }
}

Standard recipe for user-defined types
- Combine each significant field using the 31x+y rule.
- If field is primitive type, use wrapper type hashCode().
- If field if null, return 0.
- If field is a reference type, use hashCode().
- If field is a array, apply to each entry, or use Arrays.deepHashCode()
In practice. This recipe works reasonably well; used in Java libraries.

Modular hashing, the correct code is

private in hash(Key key) {
    // 0x7fffffff means all 32 bits are 1 except the first bit (signed bit) is 0.
    // key.hashCode() & 0x7fffffff, change the signed bit of negative number to 0.
    return (key.hashCode() & 0x7fffffff)) % M;
}

Integer Literals Details
- Decimal, as usual, positive or negative (-)
- Octal number, using a leading 0 (zero) digit and one or more additional octal digits (digits between 0 and 7), eg. 077
- Hexadecimal, using the form 0x (or 0X) followed by one or more hexadecimal digits (digits from 0 to 9, a to f or A to F). For example, 0xCAFEBABEL is the long integer 3405691582. Like octal numbers, hexadecimal literals may represent negative numbers.
- For example. int i1 = 0x7fffffff; Integer.toBinaryString(i1), output 0111 1111 1111 1111 1111 1111 1111 1111 without space.

Separate Chaining

Use an array of M < N linked lists. N is the total of keys, M is number of hash buckets.

Implementation SeparateChainingST.java

public class SeparateChainingHashST<Key, Value> {
    private int M = 97;               // number of chains
    private Node[] st = new Node[M];  // array of chains
    
    private static class Node {
        private Object key;   // no generic array creation
        private Object val;   // (declare key and value of type Object)
        private Node next;
    }
    private int hash(Key key) {
        return (key.hashCode() & 0x7fffffff)) % M;
    }
    public Value get(Key key) {
        int i = hash(key);
        for (Node x = st[i]; x != null; x = x.next) {
            if (key.equals(x.key)) return (Value) x.val;  // cast to Value type
        }
        return null;
    }
    public void put(Key key, Value val) {
        int i = hash(key);
        for (Node x = st[i]; x != null; x = x.next) {
            if (key.equals(x.key)) { x.val = val; return; }
        }
        st[i] = new Node(key, val, st[i]); // add the k-v pair to st[i] chain.
    }
}

Linear Probing (Open addressing)

Open addressing. When a new key collides, find next empty slot, and put it there.
Linear Probing process:
- Hash. Map key to integer i between 0 and M-1
- Insert. Put at table index i if free; if not try i+1, i+2 etc.
- Search. Search table index i; if occupied but no match, tyr i+1, i+2 etc.
- Note: Array size M must be greater than number of key-value pairs N.

Implementation LinearProbingHashST

public class LinearProbingHashST {
    private int M = 30001;
    private Value[] vals = (Value[]) new Object[M];   // array doubling and halving code omitted
    private Key[] keys = (Key[]) new Object[M];

    private int hash(Key key) { /* as before */ }

    public void put(Key key, Value val) {
      int i;
      for (int i = hash(key); keys[i] != null; i = (i+1) % M)
          if (keys[i].equals(key)) 
              break;
      keys[i] = key;
      vals[i] = val;
    }

    public Value get(Key key) {
      for (int i = hash(key); keys[i] != null; i = (i+1) % M) 
      if (keys[i].equals(key))
          return vals[i];
      return null;
    }
}

Clustering: New keys likely to hash into middle of big clusters.
Knuth’s parking problem
Analysis of linear probing
- M too large --> to many empty array entries
- M too small --> search time blows up
- Typical choice: alpha = N / M ~ (1/2), # probes for search hit is about 3/2, # probes for search miss is about 5/2. – Knuth

Hash Table Context

War story: algorithmic complexity attacks
- Surprising situations: denial-of-service attacks. malicious adversary learns your hash function and causes a big pile-up in single slot that grinds performance to a halt
Algorithmic complexity attack on Java
- Goal. Find family of strings with the same hash code
- solution. The base 31 hash code is part of Java’s string API.
Diversion: one-way hash functions
- One-way hash function. “Hard” to find a key that will hash to a desired value (or two keys that hash to same value).
- Applications. Digital fingerprint, message digest, storing password.
- Caveat, Too expensive for use in ST implementations.
Hashing: variations on the theme
- Two-probe hashing. (separate-chaining variant), hash to two positions, insert key in shorter of the two chains.
- Double hashing. (linear-probing variant), use linear probing, but skip a variable amount, not just 1 each time.
- Cuckoo hashing (linear-probing variant)

Hash tables vs.balanced search trees (in Java)

Hash tables
- Simpler to code
- Keys are unordered, no effective alternative for unordered keys.
- Faster for simple keys (a few arithmetic ops versus logN compares).
- Better system support in Java for strings (e.g., cached hash code).
Balanced search trees
- Stronger performance guarantee.
- Support for ordered ST operations.
- Easier to implement compareTo() correctly than equals() and hashCode().
Java system includes both
- Red-black BSTs: java.util.TreeMap, java.util.TreeSet.
- Hash tables: java.util.HashMap, java.util.HashSet, java.util.IdentityHashMap, java.util.LinkedHashSet

Symbol Table Applications

`Set` API

Mathematical set: A collection of distinct keys.
Exception filter, WhiteList or BlackList…

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MsSZcSrc-1586029264255)(…/…/.gitbook/assets/st-application.png)]

Dictionary lookup

Command-line arguments
- A comma-separated value (CSV) file.
- Key field, Value field
- Pick any field as the key, and any field as the value.
Ex 1. DNS lookup
Ex 2. Amino acids
Ex 3. Class list
LookupCSV.java

public class LookupCSV {
    public static void main(String[] args) {
        // process input file
        In in = new In(args[0]);
        int keyField = Integer.parseInt(args[1]);
        int valField = Integer.parseInt(args[2]);

        // build symbol tables
        ST<String, String> st = new ST<>();
        while (!in.isEmpty) {
            String line = in.readLine();
            String[] tokens = line.split(",");
            String key = tokens[keyField];
            String val = tokens[valField];
            st.put(key, val);
        }

        while(!StdIn.isEmpty()) {
            String s = StdIn.readString();
            if (!st.contains(s)) StdOut.println("Not found");
            else StdOut.println(st.get(s));
        }
    }
}

Indexing Clients

File indexing
- Goal. Index a PC (or the web). Given a list of files specified, create an index so that you can efficiently find all files containing a given query string.
- Solution, Key = query string; value = set of files containing that string.
- Book index
- Implementation FileIndex.java

        /*****************************************************************************
        Compilation:  javac FileIndex.java
        Execution:    java FileIndex file1.txt file2.txt file3.txt ...
        Dependencies: ST.java SET.java In.java StdIn.java StdOut.java
        Data files:   https://algs4.cs.princeton.edu/35applications/ex1.txt
                        https://algs4.cs.princeton.edu/35applications/ex2.txt
                        https://algs4.cs.princeton.edu/35applications/ex3.txt
                        https://algs4.cs.princeton.edu/35applications/ex4.txt
        
        % java FileIndex ex*.txt
        age
            ex3.txt
            ex4.txt
        best
            ex1.txt
        was
            ex1.txt
            ex2.txt
            ex3.txt
            ex4.txt
        
        % java FileIndex *.txt
        
        % java FileIndex *.java
        
        *****************************************************************************/
import edu.princeton.cs.algs4.*;
import java.io.File;
public class FileIndex {
    public static void main(String[] args) {
        ST<String, SET<File>> st = new ST<String, SET<File>>();

        for (String filename : args) {
            // list of file names from command line
            File file = new File(filename);
            In in = new In(file);
            while (!in.isEmpty()) {
                String key = in.readString();
                if (!st.contains(key))
                    st.put(key, new SET<File>());
                SET<File> set = st.get(key);
                set.add(file);
            }
        }

        while (!StdIn.isEmpty()) {
            String query = StdIn.readString();
            StdOut.println(st.get(query));
        }
    }
}

Concordance, similar to File index, see slides.
Index and inverted index. LookupIndex.java

Sparse vector

Vector representations of Symbol table
- key = index, value = entry.
- Efficient iterator
- Space proportional to number of nonzeros.
Implementation SparseVector.java

import edu.princeton.cs.algs4.StdOut;

import java.util.HashMap;

public class SparseVector {
    private HashMap<Integer, Double> v;  // Use Hash Symbol Tables (HashST), because order not important

    public SparseVector() { v = new HashMap<>(); }

    public void put(int i, double x) { v.put(i, x); }

    public double get(int i) {
        if (!v.containsKey(i)) return 0.0;
        else return v.get(i);
    }

    public Iterable<Integer> indices() {
        // return all nonzero indices
        return v.keySet();
    }

    public double dot(double[] that) {
        double sum = 0.0;
        for (int i : indices())
            sum += that[i]*this.get(i);
        return sum;
    }

    public static void main(String[] args) {
        double[] a = {1, 0, 0, 0, 0, 1, 0, 2};
        SparseVector sv = new SparseVector();
        for (int i = 0; i < a.length; i++) {
            if (a[i] != 0) sv.put(i, a[i]);
        }

        double[] b = {1, 1, 1, 1, 1, 1, 1 ,1};
        StdOut.println(sv.dot(b));
    }
}

焦下鹿

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Algs4 - Hash Table and Symbol Table Application, 散列表和符号表的应用

文章目录Hash Tables and Symbol Tables ApplicationHash TablesHash functionsSeparate ChainingLinear Probing (Open addressing)Hash Table ContextHash tables vs.balanced search trees (in Java)Symbol Table Appl...
复制链接

扫一扫