Algs4 - Hash Table and Symbol Table Application, 散列表和符号表的应用

Hash Tables and Symbol Tables Application

Hash Tables

  • Save items in a key-indexed table (index is a function of the key)
  • Issues.
    • Computing the hash function
    • Equality test: Method for checking whether two keys are equal.
    • Collision resolution: Algorithm and data structure to handle two keys that hash to the same array index

Hash functions

Idealistic goal: Scramble the keys uniformly to produce a table index

  • Java’s hash code conventions

    • All Java classes inherit a method hashCode(), which returns a 32-bit int, signed integer.
    • Requirement. If x.equals(y), then x.hashCode() == y.hashCode().
    • Highly desirable. If !x.equals(y), then x.hashCode() != y.hashCode().
    • Default implementation. Memory address of x.
    • Customized implementation. Integer, Double, String, File, URL, Data, …
    • User defined
  • Implementing hash code

    • Integer
    public final class Integer {
        private final int value;
        ...
        public int hashCode() { return value; }
    }
    
    • Boolean
    public final class Boolean {
        private final boolean value;
        ...
        public int hashCode() {
            if (value) return 1231;
            else return 1237;
        }
    }
    
    • Double
    public final class Double {
        private final double value;
        ...
        public int hashCode() {
            long bits = doubleToLongBits(Value);
            return (int) (bits ^ (bits >>> 32));
        }
    }
    
    • String
    public final class String {
        private final char[] s;
        ...
        public int hashCode() {
          int hash = 0;
          for (int i = 0; i < s.length(); i++) 
              hash = s[i] + (31 * hash);
          return hash;
        }
    }
    
    • String is immutable, when string contents change, assign a new string to the reference. See Java basic.
  • User-defined types

    public final class Transaction implements Comparable<Transaction> {
        private final String who;
        private final Date when;
        private final double amount;
        ...
        public int hashCode() {
            int hash = 17;    // no zero constant
            hash = 31*hash + who.hashCode();  // for reference types, use hashCode().
            hash = 31*hash + when.hashCode();
            hash = 31*hash + ((Double) amount).hashCode(); // for primitive types, use hashCode() of wrapper type.
            return hash;
        }
    }
    
  • Standard recipe for user-defined types

    • Combine each significant field using the 31x+y rule.
    • If field is primitive type, use wrapper type hashCode().
    • If field if null, return 0.
    • If field is a reference type, use hashCode().
    • If field is a array, apply to each entry, or use Arrays.deepHashCode()
  • In practice. This recipe works reasonably well; used in Java libraries.

  • Modular hashing, the correct code is

    private in hash(Key key) {
        // 0x7fffffff means all 32 bits are 1 except the first bit (signed bit) is 0.
        // key.hashCode() & 0x7fffffff, change the signed bit of negative number to 0.
        return (key.hashCode() & 0x7fffffff)) % M;
    }
    
  • Integer Literals Details

    • Decimal, as usual, positive or negative (-)
    • Octal number, using a leading 0 (zero) digit and one or more additional octal digits (digits between 0 and 7), eg. 077
    • Hexadecimal, using the form 0x (or 0X) followed by one or more hexadecimal digits (digits from 0 to 9, a to f or A to F). For example, 0xCAFEBABEL is the long integer 3405691582. Like octal numbers, hexadecimal literals may represent negative numbers.
    • For example. int i1 = 0x7fffffff; Integer.toBinaryString(i1), output 0111 1111 1111 1111 1111 1111 1111 1111 without space.

Separate Chaining

  • Use an array of M < N linked lists. N is the total of keys, M is number of hash buckets.
  • Implementation SeparateChainingST.java
    public class SeparateChainingHashST<Key, Value> {
        private int M = 97;               // number of chains
        private Node[] st = new Node[M];  // array of chains
        
        private static class Node {
            private Object key;   // no generic array creation
            private Object val;   // (declare key and value of type Object)
            private Node next;
        }
        private int hash(Key key) {
            return (key.hashCode() & 0x7fffffff)) % M;
        }
        public Value get(Key key) {
            int i = hash(key);
            for (Node x = st[i]; x != null; x = x.next) {
                if (key.equals(x.key)) return (Value) x.val;  // cast to Value type
            }
            return null;
        }
        public void put(Key key, Value val) {
            int i = hash(key);
            for (Node x = st[i]; x != null; x = x.next) {
                if (key.equals(x.key)) { x.val = val; return; }
            }
            st[i] = new Node(key, val, st[i]); // add the k-v pair to st[i] chain.
        }
    } 
    

Linear Probing (Open addressing)

  • Open addressing. When a new key collides, find next empty slot, and put it there.
  • Linear Probing process:
    • Hash. Map key to integer i between 0 and M-1
    • Insert. Put at table index i if free; if not try i+1, i+2 etc.
    • Search. Search table index i; if occupied but no match, tyr i+1, i+2 etc.
    • Note: Array size M must be greater than number of key-value pairs N.
  • Implementation LinearProbingHashST
    public class LinearProbingHashST {
        private int M = 30001;
        private Value[] vals = (Value[]) new Object[M];   // array doubling and halving code omitted
        private Key[] keys = (Key[]) new Object[M];
    
        private int hash(Key key) { /* as before */ }
    
        public void put(Key key, Value val) {
          int i;
          for (int i = hash(key); keys[i] != null; i = (i+1) % M)
              if (keys[i].equals(key)) 
                  break;
          keys[i] = key;
          vals[i] = val;
        }
    
        public Value get(Key key) {
          for (int i = hash(key); keys[i] != null; i = (i+1) % M) 
          if (keys[i].equals(key))
              return vals[i];
          return null;
        }
    }
    
  • Clustering: New keys likely to hash into middle of big clusters.
  • Knuth’s parking problem
  • Analysis of linear probing
    • M too large --> to many empty array entries
    • M too small --> search time blows up
    • Typical choice: alpha = N / M ~ (1/2), # probes for search hit is about 3/2, # probes for search miss is about 5/2. – Knuth

Hash Table Context

  • War story: algorithmic complexity attacks
    • Surprising situations: denial-of-service attacks. malicious adversary learns your hash function and causes a big pile-up in single slot that grinds performance to a halt
  • Algorithmic complexity attack on Java
    • Goal. Find family of strings with the same hash code
    • solution. The base 31 hash code is part of Java’s string API.
  • Diversion: one-way hash functions
    • One-way hash function. “Hard” to find a key that will hash to a desired value (or two keys that hash to same value).
    • Applications. Digital fingerprint, message digest, storing password.
    • Caveat, Too expensive for use in ST implementations.
  • Hashing: variations on the theme
    • Two-probe hashing. (separate-chaining variant), hash to two positions, insert key in shorter of the two chains.
    • Double hashing. (linear-probing variant), use linear probing, but skip a variable amount, not just 1 each time.
    • Cuckoo hashing (linear-probing variant)

Hash tables vs.balanced search trees (in Java)

  • Hash tables
    • Simpler to code
    • Keys are unordered, no effective alternative for unordered keys.
    • Faster for simple keys (a few arithmetic ops versus logN compares).
    • Better system support in Java for strings (e.g., cached hash code).
  • Balanced search trees
    • Stronger performance guarantee.
    • Support for ordered ST operations.
    • Easier to implement compareTo() correctly than equals() and hashCode().
  • Java system includes both
    • Red-black BSTs: java.util.TreeMap, java.util.TreeSet.
    • Hash tables: java.util.HashMap, java.util.HashSet, java.util.IdentityHashMap, java.util.LinkedHashSet

Symbol Table Applications

Set API

  • Mathematical set: A collection of distinct keys.

  • Exception filter, WhiteList or BlackList…

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MsSZcSrc-1586029264255)(…/…/.gitbook/assets/st-application.png)]

Dictionary lookup

  • Command-line arguments
    • A comma-separated value (CSV) file.
    • Key field, Value field
    • Pick any field as the key, and any field as the value.
  • Ex 1. DNS lookup
  • Ex 2. Amino acids
  • Ex 3. Class list
    LookupCSV.java
public class LookupCSV {
    public static void main(String[] args) {
        // process input file
        In in = new In(args[0]);
        int keyField = Integer.parseInt(args[1]);
        int valField = Integer.parseInt(args[2]);

        // build symbol tables
        ST<String, String> st = new ST<>();
        while (!in.isEmpty) {
            String line = in.readLine();
            String[] tokens = line.split(",");
            String key = tokens[keyField];
            String val = tokens[valField];
            st.put(key, val);
        }

        while(!StdIn.isEmpty()) {
            String s = StdIn.readString();
            if (!st.contains(s)) StdOut.println("Not found");
            else StdOut.println(st.get(s));
        }
    }
}

Indexing Clients

  • File indexing
    • Goal. Index a PC (or the web). Given a list of files specified, create an index so that you can efficiently find all files containing a given query string.

    • Solution, Key = query string; value = set of files containing that string.

    • Book index

    • Implementation FileIndex.java

        /*****************************************************************************
        Compilation:  javac FileIndex.java
        Execution:    java FileIndex file1.txt file2.txt file3.txt ...
        Dependencies: ST.java SET.java In.java StdIn.java StdOut.java
        Data files:   https://algs4.cs.princeton.edu/35applications/ex1.txt
                        https://algs4.cs.princeton.edu/35applications/ex2.txt
                        https://algs4.cs.princeton.edu/35applications/ex3.txt
                        https://algs4.cs.princeton.edu/35applications/ex4.txt
        
        % java FileIndex ex*.txt
        age
            ex3.txt
            ex4.txt
        best
            ex1.txt
        was
            ex1.txt
            ex2.txt
            ex3.txt
            ex4.txt
        
        % java FileIndex *.txt
        
        % java FileIndex *.java
        
        *****************************************************************************/
import edu.princeton.cs.algs4.*;
import java.io.File;
public class FileIndex {
    public static void main(String[] args) {
        ST<String, SET<File>> st = new ST<String, SET<File>>();

        for (String filename : args) {
            // list of file names from command line
            File file = new File(filename);
            In in = new In(file);
            while (!in.isEmpty()) {
                String key = in.readString();
                if (!st.contains(key))
                    st.put(key, new SET<File>());
                SET<File> set = st.get(key);
                set.add(file);
            }
        }

        while (!StdIn.isEmpty()) {
            String query = StdIn.readString();
            StdOut.println(st.get(query));
        }
    }
}
  • Concordance, similar to File index, see slides.
  • Index and inverted index. LookupIndex.java

Sparse vector

  • Vector representations of Symbol table
    • key = index, value = entry.
    • Efficient iterator
    • Space proportional to number of nonzeros.
  • Implementation SparseVector.java
import edu.princeton.cs.algs4.StdOut;

import java.util.HashMap;

public class SparseVector {
    private HashMap<Integer, Double> v;  // Use Hash Symbol Tables (HashST), because order not important

    public SparseVector() { v = new HashMap<>(); }

    public void put(int i, double x) { v.put(i, x); }

    public double get(int i) {
        if (!v.containsKey(i)) return 0.0;
        else return v.get(i);
    }

    public Iterable<Integer> indices() {
        // return all nonzero indices
        return v.keySet();
    }

    public double dot(double[] that) {
        double sum = 0.0;
        for (int i : indices())
            sum += that[i]*this.get(i);
        return sum;
    }

    public static void main(String[] args) {
        double[] a = {1, 0, 0, 0, 0, 1, 0, 2};
        SparseVector sv = new SparseVector();
        for (int i = 0; i < a.length; i++) {
            if (a[i] != 0) sv.put(i, a[i]);
        }

        double[] b = {1, 1, 1, 1, 1, 1, 1 ,1};
        StdOut.println(sv.dot(b));
    }
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值