Algorithm Review (Python) - Sorting/ List ADT/ PQ ADT/ Table ADT

本文深入探讨了算法分析,包括大O复杂度和案例分析,特别关注排序算法(如冒泡、选择、插入、归并和快速排序)。此外,详细介绍了链表抽象数据类型(ADT),如单链表、栈、队列、双链表和双端队列,并讨论了它们在Python中的实现。还涵盖了优先队列ADT,使用二叉最大堆进行实现,并讨论了哈希表作为表ADT的有效实现。文章通过各种练习问题强化了所学概念。
摘要由CSDN通过智能技术生成


Pre-summary:
在这里插入图片描述

1. Algorithm Analysis

1.1 Big-O Complexity

在这里插入图片描述

1.2 Case Analysis

1.2.1 Rating Problems

在这里插入图片描述

n, k = map(int, input().split()) # constant time

total = 0
for i in range(k): # k times
    total += int(input()) # constant time

print((total - (n - k) * 3) / n, (total + (n - k) * 3) / n) # constant time

# -> O(k)

1.2.2 Laptop Sticker

在这里插入图片描述

wc, hc, ws, hs = map(int, input().split())

print(1 if (wc >= ws + 2 and hc >= hs + 2) else 0)

# -> O(1)

1.2.3 Left Beehind

在这里插入图片描述

while True: # O(TC * N^2)    // # test cases <= 15 -> O(1)
    x, y = map(int, input().split())
    if (x == 0 and y == 0):
        break
    elif (x + y == 13):
        print("Never speak again.")
    elif (x > y):
        print("To the convention.")
    elif (x == y):
        print("Undecided.")
    else:
        print("Left beehind.")
        
# -> O(1)

1.2.4 Seven Wonders

在这里插入图片描述

s = input() # string
t, c, g = 0, 0, 0 # constant time
for i in s: # loops through the characters in s
    if i == "T":
        t += 1
    elif i == "C":
        c += 1
    elif i += "G":
        g += 1

print(t*t + c*c + g*g + 7*min(t,c,g)) # constant time

# -> O(length_of_input_string) or O(n) with n = length_of_input_string

1.2.5 Prerequisites

在这里插入图片描述

while True:
    line = input() 
    if (line[0] == '0'):
        break
    k, m = map(int, line.split())
    courses = list(map(int, input().split()))
    valid = True
    for _ in range(m): # runs m times
        category = list(map(int, input().split()))
        c = category[0]
        r = category[1]
        for i in range(2, len(category)): # O(len(category)) * len(courses) -> O(C * k)
            if (category[i] in courses): # O(len(category)) -> k times
                r -= 1
                
        if (r > 0):
                valid = False
                
    print('yes' if valid else 'no')
    
# -> O(TC * m * k * c)

2. Sorting

2.1 Bubble Sort - O(N2)

swap 2 consecutive numbers

请添加图片描述

# Bubble Sort
A = [5, 4, 1, 2, 7, 3, 6]

def BubbleSort(A): # O(N^2) worst case (reverse sorted input), O(N) best case (sorted input)
    N = len(A)
    while N > 1: # at most n-1 passes
        swapped = False
        for i in range(N-1):
            if A[i] > A[i+1]:
                A[i], A[i+1] = A[i+1], A[i] # Python can swap variables like this
                swapped = True
        if not swapped: # optimization
            break
        N -= 1
    return A

print(A)
print(BubbleSort(A))

2.2 Selection Sort - O(N2)

find the smallest/largest and swap with the current number
请添加图片描述

# Selection Sort
A = [5, 4, 1, 2, 7, 3, 6]

def SelectionSort(A): # O(N^2) for ALL cases...
    N = len(A)
    for L in range(N-1):
        smallest = A.index(min(A[L:N])) # BEWARE... this is O(N) not O(1)... we cannot find the smallest index of the minimum element of (N-L) items in O(1)
        A[smallest], A[L] = A[L], A[smallest] # Python can swap variables like this
    return A

print(A)
print(SelectionSort(A))

2.3 Insertion Sort - O(N2)

insert the numbers into the right place
请添加图片描述

# Insertion Sort
A = [5, 4, 1, 2, 7, 3, 6]

def InsertionSort(A): # O(N^2) worst case (reverse sorted input), O(N) best case (sorted input)
    N = len(A)
    for i in range(1, N): # O(N)
        X = A[i] # X is the item to be inserted
        j = i-1
        while j >= 0 and A[j] > X: # can be fast or slow
            A[j+1] = A[j] # make a place for X
            j -= 1
        A[j+1] = X # index j+1 is the insertion point
    return A

print(A)
print(InsertionSort(A))

2.4 Merge Sort - O(N log N)

divide, sort, and merge

请添加图片描述

# Merge Sort
A = [5, 4, 1, 2, 7, 3, 6]

def MergeSort(A): # O(N log N) worst case for ALL cases :)
    N = len(A)
    if N == 1:
        return A

    mid = N//2
    left = A[:mid] # from start to before mid
    right = A[mid:] # from mid to end
    MergeSort(left) 
    MergeSort(right)

    i = j = k = 0
    while i < len(left) and j < len(right): # both left and right not empty
        if left[i] <= right[j]:
            A[k] = left[i] # take from left
            i += 1
        else:
            A[k] = right[j] # take from right
            j += 1
        k += 1
    while i < len(left): # has leftover from left (right is empty)
        A[k] = left[i]
        k += 1
        i += 1
    while j < len(right): # has leftover from right (left is empty)
        A[k] = right[j]
        k += 1
        j += 1
    return A

print(A)
print(MergeSort(A))

2.5 Quick Sort - O(N log N)

using pivots

请添加图片描述

2.6 Wrap-up

2.6.1 Properties of Sorting

在这里插入图片描述
在这里插入图片描述

2.6.2 Comparison of Sorting Algorithm (within the scope of this module)

  • Sorted versus Nearly Sorted input for (Opt)imized Bubble Sort that stops as soon as there is no more swap in the inner loop and Insertion Sort, differentiate O(N) versus O(k × N) where k << N but can be as big as N in the worst case (but will not be the case for this ‘Nearly Sorted’ input)
  • If the input is in reverse sorted (descending) and the default behavior of the sorting algorithm is to sort ascending, the sorting algorithm will usually do the most work (not just in terms of comparison, but also number of swaps).

在这里插入图片描述

Exercise 1:
在这里插入图片描述

# Python default sort -> O(NlogN)
while True:
    num = int(input())
    if num == 0:
        break
    name = [input() for i in range(num)]
    print(*sorted(name, key = lambda x:x[0:2]), sep='\n')
    print()

Exercise 2:
在这里插入图片描述

n = int(input())
domj = sorted([input() for _ in range(n)])
kattis = sorted([input() for _ in range(n)])

i, j, cnt = 0, 0, 0
while i < n and j < n:
    if domj[i] == kattis[j]:
        i += 1
        j += 1
        cnt += 1
    elif domj[i] > kattis[j]:
        j += 1
    else:
        i += 1
        
print(cnt)

3. List ADT

3.1 List ADT operations

  • get(i) — maybe a trivial operation, return ai (0-based indexing),
  • search(v) — decide if item/data v exists (and report its position/index) or not exist (and usually report a non existing index -1) in the list,
  • insert(i, v) — insert item/data v specifically at position/index i in the list, potentially shifting the items from previous positions: [i…N-1] by one position to their right to make a space,
  • remove(i) — remove item that is specifically at position/index i in the list, potentially shifting the items from previous positions: [i+1…N-1] by one position to their left to close the gap.

3.2 Linked List (LL)

3.2.1 LL and its ADT Implementation

# https://visualgo.net/en/list?slide=3-1
class Vertex: # we can use either C struct or C++/Java/Python class
    def __init__(self, data):
        self.item = data # the data is stored here, an integer in this example
        self.next = None

class SLL: # this is the version as shown in https://visualgo.net/en/list, SLL with both head and tail pointers
    def __init__(self):
        self.head = None
        self.tail = None

    def InsertAtHead(self, v):
        # https://visualgo.net/en/list?slide=3-8
        vtx = Vertex(v) # create new vertex vtx from item v
        vtx.next = self.head # link this new vertex to the (old) head vertex
        if self.head == None: # previously empty
            self.tail = vtx # the tail is also this vertex
        self.head = vtx # the new vertex becomes the new head

    def InsertAfterTail(self, v): # this is O(N), but there is a way to make this O(1) using tail pointer
        if self.head == None:
            self.InsertAtHead(v)
        else:
            vtx = Vertex(v) # create new vertex vtx from item v
            ## slow O(N) version - if we don't use tail pointer
            # ptr = self.head # we have to start from head
            # while ptr.next != None: # while not tail, O(N)
            #     ptr = ptr.next # the pointers are pointing to the higher index
            ## now ptr is tail
            # ptr.next = vtx # link tail to this new vertex
            self.tail.next = vtx # link tail to this new vertex, O(1)
            self.tail = vtx # now update tail, O(1)

    def GetHead(self):
        if self.head == None: return None
        return self.head.item

    def GetTail(self):
        if self.tail == None: return None
        return self.tail.item

    def DeleteHead(self):
        # https://visualgo.net/en/list?slide=3-15   
        if self.head is None: return # avoid crashing when SLL is empty
        self.head = self.head.next # update the head pointer
        if self.head == None: # becomes empty
            self.tail = None
        # remarks: as nothing points to old head, Python's garbage collector will remove it

    # DeleteTail will be O(N) in a Singly Linked List with tail pointer due to the need to update tail pointer, not used


class Stack(): # an example of class inheritance
    def __init__(self):
        self.list = []
    def push(self, v):
        self.list.append(v) # uses the back side of the list
    def pop(self):
        return self.list.pop() # (1)
    def top(self): # other people call it 'peek'
        return self.list[-1]


class Queue(SLL): # another example of class inheritance
##    def __init__(self):
##        self.list = []
    def enqueue(self, v):
        self.InsertAfterTail(v) # O(1) #### list.append(v) # O(1) uses the back side of the list for enqueue
    def dequeue(self):
        oldhead = self.GetHead()
        self.DeleteHead() # O(1)
        return oldhead ##### self.list.pop(0) # O(n), the front of the list for dequeue
    def front(self): # other people call it 'peek'
        return self.GetHead() # self.list[0]
    def back(self):
        return self.GetTail() # self.list[-1]

3.2.2 Singly Linked List (SLL)

请添加图片描述

Question: Why SLL by itself is basically ‘not that useful’ compared to Python list?

Answer: What SLL can do, we can emulate (and slightly better) with Python list; SLL’s main strength is in the way it allows the vertices to be non-contiguous in memory, which will be used more in Stack ADT and especially Queue ADT; on some application that requires fast delete/fast insertion without needing to close the gap, we may need SLL/DLL (like a text editor: Notepad, Microsoft Word, this PowerPoint…)

3.3 Stack - Last-In-First-Out (LIFO)

请添加图片描述
Using Python list as a stack… 😮 (push/peek/pop = append/[-1]/pop)

3.4 Queue - First-In-First-Out (FIFO)

请添加图片描述
Python list cannot be easily used as Queue ADT, pop(0) is costly, O(N)…
Using Python library again: deque

Question: Do you “agree” that SLL is one of the best data structure to implement this basic Queue ADT?

Answer: SLL allows dynamic expansion and shrinking, it never need to rearrange vertices; it has O(1) performances for enqueue and dequeue operations that are needed for Queue ADT; But actually, we can still implement Queue ADT using other things that is not SLL and also get O(1) enqueue/dequeue performance (using head + tail indices and/or with looping may help but doesn’t help overall if the queue will grow/shrink dynamically; or the 2 Stacks idea – Google this?).

3.5 Doubly Linked List (DLL)

请添加图片描述
Compared to SLL, DLL has both forward and backward pointers, more memory per vertex, but then it allows fast deletion of tail.

3.6 Double-Ended Queue (Deque)

请添加图片描述
Deque ADT restricts DLL operations to only its both endpoints.

3.7 Wrap-up

3.7.1 Comparison of LL ADT

在这里插入图片描述
Question:
在这里插入图片描述
Answer:
在这里插入图片描述
在这里插入图片描述

3.7.2 Python Implementation

Q:
在这里插入图片描述
A:
在这里插入图片描述

3.8 Exercises

3.8.1 Integer Lists

在这里插入图片描述

#use queue or stack, pop in 0(1) but reverse in O(n)
#thus use deque to improve reverse operation, but how?
#l.reverse? NO, coz of O(N)
#so need to keep track of the head, then pop from the front to the end

from collections import deque

numberOfCases = int(input())
for _ in range(numberOfCases):
    p = input()
    n = int(input())
    ls = deque(input()[1:-1].split(','))
    # ls = [1,2,3,4] that supports deque operations
    head = 0 # 0 if head is on the left, 1 otherwise
    error = False
    
    for op in p: #handle the error
        if op == 'R':
            head = 1 - head
            # head =^ 1
        elif op == 'D':
            if n == 0: 
                error = True
                print('error')
                break
            n -= 1
            if head == 1:
                ls.pop()
            else: # head == 0
                ls.popleft()
    if n != 0:
        print('[', end='')
        if head == 0:
            print(','.join(ls), end='')
        elif head == 1:
            print(','.join(reversed(ls)), end='')
        print(']')
    elif not error:
        print('[]')

3.8.2 Keystrokes

在这里插入图片描述

# another O(n) solution: use 2 SLL or a DLL (each node has a pointer to the next and prev node

pw = input()
stack1 = []
stack2 = []

for ele in pw:
    if ele not in ['L', 'R', 'B']:
        # push lowercase letter into stack1
        stack1.append(ele)
    elif ele == 'L':
        # pop top element of stack1 to stack2
        stack2.append(stack1.pop())
    elif ele == 'R':
        # pop top element of stack2 to stack1
        stack1.append(stack2.pop())
    elif ele == 'B':
        # pop top element of stack1 
        stack1.pop()

for _ in range(len(stack2)): # O(N)
    # concatenate stack1 and reversed stack2
    stack1.append(stack2.pop()) # O(1)

res = ''.join(stack1) # O(N)

print(res)

3.8.3 Circuit Math

在这里插入图片描述
在这里插入图片描述

# Circuit math (postfix calculator) -> use stack

n = int(input())
ele = list(map(lambda x : True if x == 'T' else False, input().split()))
circuit = list(input().split())
relationship = {}
record = []
stack = [] # for real calculation

for j in range(len(circuit)): # map each variable to corresponding truth value
    if circuit[j] not in (['+', '-', '*'] + record):
        record.append(circuit[j])
        relationship[circuit[j]] = ele.pop(0)

for i in circuit:
    if i in ['+', '*']:
        a = stack.pop()
        b = stack.pop()
        if i == '*': # AND gate
            stack.append(a and b)
        if i == '+': # OR gate
            stack.append(a or b)
    elif i == '-': # NOT gate
        a = stack.pop()
        stack.append(not a)
    else:
        stack.append(relationship[i])
    
# the only item left in the stack is the result
if stack.pop() == True:
    print('T')
else:
    print('F')

4. Priority Queue ADT

4.1 Priority Queue (PQ)

Priority Queue (PQ) Abstract Data Type (ADT) is similar to normal Queue ADT, but with these two major operations:

  • Enqueue(x): Put a new element (key) x into the PQ (in some order),
  • y = Dequeue(): Return an existing element y that has the highest priority (key) in the PQ and if ties, return the one that is inserted first, i.e. back to First-In First-Out (FIFO) behavior of a normal Queue

4.2 Binary (Max) Heap

A Binary (Max) Heap is a complete binary tree that maintains the Max Heap property.

Complete Binary Tree: Every level in the binary tree, except possibly the last/lowest level, is completely filled, and all vertices in the last level are as far left as possible.

Binary Max Heap property: The parent of each vertex - except the root - contains value greater than the value of that vertex. This is an easier-to-verify definition than the following alternative definition: The value of a vertex - except the leaf/leaves - must be greater than the value of its one (or two) child(ren).

Binary Heap is one possible data structure to model an efficient PQ ADT. In a PQ, each element has a “priority” and an element with higher priority is served before an element with lower priority (ties are broken with standard First-In First-Out (FIFO) rule as with normal Queue).

we don’t have to do that manual Binary Heap everytime… just use Python’s heapq

4.2.1 Inset(v) - O(log N)

请添加图片描述

4.2.2 ExtractMax()

请添加图片描述

4.2.3 HeapSort()

请添加图片描述

4.2.4 Create(A) - O(N log N)

请添加图片描述

4.2.5 Create(A) - O(N)

请添加图片描述
Q:
What is the minimum and maximum number of comparisons between Binary Heap elements
required to construct a Binary (Max) Heap of arbitrary n elements using the O(n) Create(array)?

A:
在这里插入图片描述
在这里插入图片描述

Q:
Give an algorithm to find all vertices that have value > x in a Binary Max Heap of size n.
Your algorithm must run in O(k) time where k is the number of vertices in the output

A:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
Q:
Show an easy way to convert a Binary Max Heap of a set integers (as shown in VisuAlgo
https://visualgo.net/en/heap) into a Binary Mean Heap (of the ‘same’ set of integers) without
changing the underlying data structure at all.
Hint: modify the data.

A:
Simple, just insert the negation of those integers into another empty Binary Max Heap. For example: 5 (max), 4, 3, 2, 1 will now looks like this if negated: -1 (max), -2, -3, -4, -5.

4.3 Exercises

4.3.1 Numbers On a Tree

在这里插入图片描述

# just a reverse thinking compared to normal Binary Heap indexing

info = input().split()
H = int(info[0])
action = '' #no action
if len(info) == 2:
    action = info[1]

index = 1
for i in action:
    if i == 'L':
        index = index * 2
    elif i == 'R':
        index = index * 2 + 1
        
print(2**(H+1) - index)

4.3.2 Canvas Painting

在这里插入图片描述
在这里插入图片描述

#idea: reverse the order of thinking
#there are N groups, each contains one canvas
#merge two groups at a time, cost:sum of canvas sizes in those groups
#finally, merge those canvases into one group

#total time complexity: NlogN
#(push/pop  from pq: logN) * (number of push/pop: N)

import heapq

T = int(input())
for _ in range(T):
    N = int(input())
    pq = list(map(int, input().split()))
    heapq.heapify(pq) #true min PQ now, O(n)
    
    res = 0
    while (len(pq) > 1):
        a = heapq.heappop(pq) #the smallest one
        b = heapq.heappop(pq) #the second smallest one
        c = a + b #merge two smallest into one
        res += (a + b) 
        heapq.heappush(pq, c)
    print(res)

4.3.3 Knigs of the Forest

在这里插入图片描述

#know the max strength of each year (pop from a max heap)
#unique year of entry, excluding 2011
import heapq # Min heap (not Max heap that we want) -> put negation into the heap

k, n = map(int, input().split())
mooses = []
order = []

for _ in range(n + k -1):
    mooses.append(list(map(int, input().split())))
    #have a list of list of 2 elements
    #1st ele: year of entry
    #2nd ele: strength
y, p = mooses[0] #year and strength of Karl

mooses.sort(reverse=True) #sort according to the year
#if increasing year, smallest year at the beginning
#so reverse the list -> take out the moose, pop in O(1)

#put first (k-1)+1 mooses of 2011 into order
for _ in range(k - 1):
    heapq.heappush(order, -mooses.pop()[1])
    
unknown = True
for i in range(n):
    heapq.heappush(order, -mooses.pop()[1])
    alpha = heapq.heappop(order)
    #check if alpha is Karl or not and the current year >= y
    #current year = 2011 + i
    if (2011 + i >= y) and p == -alpha:
        unknown = False
        print(2011 + i)
        break
    
if unknown:
    print('unknown')

4.3.4 Annoyed Coworkers

在这里插入图片描述

import heapq

class Node:
   def __init__(self, a, b):
      self.a = a
      self.b = b
   def __lt__(self, other):
      if self.a + self.b <= other.a + other.b:
         return True
      return False
    
h, c = map(int, input().split())
coworkers = [] #store the information of all coworkers
strategy = [] #priority queue
for i in range(c):
    coworkers.append(list(map(int, input().split())))
    heapq.heappush(strategy, Node(coworkers[i][0], coworkers[i][1]))
    
for _ in range(h):
    #pop the annoyed worker, and increase the annoyance level
   annoyed_worker = heapq.heappop(strategy)
   heapq.heappush(strategy, Node(annoyed_worker.a+annoyed_worker.b, annoyed_worker.b))

res = strategy[0].a
for ele in strategy:
   if res < ele.a:
      res = ele.a

print(res)    

5. Table ADT

A Table ADT must support at least the following three operations as efficient as possible:

  • Search(v) — determine if v exists in the ADT or not,
  • Insert(v) — insert v into the ADT,
  • Remove(v) — remove v from the ADT.

Hash Table is one possible good implementation for this Table ADT. Binary Search Tree is alternative Table ADT.

Q: What is/are the main difference(s) between List ADT basic operations versus Table ADT basic operations?

A: In List ADT, we generally want to insert a new value v at a specific index i whereas in Table ADT, we let the ADT’s underlying data structure to decide where to store v internally. Then in List ADT, we generally want to remove existing item at a specific index i whereas in Table ADT, as we don’t specify where v should be located, i.e., just say: remove existing value v.

5.1 Directed Addressing Table (DAT)

When the range of the Integer keys is small, e.g., [0…M-1], we can use an initially empty (Boolean) array A of size M and implement the following Table ADT operations directly:

  • Search(v): Check if A[v] is true (filled) or false (empty),
  • Insert(v): Set A[v] to be true (filled),
  • Remove(v): Set A[v] to be false (empty).

That’s it, we use the small Integer key itself to determine the address in array A, hence the name Direct Addressing. It is clear that all three major Table ADT operations are O(1).

Real-life Examples:

  • Counting the frequency of keypresses, we can use ASCII values (256 characters)
  • Counting the frequency of each alphabets of a string
  • Postal code that maps an address to a number

Limitations:

  • The keys must be (or can be easily mapped to) non-negative Integer values. Note that basic DAT has problem in the full version of the example in the previous few slides as there are actually variations of bus route numbers in Singapore, e.g., 96B, 151A, NR10, etc.
  • The range of keys must be small. The memory usage will be (insanely) large if we have (insanely) large range.
  • The keys must be dense, i.e., not many gaps in the key values. DAT will contain too many empty (and wasted) cells otherwise.

We will overcome these restrictions with hashing.

5.2 Hash Table

Hash Table is a data structure to map key to values (also called Table or Map Abstract Data Type/ADT). It uses a hash function to map large or even non-Integer keys into a small range of Integer indices (typically [0…hash_table_size-1]).

The probability of two distinct keys colliding into the same index is relatively high and each of this potential collision needs to be resolved to maintain data integrity.

There are several collision resolution strategies that will be highlighted in this visualization: Open Addressing (Linear Probing, Quadratic Probing, and Double Hashing) and Closed Addressing (Separate Chaining -> focus of this module).

请添加图片描述

Q: Hashing or No Hashing: Hash Table is a Table ADT that allows for search(v), insert(new-v), and delete(old-v) operations in O(1) average-case time, if properly designed. However, it is not without its limitations. For each of the cases described below, state if Hash Table can be used. If not possible to use Hash Table, explain why is Hash Table not suitable for that particular case. If it is possible to use Hash Table, describe its design, including:

  1. The <Key, Value> pair
  2. Hashing function/algorithm
  3. Collision resolution (for IT5003, only Separate Chaining/SC)

A:

  1. Yes, we can use 2 (Hash) Tables for efficient lookup both ways (nobody is restricting us, so we
    can use more than one data structure if that simplifies our life 😮).
    The first Table ADT:
    <Key, Value> Pair: <name, age>, to easily find age of a given (distinct) name
    Hashing Algorithm: h(name) = standard string hashing
    Collision Resolution: SC
    For the second Table ADT, just see below:
  2. Since age is integer and of small range, we can actually use Direct Addressing Table.
    <Key, Value> Pair: <age, vector>, to find list of names (in any order) given an age.
    Hashing Algorithm: Direct Addressing
    Collission Resolution: SC, append additional names that have same age at the back of vector.
    The implementation can be something like vector age[151].
    The solution can be performed by iterating through age (from age 17 to max age 150?, which
    is not that large), traversing/printing the chain/list/vector of string names for each age,
    starting from the minimum eligible age.
  3. Yes, we can extract the last name (surname) from a full name (using string tokenization).
    <Key, Value> Pair: <last name, vector<pair<full name, age>>>
    Hashing Algorithm: h(last name) = standard string hashing
    Collision Resolution: Separate chaining for people with the same last name
  4. Two issues:
    First, the floating-point precision. If the precision is fixed, e.g., 1 or 2 decimal places, then the
    number can actually be converted to an integer, and stored as in part (2).
    Second issue is the requirement to retrieve a list of students who passed in ranking order.
    Hash Table is NOT designed for this ordered operations, we need to use the data structure
    discussed below: balanced BST.

Q: Which non-linear data structure should you use if you have to support the following three
operations that can come in any order: 1). many insertions, 2) many deletions, and 3) many
requests for the data in sorted order?

A: Definitely not Hash Table. Hash Table is good for frequent insertions, weaker if there are many deletions (if we use Open Addressing as discussed earlier), and it totally cannot efficiently enumerate the data inside the Hash Table in sorted order, as the data is simply unordered inside the Hash Table. For this, we will need to use (balanced) Binary Search Tree. The version that we learn in IT5003 is the standard (not self-balancing) ones to save time (otherwise we will not have enough time to discuss graph data structure and simple graph traversal algorithms). So, this possibly unbalanced BST is still not fully useful yet.

5.3 Hash Functions

Using hashing, we can:

  • Map (some) non-Integer keys (e.g., Strings) to Integers keys,
  • Map large Integers to smaller Integers,

Q: A good hash function is essential for good Hash Table performance. A good hash function is
easy/efficient to compute and will evenly distribute the possible keys (necessary condition to have good performing Separate Chaining implementation). Comment on the flaw (if any) of the following (integer) hash functions. Assume that for this question, the load factor α = number of keys N / Hash Table size M ≤ 10 (i.e., small enough for our Separate Chaining implementation) for all cases below:

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
Q: Given the following strings, which are the names of Steven’s current family members: {“Steven Halim”, “Grace Suryani Halim”, “Jane Angelina Halim”, “Joshua Ben Halim”, “Jemimah Charissa Halim”}, design any valid minimal perfect hash function to map these 5 names into index [0…4] without any collision.

A: Well, we can use the first character after the first space in the name (recall basic string processing) and we will have ‘H’, ‘S’, ‘A’, ‘B’, ‘C’. Those five characters are all different and can be mapped to [0…4] without any collision.

5.4 Collision Resolution - Separate Chaining (SC)

A hash function may, and quite likely, map different keys (Integer or not) into the same Integer slot, i.e., a many-to-one mapping instead of one-to-one mapping.

This situation is called a collision, i.e., two (or more) keys have the same hash value.

Separate Chaining (SC) collision resolution technique is simple (Python: list of Python lists/Or defaultdict if we want to get by the situation where a key does not exist (yet)). We use M copies of auxiliary data structures, usually Doubly Linked Lists. If two keys a and b both have the same hash value i, both will be appended to the (front/back) of Doubly Linked List i (in this visualization, we append to the back in O(1) with help of tail pointer). That’s it, where the keys will be slotted in is completely dependent on the hash function itself, hence we also call Separate Chaining as Closed Addressing collision resolution technique.

If we use Separate Chaining, the load factor α = N/M describes the average length of the M lists and it will determine the performance of Search(v) as we may have to explore α elements on average. As Remove(v) — also requires Search(v), its performance will be similar as Search(v). Insert(v) is clearly O(1).

If we can bound α to be a small constant (true if we know the expected largest N in our Hash Table application so that we can set up M accordingly), then all Search(v), Insert(v), and Remove(v) operations using Separate Chaining will be O(1).

Q: So worst case of searching is no longer O(1) if we use SC?

A: It is O(1+α), and we can control this load factor α = n/m most of the time

Q: When should we rehash?

A: We may not need to rehash that often if we roughly know how much data that we will deal with and then set table size accordingly. This somewhat ensures that our average linked list length to be “just a few entries” per bucket. However, when the number of keys are dynamic, we need to probably enlarge the table size by a factor of 2 when load factor is > small-threshold (#keys/M > small-threshold; notice that load factor can be > 1 in Separate Chaining)

Exercise 1:
在这里插入图片描述

from collections import defaultdict

n = int(input())
s_to_y = defaultdict(lambda : [])

for _ in range(n):
    name, year = input().split()
    year = int(year)
    s_to_y[name].append(year)


# for _ in range(n):
#     name, year = input().split()
#     if name in grandpa.keys():
#         grandpa[name].append(year)
#     else:
#         grandpa[name] = [year]
        
for k, v in s_to_y.items(): # alpha be the chain length, the cost is O(alpha log alpha) per chain, alpha will be small
    v.sort()
    
query = int(input())
for _ in range(query):
    s, k = input().split()
    k = int(k)
    print(s_to_y[s][k - 1])

Exercise 2:

在这里插入图片描述
在这里插入图片描述

# replace hyphens with spaces, transform string to lowercase
# simple application of Hash Table (of strings) (Python set()) to avoid duplicates.
# At the end we report the size of the set

n = int(input())
resume = set()

for _ in range(n):
    resume.add(input().replace("-"," ").lower())
print(len(resume))

Exercise 3:
在这里插入图片描述

# application of Hash Table (Python set()) again
# For each shopping list, doset intersection with the running answers (items that appear in all lists)
# At the end, we need to call sort to output the answers in sorted (alphabetical) order

n, m = map(int, input().split())

shopping = set(input().split()) 
for _ in range(n - 1):
    new = set(input().split())
    shopping = shopping.intersection(new)

print(len(shopping))
for ele in sorted(shopping):
    print(ele)

5.5 Binary Search Tree (BST)

Q: main difference(s) of Table ADT implementation between Hash Table and (balanced) BST?

A: Hash table = the keys are unordered (faster than bBST, O(1) vs O(log n),
bBST = the keys are ordered (more possible Table ADT applications)

Q: Do you fully understand the basic structure of BST data structure?

A: It is a recursive structure: binary tree and binary search

5.5.1 Search(v)

请添加图片描述

5.5.2 Insert(v)

请添加图片描述

5.5.3 Remove(v)

请添加图片描述

5.5.4 Pred-/Succ-essor(v)

请添加图片描述

5.5.5 Tree Traversal - Inorder

请添加图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值