Algorithm design
Overview of problems and algorithms
Algorithms on data structures
algorithm: a step-by-step set of operations to be performed in order to accomplish some task.
trees, queues, vectors/arrays
sorting, searching, adding elements, deleting elements
Optimization problems
Optimization problem: the problem of finding the best solution from the set of all feasible solutions in some real or imagined scenario.
Optimization model: a mathematical (quantitative) representation of (a real or an imagined) optimization problem.
Optimization method (optimization algorithm): an algorithm that is applied in order to solve an optimization problem.
Examples of optimization problems
- Finding the shortest (or fastest) path between two places
- Knapsack problem
- Schedule the production in a factory so that the production capacity is
maximized - Finding the best location of a factory in order to minimize the costs for
transportation of raw material and of finished products
Components of an optimization model
Parameters
- fixed values used in the optimization model
- are known before applying an optimization method, and the method cannot change
Decision variables
- represent the decisions that the model should determine
- an optimization algorithm searches for the best values
- optimization algorithms operate by systematically changing the values of the decision variables until an optimal (or good enough) solution is eventually found
Objective function
- defines how to evaluate solutions to the optimization model
Constraints
- defines limitations/restrictions on the allowed solutions to the optimization model
The knapsack problem
Optimizing and heuristic(启发式) algorithms
Optimizing (algorithms): guarantees to find the optimal solution (if enough time and resources are available)
Heuristics
- does not guarantee to find the optimal solution
- usually generates a “good” solution within a reasonable amount of time
- can be started as rules of thumb(经验法则) and they are often designed for a specific problem
Why heuristics?
- optimizing consume too much resources (CPU time and memory)
- input might be inaccurate, solution accurate enough
- non-expert better understand heuristic
Greedy algorithms
Constructive heuristics
constructive heuristics: a method that iteratively “builds up” a solution
-
Assuming that a solution consists of a certain number of solution components (variables), it starts with nothing and adds (fixes) the value of exactly one variable (solution component) at the time (in iterations)
-
The algorithm ends when all variables have been assigned a value
-
often used to generate a feasible starting solution which later on can be improved by other algorithms
Greedy algorithms
a natural, and common type of constructive heuristics
- each variable fixing is chosen as the variable that (locally) improves the objective most
A greedy, constructive, algorithm for text compression
Huffman’s algorithm: a greedy algorithm that identifies the optimal code for a particular character set and text to code.
before this, some basic concepts: text compression, graph definitions, Huffman coding
1 Text compression
only represent a specific text, we can compress the bit.
2 Human coding
- a technique for data compression of a text
- make use of the frequency of characters in order to derive an optimal code of a text
- occur frequently, short
- occur infrequently, long
3.1 Graph
- a set of objects that are connected to each other through links
- G = (V,E)
- V vertices
- E edges; e∈E where (v,w)∈V
- v is adjacent to w
- sometimes cost ce is associated with the edges
3.2 Tree
Definition
- A tree is an undirected graph where each pair of vertices is connected by exactly one path.
- A tree is a connected (undirected graph) with n vertices and n-1 edges.
3.3 Rooted tree
one vertex is considered to be the root of the tree
3.4 Binary tree
a (rooted) tree in which no node(vertices) have more than two children
3.5 Forest
a disjoint set of trees
4.1 Tree representation of codes
- character codes can be represented using a special type of tree
- often called a trie(or a prefix(前缀) tree)
- a character code can be identified through its path from the root to a leaf
- left means 0; right means 1
4.2 Tries
4.3 Prefix
a starting of a word(or a text) - 单词的前缀 h,ha,han,hang
4.4 Prefix code
- Huffman coding allows character codes to be of different length
- no character code is a prefix of another character code
5 Multiple codes for a character set
exist a huge number of possible codes
6 cost of a code
Cost = Total number of bits required to code the text
-
i: an index set of the characters
-
di: the depth of the character i in the code tree
-
fi: the frequency of character i in the text to encode
-
c o s t = ∑ i ∈ I d i ⋅ f i cost = \sum_{i \in I} d_i \cdot f_i cost=∑i∈Idi⋅fi
-
the cost (number of bits) depends on the code and the text to be coded
7 Equivalent codes
swapping children don’t change
Huffman’s algorithm
a greedy algorithm that can be used to identify the optimal code for a particular character set and text to code
-
at the start there is one tree (with a single node)
-
in each iteration, the two trees with smallest weight are merged by
- a new root node is created(合并成一棵树,加个树杈)
- the root nodes of the two trees to merge are added as children of the new root node(合并成新的树)
-
the algorithm terminates when there is only one tree
-
this happens after C-1 iterations, where the number of different characters is C
-
ppt有图解
Improving search heuristics
a method that iteratively improves a feasible solution by performing different types of modifications
requires that a feasible starting solution is provided as input to the algorithm
Neighborhood search
-
to iteratively search for improving solutions among the solutions that are close (in some sense) to the current solution
-
improving solutions in a neighborhood of a current solution
-
choose neighborhood is problem specific and a trade-off between solution quality and solving time
local optima
- Neighborhood search algorithms guarantee local optima
- To reduce the risk of getting stuck in a “bad” local optima, a neighborhood search algorithm can be restarted with different starting solutions
Algorithm description
- ppt
The knapsack problem
- n
- I = {1,…,n}
- pi
- wi
- W
- xi ∈ {0,1}
Objective function:
Maximize ∑ i ∈ I p i x i \sum_{i\in I} p_ix_i ∑i∈Ipixi
Constraints:
∑ i ∈ I w i x i ≤ W \sum_{i\in I} w_ix_i \leq W ∑i∈Iwixi≤W
x i ∈ { 0 , 1 } , ∀ i ∈ I x_i \in \{0,1\} ,\forall i \in I xi∈{0,1},∀i∈I
Example
- ipad笔记
Meta heuristics
be started as refined improving search heuristics
purpose: to guide an improving search heuristic systematically in order to avoid getting stuck in local optima
example: Tabu search; Simulated annealing(模拟退火)
Tabu search
combines neighborhood search with the possibility to move to solutions with worse objective function value, but it may be cycling.
tabu-list (or forbidden-list)
- a list of the k most recent previous solutions
- is updated after every iteration
- we can change k to tune the algorithm
Algorithm description
Example(两个)
- 第二个在ipad
Simulated annealing
- analogy to the annealing process of slowly cooling down metals
- defines a temperature that slowly decreases
- with a probability that decreases with the temperature, the algorithm allows moving to solutions with worse objective function value
- If chosen neighbor is better, always used as the next solution
- if chosen neighbor is worse, used with a decreasing probability
Algorithm description
Divide-and-conquer
Recursion
A function that is defined in terms of itself
Recursive require at least one base case
- A base case is an (input) value for which the function can be calculated without recursion
- without it, it will not terminate
Example
Fundamental rules of recursion
- at least one base case
- progress towards a base case
Divide-and-conquer
- based on recursion
- recursively dividing a problem into subproblems (two or more) of the same or related type, until the subproblems are small enough to be solved without recursion
- the solutions to the subproblems are then combined (typically in several steps)
two parts
- divide: smaller problems are solved recursively
- conquer: the solutions to the original problem is found by combining the solutions to the (smaller) subproblems
Merge sort
a sorting algorithm that uses the principles of divide-and-conquer
Based on the idea of repeatedly merging sorted lists into sorted lists of larger size until the initial list is sorted
Divide, Conquer图解
Merge
- the fundamental operation of the merge sort
- “merges” two sorted lists into one sorted list
- used repeatedly in the “conquer”-part
Example
Dynamic programming
The Shortest Path Problem
- find the shortest (shortest or cheapest) path between a start node ns and an end node ne in a network
- basic problems in network optimization
- often appear as subproblems
- assumptions
- all arcs directed
- end can be reached
- no cycles with negative cost
Bellman’s Equations
-
ys = 0
-
y j = min i : ( i , j ) ∈ B { y i + c i j } y_j = \min_{i:(i,j)\in B} \{y_i + c_{ij}\} yj=mini:(i,j)∈B{yi+cij}
-
B the set of arcs(边)
-
cij the cost of arc(i,j) ∈ B
-
yj the cost for the shortest path from ns (the start node) to nj
-
all solving SSP try to fufill the Bellman equations
Dynamic Programming
-
solve some types of structured optimization problems
-
solved with DynP can be described as a SPP in an acyclic network
-
fulfill the Bellman equations
-
General idea is to break down the problem into overlapping subproblems, solve the subproblems and then combine the solutions of the subproblems (utilizing on the relations between subproblems) to reach an overall solution
-
The problem need to have a structure that allows it to be divided into a number
of sequential stages -
in each stage, the optimal solutions of a number of subproblems, each having
one(control) variable, are determined. -
one subproblem for each possible state in each stage
-
Given a particular state in one stage, a decision (control)determines how it is best to reach that state from the previous stage
-
Hence a connection (or recursion)between stages is achieved, which allows to find the optimal solution when the final stage is reached
Summary
- Divide the problem in stages.
- For one stage at the time, decide the best way to achieve each state in that stage. Record the optimal decision (control) and the optimal cost for reaching each state.
- Go to next stage
- When the final stage has been reached, unravel the optimal solution using the optimal decision that has been recorded.
Example
老师上课讲的题目