数据结构与算法分析
chapter 1
computer design goals:
- todesign an algorithm that is easy to understand,code and debug
- to design an algorthim that makes efficient use of the computer’s resources
1.1 A philosophy of data structures
1.1.1 the need of data structure
data structure: an organization or structuring for a collectio n of data items
A solution is said to be efficient if it solves problem within the required resource constraints
- total space to store data: main memory and disk space constraints(内存和外存空间限制)
- time allowed to perform each subtask
A data structure require
each problem has constraints on available space and time
- space for each data item it store
- time to perform each basic operation
- programming effort
how to select a data structure
- analyze the problem to dertermine the basic operations that must be supported
- analyze the problem to dertermine the resource constraints that must be supported
- quantify the resource constraints for each operation
- select the data structure that best meet these requirement
1.2 abstract data types(ADT) and data structure
Type: a collection of values
simple type: contain no subparts(int,boolean)
aggregate type: contain subparts
Data item: a piece of information or record whose value is drawn from a type(某个类型的一条记录)
Data type: a type together with a collection of operations to manipulate the type
数据类型:类型+类型的操作
difference between data type and its implemenrtations: list data type(数据表数据类型) has the implementation: the linked list(链表) and array-based list(数组)
Abstract data type(ADT): the realization of data type as a software component:将数据类型进行软件组件化
ADT特征
数据抽象:用ADT描述程序处理的实体时,强调的是其本质的特征、其所能完成的功能以及它和外部用户的接口
数据封装:将实体的外部特性和其内部实现细节分离,并且对外部用户隐藏其内部实现细节
encapsulation:implementation details protexted from outside access(封装性)
Data structure: the physical implementation for an ADT
在C++中ADT和它的实现共同组成了类(ADT有点像接口的形式),ADT每一个操作都被成员函数所实现
Data type have both logic and a physical form:
-
ADT is logic form
-
the implementation as datastructure is physical form
抽象数据类型ADT的表示(DRO)
D:数据对象 R:数据集关系 O:数据集的基本操作
1.3 design pattern
design pattern: the interactions of objects and classed
设计模式是一种模板,用于描述一个解决方案的框架以及给定问题的具体细节
1.3.1 flyweight(享元模式)
- an application with many objects
- some of these objects are identical inthe information the contain and the role they play
- those objects must be reached from various places
To reduce memory cost by sharing space
方法:在文本排列中一个C可以作为对象,但不是每个C都作为独立对象,而是创建单一的对象,每个C都创建一个引用,引用同一个对象,C对象的多个引用实例就是享元
1.3.2 Visitor(访问者模式)
对一个对象树,相比较对于每一个功能都写一个单独的遍历函数,更好的是,写一个通用的遍历函数,将需要的操作传入
1.3.3 composite(组合模式)
主要处理有多层子类的情况,比如一篇文章,文章对象,行对象,字符对象
方法:每一个子对象都包含自身可能的操作,需要调用时,只需要调用最外层对象的方法,会寻找对应的子类
1.3.4 strategy(策略模式)
方法:封装一系列可以替换的方法。比如各个国家的税率计算,先创建接口,包含的是可替换的方法
1.4 problems,Algorithms and programs
1.4.1 problems
A problem is a task to be performed
-
best thought of as inputs matching outputs
根据输入对应输出的方式进行思考
-
a problem defination should include constraints on the resource that consumed by any acceptable solution
问题的定义包括任意解决方案的资源限制问题
problem viewd as function in mathematical sense(数学角度的分析)
A function is a matching between inputs(domain 定义域) and outputs(range 值域)
The value making up an input is called parameter
1.4.2 algorthims
An algorthim is a method and a process to solve a problem
The algorthim is the implementation for function that transforms an input to the corresponding output
A problem has many algorthims
- It must be correct
- It must be composed of a series of concret steps(具体明确的步骤)
- There can be no ambiguity(歧义) as to which step will be performed next
- It must composed of finite(有限的) number of steps
- It must be terminate(终止)
1.4.3 program
A computer program is an instance or concret representation for an algorthim in some programming language
chapter 3
3.1 introduction
Size(规模):输入量的数量
basic operation:完成操作所需要的时间与操作数具体取值无关(不可分割的操作单元)
算法性能:执行一定规模,需要的number of basic operations(基本操作数)
算法增长率(growth rate)the cost of the algorthim grows as the size of its input grows
随着输入数量的上升,算法代价的增长速率:不考虑时间代价函数中的系数
cn增长率:线性增长率
cn2 :二次增长率
2n :指数增长率
int largest(int A[], int n){
int largest = 0;
for(int i = 0; i < n; i++){
if(A[largest]<A[i]){
largest = i;
}
return largest;
}
}
时间代价:T(n) = cn(检查一个元素需要c的时间,会比较n次)
sum = 0;
for(int i = 0; i <= n; i++){
for(int j = 1; j<=n;j++){
sum++
}
}
时间代价:T(n) = cn2(递增一次需要c的时间)
3.2 Best, Worst, Average Cases
example:sequential search for K in array
Best case: Find at first position. Cost is 1 compare
Worst case: Find at last position. Cost is n compares
Average case: (n+1)/2 compares
3.3 Faster Computer or Algorithm
T(n) | n | n’ | Change | n’/n |
---|---|---|---|---|
10n | 1,000 | 10,000 | n’ = 10n | 10 |
20n | 500 | 5,000 | n’ = 10n | 10 |
5n log n | 250 | 1,842 | Ö10 n < n’ < 10n | 7.37 |
2n2 | 70 | 223 | n’ = Ö10n | 3.16 |
2n | 13 | 16 | n’ = n + 3 | ----- |
计算机性能的提升是将T成倍数的降低(实质上之间c减小),不是单位时间内的op个数
3.3 Asymptotic Analysis(渐进分析)
计算算法运行时间的过程中将系数进行忽略
Big Oh(上限)
Definition: For T(n) a non-negatively valued function, T(n) is in the set O(f(n)) if there exist two positive constants c and n0 such that T(n) <= cf(n) for all n > n0
存在c和n0,对于任意的n>n0,有T(n)<=cf(n),则T(n)在集合O(f(n))
T是对于n的多项式,寻找n次数最高的一项,作为上限
Example 2: T(n) = c1n2 + c2n in average case.
c1n2 + c2n <= c1n2 + c2n2 <= (c1 + c2)n2 for all n > 1.
T(n) <= cn for c = c1 + c2 and n0 = 1.
Therefore, T(n) is in O(n2) by the definition.
*存在最好,平均,最差情况的算法
Big Omega(下限)
Definition: For T(n) a non-negatively valued function, T(n) is in the set W(g(n)) if there exist two positive constants c and n0 such that T(n) >= cg(n) for all n > n0
example:
往往bigOh和Omega是一样的,同样是选择n的多项式中次数最高的一项,这是最tight的一项
BigOh和worst case的区别
worst case指的是对于同那样的规模的数据,但是数据组成不同导致的时间需求不同,最差的数据组成需要的时间就是worst case(只有部分特定的算法才会有最差,最好情况)
bigOh指的是可能的最高的增长率
An algorithm is said to be θ \theta θ(h(n)) if it is in O(h(n)) and it is in Ω \Omega Ω(h(n)).
当上下限相同时,直接使用 θ \theta θ
Simplifying Rules
1.If f(n) is in O(g(n)) and g(n) is in O(h(n)), then f(n) is in O(h(n)). 传递性
2.If f(n) is in O(kg(n)) for any constant k > 0, then f(n) is in O(g(n)). 忽略内部系数
3.If f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)), then (f1 + f2)(n) is in O(max(g1(n), g2(n))). 累加性
4.If f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)) then f1(n)f2(n) is in O(g1(n)g2(n)). 累乘性
example:
sum = 0;
for (i=1; i<=n; i++)
for (j=1; j<=i; j++)
sum++;
for (k=0; k<n; k++)
A[k] = k;
T(n) = 1/2 n2 + 3/2n, in θ \theta θ(n2)
binary search
利用递归的形式得出,二分搜索的每一次递归都让数据规模小了一半
T(n) = T(n/2) + c, T(1)=c , 利用数组加和相消的方式得出T(n) = clogn,在 θ \theta θ(logn)当中
sort时间代价
-
Cost of I/O: W(n).
-
Bubble or insertion sort: O(n2).
-
A better sort (Quicksort, Mergesort, Heapsort, etc.): O(n log n).
-
We prove later that sorting is Ω \Omega Ω(n log n).
3.4 multiple parameters
for (i=0; i<C; i++) // Initialize count
count[i] = 0;
for (i=0; i<P; i++) // Look at all pixels
count[value(i)]++; // Increment count
sort(count); // Sort pixel counts
一张图片中有P个pixel(像素),每个像素的颜色值在0-(c-1)之间,现在需要计算图当中颜色值出现的次数
- 第一个循环,size是C,颜色值的个数 θ \theta θ©
- 第二个循环,size是P,像素的个数 θ \theta θ§
- sort的代价是 θ \theta θ(ClogC)
总的时间代价就是 θ \theta θ(P+ClogC),两个变量缺一不可
3.5 space cost
渐进分析的方法同样适用
包含n个整数的一维数组的空间代价,一个整数占用c字节,数组需要cn个字节的空间即 θ \theta θ(n)
Space/Time Tradeoff Principle
One can often reduce time if one is willing to sacrifice space, or vice versa
Disk-based Space/Time Tradeoff Principle: The smaller you make the disk storage requirements, the faster your program will run.
磁盘上的存储开销越小,消耗的时间越少,磁盘读取消耗的时间过大
chapter 4 list,stack and queue
4.1 lists
list is a finite, ordered sequences of data items(两种实现方式:顺序表和链表)
Notation: <a0, a1, …, an-1>
list implementation will support the concept of a current position(当前位置)
by defining the list in terms of left and right partitions. Partitions(部分) are separated by the fence(栅栏)
<20, 23 | 12, 15>
线性表ADT
抽象类的表达法正式表达ADT
template <class Elem> class List {
public:
List();
virtual void clear() = 0;
virtual bool insert(const Elem&) = 0;
//当前位置插入
virtual bool append(const Elem&) = 0;
//扩展线性表
virtual bool remove(Elem&) = 0;
//删除线性表的某个数
virtual void setStart() = 0;
//将fence放置在list表头
virtual void setEnd() = 0;
//将fence放置在list表尾
virtual void prev() = 0;
virtual void next() = 0;
virtual int leftLength() const = 0;
virtual int rightLength() const = 0;
virtual bool setPos(int pos) = 0;
virtual bool getValue(Elem&) const = 0;
virtual void print() const = 0;
};
List: <12 | 32, 15>
MyList.insert(99);
Result: <12 | 99, 32, 15>
//遍历线性表:分别是将fence设置在表头,获取值,向下一个位置移动(和iterator比较像)
for (MyList.setStart(); MyList.getValue(it); MyList.next())
DoSomething(it);
// Return true iff K is in list
// List Find Function
bool find(List<int>& L, int K) {
int it;
for (L.setStart(); L.getValue(it); L.next())
if (K == it) return true; // Found it
return false; // Not found
}
4.1.1 顺序表的实现方式
template <class Elem> // Array-based list
class AList : public List<Elem> { //Alist集成了抽象类List
private:
int maxSize; // Maximum size of list
int listSize; // Actual elem count
int fence; // Position of fence
Elem* listArray; // Array holding list(存储数据的指针)
public:
AList(int size=DefaultListSize) {
maxSize = size; //设置最大空间
listSize = fence = 0;
listArray = new Elem[maxSize]; //分配空间
}
~AList() { delete [] listArray; }
void clear() {
delete [] listArray;
listSize = fence = 0;
listArray = new Elem[maxSize];
}
void setStart() { fence = 0; }
void setEnd() { fence = listSize; }
void prev() { if (fence != 0) fence--; }
void next() { if (fence <= listSize)
fence++; }
int leftLength() const { return fence; }
int rightLength() const
{ return listSize - fence; }
bool setPos(int pos) {
if ((pos >= 0) && (pos <= listSize))
fence = pos;
return (pos >= 0) && (pos <= listSize);
}
bool getValue(Elem& it) const {
//引用可以直接修改外部变量
if (rightLength() == 0) return false;
else {
it = listArray[fence];
return true;
}
}
三个外部成员函数定义
// Insert at front of right partition
template <class Elem>
bool AList<Elem>::insert(const Elem& item) {
if (listSize == maxSize) return false;
for(int i=listSize; i>fence; i--)
listArray[i] = listArray[i-1]; //将fence之后的所有数据向后移动一位
listArray[fence] = item; //将现在位置的单元设置为数据
listSize++;
return true;
}
// Append Elem to end of the list
template <class Elem>
bool AList<Elem>::append(const Elem& item) {
if (listSize == maxSize)
return false;
listArray[listSize++] = item; //将最后一位的后一位设置为item
return true;
}
// Remove and return first Elem in right
// partition
template <class Elem> bool AList<Elem>::remove(Elem& it) {
if (rightLength() == 0)
return false;
it = listArray[fence]; // Copy Elem
for(int i=fence; i<listSize-1; i++)
listArray[i] = listArray[i+1]; //fence右边的所有数据向左移动一位
listSize--; // Decrement size
return true;
}
4.1.2 linked list
// Singly-linked list node(节点类的定义)
template <class Elem> class Link {
public:
Elem element; // Value for this node
Link *next; // Pointer to next node
Link(const Elem& elemval,Link* nextval =NULL){
element = elemval;
next = nextval;
}
Link(Link* nextval =NULL){
next = nextval; //节点头没存储数据
}
};
// Linked list implementation
// 和顺序表的区别在于不需要最大存储空间
template <class Elem> class LList:
public List<Elem> {
private:
Link<Elem>* head; // Point to list header
Link<Elem>* tail; // Pointer to last Elem
Link<Elem>* fence;// Last element on left
int leftcnt; // Size of left
int rightcnt; // Size of right
void init() { // Intialization routine
fence = tail = head = new Link<Elem>;
leftcnt = rightcnt = 0;
}
void setStart() {
fence = head;
rightcnt += leftcnt;
leftcnt = 0;
}
void setEnd() {
fence = tail; leftcnt += rightcnt;
rightcnt = 0; }
void next() {
// Don't move fence if right empty
if (fence != tail) {
fence = fence->next; rightcnt--;
leftcnt++; }
//fence本身是link对象包含下一个节点的地址
}
int leftLength() const { return leftcnt; }
int rightLength() const { return rightcnt; }
bool getValue(Elem& it) const {
if(rightLength() == 0) return false;
it = fence->next->element;
return true; }
// Insert at front of right partition
template <class Elem>
bool LList<Elem>::insert(const Elem& item) {
fence->next = new Link<Elem>(item, fence->next);
//利用第一个构造函数,直接给新创建的节点的next赋值
if (tail == fence)
tail = fence->next;
rightcnt++;
return true;
}
// Append Elem to end of the list
template <class Elem>
bool LList<Elem>::append(const Elem& item) {
tail = tail->next = new Link<Elem>(item, NULL);
rightcnt++;
return true;
}
// Remove and return first Elem in right
// 将fence->next消除,不是将fence所在单元消除
template <class Elem> bool LList<Elem>::remove(Elem& it) {
if (fence->next == NULL)
return false;
it = fence->next->element; // Remember val
// Remember link node
Link<Elem>* ltemp = fence->next;
fence->next = ltemp->next; // Remove
if (tail == ltemp) // 如果移除的是最后一位需要重新设置tail
tail = fence;
delete ltemp; // Reclaim space
rightcnt--;
return true;
}
prev
// Move fence one step left;
// no change if left is empty
template <class Elem> void
LList<Elem>::prev() {
Link<Elem>* temp = head;
if (fence == head) return; // No prev Elem
while (temp->next!=fence)
temp=temp->next;
fence = temp;
leftcnt--;
rightcnt++;
}
Array-Based Lists:
难以插入删除数据
-
Insertion and deletion are Q(n). 因为需要将每一位都向后进行移动
-
Prev and direct access are Q(1).
-
Array must be allocated in advance.
-
No overhead if all array positions are full.(结构性开销:指的是链表每个元素的next指针)
Linked Lists:
难以进行检索
Insertion and deletion are Q(1)
Prev and direct access are Q(n) 每一个节点都只包含向后连接的指针,向前需要从前往后遍历
Space grows with number of elements
Every element requires overhead
数据存储量较小时,链表效率更高,因为数组会有分配多余的空间
space comparasion:
“Break-even” point:(只数组占用空间超过链表的临界值)
n = DE /(P + E)
E: Space for data value.
P: Space for pointer.
D: max Number of elements in array.
n指的是D个数据量的数组占有的空间能创造的节点数量
4.1.3 freelist
// Singly-linked list node with freelist
template <class Elem> class Link {
private:
static Link<Elem>* freelist; // Head
public:
Elem element; // Value for this node
Link* next; // Point to next node
Link(const Elem& elemval,Link* nextval =NULL) {
element = elemval;
next = nextval;
}
Link(Link* nextval =NULL) {next=nextval;}
void* operator new(size_t); // Overload
void operator delete(void*); // Overload
};
template <class Elem>
Link<Elem>* Link<Elem>::freelist = NULL;
template <class Elem> // Overload for new
void* Link<Elem>::operator new(size_t) {
if (freelist == NULL) return ::new Link;
Link<Elem>* temp = freelist; // Reuse
freelist = freelist->next;
return temp; // Return the link
}
template <class Elem> // Overload delete
void Link<Elem>::operator delete(void* ptr){
((Link<Elem>*)ptr)->next = freelist;
freelist = (Link<Elem>*)ptr;
}
4.1.3 Doubly Linked Lists
// Doubly-linked list link node
template <class Elem> class Link {
public:
Elem element; // Value for this node
Link *next; // Pointer to next node
Link *prev; // Pointer to previous node
Link(const Elem& e, Link* prevp =NULL,
Link* nextp =NULL)
{ element=e; prev=prevp; next=nextp; }
Link(Link* prevp =NULL, Link* nextp =NULL)
{ prev = prevp; next = nextp; }
};
// Insert at front of right partition
template <class Elem>
bool LList<Elem>::insert(const Elem& item) {
fence->next =
new Link<Elem>(item, fence, fence->next);
if (fence->next->next != NULL) //在insert之后应该将更后面一位的prev进行设置
fence->next->next->prev = fence->next;
if (tail == fence) // Appending new Elem
tail = fence->next; // so set tail
rightcnt++; // Added to right
return true;
}
// Remove, return first Elem in right part
template <class Elem>
bool LList<Elem>::remove(Elem& it) {
if (fence->next == NULL) return false;
it = fence->next->element;
Link<Elem>* ltemp = fence->next;
if (ltemp->next != NULL)
ltemp->next->prev = fence;
else tail = fence; // Reset tail
fence->next = ltemp->next; // Remove delete ltemp; // Reclaim space
rightcnt--; // Removed from right
return true;
}
4.1.4 dictionary
Often want to insert records, delete records, search for records.
Required concepts:
-
Search key: Describe what we are looking for
-
Key comparison
-
Equality: sequential search
-
Relative order: sorting
-
Record comparison
4.1.5 stacks(栈)
LIFO: Last In, First Out.(最后一个进去的最先出来)
Restricted form of list: Insert and remove only at front of list.
Insert: PUSH
Remove: POP
The accessible element is called TOP.
Stack ADT
// Stack abtract class
template <class Elem> class Stack {
public:
// Reinitialize the stack
virtual void clear() = 0;
// Push an element onto the top of the stack.
virtual bool push(const Elem&) = 0;
// Remove the element at the top of the stack.
virtual bool pop(Elem&) = 0;
// Get a copy of the top element in the stack
virtual bool topValue(Elem&) const = 0;
// Return the number of elements in the stack.
virtual int length() const = 0;
};
Array-Based Stack
// Array-based stack implementation
template <typename E> class AStack {
private:
int size; // Maximum size of stack
int top; // Index for top element(当前闲置的栈顶,数组的结尾是栈头)
Elem *listArray; // Array holding elements
public:
AStack(){
maxSize = size;
top = 0;
listArray = new Elem[size];
}
void push(const E& it){
Assert(top != maxSize, "stack is full"); //Assert指的是测试(Junit一样)
listArray[top] = it;
top++;
}
E pop(){
Assert(top != 0, "stack is empty");
top--;
return listArray[top];
}
const E& topvalue(){
Assert(top != 0, "stack is empty");
return listArray[top-1];
}
}
Linked Stack
// Linked stack implementation
template<typename E> class LStack:public Stack<E> {
private:
Link<Elem>* top; // 和array-based区别就在于链表头就指向栈头,不需要一个top来计数
int size; // Count number of elems
public:
LStack(int size = defaultSize){
top = NULL;
size = 0;
}
void clear(){
while(top!=NULL) {
Link<E>* temp = top;
top = top->next;
delete temp;
}
size = 0;
}
void push(const E& it){
top = new Link(it,top); //和链表的append不同,是在开头处进行附加
size++;
}
void pop(){
Assert(top != NULL, "stack is empty");
E it = top->element;
Link<E>* ltemp = top->element;
delete top;
top = ltemp;
size--;
return it;
}
const E& topValue() {
Assert(top != NULL, "Stack is empty");
return top->element;
}
}
example:
十进制数N和其他d进制数的转换:
N = (N div d)×d + N mod d
N | N div 8 | N mod 8 |
---|---|---|
1348 | 168 | 4 |
168 | 21 | 0 |
21 | 2 | 5 |
2 | 0 | 2 |
void conversion () {
InitStack( ); // 构造空栈
cin >> N; // 输入一个十进制数
while(N) {
Push(N % 8); // "余数"入栈:入栈的顺序4->0->5-.2
N = N / 8; // 非零"商"继续运算
}
while (!StackEmpty) {
// 和"求余"所得相逆的顺序输出八进制的各位数
Pop(e);
cout << e;
}
}
queues(会出现两个I/O口,stack只有一个I/O口)
FIFO: First in, First Out
Restricted form of list: Insert at one end, remove from the other.
Notation:
Insert: Enqueue
Delete: Dequeue
First element: Front
Last element: Rear
chapter 5 binary trees
binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees, called the left and right subtrees, which are disjoint from each other and from the root
Notation: Node, children(B是A的children), edge(边), parent(A是B的parent), ancestor, descendant(D是A的descentdant), path, depth, height, level, leaf node(叶节点:终点), internal node(内部节点), subtree.
Full binary tree: Each node is either a leaf or internal node with exactly two non-empty children.(要么是叶节点,要么是内部节点且有两个子节点)
Complete binary tree: If the height of the tree is d, then all leaves except possibly level d are completely full. The bottom level has all nodes to the left side.
Full Binary Tree Theorem
Theorem: The number of leaves in a non-empty full binary tree is one more than the number of internal nodes.
-
在二叉树的第i(i≥1)层上至多有2i-1 个结点
-
深度为k(k≥1)的二叉树上至多含2k-1个结点
-
对任何一棵二叉树,若它含有n0个叶子结点,n2个度为2的结点,则必存在关系式:n0 = n2+1
-
具有n个结点的完全二叉树的深度为log2n+1