数据结构与算法分析学习笔记（一）

最新推荐文章于 2022-08-22 13:39:34 发布

Eric Zerk

最新推荐文章于 2022-08-22 13:39:34 发布

阅读量4.5k

点赞数 1

文章标签：数据结构

本文链接：https://blog.csdn.net/weixin_43469023/article/details/108980139

版权

数据结构与算法分析

chapter 1

computer design goals:

todesign an algorithm that is easy to understand,code and debug
to design an algorthim that makes efficient use of the computer’s resources

1.1 A philosophy of data structures

1.1.1 the need of data structure

data structure: an organization or structuring for a collectio n of data items

A solution is said to be efficient if it solves problem within the required resource constraints

total space to store data: main memory and disk space constraints（内存和外存空间限制）
time allowed to perform each subtask

A data structure require

each problem has constraints on available space and time

space for each data item it store
time to perform each basic operation
programming effort

how to select a data structure

analyze the problem to dertermine the basic operations that must be supported
analyze the problem to dertermine the resource constraints that must be supported
quantify the resource constraints for each operation
select the data structure that best meet these requirement

1.2 abstract data types(ADT) and data structure

Type: a collection of values

simple type: contain no subparts(int,boolean)

aggregate type: contain subparts

Data item: a piece of information or record whose value is drawn from a type(某个类型的一条记录)

Data type: a type together with a collection of operations to manipulate the type

数据类型：类型+类型的操作

difference between data type and its implemenrtations: list data type(数据表数据类型) has the implementation: the linked list(链表) and array-based list(数组)

Abstract data type(ADT): the realization of data type as a software component:将数据类型进行软件组件化

ADT特征

数据抽象：用ADT描述程序处理的实体时，强调的是其本质的特征、其所能完成的功能以及它和外部用户的接口

数据封装：将实体的外部特性和其内部实现细节分离，并且对外部用户隐藏其内部实现细节

encapsulation：implementation details protexted from outside access(封装性）

Data structure: the physical implementation for an ADT

在C++中ADT和它的实现共同组成了类（ADT有点像接口的形式），ADT每一个操作都被成员函数所实现

Data type have both logic and a physical form:

ADT is logic form
the implementation as datastructure is physical form

抽象数据类型ADT的表示（DRO）

D：数据对象 R：数据集关系 O：数据集的基本操作

1.3 design pattern

design pattern: the interactions of objects and classed

设计模式是一种模板，用于描述一个解决方案的框架以及给定问题的具体细节

1.3.1 flyweight(享元模式)

an application with many objects
some of these objects are identical inthe information the contain and the role they play
those objects must be reached from various places

To reduce memory cost by sharing space

方法：在文本排列中一个C可以作为对象，但不是每个C都作为独立对象，而是创建单一的对象，每个C都创建一个引用，引用同一个对象，C对象的多个引用实例就是享元

1.3.2 Visitor(访问者模式)

对一个对象树，相比较对于每一个功能都写一个单独的遍历函数，更好的是，写一个通用的遍历函数，将需要的操作传入

1.3.3 composite(组合模式)

主要处理有多层子类的情况，比如一篇文章，文章对象，行对象，字符对象

方法：每一个子对象都包含自身可能的操作，需要调用时，只需要调用最外层对象的方法，会寻找对应的子类

1.3.4 strategy(策略模式)

方法：封装一系列可以替换的方法。比如各个国家的税率计算，先创建接口，包含的是可替换的方法

1.4 problems，Algorithms and programs

1.4.1 problems

A problem is a task to be performed

best thought of as inputs matching outputs

根据输入对应输出的方式进行思考
a problem defination should include constraints on the resource that consumed by any acceptable solution

问题的定义包括任意解决方案的资源限制问题

problem viewd as function in mathematical sense（数学角度的分析）

A function is a matching between inputs(domain 定义域) and outputs(range 值域)

The value making up an input is called parameter

1.4.2 algorthims

An algorthim is a method and a process to solve a problem

The algorthim is the implementation for function that transforms an input to the corresponding output

A problem has many algorthims

It must be correct
It must be composed of a series of concret steps(具体明确的步骤)
There can be no ambiguity(歧义) as to which step will be performed next
It must composed of finite(有限的) number of steps
It must be terminate(终止)

1.4.3 program

A computer program is an instance or concret representation for an algorthim in some programming language

chapter 3

3.1 introduction

Size（规模）：输入量的数量

basic operation：完成操作所需要的时间与操作数具体取值无关（不可分割的操作单元）

算法性能：执行一定规模，需要的number of basic operations（基本操作数）

算法增长率（growth rate）the cost of the algorthim grows as the size of its input grows

随着输入数量的上升，算法代价的增长速率：不考虑时间代价函数中的系数

cn增长率：线性增长率

cn² :二次增长率

2ⁿ :指数增长率

int largest(int A[], int n){
	int largest = 0;
	for(int i = 0; i < n; i++){
		if(A[largest]<A[i]){
		largest = i;
		}
		return largest;
	}
}

时间代价：T(n) = cn（检查一个元素需要c的时间，会比较n次）

sum = 0;
for(int i = 0; i <= n; i++){
	for(int j = 1; j<=n;j++){
		sum++
	}
}

时间代价：T(n) = cn²（递增一次需要c的时间）

3.2 Best, Worst, Average Cases

example：sequential search for K in array

Best case: Find at first position. Cost is 1 compare

Worst case: Find at last position. Cost is n compares

Average case: (n+1)/2 compares

3.3 Faster Computer or Algorithm

T(n)	n	n’	Change	n’/n
10n	1,000	10,000	n’ = 10n	10
20n	500	5,000	n’ = 10n	10
5n log n	250	1,842	Ö10 n < n’ < 10n	7.37
2n²	70	223	n’ = Ö10n	3.16
2n	13	16	n’ = n + 3	-----

计算机性能的提升是将T成倍数的降低（实质上之间c减小），不是单位时间内的op个数

3.3 Asymptotic Analysis（渐进分析）

计算算法运行时间的过程中将系数进行忽略

Big Oh(上限)

Definition: For T(n) a non-negatively valued function, T(n) is in the set O(f(n)) if there exist two positive constants c and n0 such that T(n) <= cf(n) for all n > n0

存在c和n0，对于任意的n>n0，有T(n)<=cf(n)，则T(n)在集合O(f(n))

T是对于n的多项式，寻找n次数最高的一项，作为上限

Example 2: T(n) = c1n² + c2n in average case.

c1n² + c2n <= c1n² + c2n2 <= (c1 + c2)n² for all n > 1.

T(n) <= cn for c = c1 + c2 and n0 = 1.

Therefore, T(n) is in O(n²) by the definition.

*存在最好，平均，最差情况的算法

Big Omega(下限)

Definition: For T(n) a non-negatively valued function, T(n) is in the set W(g(n)) if there exist two positive constants c and n0 such that T(n) >= cg(n) for all n > n0

example：

往往bigOh和Omega是一样的，同样是选择n的多项式中次数最高的一项，这是最tight的一项

BigOh和worst case的区别

worst case指的是对于同那样的规模的数据，但是数据组成不同导致的时间需求不同，最差的数据组成需要的时间就是worst case（只有部分特定的算法才会有最差，最好情况）

bigOh指的是可能的最高的增长率

An algorithm is said to be $\theta$ (h(n)) if it is in O(h(n)) and it is in $\Omega$ (h(n)).

当上下限相同时，直接使用 $\theta$

Simplifying Rules

1.If f(n) is in O(g(n)) and g(n) is in O(h(n)), then f(n) is in O(h(n)). 传递性

2.If f(n) is in O(kg(n)) for any constant k > 0, then f(n) is in O(g(n)). 忽略内部系数

3.If f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)), then (f1 + f2)(n) is in O(max(g1(n), g2(n))). 累加性

4.If f1(n) is in O(g1(n)) and f2(n) is in O(g2(n)) then f1(n)f2(n) is in O(g1(n)g2(n)). 累乘性

example：

sum = 0;
for (i=1; i<=n; i++)
  for (j=1; j<=i; j++)
    sum++;
for (k=0; k<n; k++)
  A[k] = k;

T(n) = 1/2 n² + 3/2n, in $\theta$ (n²)

binary search

利用递归的形式得出，二分搜索的每一次递归都让数据规模小了一半

T(n) = T(n/2) + c, T(1)=c , 利用数组加和相消的方式得出T(n) = clogn,在 $\theta$ (logn)当中

sort时间代价

Cost of I/O: W(n).
Bubble or insertion sort: O(n2).
A better sort (Quicksort, Mergesort, Heapsort, etc.): O(n log n).
We prove later that sorting is $\Omega$ (n log n).

3.4 multiple parameters

for (i=0; i<C; i++)  // Initialize count
  count[i] = 0;
for (i=0; i<P; i++)  // Look at all pixels
  count[value(i)]++; // Increment count
sort(count);         // Sort pixel counts

一张图片中有P个pixel（像素），每个像素的颜色值在0-（c-1）之间，现在需要计算图当中颜色值出现的次数

第一个循环，size是C，颜色值的个数 $\theta$ ©
第二个循环，size是P，像素的个数 $\theta$ §
sort的代价是 $\theta$ (ClogC)

总的时间代价就是 $\theta$ (P+ClogC),两个变量缺一不可

3.5 space cost

渐进分析的方法同样适用

包含n个整数的一维数组的空间代价，一个整数占用c字节，数组需要cn个字节的空间即 $\theta$ (n)

Space/Time Tradeoff Principle

One can often reduce time if one is willing to sacrifice space, or vice versa

Disk-based Space/Time Tradeoff Principle: The smaller you make the disk storage requirements, the faster your program will run.

磁盘上的存储开销越小，消耗的时间越少，磁盘读取消耗的时间过大

chapter 4 list，stack and queue

4.1 lists

list is a finite, ordered sequences of data items（两种实现方式：顺序表和链表）

Notation: <a0, a1, …, an-1>

list implementation will support the concept of a current position(当前位置)

by defining the list in terms of left and right partitions. Partitions（部分） are separated by the fence（栅栏）

<20, 23 | 12, 15>

线性表ADT

抽象类的表达法正式表达ADT

template <class Elem> class List {
public:
  List();
  virtual void clear() = 0;
  virtual bool insert(const Elem&) = 0;
  //当前位置插入
  virtual bool append(const Elem&) = 0;
  //扩展线性表
  virtual bool remove(Elem&) = 0;
  //删除线性表的某个数
  virtual void setStart() = 0;
  //将fence放置在list表头
  virtual void setEnd() = 0;
  //将fence放置在list表尾
  virtual void prev() = 0;
  virtual void next() = 0;
  virtual int leftLength() const = 0;
  virtual int rightLength() const = 0;
  virtual bool setPos(int pos) = 0;
  virtual bool getValue(Elem&) const = 0;
  virtual void print() const = 0;
};

List: <12 | 32, 15>

MyList.insert(99);

Result: <12 | 99, 32, 15>

//遍历线性表：分别是将fence设置在表头，获取值，向下一个位置移动（和iterator比较像）
for (MyList.setStart(); MyList.getValue(it); MyList.next())
	DoSomething(it);

// Return true iff K is in list
// List Find Function
bool find(List<int>& L, int K) {
  int it;
  for (L.setStart(); L.getValue(it); L.next())
    if (K == it) return true;  // Found it
  return false;                // Not found
}

4.1.1 顺序表的实现方式

template <class Elem> // Array-based list
class AList : public List<Elem> {        //Alist集成了抽象类List
private:
  int maxSize;     // Maximum size of list
  int listSize;    // Actual elem count
  int fence;       // Position of fence
  Elem* listArray; // Array holding list（存储数据的指针）
public:
  AList(int size=DefaultListSize) {
    maxSize = size;            //设置最大空间    
    listSize = fence = 0;
    listArray = new Elem[maxSize];     //分配空间
  }
~AList() { delete [] listArray; }
void clear() {
  delete [] listArray;
  listSize = fence = 0;
  listArray = new Elem[maxSize];
}
void setStart() { fence = 0; }
void setEnd() { fence = listSize; }
void prev()   { if (fence != 0) fence--; }
void next()   { if (fence <= listSize)
                fence++; }
int leftLength() const  { return fence; }
int rightLength() const
  { return listSize - fence; }
bool setPos(int pos) {
  if ((pos >= 0) && (pos <= listSize))
    fence = pos;
  return (pos >= 0) && (pos <= listSize);
}

bool getValue(Elem& it) const {
  //引用可以直接修改外部变量
  if (rightLength() == 0) return false;
  else {
    it = listArray[fence];
    return true;
  }
}

三个外部成员函数定义

// Insert at front of right partition
template <class Elem>
bool AList<Elem>::insert(const Elem& item) {
  if (listSize == maxSize) return false; 
    for(int i=listSize; i>fence; i--)
    	listArray[i] = listArray[i-1];      //将fence之后的所有数据向后移动一位
  listArray[fence] = item;                  //将现在位置的单元设置为数据
  listSize++; 								
  return true;
}

// Append Elem to end of the list
template <class Elem>
bool AList<Elem>::append(const Elem& item) {
  if (listSize == maxSize) 
      return false;
  listArray[listSize++] = item;			//将最后一位的后一位设置为item
  return true;
}

// Remove and return first Elem in right
// partition
template <class Elem> bool AList<Elem>::remove(Elem& it) {
  if (rightLength() == 0) 
      return false;
  it = listArray[fence]; // Copy Elem
  for(int i=fence; i<listSize-1; i++)
    listArray[i] = listArray[i+1];		//fence右边的所有数据向左移动一位
  listSize--;    // Decrement size
  return true;
}

4.1.2 linked list

// Singly-linked list node（节点类的定义）
template <class Elem> class Link {
public:
  Elem element; // Value for this node
  Link *next;   // Pointer to next node
  Link(const Elem& elemval,Link* nextval =NULL){
     element = elemval;  
     next = nextval; 
  }
  Link(Link* nextval =NULL){
      next = nextval; 			//节点头没存储数据
  }
};

// Linked list implementation
// 和顺序表的区别在于不需要最大存储空间
template <class Elem> class LList:
public List<Elem> {
private:
  Link<Elem>* head; // Point to list header
  Link<Elem>* tail; // Pointer to last Elem 
  Link<Elem>* fence;// Last element on left
  int leftcnt;      // Size of left
  int rightcnt;     // Size of right
  void init() {     // Intialization routine
    fence = tail = head = new Link<Elem>;
    leftcnt = rightcnt = 0;
  }

void setStart() {
  fence = head; 
  rightcnt += leftcnt;
  leftcnt = 0; 
}
void setEnd() {
  fence = tail; leftcnt += rightcnt;
  rightcnt = 0; }
void next() {
 // Don't move fence if right empty
 if (fence != tail) {
   fence = fence->next; rightcnt--; 
      leftcnt++; }
      //fence本身是link对象包含下一个节点的地址
}
int leftLength() const  { return leftcnt; }
int rightLength() const { return rightcnt; }
bool getValue(Elem& it) const {
  if(rightLength() == 0) return false;
  it = fence->next->element;
  return true; }

// Insert at front of right partition
template <class Elem>
bool LList<Elem>::insert(const Elem& item) {
  fence->next = new Link<Elem>(item, fence->next);  
  //利用第一个构造函数，直接给新创建的节点的next赋值
  if (tail == fence) 
      tail = fence->next; 
  rightcnt++;
  return true;
}
// Append Elem to end of the list
template <class Elem>
bool LList<Elem>::append(const Elem& item) {
  tail = tail->next = new Link<Elem>(item, NULL);
  rightcnt++;
  return true;
}

// Remove and return first Elem in right
// 将fence->next消除，不是将fence所在单元消除
template <class Elem> bool LList<Elem>::remove(Elem& it) {
  if (fence->next == NULL) 
      return false;
  it = fence->next->element; // Remember val
  // Remember link node
  Link<Elem>* ltemp = fence->next;
  fence->next = ltemp->next; // Remove
  if (tail == ltemp)         // 如果移除的是最后一位需要重新设置tail
    tail = fence;
  delete ltemp;              // Reclaim space
  rightcnt--;
  return true;
}

// Move fence one step left;
// no change if left is empty
template <class Elem> void
LList<Elem>::prev() {
  Link<Elem>* temp = head;
  if (fence == head) return; // No prev Elem
  while (temp->next!=fence)
    temp=temp->next;
  fence = temp;
  leftcnt--;
  rightcnt++;
}

Array-Based Lists:

难以插入删除数据

Insertion and deletion are Q(n). 因为需要将每一位都向后进行移动
Prev and direct access are Q(1).
Array must be allocated in advance.
No overhead if all array positions are full.（结构性开销：指的是链表每个元素的next指针）

Linked Lists:

难以进行检索

Insertion and deletion are Q(1)

Prev and direct access are Q(n) 每一个节点都只包含向后连接的指针，向前需要从前往后遍历

Space grows with number of elements

Every element requires overhead

数据存储量较小时，链表效率更高，因为数组会有分配多余的空间

space comparasion：

“Break-even” point:（只数组占用空间超过链表的临界值）

n = DE /（P + E）

E: Space for data value.

P: Space for pointer.

D: max Number of elements in array.

n指的是D个数据量的数组占有的空间能创造的节点数量

4.1.3 freelist

// Singly-linked list node with freelist
template <class Elem> class Link {
private:
  static Link<Elem>* freelist; // Head
public:
  Elem element;     // Value for this node
  Link* next;       // Point to next node  
  Link(const Elem& elemval,Link* nextval =NULL) { 
      element = elemval;  
      next = nextval; 
  }
  Link(Link* nextval =NULL) {next=nextval;}
  void* operator new(size_t);  // Overload
  void operator delete(void*); // Overload
};

template <class Elem>
Link<Elem>* Link<Elem>::freelist = NULL;

template <class Elem>   // Overload for new
void* Link<Elem>::operator new(size_t) {
  if (freelist == NULL) return ::new Link;
  Link<Elem>* temp = freelist; // Reuse
  freelist = freelist->next;
  return temp;         // Return the link
}

template <class Elem>   // Overload delete
void Link<Elem>::operator delete(void* ptr){
  ((Link<Elem>*)ptr)->next = freelist;
  freelist = (Link<Elem>*)ptr;
}

4.1.3 Doubly Linked Lists

// Doubly-linked list link node
template <class Elem> class Link {
public:
  Elem element;  // Value for this node
  Link *next;    // Pointer to next node 
  Link *prev;    // Pointer to previous node
  Link(const Elem& e, Link* prevp =NULL, 
                      Link* nextp =NULL)
    { element=e;  prev=prevp;  next=nextp; }
  Link(Link* prevp =NULL, Link* nextp =NULL)
    { prev = prevp;  next = nextp; }
};

// Insert at front of right partition
template <class Elem>
bool LList<Elem>::insert(const Elem& item) {
  fence->next =
   new Link<Elem>(item, fence, fence->next);  
  if (fence->next->next != NULL)	//在insert之后应该将更后面一位的prev进行设置
    fence->next->next->prev = fence->next;
  if (tail == fence)   // Appending new Elem
    tail = fence->next; //   so set tail
  rightcnt++;           // Added to right
  return true;
}

// Remove, return first Elem in right part
template <class Elem>
bool LList<Elem>::remove(Elem& it) {
  if (fence->next == NULL) return false;
  it = fence->next->element;
  Link<Elem>* ltemp = fence->next;
  if (ltemp->next != NULL)
    ltemp->next->prev = fence;
  else tail = fence;         // Reset tail
  fence->next = ltemp->next; // Remove  delete ltemp;    // Reclaim space
  rightcnt--;                // Removed from right
  return true;
}

4.1.4 dictionary

Often want to insert records, delete records, search for records.

Required concepts:

Search key: Describe what we are looking for
Key comparison
Equality: sequential search
Relative order: sorting
Record comparison

4.1.5 stacks(栈)

LIFO: Last In, First Out.（最后一个进去的最先出来）

Restricted form of list: Insert and remove only at front of list.

Insert: PUSH

Remove: POP

The accessible element is called TOP.

Stack ADT

// Stack abtract class
template <class Elem> class Stack {
public:
  // Reinitialize the stack
  virtual void clear() = 0;
  // Push an element onto the top of the stack.
  virtual bool push(const Elem&) = 0;
  // Remove the element at the top of the stack. 
  virtual bool pop(Elem&) = 0;
  // Get a copy of the top element in the stack
  virtual bool topValue(Elem&) const = 0;
  // Return the number of elements in the stack.
  virtual int length() const = 0;
};

Array-Based Stack

// Array-based stack implementation
template <typename E> class AStack {
private:
	int size;     // Maximum size of stack
	int top;      // Index for top element(当前闲置的栈顶,数组的结尾是栈头)
	Elem *listArray; // Array holding elements
public:
	AStack(){
        maxSize = size;
        top = 0;
        listArray = new Elem[size];
    }
    void push(const E& it){
        Assert(top != maxSize, "stack is full"); //Assert指的是测试（Junit一样） 
        listArray[top] = it;
        top++;
    }
    E pop(){
        Assert(top != 0, "stack is empty");
        top--;
        return listArray[top];
    }
    const E& topvalue(){
        Assert(top != 0, "stack is empty");
        return listArray[top-1];
    }
}

Linked Stack

// Linked stack implementation
template<typename E> class LStack:public Stack<E> {
    private:
	Link<Elem>* top; // 和array-based区别就在于链表头就指向栈头，不需要一个top来计数
	int size;        // Count number of elems
public:
	LStack(int size = defaultSize){
        top = NULL;
        size = 0;
    }
    void clear(){
        while(top!=NULL) {
            Link<E>* temp = top;
            top = top->next;
            delete temp;
        }
        size = 0;
    }
    void push(const E& it){
        top = new Link(it,top);      //和链表的append不同，是在开头处进行附加
        size++;
    }
    void pop(){
        Assert(top != NULL, "stack is empty");
        E it = top->element;
        Link<E>* ltemp = top->element;
        delete top;
        top = ltemp;
        size--;
        return it;
    }
    const E& topValue() {
        Assert(top != NULL, "Stack is empty");
        return top->element;
    }
}

example:

十进制数N和其他d进制数的转换:

N = (N div d)×d + N mod d

N	N div 8	N mod 8
1348	168	4
168	21	0
21	2	5
2	0	2

void conversion () {　　
	InitStack( ); 　　// 构造空栈　
    cin >> N;　　　　// 输入一个十进制数　
    while(N) {　　　
    	Push(N % 8);　// "余数"入栈:入栈的顺序4->0->5-.2
    	N = N / 8;　　　　// 非零"商"继续运算　　
    } 　
    while (!StackEmpty)	{
	 	// 和"求余"所得相逆的顺序输出八进制的各位数　　
	 	Pop(e);　　　
	 	cout << e;　
	 } 
}

queues（会出现两个I/O口，stack只有一个I/O口）

FIFO: First in, First Out

Restricted form of list: Insert at one end, remove from the other.

Notation:

Insert: Enqueue

Delete: Dequeue

First element: Front

Last element: Rear

chapter 5 binary trees

binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees, called the left and right subtrees, which are disjoint from each other and from the root

Notation: Node, children（B是A的children）, edge（边）, parent（A是B的parent）, ancestor, descendant（D是A的descentdant）, path, depth, height, level, leaf node（叶节点：终点）, internal node（内部节点）, subtree.

Full binary tree: Each node is either a leaf or internal node with exactly two non-empty children.（要么是叶节点，要么是内部节点且有两个子节点）

Complete binary tree: If the height of the tree is d, then all leaves except possibly level d are completely full. The bottom level has all nodes to the left side.

Full Binary Tree Theorem

Theorem: The number of leaves in a non-empty full binary tree is one more than the number of internal nodes.

在二叉树的第i(i≥1)层上至多有2^i-1 个结点
深度为k（k≥1）的二叉树上至多含2^k-1个结点
对任何一棵二叉树，若它含有n₀个叶子结点,n₂个度为2的结点，则必存在关系式：n₀ = n₂+1
具有n个结点的完全二叉树的深度为log₂n+1

Eric Zerk

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
数据结构与算法分析学习笔记（一）

数据结构与算法分析chapter 1computer design goals:todesign an algorithm that is easy to understand,code and debugto design an algorthim that makes efficient use of the computer’s resources1.1 A philosophy of data structures1.1.1 the need of data structureda
复制链接

扫一扫