编程珠玑读书笔记

 2008年7月3日至7月17日,我将很久以前就买了的《编程珠玑》第二版认真进行了阅读。读了以后,还是有一些收获。书中对一些问题给出了巧妙的解法,并且结合问题将程序设计中的一些原则以清晰的方式进行了讲述。本书的不足之处在于相比于程序设计艺术或算法导论来说总体难度不高,不过还是比较值得读的一本书。在这篇文章中,我将书中提到的一些原则和一些问题以及相应的解答思路记录下来,以供参考。因为我看的是英文版, 所以下面所有的文字将以英文阐述为主。

 

Reading Note of <Programming Pearls> Second Edition

This book illustrates a central theme of software development and programming.:

"Thinking hard about a case can be fun and can also lead to practical benefits."

 

There are totally 15 columns in this book, covering problem definition, algorithm and data strucutre design, code tuning and test. The last several columns illustrate three important but practical aspects of algorithm and data structure: sorting, heap and strings.

 

The First Column

In the first column, the author examines a problem: How to sort a disk file., in which the data is telephone number in the US. In the US, telephone numbers is in the following format: dddddddddd, consisting of a three-digit "area code" followed by seven digits. If the first three digits are 800, then the number is called toll-free number. The problem is to sort these toll-free numbers and integrate them with a system for processing such a database.The input is a list of telephone numbers and the output is a file of these numbers, sorted in increasing order. The context also defines the performance requirement. The sort should not take more than a few minutes.

 

Before giving solution to the problem, the book gives a clear definition of the problem, which is very important for us to understand and give solutions. The author lists the input, output and constraint of the problem and leave the process section blank so that we could write our thoughts(whatever correct or not) in the process section. The problem defintion could be arranged  as follows:

 

Problem Def:

(1) Input: a file of n positive integers, each less than n, where n = 10^7.

(2) Output: a sorted file in increasing order of the input integers

(3) Constraint: 1M memory + run time within a few minutes

(4) Process

 

Then how can we solve the problem. First, the obvious program is to use a general disk-based Merge Sort. which requires many intermediate files However, it might still take a long time to run the program and building the program might take a long time, too. A second solution, making more use of the the particular nature that we are sorting integers, calculates the maximum number of integers we could store in the available memory(in this problem, the available memory is 1M, so 250,000 integers in 32-bits could be stored in memory). Therefore, we could build a program that makes 40 passes over the file. On the first pass, it reads into memory any integer between 0 and 249,000, sorts them and writes them to the output file. On the second pass, integers from 250,000 to 499,999 and so on. A QuickSort would be efficient for in-memory sort. This solution eliminuates the use of intermediatary disk files as the disk-based Merge Sort, but it requires to read the input file 40 times which is costly.

 

The author then gives a third solution: use bitmap or bit-vector to represent these integers.In this problem, the seven-decimal digits denote a number less than ten million. Thus, we will represent the set of integers by a string of ten million bits in which the i-th bit is on if and only if i is in the set. Given the bitmap structure, the program could be written the following three phases:

/*phase 1: empty the set*/

   for i =[0,n)

       bit[i] = 0

/*phase 2: insert present element into the set*/

   for each i in the input file

       bit[i] = 1

/*phase 3: write sorted output*/

   for i = [0,n)

        if bit[i] = 1

           write i on the output file

An implementation of bitmap using 32-bit integer array could be programmed as follows:

{

#define BITSPERWORD 32

#define SHIFT 5

#define MASK 0x1F

#define N 100000000

int a[1+N/BITSPERWORD]

 

void set(int i)

{a[i>>SHIFT] |= (1<<(i&MASK));}

 

void clr(int i)

{a[i>>SHIFT] &= ~(1<<(i&MASK));}

 

int test(int i)

{return a[i>>SHIFT] & (1<<(i&MASK));}

}

 

What could we learn from this case? 

(1) Careful analysis of the problem

(2) Data Structure

(3) Time-Space Tradeoff

(4) A Simple Design

                                                            

Second Column

This column writes about three problems and how to solve them, covering binary search and sorting. Talk about it one by one.

(1) Given a sequential file that contains at most four billion 32-bit integers in random order, find an integer that does not exist in the file

      If the memory is limitless, how would you solve it? If there is only a few hundred bytes of memory available and several external    

       scratch files could be used, how would you solve it?

As for the first half of this question, it is natural to apply bitmap structure. As for the second one, the author gives a method by using binary search, which is usually applied to locate an element in a sorted array. To solve this problem, we have to decrease the size of the range that contains the missing element until the range's size becomes 1. (A range, a representation of the elements within the range and the probing method to determine which half contains the missing element). In this problem, we firstly define the range as the whole sequence of the input integers,.The insight here is to probe a range by counting the elements above and below its midpoint: either the upper or lower range contains at most half the elements in the total range. Because the total range contains a missing element, the smaller half must contain a missing element. The algorithm could be written as follows:

range = [0, 2^32)

 

while(size of range > 1)

{

     midpoint = (range.left  +  range.right) / 2;

     lowerCnt = upperCnt = 0;

     lowerFile: scratch file that holds integers not greater than midpoint

     upperFile: scratch file that holds integers greater than midpoint

     foreach integer i in the range

           if(i < =midpoint)

               lowerCnt ++;

           else

              upperCnt++;

           put i in corresponding scratch file

     if(lowerCnt < upperCnt)

           range = [range.left, midpoint]

     else

           range = [midpoint+1, range.right]

}

 

(2) Circle-rotate a one-dimensional vector of n elements left by i positions

This problem corresponds to the the problem of swapping adjacent blocks of memory of unequal size. Time and space constraints are very important: O(n) time complexity and O(1) space complexity. The author proposes three methods, of which the third one is the best. The first method is a juggling act: move x[0] to a temporary variable, then move x[i] to x[0], x[2i] to x[i]...until coming back to take an element from x[0], at which point we instead take the element from temporary variable and stop the process.  

 

i = i % n;

 

for start = [0,gcd(i, n))

{

    t = x[start];

    j = start;

    loop:  k = j + i;

              if(j >= n)j-=n;

              if(j == start)break;

              x[j] = x[k];

    x[j] = t; 

}

 

gcd(i,n) calculates the greatest common divisor of i and n.

 

The second proposed solution treats the problem as swapping adjacent non-equal size memory blocks. Assume bl is the memory block before position i and br is the memory block after position i. if bl is shorter than br, swap bl with the starting segements of br that has the same length as bl; else swap br with the ending segment of bl that has the same length as br. Then apply the same technique to (bl, remaining segment of br) or (the remaining segment of bl,br) until br and bl has the same length, at which point swap bl and br directly. However, this procedure is complex and the author gives a much easier way to code this technique. Assume p is the position that separates the to-be-swapped memory blocks, x[0..p-i) is in final position, x(p-i,p-1] = a, x[p,p+j-1] = b and x[p+j..n-1] is in final position. So our taks is to swap a and b so that x[0..n-1] is in the final postion. We can see the length of a is i while the length of b is j. If i > j, then we swap x[p-j..p-1] with x[p..p+j-1] so that x[p..p-j+1] is in the final position. If i < j, then we swap x[p-i..p-1] with x[p+j-i..n-1] so that x[p+j-i..n-1] is in the final position. Continue this process until i==j, at which moment we directly swap x[p-i..p-1] with x[p..p+i-1].

 

rodist = rodist % n;

if(rodist ==0)

    return;   

while(i != j)

 {

     /*invariant:

        x[0..p-i) is in final position

        x[p-i..p-1] =a

        x[p..p+j-1] =b

        x[p+j..n-1] is in final position

     */

     if(i>j)

        swap(x,p-i,p,j);

        i-=j;

      else

        swap(x,p-i,p+j-i,i);

        j-=i

}

 

swap(x,p-i,p,i)

 

The third method is pretty excellent. Our task is to swap ab to ba. How do we do this? Why not firstly reverse a so that ab becomes reverse(a)b.Then reverse b so it becomes reverse(a)reverse(b). Finally reverse the whole sequence which produces reverse(reverse(a)reverse(b)) => ba.

 

Code should be pretty simple

reverse(x,0,i-1)

reverse(x,i,n-1)

reverse(x,0,n-1)

 

reverse(x,i,j) is to reverse the segment of x which starts at i and ends at j.

 

I implement the above three techniques and paste my code here:

/* * rotate_1() and rotate_2() are implementation of the first method * rotate_1() is my own implementation * rotate_2() is a copy from the book */

void rotate_1(int x[], int size, int rodist) { int unmoved = size; int i = 0; rodist %= size; while(unmoved > 0) { int t = x[i]; int j = i; do { int k = j + rodist; if(k>=size)k-=size; if(k == i)break; x[j] = x[k]; j = k; unmoved--; }while(true); x[j] = t; unmoved--; i++; } } int gcd(int a,int b) { while(a!=b) { if(a>b) a-=b; else b-=a; } return a; } void rotate_2(int x[], int size, int rodist) { int i = 0; rodist %= size; int cnt = gcd(size,rodist); for(int i=0;i =size)k-=size; if(k == i)break; x[j] = x[k]; j = k; }while(true); x[j] = t; } }

/* * rotate_3() and rotate_4() are implementation of the second method * rotate_3() is my own implementation * rotate_4() is a copy from the book */ void swap(int x[],int p, int q, int l) { while(l>0) { int temp = x[p+l-1]; x[p+l-1] = x[q+l-1]; x[q+l-1] = temp; l--; } } void rotate_3(int x[],int size,int rodist) { rodist %= size; if(rodist == 0) return; int dist = rodist; while(dist != size - dist) { /* * x[0..dist-1] x[dist..0+size-1] */ if(dist < size - dist) { swap(x,0,size-dist,dist); size -= dist; } else { swap(x,dist,2*dist-size,size-dist); int swDist = size -dist; size -= swDist; dist -= swDist; } } swap(x,0,dist,dist); } void rotate_4(int x[],int size,int rodist)
{
 rodist %= size;

 if(rodist == 0)
  return;

 

 int p = rodist;
 int i = p;
 int j = size - rodist;

 

 while(i != j)
 {
  if(i<j)
  {
   swap(x,p-i,p+j-i,i);
   j =  j - i;
  }
  else
  {
   swap(x,p-i,p,j);
   i = i-j;
  }
 }

 

 swap(x,p-i,p,i);
}

 

/* * rotate_5() is the implementation of the third method * rotate_6() applies this technique to solve a similar problem */

void reverse(int x[],int l,int u)
{
 while(l<u)
 {
  int temp = x[l];
  x[l] = x[u];
  x[u] = temp;
  l++;
  u--;
 }
}

 

void rotate_5(int x[], int size, int rodist)
{
 rodist %= size;

 

 if(rodist == 0)
  return;

 

 reverse(x,0,rodist-1);
 reverse(x,rodist,size-1);
 reverse(x,0,size-1);
}

 

/*
 * abc -> cba
 */

void rotate_6(int x[],int size,int rodist_0,int rodist_1) { rodist_0 %= size; rodist_1 %= size; if(rodist_0 > rodist_1) return; reverse(x,0,rodist_0-1); for(int i=0;i

 

The third problem is related with arranging anagram words.Given a set of English words., find all sets of anagram words. For instance, "pots","stop" and "tops" are all anagrams of one another because each can be formed by permuting the letters of the oters.The solution to the problem is to first sort each word in alphabetical order and then sort thewhole set of words so that words with the same signature(here signature means the reordered word of the orignal word). Words that have the saem signature are anagrams. The key to the problem is to define the signature of each word..

 

 

Third Column

This column is mainly about restructuring the code by the application of data structure. Just remember the following useful principles:

(1)Rework repeated code into arrays

(2) Use advanced tools when possible

(2) Let the data structure the program

 

Fourthe Column

This column is mainly about writing correct programs, by introducing the binary-search algorithm and its implementation. Use invariant to guarantee the correctness of a program. The author firstly gives the sketch of the program:

{

initialize range to 0..n-1

loop

{invariant:mustbe(range)}

if range is empty

   break and report that t is not in the array

compute m, the middle of the range

use m as a probe to shrink the range

    if t is found during the shrinking process

    break and report its position

}

 

To represent the range, use l and u(lower and upper bound). The loop invariant is the crucial part and must ensure it is true at the beginning and the end of each iteration of the loop. Initialze l to 0 and u to n-1 makes the invariant as mustbe(0..n-1) be true. The next step is to check for an empty array and to compute m. The range l..u is empty if l>u, in which case we store -1 in p and terminate the loop,which gives

        if(l>u) p=-1;break;

Compute m: m = (l+u)/2

The final task is to compare t and x[m] and take appropriate action to maintain the invariant. This could be done in the following form:

        case

             x[m] < t : l = m+1    //x[0]<=x[1]....<=x[m] < t

             x[m] = t;  p = m; break;

             x[m] > t;  u = m-1    //t<x[m]<=x[m+1]....<=x[n-1]

Combine these individual analysis to produce the final code.

 

The next 4 columns discusses aspects related with performance. The 6th column discusses several levels to increase performance.The 7th column describes a method of estination called "Back of The Envelop".The 8th column describes a common problem and gives a clearer illustration of each solution. The 9th column talks about code tuning.The 10th column desribes a problem and discusses the topic of squeezing space. The following focuses on the 8th ,9th and 10th column.

 

How to speed up the program?An important message is that a huge speedup could be achived by working at different levels:

(1) Problem Definition

(2) System Strucuture

(3) Algorithms and Data Structures

(4) Code Tuning

If a little speedup is needed, work at the best level

If a big speed up is needed, work at many levels.

 

Ignore Column 7

 

Column 8

This column studies four different algorithms for a small problem.The input is a vector x of n floating-point numbers.The output is the maximum sum found in any contiguous subvector of the input. When all inputs are negative the maximum-sum subvector is the empty vector,which has sum zero.

 

Alogrithm 1

maxsofar = 0

for i=[0,n)

    for j =[i,n)

         sum = 0

         for k = [i,j]

              sum += x[k]

         maxsofar = max(maxsofar,sum)

 

Time complexity: O(n^3)

 

Algorithm 2:

This algorithm modifies Algorithm 1 and computes the sum quickly by noting the sum of x[i..j] is related to the sum previously computed.

maxsofar = 0

for i=[0,n)

    sum = 0

    for j =[i,n)

         sum += x[j]

         maxsofar = max(maxsofar,sum)

 

Time complexity: O(n^2)

 

An alternative algorithm with the same time complexity computes the sum in the inner loop by accessing a data structure built before the outer loop is executed.The ith element of cumarr contains the cumulative sum of the values in x[0..i], so the sum of the values in x[i..j] can be found by computing cumarr[j] - cumarr[i-1].

cumarr[-1] = 0

for i=[0,n)

     cumarr[i] = cumarr[i-1] + x[i]

maxsofar = 0

for i=[0,n)

     for j =[i,n)

          sum = cumarr[j] - cumarr[i-1]

          maxsofar = max(maxsofar,sum)

 

Algorithm 3

Use divide-and-conquer

float maxsu(l,u)

{

    if (l>u)

       return 0

    if (l==u)

       return max(0,x[l])

 

    m = (l+u)/2

 

    lmax=sum=0

    for(i=m;i>=1;i--)

         sum+=x[i]

         lmax=max(lmax,sum)

 

     rmax=sum=0

     for i=(m,u]

         sum+=x[i]

         rmax=max(rmax,sum)

 

      return max(lmax+rmax,maxsum(l,m),maxsum(m+1,u))

}

 

Time complexity: O(nlogn)

 

Algorithm 4

This algorithm starts at the left end and scans through to the right end,keeping track of the maximum-sum subvector seen so far.Suppose we have solved the problem for x[0..i-1], we need to extend the solution to x[i]. The maximu-sum subarray in the first i elements is either in the first i-1 elements or ends in position i. Use maxendinghere to store the maximum sum of subvector ending in position i.

 

maxsofar = 0

maxendinghere -= 0

for i=[0,n)

      maxendinghere = max(maxendinghere+x[i],0)

      maxsofar = max(maxsofar,maxendinghere)

 

Time complexity: O(n)

 

The algorithms above illustrate important algorithm design techniques:

(1) Save state to avoid recomputation

(2) Preprocess information into data structures

(3) Scanning algorithm.Extend a solution for x[0..i-1] to x[0..i]

(4) Cumulative

 

Exercises in this column include two interesting, similar problems.

 

(1) Find the subvector with the sum closest to zero

The scanning algorithm above could not be applied here since solution for x[0..i-1] could be extended to x[0..i].We could use cumulative array to store the sum of x[0..i]. Then find the two elements in the cumulative array cumarr[i] and cumarr[j] that the abosulte value of cumarr[i] and cumarr[j] is the smallest.This could be done by divide-and-conquer technique: View all elements of cumarr as points on a line and the problem is to locate two nearest points

 

  1. struct CumArrElem
  2. {
  3.     float val;
  4.     int pos;
  5. };
  6. static int comp(const void *a, const void *b)
  7. {
  8.     CumArrElem *p1 = ((CumArrElem *)a);
  9.     CumArrElem *p2 = ((CumArrElem *)b);
  10.     if(p1->val > p2->val)
  11.         return 1;
  12.     if(p1->val < p2->val)
  13.         return -1;
  14.     return 0;
  15. }
  16. float getNearestDistance(CumArrElem L[], int l ,int u,int &start,int &end)
  17. {
  18.     if(l>=u)
  19.     {
  20.         start = l;
  21.         end = u;
  22.         return -1;
  23.     }
  24.     int m = (l+u)/2;
  25.     int start_l,end_l,start_r,end_r;
  26.     float leftDist = getNearestDistance(L,l,m,start_l,end_l);
  27.     float rightDist = getNearestDistance(L,m+1,u,start_r,end_r);
  28.     float curMin;
  29.     if(rightDist < 0 || leftDist >= 0 && leftDist < rightDist)
  30.     {
  31.         start = start_l;
  32.         end = end_l;
  33.         curMin = leftDist;
  34.     }
  35.     else
  36.     {
  37.         start = start_r;
  38.         end = end_r;
  39.         curMin = rightDist;
  40.     }
  41.     
  42.     if(curMin < 0 || L[m+1].val - L[m].val < curMin)
  43.     {
  44.         start = m;
  45.         end = m + 1;
  46.         curMin = L[m+1].val - L[m].val;
  47.     }
  48.     return curMin;
  49. }
  50. float getNearestDist(CumArrElem cumarr[], int size,int &start,int &end)
  51. {
  52.     return getNearestDistance(cumarr,0,size-1,start,end);
  53. }
  54. float SumClosedToZero(float x[], int size,int &start,int &end)
  55. {
  56.     int i;
  57.     CumArrElem *cumarr = new CumArrElem[size+1];
  58.     cumarr[0].val = 0;
  59.     cumarr[0].pos = -1;
  60.     for(i=0;i<size;i++)
  61.     {
  62.         cumarr[i+1].val = cumarr[i].val + x[i];
  63.         cumarr[i+1].pos = i;
  64.     }
  65.     qsort(cumarr,size+1,sizeof(cumarr[0]),comp);
  66.     float result = getNearestDist(cumarr,size+1,start,end);
  67.     start = cumarr[start].pos;
  68.     end = cumarr[end].pos;
  69.     if(start > end)
  70.     {
  71.         int temp = start;
  72.         start = end;
  73.         end = temp;
  74.     }
  75.     start++;
  76.     delete []cumarr;
  77.     return result;
  78. }

(2) Given an n*n array of relas, find the maximum sum contained in any rectangular array

The idea here is to limit to a certain range of columns and locate the rectangular array with the maximum sum within these columns.

For instance, we focus on column p to column q, then we sum up the real numbers of each row wihtin column p and q and these sum ups

could be stored in a cumulative array. Then apply Algorithm 3 above to find the maximum sumvector of the cumulatie array. Program is listed in the following:

 

  1. float maxsum(float x[], int size,int &l, int &u)
  2. {
  3.     float maxsofar = x[0];
  4.     float maxendinghere = x[0];
  5.     int cl,cu;
  6.     cl=cu=l=u=0;
  7.     for(int i=1;i<size;i++)
  8.     {
  9.         if(x[i] > maxendinghere+x[i])
  10.         {
  11.             maxendinghere = x[i];
  12.             cl = cu = i;
  13.         }
  14.         else
  15.         {
  16.             maxendinghere += x[i];
  17.             cu = i;
  18.         }
  19.         if(maxsofar < maxendinghere)
  20.         {
  21.             maxsofar = maxendinghere;
  22.             l = cl;
  23.             u = cu;
  24.         }
  25.     }
  26.     return maxsofar;
  27. }
  28. float maxsum2(float x[][3],int row,int col,int &start_r,int &start_c,int &rl,int &cl)
  29. {
  30.     float *cumarr = new float[row];
  31.     float *helpCumArr = new float[row];
  32.     float maxsofar;
  33.     
  34.     int start,end;
  35.     int i,range,p;
  36.     for(i=0;i<row;i++)
  37.     {
  38.         cumarr[i] = x[i][0];
  39.         helpCumArr[i] = 0;
  40.     }
  41.     maxsofar = maxsum(cumarr,row,start,end);
  42.     start_r = start;
  43.     rl = end - start + 1;
  44.     start_c = 0;
  45.     cl = 1;
  46.     for(range = 1; range <= col; range++)
  47.     {
  48.         for(i=0;i<row;i++)
  49.         {
  50.             helpCumArr[i] = helpCumArr[i] + x[i][range-1];
  51.             cumarr[i] = helpCumArr[i];
  52.         }
  53.         float cur = maxsum(cumarr,row,start,end);
  54.         if(cur > maxsofar)
  55.         {
  56.             maxsofar = cur;
  57.             start_r = start;
  58.             rl = end - start + 1;
  59.             start_c = 0;
  60.             cl = range;
  61.         }
  62.         for(p = 1;p + range <= col;p++)
  63.         {
  64.             for(i=0;i<row;i++)
  65.                 cumarr[i] = cumarr[i] + x[i][p+range-1] - x[i][p-1];
  66.             cur = maxsum(cumarr,row,start,end);
  67.             if(cur > maxsofar)
  68.             {
  69.                 maxsofar = cur;
  70.                 start_r = start;
  71.                 rl = end - start + 1;
  72.                 start_c = p;
  73.                 cl = range;
  74.             }
  75.         }
  76.     }
  77.     delete []cumarr;
  78.     delete []helpCumArr;
  79.     return maxsofar;
  80. }

Column 9

This column discusses a low-level approach of improving performance by locating the expensive part of an existing program and makes little changes to improve its speed. This column gives a few examples of tuning code, just list them below:

(1) k = (j + rodist) % n

      This could be replaced by

      k = j + rodist;

      if (k >= n)

          k-=n;

     If j + rodist is no greater than 2n, otherwise need to use while loop

 

(2) use sentinel in sequential search

Old program of sequential search is written like this:

      int ssearch(t)

           for i = [0,n)

                if (x[i] == t)

                   return i

           return -1

The inner loop has two tests: the first tests whether i is at the end of the array, and the second tests whether x[i] is the desired element.Replace the first test with the second test by placing a sentinel at the end of the array. The program is then written like this:

       int ssearch2(t)

            hold = x[n]

            x[n] = t

            for(i=0;;i++)

                if (x[i] == t)

                    break;

            x[n] = hold;

            if (i == n)

                return -1;

            else

                return i;

 

Then the book introduces another version of binary search to solve a different problem: locate the first occurence of t in x[0..n-1].

We will use the invariant x[l]<t<=x[u] && l< u and assume that x[-1]<t&&x[n]>=t.

The code could be written in the following way:

       l = -1; u = n;

       while (l+1 != u)  

      {

          /*invariant: x[l]<t && x[u]>=t && l<u*/

          m = (l+u)/2;

          if (x[m]<t)

               l = m;

          else

               u = m;

      }

      /*assert l+1=u && x[l]<t && x[u]>=t*/

      p = u;

      if (p>=n || x[p] != t)

          p = -1;

As the loop is repeated, the invariant is maintained by the if statement.Upon termination, if t is anywhere in the array, its first occurence is in position u.The final two statements set p to the index of the first occurence of t in x if it is present, and to -1 if it is not present.

 

Principle for code tuning: it should be done rarely.Save concern for efficiceny for when it matters.

 

A problem in this column gives an example of using table lookup to improve efficiency.The problem is illustrated as :

Given a very long sequence of bytes,efficiently count the total number of  one bits.

 

Traditional way is to view the sequence as an array of char, short,int or long and process each unit,couting the one bits in each unit one by one. However, this method includes a lot of recomputation. Remember the principle metnioned in the above column "Save state to avoid recomputation". We could firstly build a table, count the number of one bits of all possible values of one unit. When processing the sequence, we just need to maintain an array of counters to count the number of each input unit in the input,and then at the end take the sum of that number multiplied by the number of one bits in that unit. The program could be written as follows:

  1. struct countTableElem
    {
     unsigned char oneBitCnt;
     unsigned long occurence;
    };
  2. unsigned long countOneBits(unsigned char seq[], int size)
  3. {
  4.     countTableElem countTable[MAXN];
  5.     for(unsigned short c=0;c<MAXN;c++)
  6.     {
  7.         countTable[(int)c].oneBitCnt = 0;
  8.         countTable[(int)c].occurence = 0;
  9.         unsigned char b = (unsigned char)c;
  10.         while(b)
  11.         {
  12.             countTable[(int)c].oneBitCnt++;
  13.             b = b & (b-1);
  14.         }
  15.     }
  16.     int i;
  17.     for(i = 0; i< size; i++)
  18.     {
  19.         countTable[(int)seq[i]].occurence++;
  20.     }
  21.     unsigned long totalOneBitCnt = 0;
  22.     for(i = 0; i < MAXN; i++)
  23.     {
  24.         totalOneBitCnt += (countTable[i].occurence * countTable[i].oneBitCnt);
  25.     }
  26.     return totalOneBitCnt;
  27. }

Column 10

This column discusses how to reduce space cost. A real-world  problem is presented to illustrate the techniques to reduce space. A 200*200 array of point identifiers is given, each point identifier is an integer in the range 0...1999 or -1 if no point  is at that location.There are two thousand neighborhoods each attached to a point in the array.The usser could access a particular location by given (x,y) and the space to build the array should be as small as possible.If each point identifier is chosen as a 32-bit integer and the 2-dimensional array will use 80K. The problem is now how to build the array with less space. The book recommends using sparse data structure.An obvious representation of sparse matrix is to use an array to represent all columns, and linked lists to represent the active elements in a given column.

                           colhead               pointnum

                                                      row | next

                          0------------------> 2 | 17|  ----------->5 | 538 | /

                          1------------------> 1 | 98|  ----------->138 | 15 | /

                          2 /

 To search for point(i,j), use the following code:

        for (p=colhead[i]; p != NULL; p = p -> next)

               if (p->row == j)

                     return p->pointnum

        return -1

 

However, the pitfall of this mechanism is that this structure uses an array of 200 pointers and 2000 records, each with an integer and two pointers. The pointers will occupy 800 bytes.If the records are allocated as a 2000-element array,they will occupy 12 bytes each, for a total of 24,800 bytes. If use malloc to allocate the records dynamically, the space cost will increase from 80K to 96.8 K. So the book then introduces another structure to eliminate the use of pointers. It uses an array of 201 elements to represent the columns, and two partical arrays of 2000 elements to represent the points.

     int firstincol[201]

     int row[2000]

 

The points in column i are represented in the row array between locations firstincol[i] and firstincol[i+1]-1. fristincol[200] is defined as 2000 to make this condition hold. To determine what point is stored in position (i,j) use this code:

      for k = [firstincol[i],firstincol[i+1])

           if row[k] == j

              return pointnum[k] //pointnum is a 2000-element array attached to @row

      return -1

 

After this problem, this book gives some principles on space reduction

(1) Don't store,Recompute

(2) Sparse Data Structures

(3) Data Compression

(4) Allocation Policies

(5) Share Storage

 

Now the next five columns discuss various common but important data strucutres and algorithms. Column 11 is about sorting.Column 12 is about generating a random sample of integers.Column 13 is about searching.Column 14 is about heap and column 15 is about words. Focus on column 11,12,13,14

 

Column 11

Insertion Sort

    for i = [1,n)

         t = x[i]

         for (j=i; j > 0 && x[j-1] > t ; j--)

              x[j] = x[j-1]

        x[j] = t

 

QuickSort

    void qsort(l,u)

        if (l >= u)

           return

        /*parittion array around a particular value, which is eventually placed in its correct position*/

        qsort(l,p-1)

        qsort(p+1,u)

To partition the array around the value t this book introduces a simple scheme learnt from Nico Lomuto. Given the value t, rearrange x[a..b] and compute the index m such that all elements less than t are to one side of m, while all other elements are on the other side. So a simple loop that scans the array from left to right, using i and m to maintain the following invariant:

                                       <t               |          >=t              |         ?       |

                                     ^                 ^                               ^               ^

                                     a                 m                              i                b

When the code inspects the ith element, if x[i]>=t then all is fine, the invariant is still true.If x[i]<t, regain the invariant by incrementing m (which will index the new location of the small element),and then swapping x[i] and x[m]. The complete partitioning code is:

            m = a-1

            for  i = [a,b]

                 if x[i] < t

                     swap(++m,i)

In Quicksort partition the array x[l..u] around the value t = x[l], a therefore will be l+1 and b will be u.Swap x[l] with x[m] upon the  loop termination

     void qsort(l,u)

           if (l>=u)

              return

           m = l

           for i=[l+1,u)

               /*invariant: x[l+1..m] < x[l] && x[m+1..i-1] >= x[l]*/

               if (x[i] < x[l]) 

                   swap(++m,l)

           swap(l,m)

           /*x[l..m-1] < x[m] <= x[m+1..u]*/

           qsort(l,m-1)

           qsort(m+1,u)

 

A better partition scheme would be two-sided partitioning code, using the following invariant:

              t |                 <=t         |             ?    |        >= t      |

              ^                               ^                    ^                  ^

              l                                i                     j                   u    

 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
图像识别技术在病虫害检测中的应用是一个快速发展的领域,它结合了计算机视觉和机器学习算法来自动识别和分类植物上的病虫害。以下是这一技术的一些关键步骤和组成部分: 1. **数据收集**:首先需要收集大量的植物图像数据,这些数据包括健康植物的图像以及受不同病虫害影响的植物图像。 2. **图像预处理**:对收集到的图像进行处理,以提高后续分析的准确性。这可能包括调整亮度、对比度、去噪、裁剪、缩放等。 3. **特征提取**:从图像中提取有助于识别病虫害的特征。这些特征可能包括颜色、纹理、形状、边缘等。 4. **模型训练**:使用机器学习算法(如支持向量机、随机森林、卷积神经网络等)来训练模型。训练过程中,算法会学习如何根据提取的特征来识别不同的病虫害。 5. **模型验证和测试**:在独立的测试集上验证模型的性能,以确保其准确性和泛化能力。 6. **部署和应用**:将训练好的模型部署到实际的病虫害检测系统中,可以是移动应用、网页服务或集成到智能农业设备中。 7. **实时监测**:在实际应用中,系统可以实时接收植物图像,并快速给出病虫害的检测结果。 8. **持续学习**:随着时间的推移,系统可以不断学习新的病虫害样本,以提高其识别能力。 9. **用户界面**:为了方便用户使用,通常会有一个用户友好的界面,显示检测结果,并提供进一步的指导或建议。 这项技术的优势在于它可以快速、准确地识别出病虫害,甚至在早期阶段就能发现问题,从而及时采取措施。此外,它还可以减少对化学农药的依赖,支持可持续农业发展。随着技术的不断进步,图像识别在病虫害检测中的应用将越来越广泛。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值