Finding the Longest Nondecreasing Subsequence of A Given Sequence:An analysis of dynamic programming










Finding the Longest Nondecreasing Subsequence of A Given Sequence:

An analysis of dynamic programming











by Justin Dugger

Kansas State University

12/5/2002


Introduction


In 1935 Erdos and Szekeres published a theorem that (among other things) said for any sequence of natural numbers of size 2n + 1 there is a monotonic subsequence of size n.  I intend to investigate here a method of calculating a related sequence, the longest increasing subsequence.  By definition, an increasing subsequence is also a monotonic subsequence.  Finding the longest increasing substring would be of great value in finding the longest monotonic subsequence.  Additionally, finding an efficient here should lead to insight for finding the longest decreasing subsequence.  But the primary question in this research is whether dynamic programming is suitable for this problem. In short, the answer is yes; by using the principals of dynamic programming, a worst case O(2n) is reduced to Q(n2) on average.  This is achieved by a concept similar to caching recursive calls.

The methodology for this experiment was to use a UNIX time command to record the amount of time spent on the CPU.  In today's modern multitasking systems a simple system time start/finish comparison can be wildly inaccurate, especially for faster algorithms or smaller data sets.  A script was written to generate random data for the algorithm of a given size, and then record the runtime for both a naive implementation and a dynamic programming approach.  The language chosen for this experiment was Java.  Being widely available and a having rather clear syntax made it an appropriate choice.  Solaris was chosen for its availability on campus and in the Department.  This experiment should be easily repeatable anywhere Java and bash are installed.  All the code needed for this experiment is available at http://www.cis.ksu.edu/~jld5445/575experiment.html.

The results of all this was that dynamic programming is a big win, especially on larger data sets.  On large datasets the asymptotic run times are especially flagrant.  There is some noise in the run time data, but the error is likely due to the large cost of startup times outweighing the runtime on small input sizes.


The Scripts

The shell script is a bash shell script that takes a single argument and generates that many random numbers for parameters to the implementations.  Of some concern here would be the quality of the pseudo random number generator used for $RANDOM, however dumping off tons of $RANDOM into a file modulus 8 had not run the period with over a million iterations.  This should be of suitable use for the script, however your environment may vary; a bug was reported for a version of bash newer than the one installed in CISUNIX citing a period of 128. 

After generating the random argument values the script calls time java <dynamic,naive>, and returns the results on screen. time returns three values, the real time, the time spent in user mode and the time spent in system mode.  Of these the most important is user mode, this represents the time spent calculating things and calling functions.  The system mode time represents time spent in system calls.  These are usually things like I/O and other considerations not reflective of the algorithms themselves. 

For ease of experimentation, this script was ran 3 times for inputs of size 10, 50, 100 and 200, and an average score was calculated by hand.  It should be possible to modify the script to record the results of the time command into a file for statistical analysis in a program like MS Excel or possibly MatLab. To use the script, simply run run_me. This will execute another script generate with increasing parameter value. Be wary of calling generate on large values (>100), it quickly becomes apparent that naive.java is slow.

In addition, I wrote a small script to investigate the dynamic class alone. The speed is impressive and hardly matches a quadratic run time. However, fairly quickly the random number generator becomes a limiting factor. It doesn't affect our scientific results, but it does take 300 seconds to generate enough data to power dynamic for 3 seconds.

The Naive Implementation

The naive class implements a recursive call on the system.  After processing the arguments from the command line into an array, a recursive function rec_substring(int,int) is called.  The first parameter represents an index into the array holding our sequence.  The second represents the hurdle any new elements must meet to be added to the list.  The recursion stops when the index hits the end of the sequence, otherwise it decides if the current index can be included, then takes the maximum of not including it (thus preserving the old lower bound) and electing to include it.

The intermediate subsequences themselves are stored in a standard java.uti.LinkedList.  If the Java programmers are any good at their job, both prepending to the list and calculating the size run in constant time. 

The Dynamic Implementation

Before I discuss the implementation of the dynamic class, I should mention in brief why Dynamic Programming(DP) is appropriate here.  Our course book mentions 3 conditions that should be satisfied for dynamic programming to work well.

  • " Simple Subproblems: There has to be a way of breaking the global problem up into   smaller subproblems of similar structure.
  • Subproblem Optimality: An optimal solution to the global problem must be a  composition of optimal subproblem solutions, using a simple combining operation
  •   Subproblem Overlap: Optimal solutions to unrelated subproblems can contain subproblems in common. The more of this overlap the more efficient DP will be." [1]

The dynamic class implements a dynamic programming approach to the problem.

After processing the String arguments into an integer array, we begin the dynamic programming  style. Dynamic programming usually consists of three parts:

  1. Initializing the array with the base cases on the recursion relation.
  2. Filling the array
  3. Back-tracing the array to find the optimal solution

First we need a recursion system. Our recursion relation for this system will be on a single variable, rather than the two as above. The recurrence relation shall be 

L0= 1

Li = max0<j<i Lj +1, when sk<si

So the the array at index 0 is initialized to 1, and a tournament is held to find the largest substring ending at j subject to the constraint that the the jth number in the sequence is less than the ith number. But in this situation we need another array to keep track of the previous values because there's no way to tell which of the previous values was the ancestor. Instead doing number 3, an array storing ancestors is kept. which is set during iteration. When iteration is finished we simply mark the entries in a corresponding boolean array for each used entry and then parse the bitmap forwards separately to generate the actual subsequence. 

This is a linear in the number of elements and the previous step runs in quadratic time. The inner loop runs linear in i, and i runs linear in n, so we get a summation from 1 to n of i, which is O(n2).  In experimentation, dynamic ran fast even on the largest data sets, so I decided to examine the runtime a little more closely.  Even on a size of 1000 it ran in 3 seconds!


Comparison

Below is a graph of runtime versus input size.  It becomes quite clear from Fig 1 that dynamic programming is an expedient method compared to a simple recursive solution. Additionally, we can see from Fig 2 that dynamic scales far beyond what would be feasible with the simple recursion.


Graphic of input size vs runtime


Conclusion

From the above its quite clear that for any decent sized data dynamic programming is an excellent tool.  It seems (and rightly so) that this problem has many overlapping subproblems in the recursive version that will end up being calculated over and over. There is a famous adage in Computer Science: "All of Computer Science is an exercise in caching," and it is no different here. Storing the calculations to subproblems allows us to build from the ground up rather than discarding the recursive call. Finding the longest monotonic increasing subsequence is ripe for the dynamic programming approach. As the implementation shows, it is indeed possible to split our problem up into simple subproblems, and possible to combine the them into a globally optimal solution.  Most importantly, there appears to be plenty of subproblem overlap in the recursive function.


Appendix

naive.java


/** naive.java

*  a typical naive solution for finding the longest nondecreasing subsequence

*  takes a list of numbers and gives the length of the longest subsequence

* @author Justin Dugger

*/


import java.util.*;

public class naive {


  static int[] sequence;

  

  static void main(String args[]) {

    if (args.length<1) {

      System.out.println(args.length);

      System.out.println("Usage: java naive <entry1> ... <entry n>");

    }

    sequence= new int[args.length];

    for (int i = 0; i < args.length; i++) {

      //convert string parameter into an integer 

      sequence[i]=Integer.parseInt(args[i]);

    }

    java.util.LinkedList result = (new naive()).rec_substring(0, Integer.MIN_VALUE);

    Iterator i = result.iterator();

    while (i.hasNext()) { System.out.print( ((Integer)(i.next())).toString() + " "); }

    return;

  }

  

  LinkedList rec_substring(int index, int lowerBound) {

    LinkedList answer;

    if (index>=sequence.length-1) {

      answer= new LinkedList();

      if (sequence[index]>=lowerBound) {

answer.addFirst(new Integer(sequence[index]));

      }

    }

    else if (sequence[index] >= lowerBound) {

//we may wish to take the current index  

 

LinkedList dontInclude = rec_substring(index+1,lowerBound);

LinkedList include = rec_substring(index+1,sequence[index]);

//we prefer the optimal solution 

if (dontInclude.size() >= 1 +include.size()) {

   answer = dontInclude;

else {

   include.addFirst(new Integer(sequence[index]));

   answer=include; 

}

      } else {

//we cannot take the current index

answer=rec_substring(index+1,lowerBound);

      }

    

    return answer; }


  dynamic.java


/** dynamic uses dynamic programming to solve the monotonic increasing subsequnce problem

* @author Richard Buckland http://www.cse.unsw.edu.au/~cs3121/email.html

* note from justin: my approach was way off using a 2 dimensional array

* ive borrowed this implementation after discovering that there existed a way

* to do it with a single dimension (not to mention an n log n approach!)

* I've mostly removed some of the printout to reduce run time, and added a few

* comments about some of the quirkier expressions

*/


class dynamic {

  public static void main(String[] args) {

    int n; // the length of the input sequence

    int[] x; // the input sequence

    int[] lis; // lis[i] is the length of the longest increasing subsequence

    // whose final member is x[i]

    int[] previous; // if x[i] belongs to a LIS located by the algorithm,

    // previous[i] is the index of x[i]'s predecessor in

    // the LIS. The initial member of the LIS has

    // previous[i] = n

    boolean[] in_LIS; // in_LIS[i] is true iff x[i] is in the LIS

    

    int i, j, k, lis_value, lis_max, k_max, i_max, end_of_longest;

    

    

    // initialize variables and read the input sequence from command line

    

    n = args.length;

    x = new int[n];

    lis = new int[n];

    previous = new int[n];

    in_LIS = new boolean[n];

    for (i=0; i < n; i++)

      x[i] = Integer.parseInt(args[i]); 

    

    

    // find a LIS

    

    lis[0] = 1;

    previous[0] = n;

    i_max = 0;

    end_of_longest = 0;

    

    for (i=1; i<n; i++) {

      lis_max = 0;

      k_max = 0;

      for (k=0; k < i; k++) {

//note from j: zero if we can't accept this value, otherwise lis[k]

lis_value = (x[k] < x[i] ? 1 : 0) * lis[k];

if (lis_value > lis_max) {

   lis_max = lis_value;

   k_max = k;

        }

      }

      lis[i] = lis_max + 1;

      //if we can't chose this value then store a end of list marker

      if (lis_max == 0)

previous[i] = n;

      else

previous[i] = k_max;

      

      if (lis[i] > i_max) {

i_max = lis[i];

end_of_longest = i;

      }

    }

    

    //print mark all elements that are in the LIS

    j = end_of_longest;

    in_LIS[j] = true;

    while (previous[j] != n) {

      j = previous[j];

      in_LIS[j] = true;

    }

        

    //print the elements that are in the LIS

    for (i=0; i<n; i++) {

      if (in_LIS[i])

System.out.print(x[i] + " ");

    }

    System.out.println();

  }

 

}



run_me

#!/bin/bash

# compiles naive and dynamic and calls them on varying data sets


javac naive.java

javac dynamic.java


echo 10

generate 10

generate 10

generate 10

echo 20

generate 20

generate 20

generate 20

echo 50

generate 50

generate 50

generate 50

echo 100

generate 100

generate 100

generate 100

echo 125

generate 125

generate 125

generate 125


generate

#!/bin/bash

# calls naive and dynamic with the number of random arguments specified

# warning, calling with values of greater than 100 can be SLOW

RANDOM=$$$(date +%s)

arg=" "

let count=$1

while [ $count -gt 0 ]

  do arg="$arg $RANDOM"

  let count=$count-1

done

echo "$arg"

time java naive $arg

time java dynamic $arg




References


[1] Algorithm Design: Foundations, Analysis and Internet Examples, M. Goodrich and R.  Tamassia, Wiley, 2002

[2] R. Buckland, "Selected Solutions to Week 7: Self-Test Exercises."  (http://www.cse.unsw.edu.au/~cs3121/exercises/solutions/core07solution.html). Sept 17th, 2002

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值