Finding the Longest Nondecreasing Subsequence of A Given Sequence:An analysis of dynamic programming

最新推荐文章于 2021-10-03 22:09:36 发布

alexmajy

最新推荐文章于 2021-10-03 22:09:36 发布

阅读量5.7k

点赞数

分类专栏： Algorithm 文章标签： recursion random integer algorithm generator numbers

本文链接：https://blog.csdn.net/alexmajy/article/details/467017

版权

Algorithm 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Finding the Longest Nondecreasing Subsequence of A Given Sequence:

An analysis of dynamic programming

by Justin Dugger

Kansas State University

12/5/2002

Introduction

In 1935 Erdos and Szekeres published a theorem that (among other things) said for any sequence of natural numbers of size 2n + 1 there is a monotonic subsequence of size n. I intend to investigate here a method of calculating a related sequence, the longest increasing subsequence. By definition, an increasing subsequence is also a monotonic subsequence. Finding the longest increasing substring would be of great value in finding the longest monotonic subsequence. Additionally, finding an efficient here should lead to insight for finding the longest decreasing subsequence. But the primary question in this research is whether dynamic programming is suitable for this problem. In short, the answer is yes; by using the principals of dynamic programming, a worst case O(2n) is reduced to Q(n2) on average. This is achieved by a concept similar to caching recursive calls.

The methodology for this experiment was to use a UNIX time command to record the amount of time spent on the CPU. In today's modern multitasking systems a simple system time start/finish comparison can be wildly inaccurate, especially for faster algorithms or smaller data sets. A script was written to generate random data for the algorithm of a given size, and then record the runtime for both a naive implementation and a dynamic programming approach. The language chosen for this experiment was Java. Being widely available and a having rather clear syntax made it an appropriate choice. Solaris was chosen for its availability on campus and in the Department. This experiment should be easily repeatable anywhere Java and bash are installed. All the code needed for this experiment is available at http://www.cis.ksu.edu/~jld5445/575experiment.html.

The results of all this was that dynamic programming is a big win, especially on larger data sets. On large datasets the asymptotic run times are especially flagrant. There is some noise in the run time data, but the error is likely due to the large cost of startup times outweighing the runtime on small input sizes.

The Scripts

The shell script is a bash shell script that takes a single argument and generates that many random numbers for parameters to the implementations. Of some concern here would be the quality of the pseudo random number generator used for $RANDOM, however dumping off tons of $RANDOM into a file modulus 8 had not run the period with over a million iterations. This should be of suitable use for the script, however your environment may vary; a bug was reported for a version of bash newer than the one installed in CISUNIX citing a period of 128.

After generating the random argument values the script calls time java <dynamic,naive>, and returns the results on screen. time returns three values, the real time, the time spent in user mode and the time spent in system mode. Of these the most important is user mode, this represents the time spent calculating things and calling functions. The system mode time represents time spent in system calls. These are usually things like I/O and other considerations not reflective of the algorithms themselves.

For ease of experimentation, this script was ran 3 times for inputs of size 10, 50, 100 and 200, and an average score was calculated by hand. It should be possible to modify the script to record the results of the time command into a file for statistical analysis in a program like MS Excel or possibly MatLab. To use the script, simply run run_me. This will execute another script generate with increasing parameter value. Be wary of calling generate on large values (>100), it quickly becomes apparent that naive.java is slow.

In addition, I wrote a small script to investigate the dynamic class alone. The speed is impressive and hardly matches a quadratic run time. However, fairly quickly the random number generator becomes a limiting factor. It doesn't affect our scientific results, but it does take 300 seconds to generate enough data to power dynamic for 3 seconds.

The Naive Implementation

The naive class implements a recursive call on the system. After processing the arguments from the command line into an array, a recursive function rec_substring(int,int) is called. The first parameter represents an index into the array holding our sequence. The second represents the hurdle any new elements must meet to be added to the list. The recursion stops when the index hits the end of the sequence, otherwise it decides if the current index can be included, then takes the maximum of not including it (thus preserving the old lower bound) and electing to include it.

The intermediate subsequences themselves are stored in a standard java.uti.LinkedList. If the Java programmers are any good at their job, both prepending to the list and calculating the size run in constant time.

The Dynamic Implementation

Before I discuss the implementation of the dynamic class, I should mention in brief why Dynamic Programming(DP) is appropriate here. Our course book mentions 3 conditions that should be satisfied for dynamic programming to work well.

" Simple Subproblems: There has to be a way of breaking the global problem up into smaller subproblems of similar structure.
Subproblem Optimality: An optimal solution to the global problem must be a composition of optimal subproblem solutions, using a simple combining operation
Subproblem Overlap: Optimal solutions to unrelated subproblems can contain subproblems in common. The more of this overlap the more efficient DP will be." [1]

The dynamic class implements a dynamic programming approach to the problem.

After processing the String arguments into an integer array, we begin the dynamic programming style. Dynamic programming usually consists of three parts:

Initializing the array with the base cases on the recursion relation.
Filling the array
Back-tracing the array to find the optimal solution

First we need a recursion system. Our recursion relation for this system will be on a single variable, rather than the two as above. The recurrence relation shall be

L0= 1

Li = max0<j<i Lj +1, when sk<si

So the the array at index 0 is initialized to 1, and a tournament is held to find the largest substring ending at j subject to the constraint that the the jth number in the sequence is less than the ith number. But in this situation we need another array to keep track of the previous values because there's no way to tell which of the previous values was the ancestor. Instead doing number 3, an array storing ancestors is kept. which is set during iteration. When iteration is finished we simply mark the entries in a corresponding boolean array for each used entry and then parse the bitmap forwards separately to generate the actual subsequence.

This is a linear in the number of elements and the previous step runs in quadratic time. The inner loop runs linear in i, and i runs linear in n, so we get a summation from 1 to n of i, which is O(n2). In experimentation, dynamic ran fast even on the largest data sets, so I decided to examine the runtime a little more closely. Even on a size of 1000 it ran in 3 seconds!

Comparison

Below is a graph of runtime versus input size. It becomes quite clear from Fig 1 that dynamic programming is an expedient method compared to a simple recursive solution. Additionally, we can see from Fig 2 that dynamic scales far beyond what would be feasible with the simple recursion.

Graphic of input size vs runtime

Conclusion

From the above its quite clear that for any decent sized data dynamic programming is an excellent tool. It seems (and rightly so) that this problem has many overlapping subproblems in the recursive version that will end up being calculated over and over. There is a famous adage in Computer Science: "All of Computer Science is an exercise in caching," and it is no different here. Storing the calculations to subproblems allows us to build from the ground up rather than discarding the recursive call. Finding the longest monotonic increasing subsequence is ripe for the dynamic programming approach. As the implementation shows, it is indeed possible to split our problem up into simple subproblems, and possible to combine the them into a globally optimal solution. Most importantly, there appears to be plenty of subproblem overlap in the recursive function.

Appendix

naive.java

/** naive.java

* a typical naive solution for finding the longest nondecreasing subsequence

* takes a list of numbers and gives the length of the longest subsequence

* @author Justin Dugger

import java.util.*;

public class naive {

static int[] sequence;

static void main(String args[]) {

if (args.length<1) {

System.out.println(args.length);

System.out.println("Usage: java naive <entry1> ... <entry n>");

}

sequence= new int[args.length];

for (int i = 0; i < args.length; i++) {

//convert string parameter into an integer

sequence[i]=Integer.parseInt(args[i]);

}

java.util.LinkedList result = (new naive()).rec_substring(0, Integer.MIN_VALUE);

Iterator i = result.iterator();

while (i.hasNext()) { System.out.print( ((Integer)(i.next())).toString() + " "); }

return;

}

LinkedList rec_substring(int index, int lowerBound) {

LinkedList answer;

if (index>=sequence.length-1) {

answer= new LinkedList();

if (sequence[index]>=lowerBound) {

answer.addFirst(new Integer(sequence[index]));

}

else if (sequence[index] >= lowerBound) {

//we may wish to take the current index

LinkedList dontInclude = rec_substring(index+1,lowerBound);

LinkedList include = rec_substring(index+1,sequence[index]);

//we prefer the optimal solution

if (dontInclude.size() >= 1 +include.size()) {

answer = dontInclude;

}

else {

include.addFirst(new Integer(sequence[index]));

answer=include;

}

} else {

//we cannot take the current index

answer=rec_substring(index+1,lowerBound);

}

return answer; }

dynamic.java

/** dynamic uses dynamic programming to solve the monotonic increasing subsequnce problem

* @author Richard Buckland http://www.cse.unsw.edu.au/~cs3121/email.html

* note from justin: my approach was way off using a 2 dimensional array

* ive borrowed this implementation after discovering that there existed a way

* to do it with a single dimension (not to mention an n log n approach!)

* I've mostly removed some of the printout to reduce run time, and added a few

* comments about some of the quirkier expressions

class dynamic {

public static void main(String[] args) {

int n; // the length of the input sequence

int[] x; // the input sequence

int[] lis; // lis[i] is the length of the longest increasing subsequence

// whose final member is x[i]

int[] previous; // if x[i] belongs to a LIS located by the algorithm,

// previous[i] is the index of x[i]'s predecessor in

// the LIS. The initial member of the LIS has

// previous[i] = n

boolean[] in_LIS; // in_LIS[i] is true iff x[i] is in the LIS

int i, j, k, lis_value, lis_max, k_max, i_max, end_of_longest;

// initialize variables and read the input sequence from command line

n = args.length;

x = new int[n];

lis = new int[n];

previous = new int[n];

in_LIS = new boolean[n];

for (i=0; i < n; i++)

x[i] = Integer.parseInt(args[i]);

// find a LIS

lis[0] = 1;

previous[0] = n;

i_max = 0;

end_of_longest = 0;

for (i=1; i<n; i++) {

lis_max = 0;

k_max = 0;

for (k=0; k < i; k++) {

//note from j: zero if we can't accept this value, otherwise lis[k]

lis_value = (x[k] < x[i] ? 1 : 0) * lis[k];

if (lis_value > lis_max) {

lis_max = lis_value;

k_max = k;

}

lis[i] = lis_max + 1;

//if we can't chose this value then store a end of list marker

if (lis_max == 0)

previous[i] = n;

else

previous[i] = k_max;

if (lis[i] > i_max) {

i_max = lis[i];

end_of_longest = i;

}

//print mark all elements that are in the LIS

j = end_of_longest;

in_LIS[j] = true;

while (previous[j] != n) {

j = previous[j];

in_LIS[j] = true;

}

//print the elements that are in the LIS

for (i=0; i<n; i++) {

if (in_LIS[i])

System.out.print(x[i] + " ");

}

System.out.println();

}

run_me

#!/bin/bash

# compiles naive and dynamic and calls them on varying data sets

javac naive.java

javac dynamic.java

echo 10

generate 10

echo 20

generate 20

echo 50

generate 50

echo 100

generate 100

echo 125

generate 125

generate

#!/bin/bash

# calls naive and dynamic with the number of random arguments specified

# warning, calling with values of greater than 100 can be SLOW

RANDOM=$$$(date +%s)

arg=" "

let count=$1

while [ $count -gt 0 ]

do arg="$arg $RANDOM"

let count=$count-1

done

echo "$arg"

time java naive $arg

time java dynamic $arg

References

[1] Algorithm Design: Foundations, Analysis and Internet Examples, M. Goodrich and R. Tamassia, Wiley, 2002

[2] R. Buckland, "Selected Solutions to Week 7: Self-Test Exercises." (http://www.cse.unsw.edu.au/~cs3121/exercises/solutions/core07solution.html). Sept 17th, 2002