CS 246: Objective-oriented programming

Objective oriented programming

Knuth-Morris-Pratt algorithm

In computer science, the Knuth-Morris-Pratt string searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re0examination of previously matched characters.
The algorithm was conceived in 1970 by Donald Knuth and Vaughan Pratt, and independelty by James H.Morris.

The most straightforward algorithm is to look for a character match at successive values of the index m, the posisiotn in the string being searched, ie. S[m]. If the index m reaches the end of the string then there is no match, in which case the search is said to fial. At each position m the algorithm first checks for equality of the first character in the word being searched, I.e. S[m] = ? W[0]. If a match is found, the algorithm tests the other characters in the word being searched by checking successive values of the word position index, I. The algorithm retrieves the character W[I] in the word being searched and checks for equality of the expression S[m+i] = ? W[I]. If all successive characters match in W at position m, then a match is found at that position in the search string.
Usually, the trail check will quickly reject the trail match. If the strings are uniformly distributed random letters, then the chance that charaters match is 1 in 26. In most cases, the tral check will reject the match at the initial letter. The chance that the first two letters will match is 1 in 26^2 ( 1 in 676). So if the characters are random, then the expected complexity of searching string S[] of length k is on the order of k comparisons or O(k). The expected performance is very good. If S[] is 1 billion characters and W[] is 100 characters, then the string search should complete after about one billion character comparisons.
The expected peroframnce is not guaranteed. If the strings are not random, then checking a trial m may take many character comparisons. The worst case is if the two strings match in all but the last letter. Imagine that the string S[] consists of 1 billion characters that are all A, and that the word W[] is 999 A characters terminating in a final B character. The simple string matching algorithm will now examine 1000 characters at each trail position before rejecting the match and advancing the trail position. The simple string search example would now take about 1000 character comparisons times 1 billion positions for 1 trillion character comparisons. If the length of W[] is n, then the worst-case performance is O(kn)

The KMP algorithm has a better worst-case performance than the straightforward algorithm. KMP spends a little time precomputing a table, and then it uses that table to do an efficient search of the string in O(k)

The difference is that KMP makes use of previous match information that the straightforward algorithm does not.

Description of pseudocode for the search algorithm

We assume the existence of a "partial match" table T, described below, which indicates where we need to look for the start of a new match in the event that the current one ends in a mismatch. The entries of T are constructed so that if we have a match starting at S[m] that fails when comparing S[m + I] to W[I], then the next possible match will start at index m + I - T[i] in S. This has two implications: first, T[0] = -1, which indicates that if W[0] is a mismatch, we cannot backtrack and must simply check the next character. and second, although the next possible match will begin at index m + I - T[i], as in the example above, we need not actually check any of the T[I] characters after that, so that we continue searching from W[T[I]]. The following is a sample pseudocode implementation of the KMP search algorithm.

algorithm kmp_search:

input:
    an array of characters, S (the text to be searched)
    an array of characters, W (the word sought)
output:
    an integer (the zero-based position in S at which W is found)
define variables:
    an integer, m <- 0 (the beginning of the current match in S)

Lecture 3: egrep | permissions| Scripts

egrep cs246 index.html
[]:= pattern
|:= or, h|c matches h or c. cat|dog matches cat or dog
usage: regexpattern file
* := 0 or more repetitions of the preceding subexpression
(cs)*246 := ex. 246, cs246, cscs246
cs*246 := ex. c246, cs246, css246
for * to not have a special meaning, escape it by using
cs\*246
?:= 0 or 1 of proceeding subexpression
ex. cs ?246:= cs246, cs 246 (cs )?246:= 246, cs 246
+:= 1 or more of the proceeding subexpression
.:= any one characters
.*: any number of any characters
ex. cs.*246:= matches lines that contain a string starting with cs & ending with 246

use ^ to indicate pattern should match start of line
use $ to indicate pattern must end with end of line
"^cs.*246$"
^(cs.*)246$:= 246, csb246, csbcsa246

e.g. Fetch lines of even length
(..)* is not enough as you need to say the pattern start and end with (..)*
correct: ^(..)*$
e.g. files in the current directory that contain exactly one a
ls| egrep "^[^a]a[^a]*$" if "" is not added then will be interpreted as a glopping item
words in the dictionary containing exactly 5 characters
egrep ".{5}" file

Permissions

ls -l:= long listing of pwd
--rwxr--xr-- 1 naneem staff 28 Jan10 abcd.txt
type|user bits| group bits | others bits shortcuts owners group size last modified filename
w: write
x: execute as a program
r: read
group bits: permissions all others members of group have
user bits: permissions that the owner has
others bits: permissions that all others(people not in group staff in this case) has

r for directory:= read contents
w for directory:= write contents
x for directory:= enters a directory

Changing Permissions

only the owner can change permissions
chmod mode file(s)
mode:= ownership class:= U_users, g_group, o_others, a_all
mode:= operator:= +, -, =
mode:= r, w, x

425
100, 010, 101
r--, -w-, r-x

Programming using shell

vairables
$> x=1
variable=value

access the value

usage: $x (beter ${x})
In comparison: $(...) is the embedded command line
e.x. x=1
echo $x
y = 1, not working because of space
values in shell are stored as strings
dir=~/1171/a1
cd $dir
echo '$dir': single quote won't let embedded code to excecute

Script

Script: text file with Linux commands that are executed as a program
#!/bin/bash:= shebang line change this file into command line file
e.x. File basic
chmod a+x basic
Usage: basic, doesn't work, tell the shell where to find it for example ./basic
tells the shell that basic is in the current directory
Arguments to scripts: e.g. is a word in the dictionary
usage: ./isItAWord($0) hello($1)

cat isItAWord:= egrep "^$1$" /usr/share/dict/words

!/bin/bash

Answers whetehr a candidate word might be a good password

usage () {

echo "Usage: $0 password" >&2 (find in piazza| redirecting)
exit 1

}
Using $0 so that if someone change the file name, it still work
${#} := number of args to script
if [${#} -ne 1]; then

usage

fi

syntax for if: if [condition]; then statement elif [condition]; then statement else statement /n fi
egrep "^$1$" /usr/share/dict/words > /dev/null
Loops
e.g. Print numbers 1 to $1
x = 1
while [$x -le $1]; do

echo ${x}
x = $((x+1))

done

e.x. for x in a b c d; do echo $x; done
File: RenameC
change all .C files to .cc

  1. mv Hello.C Hello.cc

  2. filename = Hello.C mv ${filename} ${filename%C}cc

for name in *.C; do

mv ${name} ${name%C}cc

done
eg. count the number of occurrence of $1 in file $2
count = 0
for word in $(cat $2); do

if [${word}] == $1]; then
    count = $((count + 1))
fi

done
echo ${count}
usage: ./countWords "To be" ../sample.txt "" prevent the shell from treat to be as two arguments

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值