题目:
Given a string s and a string t, check if s is subsequence of t.
You may assume that there is only lower case English letters in both s and t. t is potentially a very long (length ~= 500,000) string, and s is a short string (<=100).
A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ace" is a subsequence of "abcde" while "aec" is not).
Example 1:
s = "abc", t = "ahbgdc"
Return true.
Example 2:
s = "axc", t = "ahbgdc"
Return false.
Follow up:
If there are lots of incoming S, say S1, S2, ... , Sk where k >= 1B, and you want to check one by one to see if T has its subsequence. In this scenario, how would you change your code?
这道题目不用binary-search解其实还蛮简单的,直接使用两个指针遍历两个字符串即可。但是这里也发现了一个问题,就是字符串操作中的charAt和indexOf两个函数执行效率差别很大。我们可以先看下面两种方法来进行对比:
//方法一
public boolean isSubsequence1(String s, String t) {
if (s.length() == 0) return true;
//使用两个指针分别遍历两个字符串,这里使用两次charAT函数来取字符
int indexS = 0, indexT = 0;
while (indexT < t.length()) {
if (t.charAt(indexT) == s.charAt(indexS)) {
indexS++;
if (indexS == s.length()) return true;
}
indexT++;
}
return false;
}
//方法二
public boolean isSubsequence(String s, String t) {
if(t.length() < s.length()) return false;
int prev = 0;
//同样遍历s字符串,但是使用一个charAt一个indexOf
for(int i=0; i<s.length(); i++){
char tmp = s.charAt(i);
prev = t.indexOf(tmp, prev);
if(prev == -1) return false;
prev ++;
}
return true;
}
//方法三
public boolean isSubsequence2(String s, String t) {
int fromIndex = 0;
for (char c : s.toCharArray()) {
fromIndex = t.indexOf(c, fromIndex);
if (fromIndex++ < 0) {
return false;
}
}
return true;
}
结果会发现方法一需要50ms,而方法二只需要2ms,可以说是大幅度提升了代码的执行效率,这是为什么呢,我们可以参考这个链接给出的解释:
I checked the origin code of func "indexOf" and "charAt". These two solution both traversed the char of String one by one to search the first occurrence specific char.
The difference is that indexOf only call once function then traversed in "String.value[]" arr, but we used multiple calling function "charAt" to get the value in "String.value[]" arr.
The time expense of calling function made the difference.
也就是说charAt函数会调用多次而indexOf只需要调用一次即可,所以执行效率高了很多。拿笔记本记下来==那么接下来看看题目的follow up,如果同时有多个字符串s需要判断那么问题会变得复杂起来,如果还使用上面的方法,效率会很低,因为会做很多重复的操作,每当这时候我们就需要考虑换一种数据结构来存储t,从而减少中间多次判断带来的冗余操作。那么最好的选择当然是使用hashMap了,我们将每个字符出现位置都保存下来,这样会方便很多。代码如下所示:
public boolean isSubsequence3(String s, String t) {
if (s == null || t == null) return false;
//存储t中字符与出现的索引位置
Map<Character, List<Integer>> map = new HashMap<>();
//将t的信息保存在map中
for (int i = 0; i < t.length(); i++) {
char curr = t.charAt(i);
if (!map.containsKey(curr))
map.put(curr, new ArrayList<>());
map.get(curr).add(i);
}
int prev = -1;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (map.get(c) == null) {
return false;
} else {
List<Integer> list = map.get(c);
prev = binarySearch(prev, list, 0, list.size() - 1);
if (prev == -1) {
return false;
}
prev++;
}
}
return true;
}
private int binarySearch(int index, List<Integer> list, int start, int end) {
while (start <= end) {
int mid = start + (end - start) / 2;
if (list.get(mid) < index) {
start = mid + 1;
} else {
end = mid - 1;
}
}
return start == list.size() ? -1 : list.get(start);
}
当然,我们之前也介绍过,hashMap可以转化为数组来代替,所以就有了下面这种方法,而且binary-search在java中也有内置函数提供,就是Collections.binarySearch,代码如下所示;
public boolean isSubsequence4(String s, String t) {
List<Integer>[] idx = new List[256]; // Just for clarity
for (int i = 0; i < t.length(); i++) {
if (idx[t.charAt(i)] == null)
idx[t.charAt(i)] = new ArrayList<>();
idx[t.charAt(i)].add(i);
}
int prev = 0;
for (int i = 0; i < s.length(); i++) {
if (idx[s.charAt(i)] == null) return false; // Note: char of S does NOT exist in T causing NPE
int j = Collections.binarySearch(idx[s.charAt(i)], prev);
if (j < 0) j = -j - 1;
if (j == idx[s.charAt(i)].size()) return false;
prev = idx[s.charAt(i)].get(j) + 1;
}
return true;
}