Chapter 1 Arrays and Strings - 1.3

The statement of problem 1.3 is:

Design an algorithm and write code to remove the duplicate characters in a string without using any additional buffer. NOTE: One or two additional variables are fine. An extra copy of the array is not.
FOLLOW UP
Write the test cases for this method.


To be honest, the problem sucks. It doesn't make it clear which one should be removed in a bunch of duplicated characters. Let's assume that we should keep the first one of several duplicated characters. (First one means the first one we meet when iterating from left to right.)

Brute force solution takes O(n^2) running time.

Recall problem 1.1 and you will find this problem is a similar one. We can simply create a hash table for all characters and it will run in O(n). However, the smart method used in problem 1.1 has a space complexity of O(k), as k is the number of characters in the character set.

Another solution flashed into my mind is sorting the string in O(nlgn) and then using one extra variable to eliminate duplicated charaters. Nevertheless, it will change the order of characters and I am not sure whether it is proper.

Test cases of the program is given below:

1) empty string

2) "a"

3) "aa"

4) "aba"

5) "bab"

6) "abcd"


Seems that I cannot find a ideal solution whose running time is less than O(n^2). OK, let's turn to the answer page...

Well, the standard answer is the O(n^2) one and the author suggests that we'd better to ask what the interviewer means by an addtional buffer? Can we use addtional array of constant size?

I implemented the brute force solution as below:

def removeDuplicates(str):
    if len(str) < 2:
        return
    i = 0
    while i < len(str) - 1:
        j = i + 1
        while j < len(str):
            if str[i] == str[j]:
                del str[j]
                j = j - 1   # --j is wrong
            j = j + 1       # ++j is wrong
        i = i + 1           # ++i is wrong


if __name__ == '__main__':
    str = 'aaaaaabaaaaa'
    # One cannot change a string,
    # so we convert it to a list for convenience
    str_list = [i for i in str] 
    removeDuplicates(str_list)
    print str_list

I went into a endless loop when firstly tried to implement the algorithm above. The wrong code is in the comments: there is no ++ operator and -- operator in Python. ++ and -- will be translated to double positive operators and double negative operators.

Thanks to the delete operation in Python and it enable me to delete the duplicated elements directly. However, in some languages, we have no such a powerful tool in hands. For languages without delete operation, the answer page gives a smart implementation:

public static void removeDuplicates(char[] str) {
	if (str == null) return;
	int len = str.length;
	if (len < 2) return;

	int tail = 1;

	for (int i = 1; i < len; ++i) {
		int j;
		for (j = 0; j < tail; ++j) {
			if (str[i] == str[j]) break;
		}
		if (j == tail) {
			str[tail] = str[i];
			++tail;
		}
	}
	str[tail] = 0;
}

In my view, the algorithm above is similar to quick sort to some extent, for both of them keep the end of a particular segment in the whole array. This kind of strategy utilizes the memory that won't be used further.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值