Given a string, count the number of distinct subsequences of it ( including empty subsequence ). For the uninformed, A subsequence of a string is a new string which is formed from the original string by deleting some of the characters without disturbing the relative positions of the remaining characters.
For example, "AGH" is a subsequence of "ABCDEFGH" while "AHG" is not.
Input
First line of input contains an integer T which is equal to the number of test cases. You are required to process all test cases. Each of next T lines contains a string s.
Output
Output consists of T lines. Ith line in the output corresponds to the number of distinct subsequences of ith input string. Since, this number could be very large, you need to output ans%1000000007 where ans is the number of distinct subsequences.
Example
Input: 3 AAA ABCDEFG CODECRAFT Output: 4 128 496
Constraints and Limits
T ≤ 100, length(S) ≤ 100000
All input strings shall contain only uppercase letters.
It's a classic dynamic programming problem.
Let:
dp[i] = number of distinct subsequences ending with a[i]
sum[i] = dp[1] + dp[2] + ... + dp[i]. So sum[n] will be your answer.
last[i] = last position of character i in the given string.
A null string has one subsequence, so dp[0] = 1
.
read a
n = strlen(a)
for i = 1 to n
dp[i] = sum[i - 1] - sum[last[a[i]] - 1]
sum[i] = sum[i - 1] + dp[i]
last[a[i]] = i
return sum[n]
Explanation
dp[i] = sum[i - 1] - sum[last[a[i]] - 1]
Initially, we assume we can append a[i]
to all subsequences ending on previous characters(即sum[i-1]), but this might violate the condition that the counted subsequences need to be distinct. Remember that last[a[i]]
gives us the last position a[i]
appeared on until now. The only subsequences we overcount are those that the previous a[i]
was appended to, so we subtract those(sum[last[a[i]] - 1]
).
sum[i] = sum[i - 1] + dp[i]
last[a[i]] = i
Update these values as per their definition.
If your indexing starts from 0, use a[i - 1]
wherever I used a[i]
. Also remember to wrap your computations in a mod
function if you're going to submit code. This should be implemented like this:
mod(x) = (x % m + m) % m
In order to correctly handle negative values in some languages (such as C/C++).
另外:There exists an easier solution to this problem.
The idea is : If all character of the string are distinct, total number of subsequences is 2^n.
Now, if we find any character that have already occurred before, we should consider it's last occurrence only(otherwise sequence won't be distinct). So we have to subtract the number of subsequences due to it's previous occurrence.
My implementation is like this:
read s
dp[0] = 1
len = strlen(s)
for (i = 1; i <= len; i++)
{
dp[i] = (dp[i - 1] * 2)
if (last[s[i]] != 0) dp[i] = (dp[i] - dp[last[s[i]] - 1])
last[s[i]] = i