438. Find All Anagrams in a String
Given two strings s and p, return an array of all the start indices of p’s anagrams in s. You may return the answer in any order.
An Anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.
Example 1:
Input: s = “cbaebabacd”, p = “abc”
Output: [0,6]
Explanation:
The substring with start index = 0 is “cba”, which is an anagram of “abc”.
The substring with start index = 6 is “bac”, which is an anagram of “abc”.
Example 2:
Input: s = “abab”, p = “ab”
Output:[0,1,2]
Explanation:
The substring with start index = 0 is “ab”, which is an anagram of “ab”.
The substring with start index = 1 is “ba”, which is an anagram of “ab”.
The substring with start index = 2 is “ab”, which is an anagram of “ab”.
Constraints:
- 1 < = s . l e n g t h , p . l e n g t h < = 3 ∗ 1 0 4 1 <= s.length, p.length <= 3 * 10^4 1<=s.length,p.length<=3∗104
- s and p consist of lowercase English letters.
From: LeetCode
Link: 438. Find All Anagrams in a String
Solution:
Ideas:
1. Character Frequency Count
The algorithm starts by creating two arrays, sCount and pCount, each of size 26 (representing the 26 letters of the English alphabet). These arrays are used to keep track of the frequency of each character in p and the current window of s of size equal to the length of p. For example, pCount[‘a’ - ‘a’] would hold the frequency of ‘a’ in p.
2. Sliding Window Technique
The core of the algorithm involves sliding a window of size equal to the length of p across s. At each step, it updates the sCount array to reflect the frequency of characters within this window. This technique allows the algorithm to efficiently check every possible substring of s that could be an anagram of p.
3. Comparison and Collection of Results
As the window slides, the algorithm compares the sCount and pCount arrays. If these arrays are identical (i.e., if memcmp(pCount, sCount, sizeof(pCount)) == 0), it means the characters within the current window of s have the exact same frequency as those in p, making the substring an anagram of p. The start index of such windows is stored in an array indices.
4. Dynamic Memory Allocation
The indices array, which stores the start indices of all anagrams found in s, is dynamically allocated with an initial size equal to the length of s. This size is likely larger than needed, so after collecting all indices, the array is resized to the actual number of indices found using realloc. This ensures memory efficiency.
5. Efficient Window Update
To move the window forward, the algorithm decreases the count of the character that is exiting the window and increases the count of the character entering the window. This update is done in constant time and avoids having to recount characters from scratch for every new window position.
Code:
/**
* Note: The returned array must be malloced, assume caller calls free().
*/
int* findAnagrams(char* s, char* p, int* returnSize) {
int sLen = strlen(s), pLen = strlen(p);
if (sLen < pLen) {
*returnSize = 0;
return NULL;
}
int sCount[26] = {0}, pCount[26] = {0};
// Initialize the count arrays for p and the first window in s
for (int i = 0; i < pLen; i++) {
pCount[p[i] - 'a']++;
sCount[s[i] - 'a']++;
}
int* indices = (int*)malloc(sLen * sizeof(int));
int indicesCount = 0;
// Iterate through s with a window size of pLen
for (int i = 0; i <= sLen - pLen; i++) {
// If the counts match, it's an anagram, save the start index
if (memcmp(pCount, sCount, sizeof(pCount)) == 0) {
indices[indicesCount++] = i;
}
// Move the window: remove the starting character and add the next character
sCount[s[i] - 'a']--;
if (i + pLen < sLen) {
sCount[s[i + pLen] - 'a']++;
}
}
*returnSize = indicesCount;
// Resize the array to the actual number of indices found
indices = (int*)realloc(indices, indicesCount * sizeof(int));
return indices;
}