
Description
The task is to find all 10-letter-long sequences (substrings) that occur more than once in a given string. Each sequence should be returned as a list of strings. For example, given the string "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
, the repeated 10-letter sequences are "AAAAACCCCC"
and "CCCCCAAAAA"
.
Intuition
The problem is about detecting repeating sequences within a string. We need to identify and list substrings of exactly 10 characters that appear more than once. The key challenge is efficiently tracking and comparing these substrings to find duplicates.
Approach
- Handle Short Strings: - If the length of the input string is less than 10, return an empty list immediately since no 10-letter-long sequence can exist.
- Initialize Data Structures: - Use a map (or hash table) to keep track of the occurrences of each 10-letter substring. - Use a list to collect substrings that occur more than once.
- Iterate Through the String: - Loop through the string, extracting each possible 10-letter substring. - Update the map with the count of each substring. - If a substring is encountered for the second time, add it to the result list.
- Return Results: - After processing the string, return the list of repeated substrings.
Complexity
Time Complexity:
O(n)
, where n
is the length of the input string. The algorithm processes each character of the string once and performs constant-time operations for each substring.
Space Complexity:
O(n)
, where n
is the length of the input string. Space is used for the map and result list, which store occurrences of substrings and repeated substrings respectively.
Code
C++
class Solution {
public:
vector findRepeatedDnaSequences(string s) {
vector result;
if (s.length() < 10) return result; // Handle short strings
unordered_map map;
for (int i = 0; i <= s.length() - 10; ++i) {
string substring = s.substr(i, 10);
if (map.count(substring)) {
if (map[substring] == 1) {
result.push_back(substring);
}
map[substring]++;
} else {
map[substring] = 1;
}
}
return result;
}
};
Python
class Solution:
def findRepeatedDnaSequences(self, s: str) -> list[str]:
map = {}
result = []
for i in range(len(s) - 9):
substring = s[i:i + 10]
if substring in map:
if map[substring] == 1:
result.append(substring)
map[substring] += 1
else:
map[substring] = 1
return result
Java
class Solution {
public List findRepeatedDnaSequences(String s) {
Map map = new HashMap<>();
List result = new ArrayList<>();
for (int i = 0; i <= s.length() - 10; i++) {
String substring = s.substring(i, i + 10);
if (map.containsKey(substring)) {
if (map.get(substring) == 1) {
result.add(substring);
}
map.put(substring, map.get(substring) + 1);
} else {
map.put(substring, 1);
}
}
return result;
}
}
JavaScript
var findRepeatedDnaSequences = function (s) {
let map = new Map();
let result = [];
for (let i = 0; i <= s.length - 10; i++) {
let substring = s.substring(i, i + 10);
if (map.has(substring)) {
if (map.get(substring) === 1) {
result.push(substring);
}
map.set(substring, map.get(substring) + 1);
} else {
map.set(substring, 1);
}
}
return result;
};