187. Repeated DNA Sequences

Dare2Solve

Dare2Solve

187. Repeated DNA Sequences
SAMSUNG 49-Inch Odyssey G9
SAMSUNG 49-Inch Odyssey G9
Because earth is not flat

Description

The task is to find all 10-letter-long sequences (substrings) that occur more than once in a given string. Each sequence should be returned as a list of strings. For example, given the string "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", the repeated 10-letter sequences are "AAAAACCCCC" and "CCCCCAAAAA".

Intuition

The problem is about detecting repeating sequences within a string. We need to identify and list substrings of exactly 10 characters that appear more than once. The key challenge is efficiently tracking and comparing these substrings to find duplicates.

Approach

  1. Handle Short Strings:

    • If the length of the input string is less than 10, return an empty list immediately since no 10-letter-long sequence can exist.
  2. Initialize Data Structures:

    • Use a map (or hash table) to keep track of the occurrences of each 10-letter substring.
    • Use a list to collect substrings that occur more than once.
  3. Iterate Through the String:

    • Loop through the string, extracting each possible 10-letter substring.
    • Update the map with the count of each substring.
    • If a substring is encountered for the second time, add it to the result list.
  4. Return Results:

    • After processing the string, return the list of repeated substrings.

Complexity

Time Complexity:

O(n), where n is the length of the input string. The algorithm processes each character of the string once and performs constant-time operations for each substring.

Space Complexity:

O(n), where n is the length of the input string. Space is used for the map and result list, which store occurrences of substrings and repeated substrings respectively.

Code

C++

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        vector<string> result;
        if (s.length() < 10) return result; // Handle short strings

        unordered_map<string, int> map;

        for (int i = 0; i <= s.length() - 10; ++i) {
            string substring = s.substr(i, 10);
            if (map.count(substring)) {
                if (map[substring] == 1) {
                    result.push_back(substring);
                }
                map[substring]++;
            } else {
                map[substring] = 1;
            }
        }
        
        return result;
    }
};

Python

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> list[str]:
        map = {}
        result = []

        for i in range(len(s) - 9):
            substring = s[i:i + 10]
            if substring in map:
                if map[substring] == 1:
                    result.append(substring)
                map[substring] += 1
            else:
                map[substring] = 1
        
        return result

Java

class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        Map<String, Integer> map = new HashMap<>();
        List<String> result = new ArrayList<>();

        for (int i = 0; i <= s.length() - 10; i++) {
            String substring = s.substring(i, i + 10);
            if (map.containsKey(substring)) {
                if (map.get(substring) == 1) {
                    result.add(substring);
                }
                map.put(substring, map.get(substring) + 1);
            } else {
                map.put(substring, 1);
            }
        }
        
        return result;
    }
}

JavaScript

var findRepeatedDnaSequences = function (s) {

    let map = new Map();
    let result = [];

    for (let i = 0; i <= s.length - 10; i++) {
        let substring = s.substring(i, i + 10);

        if (map.has(substring)) {
            if (map.get(substring) === 1) {
                
                result.push(substring);
            }
            map.set(substring, map.get(substring) + 1);
        } else {
            map.set(substring, 1);
        }
    }
    return result;

};