All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: ACGAATTCCG. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT, Return: [AAAAACCCCC, CCCCCAAAAA].解題報告:標准字符轉二進制的hash table。直接用string進行存儲會內存超時。
采用( s[i] - 'A' + 1 ) % 5 )把A,C,G,T,轉換為1,3,2,0;t = ( (t<<2) & 0xfffff) | ( ( s[i] - 'A' + 1 ) % 5 );進行20位的存貯。
class Solution {
public:
vector findRepeatedDnaSequences(string s) {
vector result;
unordered_map str;
unordered_map::iterator it;
int t;
for (int i = 0; i < s.size() ;i++) {
t = ( (t<<2) & 0xfffff) | ( ( s[i] - 'A' + 1 ) % 5 );
if (i < 9) continue;
it = str.find(t);
if (it == str.end())
str[t] = 1;
else {
if (str[t] == 1) {
str[t] = 2;
result.push_back(s.substr(i-9,10));
}
}
}
return result;
}
};