For future study abord, I decide to practise wiriting in English, starting from technical articles.

# Repeated DNA Sequences

Problem: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG".When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.)

For example,

Given

`s ="AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"`

Output:

`["AAAAACCCCC", "CCCCCAAAAA"]`

This problem can recognized as the string matching problem from the first look. The instinctive solution is to try every potional string pairs exhaustivity, but this should be very slow. There are \(N\) length DNA, which means the potional substring is \(N-10\), as a result, there are going to be need \(10(N-10)^2\) compares. It's \(O(n^2)\) !

Actually, there are some other possbile solutions, which is quicker than the \(O(n^2)\).

Observing this problem carefully, we can find that there are only 4 different characters that needed to be compared. Why shouldn't we make a map, mapping the character to the binary code, then we can compare numbers instead of characters.

A \(10\) length DNA sequence can be mapped into a integer and we only need to compare the integer.

More specifically, our maps are: `A=00`

,`C=01`

,`G=10`

and `T=11`

, and the substring `AAAAACCCCC`

can be represented as `0101010101=341`

. This will be much more convient for us to compare substrings.

**Mappings** is the core idea for this problem and the rest is implementing:

```
#!cpp
vector<string> findRepeatedDnaSequences(string s)
{
vector<string> reval;
unordered_set<int> tmp;
if(s.size() <= 10)
return reval;
unordered_set<int> container;
int i,j;
int num = init(s);
container.insert(num);
for(i = 1, j = 10; j < s.size(); ++i,++j)
{
num = num << 2;
num = num | map(s[j]);
num = num & 1048575;
if(container.find(num) != container.end() && tmp.find(num) == tmp.end())
{
reval.push_back(s.substr(i,10));
tmp.insert(num);
}
container.insert(num);
}
return reval;
}
int init(string s)
{
int reval = 0;
for(int i = 0;i < 10; ++i)
{
reval = reval << 2;
reval = reval | map(s[i]);
}
return reval;
}
int map(char DNA)
{
if (DNA == 'A')
return 0;
else if(DNA == 'C')
return 1;
else if(DNA == 'G')
return 2;
else if(DNA == 'T')
return 3;
else
return 0;
}
```

# Number of 1 Bits

Problem: Write a function that takes an unsigned integer and returns the number of`1`

bits it has (also known as the Hamming weight).For example, the 32-bit integer

`11`

has binary representation`00000000000000000000000000001011`

, so the function should return 3.)

This should be easy, we only need to ues the shift operation to check out how many `1`

in the given integer. We need 32 shifts at most.

The code is easy and short:

```
#!cpp
int hammingWeight(uint32_t n) {
uint32_t m = n;
int count = 0;
while(m > 0)
{
count += (m & 1);
m = m >> 1;
}
return count;
}
```

# Reverse Bits

Problem: Reverse bits of a given 32 bits unsigned integer.For example

Given input

`43261596`

(represented in binary as`00000010100101000001111010011100`

),return

`964176192`

(represented in binary as`00111001011110000010100101000000`

).

This should be easy too.

Due to the fact that `X & 1 = X`

and `X | 0 = X`

, in which `X`

can be both `0`

and `1`

.

We can use `X & 1 = X`

to get the last bit of the given integer and use `X | 0 = X`

to give the bit to new integer.

We use two different ways beacuse when we 'cut off'(using `X & 1 = X`

) the last bit of given integer, we should make sure that other bits become `0`

(`using X & 0 = 0`

), and when we 'attach'(using `X | 0 = X`

) the bit to the new integer, we also should make sure the other bits are uneffected(using `X | 0 = 0`

).

The codes are following:

```
#!cpp
uint32_t reverseBits(uint32_t n) {
uint32_t x = n;
uint32_t y = 0;
uint32_t t;
int i = 0;
while(true)
{
t = x & 1;
y = y | t;
i += 1;
if(i >= 32)
{
break;
}
x = x >> 1;
y = y << 1;
}
return y;
}
}
```

###### Comments

So what do you think? Did I miss something? Is any part unclear? Leave your comments below