Analysis of Sequence Population in a High Performance Computing Enviornment

Student(s):  Chartese Jones, Mississippi Valley State University


Modern pyrosequencing techniques have made it possible to study complex bacterial populations, such as 16 rRNA, directly from environmental or clinical samples, and the resultant data sets contain many duplicate sequences leading to redundant calculations.

Redundant Sequence Identification - In many large samples of biosequences that I have work with, I have observed that many of the sequences are repeats. By identifying these repeats upfront and tracking them in some way, we can drastically reduce our overall computation time and achieve better analysis throughput.