(Go back to index)

   Sample Results

   Our tool generates lists of sub-strings from the DNA input file of length N. Here is an extremely small sampling of sub-strings from the Sulfolobus DNA file of length 30.

   This input file contains 2,694,756 {ACTG} characters from the Sulfolobus DNA file.

   Our scanning code generates all possible sub-strings of length N, stores them in the tree, and records the location in the file of each sub-string.

   Searching

   Our tool currently allows the searching of exact matches of sub-strings of length N in the tree. If the string is found, a message is returned to the user along with the locations in the text file as well as the "span". The span is the       

   distance between two matches in the DNA file. 

   We also plan to provide graphical results to the user as well.

   Sample Search Result

   Found: TTTTATAACTTTTTACTTTATTTAGTTATT.

   Locations: 824759 1020445 (Span:195686) 1166209 (Span:145764) 1543241 (Span:377032) 1773710 (Span:230469) 1923991 (Span:150281) 2001272 (Span:77281) 2027882 (Span:26610) 2240658(Span:212776)

   Small sampling of sub-strings of length N from DNA input file

   AAAAAAAAAAAAGATGAACAAAATTCAGA
   AAAAAAAAAAAGATGAACAAAATTCAGAA
   AAAAAAAAAAAGATGAGTTTAACATCTGC
   AAAAAAAAAAAGTTGAATTGACAGAAGAC
   AAAAAAAAAACAAATTTAAAAAAATTCCA
   AAAAAAAAAACTGGAAGCGCTTAGCATAA
   AAAAAAAAAAGATGAACAAAATTCAGAAA
   AAAAAAAAAAGATGAGTTTAACATCTGCA
   AAAAAAAAAAGTTGAATTGACAGAAGACG
   AAAAAAAAAATCAACGATTCTCTCAATAA
   . . .

   TTTTTTTTTCATAATAAAAAGTCATAGAA
   TTTTTTTTTCTTTTAATCTGCTTTTATTT
   TTTTTTTTTTAAAAAAAAGAGCGTTAAAC
   TTTTTTTTTTAATATGGAATTTCTTTCAC
   TTTTTTTTTTTAATATGGAATTTCTTTCA
   TTTTTTTTTTTTAATATGGAATTTCTTTC
   TTTTTTTTTTTTTAATATGGAATTTCTTT

   Sub-string uniqueness

   In this particular data file, there is a high degree of unique sub-strings (where the length of the sub-string is 30). Our tool reports the total number of unique sub-strings out of the total number of possible sub-strings. The data shows

   that 98.81% of all sub-strings are unique in the datafile with only a small number of duplicates.

   Unique sub-strings:   2,662,746

   Total sub-strings:     2,694,726