DNA Scout - User Guide - As a DNA Scout user, here are some things to know: ========================================================================================================================================================== *Input File* You'll need to provide a DNA file of {ACTG} characters. A sample one is provided with the tool: "dna.txt." Additional DNA files can be obtained from NCBI, however, they will need to be translated from ASN. ========================================================================================================================================================== *Configuring DNA Scout* DNA Scout currently allows several searching modes and these can be individually enabled/disabled in "options.h" by changing the #define statements. The options, in detail, are: DEBUG: Enables debug output which is only useful to DNA Scout developers PRINT_TREE_ENABLED: Prints out the tree entries in alphabetical order. PARALLEL_SEARCH: Enable a pthread-based parallel exact match search EXACT_SEARCH_ENABLED: Perform an exact match single-threaded search SEARCHFILE_INPUT_ENABLED: Allows the user to put all searches in a text file (searchStrings.txt). ========================================================================================================================================================== *Memory* Large DNA files will consume large amounts of memory. Each character in the input file will consume at least 20 bytes upon being entered into the memory. Depending on the length of the match desired, different amounts of memory will be required. ========================================================================================================================================================== *Searching* All DNA permutations of a search string are automatically checked. E.g. a search of "ACTG" would also result in the reverse "GTCA" and derivation of the other DNA strand both forward and reverse. -Exact match searches- Ensure that EXACT_SEARCH_ENABLED is enabled in options.h and that you've recompiled since changing that option. Example syntax: ./binningTree2 -t sequential -f dna.txt -s ATAAATTATAGACATACACAAGTACAC Example results where only one match is found for a given string. ------------------- Exact string match search Searching for original string: ACTCTGAAACGTTCCTTATCTCTAACCGAGTCTTTGATTTTAATGTCAG --> Found: ACTCTGAAACGTTCCTTATCTCTAACCGAGTCTTTGATTTTAATGTCAG. 935 Searching for reversed string: GACTGTAATTTTAGTTTCTGAGCCAATCTCTATTCCTTGCAAAGTCTCA --> Unable to find GACTGTAATTTTAGTTTCTGAGCCAATCTCTATTCCTTGCAAAGTCTCA. Searching for flipped string: TGAGACTTTGCAAGGAATAGAGATTGGCTCAGAAACTAAAATTACAGTC --> Unable to find TGAGACTTTGCAAGGAATAGAGATTGGCTCAGAAACTAAAATTACAGTC. Searching for flipped and reversed string: CTGACATTAAAATCAAAGACTCGGTTAGAGATAAGGAACGTTTCAGAGT --> Unable to find CTGACATTAAAATCAAAGACTCGGTTAGAGATAAGGAACGTTTCAGAGT. The first line deontes that the string was found and that the match started at character 935. Example results where multiple matches were found for the string. Note that the first search result "TATATT" matches twice, at characters 401 and 702 in the input file. The difference between these two locations is auto-calculated and denoted as (Span:301). ./binningTree2 -t sequential -f dna.txt -s TATATT ------------------- Exact string match search Searching for original string: TATATT --> Found: TATATT. 401 702 (Span:301) Searching for reversed string: TTATAT --> Found: TTATAT. 400 701 (Span:301) Searching for flipped string: ATATAA --> Found: ATATAA. 420 Searching for flipped and reversed string: AATATA --> Found: AATATA. 419 -Partial Matches- This search looks for characters 0 through 4 of the search string: GTTTGTTTATT. Thus, the search will be for GTTTG and its various permutations. ./binningTree2 dna-short.txt GTTTGTTTATT 0 4 #./binningTree2 -t partial -f dna-short.txt -s GTTTGTTTATT -start 0 -end 4 PARTIAL MATCH SEARCH Searching for 5 characters (characters 0 to 4) from the Orig. String of GTTTGTTTATT Searching for substring: GTTTG Found: GTTTG in the tree at level 4 Searching for subReversed string: GTTTG Found: GTTTG in the tree at level 4 Searching for subFlipped string: CAAAC Found: CAAAC in the tree at level 4 Searching for subFlippedReversed string: CAAAC Found: CAAAC in the tree at level 4 -Parallel Search- ./binningTree2 -t parallel -f dna.txt -s ATAAATTATAGACATACACAAGTACAC -Putting search strings into files- Populate searchStrings.txt with the search strings separated by spaces or one per line. Currently, DNA Scout has a limitation in that all search strings need to be the same length and also a search string of the same length needs to be passed in as the second argument. ./binningTree2 -t exactfile -f dna.txt