The overall goal of this procedure is to explore mid-range, non-random ness, also called in homogeneity of a user specified genomic sequence. This is accomplished by first examining the oligonucleotide composition of an input sequence and generating a frequency table of oligonucleotides comprising the sequence. This goal is achieved by invoking the SRI Analyzer program.
The second step is to produce random sequences that have exactly the same oligonucleotide composition as the input sequence. This is achieved by invoking the SRI generator program. The third step of the procedure is to find segments within the input and randomized sequences that are significantly enriched by a particular nucleotide or nucleotide combination, for example, GC or ag.
This goal is achieved by the MRI Analyzer program. The final step of the procedure is to download the files containing sequences of all MRI regions detected by the program. Ultimately, results can be obtained that show a majority of genomic regions of multicellular arias are significantly enriched by segments containing base compositional extremes detected for each of four nucleotides or any of their combinations.
These in homogeneous regions are associated with unusual DNA confirmations and or particular DNA properties. This method can help answer key questions in the genomics field, such as finding potentially functional DNA elements within vast areas of non-coding sequences, including intergenic regions or entrances. Though this method can provide insights into mammalian genomes, it can also be applied to other organisms such as invertebrates, plants, fungi, and bacteria.
Open the homepage of the online genomic mid-range in homogeneity or G GM I package at www.bioinfo.utoledo. edu/gri/the web resource. Also provides detailed information on the programs in the help how to Readme link.
While all published materials on genomic MRI and similar algorithms are listed in the links to relevant resources link create a file with faster formatted sequence to start A-G-M-R-I analysis session. A faster formatted sequence begins with a carrot or the greater than symbol, followed by a unique identifier or name and a sequence on the following lines. Only a TG and C nucleotides will be processed by genomic MRI, although other character inputs are allowed.
The sequence file for the PKD one gene is used. To demonstrate the session, press the start button. This opens a new webpage, which gives the option to copy and paste the input sequence in the provided window, or if the file is large.
To upload it, click the choose file button to upload a file. Browse to the desired file. Then click the start this session with this file button Confirmation that the file was successfully uploaded is displayed at the top of the page.Below.
This message is an identifier of the current session, which in this case is C eight EP oh six. To analyze the short range in homogeneity of the input sequence, click the analyze short range in homogeneity button. The program opens a new page, specifically designed for execution of the SRI Analyzer program.
Here we need to choose the maximal length of oligo nucleotides to be examined. Since our input sequences of medium size approximately 50, 000 nucleotides, we select formers as the highest length of oligonucleotides for which frequencies will be calculated. Finally, we need to press analyze file button immediately.
The program computes the frequencies of all oligonucleotides of the chosen length and smaller lengths within the input sequence. To see the frequencies of all oligonucleotides, click on the link download composition file. This file named user file.
com represents a table with three columns. The first column specifies oligonucleotides. The second represents their relative frequencies, and the third represents the number of oligonucleotide occurrences within the input sequence.
To generate randomized sequences with the same oligonucleotide composition as the input file, click on the SRI generator tab. On the new page, choose the number of samples of random sequences to be generated. Each of these samples will contain random sequences of the same number and length as the input sequences in user file.
In this example, the user file is the sequence of the PKD one gene. Next, choose the longest length of oligonucleotides for which frequencies will be approximated in the randomized sequences by selecting the radio button for formers, which stands for four base oligonucleotides. Next, choose two for the number of samples of randomized sequences to be generated.
Finally, start the program by clicking the generate file button for the input sequences of hundreds of thousands of nucleotides. It could take a couple of minutes to generate random sequences. Thus wait until blue download links appear at the bottom of this page.
To analyze mid-range in homogeneity of input and randomized sequences, click on the MRI analyzer tab. On the new page, select a sequence to be analyzed from the file to analyze list box. Then choose GC content from the list of seven content types.
GC content stands for G plus C composition. The window size field allows selection of the length of the window for which content rich and poor sequences will be examined. Keep the default window size of 50 nucleotides.
Finally, choose the upper and lower threshold for content rich and content poor regions respectively. These thresholds can be defined by the number of particular nucleotides in the current window or by the percentage of these nucleotides in the window. In this case, let us initially choose random values of 60%for upper threshold and 30%for lower threshold.
Next, press the analyze file button Following this, a link to the output file appears and a graphical representation of the results is displayed. All content rich regions along the input sequence are marked as blue upward spikes and content poor regions as red downward spikes. The graph shows that there are too many blue spikes that show GC rich regions and many fewer red spikes that show GC poor regions.
This means that the chosen parameters are not optimal. For the next iteration, use a 75%upper threshold and a 32%lower threshold. Again, initiate MRI analyzer by clicking analyze file button.
The new graph shows that even for the new much more stringent parameters, there are hundreds of blue spikes. Therefore, increase the upper threshold to 80%and repeat the calculations. The new graph shows that 62 GC rich regions have GC content greater than or equal to 80%and 28 GC poor regions with GC content below 32%Next, compare the results for the input file with the two randomized sequences that have the same oligonucleotide composition as the PKD one gene.
To do this, change the sequence from PKD one gene to the random one using the file to analyze list box while keeping the same parameters for GC rich and GC poor regions. The newly generated graphical display for the randomized sequence number one illustrates that this random sequence has only one GC poor region and 31 GC rich regions. Another randomized sequence may present some fluctuation in the number of content rich and content poor regions, but the trend should be the same.
Randomized sequences for PKD one gene have several times less. GC rich regions are more than 10 times less GC poor regions. The next demonstration uses another type of MRI content.
In the same pkg one human gene introns of this gene contains several purine rich regions associated with DN aex structures. In the MRI analyzer webpage, switch the DNA content to ag nucleotides that represent purines. Keep the window size equal to 50 bases and change the upper threshold to 80%and the lower threshold to 10%Then press the analyze file button to invoke the program.
The displayed graph shows that this gene has six ag rich regions shown by blue vertical spikes and 14 ag poor regions shown by red spikes. Then compare these results obtained for the PKD one gene with the data from the randomized sequences that have the same oligonucleotide composition as the gene understudy first switch to the randomized sequence, for example number two, and keep the same parameters as in the previous test. The graph for this random sequence shows that it contains only one ag rich region and no AG poor regions.
All data generated during the session are saved in special files that are accessible via the download files tab at the right upper corner of each webpage. After opening this link, download the input sequence as well as all generated randomized sequences. Below this list of files, there is a link to the oligonucleotide frequency table generated by the SRI analyzer program.
Then there are links to all results generated by MRI analyzer program during the session. For example, click on the file user a file RAND one four AGCO 50 32 14, representing the results obtained for the randomized sequence. Number one with the following parameters, ag content 50 nucleotide long window, upper threshold, 32 nucleotides and lower threshold 14 nucleotides.
In this file, all nucleotide sequence segments that match content rich or poor criteria, and their coordinates are available as a list according to their consecutive positions along the input sequence. After watching this video, you should have a good understanding of how to navigate through genomic MRI, computational resource.