In biology all of us share a basic question in biology, what properties are shared among organisms? Comparative genomics and genome sequencing allows comparison of organisms at DNA and protein levels, and sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. The sequence Comparisons can be used to
- Find evolutionary relationships between organisms
- Identify functionally conserved sequences
- Identify corresponding genes in human and model
- organisms: develop models for human diseases
Thus, sequence alignment is an important first step toward structural and functional analysis of newly determined sequences to draw functional and evolutionary inference. The sequence alignment is made between a known sequence and unknown sequence or between two unknown sequences. The known sequence is called reference sequence, and the unknown sequence is called query sequence. To proceed with the alignment process the sequences are either aligned in group of two which is called pair-wise alignment) or more than two known as, multiple sequence alignment) sequences by searching for a series of individual characters or character patterns that are in the same order in the sequences. Identical or similar characters are placed in the same column, and non-identical characters can either be placed in the same column as a mismatch or opposite a gap in the other sequence. In an optimal alignment, non-identical characters and gaps are placed to bring as many identical or similar characters as possible into vertical register. Depending upon the region of comparison, alignments are divided into two types of viz. global and local.
Global alignment program is based on Needleman-Wunsch algorithm In global alignment, two sequences to be aligned are assumed to be generally similar over their entire length. Alignment is carried out from beginning to end of both sequences to find the best possible alignment across the entire length between the two sequences.
The two sequences are treated as potentially equivalent.
Goal for Global alignment: Identify conserved regions and differences, and it is applied for either comparing two genes with same function. or for comparing two sequences for conserved regions.
Local alignment program are based on Smith-Waterman, algorithm. Local alignment does not assume that the two sequences in question have similarity over the entire length, rather, it only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions. There are three primary methods of producing local alignments, dot-matrix methods, dynamic programming, and word or k-tuple method.
Goal for local alignment: The goal for local alignment is to check whether a substring in one sequence aligns well with a substring in the other, and it is applied for searching local regions of similarities in large sequences (e.g., newly sequenced genomes). or for searching conserved domains or motifs.
Significance of sequence alignment
Sequence alignment is useful for discovering functional, structural, and evolutionary information in biological sequences. However, it is important to obtain the best possible or “optimal” alignment to discover this information. Sequences that are very much alike, or “similar” in the parlance of sequence analysis, probably have the same function or there may have been a common ancestor sequence, and thus, the sequences are then defined as being homologous. The alignment indicates the changes that could have occurred between the two homologous sequences during the course of evolution.
Now let us learn more about sequence alignment using this video tutorial.
Hello! My name is Arunabha Banerjee, and I am the mind behind Biologiks. Leaning new things and teaching biology are my hobbies and passion, it is a continuous journey, and I welcome you all to join with me