|
In bioinformatics, the BLOSUM (BLOcks SUbstitution Matrix) matrix is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences. They are based on local alignments. BLOSUM matrices were first introduced in a paper by Henikoff and Henikoff. They scanned the BLOCKS database for very conserved regions of protein families (that do not have gaps in the sequence alignment) and then counted the relative frequencies of amino acids and their substitution probabilities. Then, they calculated a log-odds score for each of the 210 possible substitution pairs of the 20 standard amino acids. All BLOSUM matrices are based on observed alignments; they are not extrapolated from comparisons of closely related proteins like the PAM Matrices. ==Biological background== The genetic instructions of every replicating cell in a living organism are contained within its DNA.〔 Throughout the cell's lifetime, this information is transcribed and replicated by cellular mechanisms to produce proteins or to provide instructions for daughter cells during cell division, and the possibility exists that the DNA may be altered during these processes. This is known as a mutation. At the molecular level, there are regulatory systems that correct most — but not all — of these changes to the DNA before it is replicated.〔 The functionality of a protein is highly dependent on its structure. Changing a single amino acid in a protein may reduce its ability to carry out this function, or the mutation may even change the function that the protein carries out.〔 Changes like these may severely impact a crucial function in a cell, potentially causing the cell — and in extreme cases, the organism — to die. Conversely, the change may allow the cell to continue functioning albeit differently, and the mutation can be passed on to the organism's offspring. If this change does not result in any significant physical disadvantage to the offspring, the possibility exists that this mutation will persist within the population. The possibility also exists that the change in function becomes advantageous. The 20 amino acids translated by the genetic code vary greatly by the physical and chemical properties of their side chains.〔 However, these amino acids can be categorised into groups with similar physicochemical properties.〔 Substituting an amino acid with another from the same category is more likely to have a smaller impact on the structure and function of a protein than replacement with an amino acid from a different category. Sequence alignment is a fundamental research method for modern biology. The most common sequence alignment for protein is to look for the similarity between different sequences in order to understand the evolutionarily divergent protein sequences on the molecular level, so that researchers could predict the functions initiated by those mutated genes. Matrices are applied as algorithms to calculate the similarity of different sequences of proteins; however, the utility of Dayhoff Matrix which is a widely used method before is limited due to the requirement of sequences with a similarity more than 85%. In order to fill in this gap, Henikoff and Henikoff introduced BLOSUM (BLOcks SUbstitution Matrix) matrix which led to marked improvements in alignments and in searches using queries from each of the groups of related proteins.〔 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「BLOSUM」の詳細全文を読む スポンサード リンク
|