|
A contig (from ''contiguous'') is a set of overlapping DNA segments that together represent a consensus region of DNA.〔Gregory, S. ''Contig Assembly''. Encyclopedia of Life Sciences, 2005.〕 In bottom-up sequencing projects, a contig refers to overlapping sequence data (reads); in top-down sequencing projects, contig refers to the overlapping clones that form a physical map of the genome that is used to guide sequencing and assembly.〔Dear, P. H. ''Genome Mapping''. Encyclopedia of Life Sciences, 2005. .〕 Contigs can thus refer both to overlapping DNA sequence and to overlapping physical segments (fragments) contained in clones depending on the context. ==Sequence contigs== A sequence contig is a contiguous, overlapping sequence read resulting from the reassembly of the small DNA fragments generated by bottom-up sequencing strategies. This meaning of contig is consistent with the original definition by Rodger Staden (1979). The bottom-up DNA sequencing strategy involves shearing genomic DNA into many small fragments ("bottom"), sequencing these fragments, reassembling them back into contigs and eventually the entire genome ("up"). Because current technology allows for the direct sequencing of only relatively short DNA fragments (300–1000 nucleotides), genomic DNA must be fragmented into small pieces prior to sequencing.〔Dunham, I. ''Genome Sequencing''. Encyclopedia of Life Sciences, 2005.〕 In bottom-up sequencing projects, amplified DNA is sheared randomly into fragments appropriately sized for sequencing. The subsequent sequence reads, which are the data that contains the sequence of each fragment, are assembled into contigs, which are finally connected by sequencing the gaps between them resulting in a sequenced genome. The ability to assemble contigs depends on the overlap of reads. Because shearing is random and performed on multiple copies of DNA, each portion of the genome should be represented multiple times in different fragment frames. In other words, the sequences of the fragments (and thus the reads) should overlap. After sequencing, the overlapping reads are assembled into contigs by assembly software.〔 Today, it is common to use paired-end sequencing technology where both ends of consistently sized longer DNA fragments are sequenced. Here, a contig still refers to any contiguous stretch of sequence data created by read overlap. Because the fragments are of known length, the distance between the two end reads from each fragment is known. This gives additional information about the orientation of contigs constructed from these reads and allows for their assembly into scaffolds. Scaffolds consist of overlapping contigs separated by gaps of known length. The new constraints placed on the orientation of the contigs allows for the placement of highly repeated sequences in the genome. If one end read has a repetitive sequence, as long as its mate pair is located within a contig, its placement is known.〔 The remaining gaps between the contigs in the scaffolds can then be sequenced by a variety of methods, including PCR amplification followed by sequencing (for smaller gaps) and BAC cloning methods followed by sequencing for larger gaps.〔 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「contig」の詳細全文を読む スポンサード リンク
|