Home Second-generation sequencing > Next-generation sequencing Rambler's Top100

Second- versus first-generation sequencing




    FG- and SG-sequencing

    Sometimes, when people talk about Second-Generation (SG-) sequencing they imagine that it is something like First-Generation (FG-) sequencing, but more powerfull. Let's use common analogy "nucleotide sequence"~"text in the book" to demonstrate, how far this analogy is from reality.

    Lets take as a "book" something similar to the Nature journal: ~150 letters per string, ~60 strings per page, ~160 pages per journal.

    Laboratory-scale task for FG-sequencing: short-gun sequencing of 40kb BAC-clone

    nucleotide sequencetext
    Prepare a random library with ~700nt insertions.

    Take ~300 clones and sequence them. It will give about 200kb of raw nucleotides.

    5x coverage should be enough to prepare a single contig.
    Find a text of an article ~4.5 pages.

    Library preparation: take ~1000 reprints of this article, and cut them randomly, each fragment ~4,5 strings.

    Sequencing: collect ~300 of such fragments.

    Bioinformatics: combine fragments into one uninterrupted text. Computer would help in it, but it is obvious, that it is possible to solve the task manually.


    Laboratory-scale task for SG-sequencing: resequencing of a human genome

    nucleotide sequencetext
    Prepare a random library

    Perform 5 sequencing runs and collect ~20Gb of raw nucleotides.

    Perform alignment to the reference genome, make conclusions about state of known SNP-loci, try to find new SNP's and structural variations
    Human genome: ~3x109 of haploid sequence packed into ~20 chromosomes. In "journal" terms this would be a superbook, which contains ~2000 individual journals, organized into 20 ~0.5 meter thick volumes (10m book-shelf, with total weight ~600kg).

    Shortgun library preparation: take ~20 of real genomes (each genome is two superbooks: one from mother, another — from father) and put them through a paper shredder, generating random pieces one string wide and 35 letters long (~3.5x50mm2).

    Sequencing: if we collect ~400mln of 35-letter phrases (~3000kg), this would be a ~5x coverage of the genome.

    Bioinformatics. We should take a reference genome and align short sequences on it (find original position in the superbook for all collected 35-letter phrases). Then we have to make a conclusion about the state of known SNP loci (homozygous, heterozygous), try to characterize unknown substitutions and structural variations. Special sequencing libraries, SNP database, powerfull computer and good programm algorithms are absolutely necessary for this work. Primitive shortgun library would not help in revealing of structural variations and in analysis of repetitive sequences.


    FG-sequencing is able to read directly only short clones. When people plan to "read sequence of a BAC clone" using FG-sequencing, it is a slang, because direct reading of 40kb is impossible. But it is not too misleading, because a standard reliable technologies of reconstruction of such clonesis are available. A phrase "read human genome using SG-sequencing" is totally misleading, because:

    • there are no standard algorithms of sequencing;
    • absolutely accurate 100% full sequence can't be generated using nowadays technologies;
    • accuracy and completeness of the sequence depend on sequence coverage, library construction technology and analysis algorithms.



    No clones

    All SG-sequencing platforms use clonally-amplified DNA-libraries for sequencing. Individual clones are never handled or stored separately. A "minimal unit" for SG-sequencing is a DNA-library. To obtain a nice sequencing it is necessary to use a library with (i) desirable clone length distribution, and (ii) with enough complexity. Both parameters may be either calculated before the sequencing (better) or derived from the preliminary small-scale sequencing.




    SG-sequencing

  • All SG-systems libraries are collections of relatively short DNA fragments (<600bp) flanked by adaptors with known sequence.


  • All SG-platforms rely on array-based sequencing. Clones (454, Illumina, SOLiD) or individual molecules (Helicos) are distributed on two-dimensional surface. In case of Illumina individual clone is a result of on-surface-PCR amplification: ~1000 DNA molecules on the area of ~1µm2. In case of SOLiD, a clone is ~1µm paramagnetic bead bearing ~10,000 DNA molecules.


  • Sequencing reaction is performed step-by-step for all clones simultaneously. All platforms except for 454 use fluorescent-based reading. 454 relies on luminiscence. Each step results in specific fluorescence(luminiscence) of clones. CCD-camera makes photos of two-dimensional surface (a different process for 454). Optical filters (channels) used to visualize individual fluorophores. Images are always black-and-white. Each sequencing step results in thousands of images


  • In process of base-calling positions of all clones are determined and for each clone
    • fluorescent intensities in all optical channels are recorded;
    • nucleotide (colour dinucleotide for SOLiD) and read quality is determined.


  • Sequence analysis. Normally, some reference genome is used for analysing of sequencing data. De novo assembly is possible for short (~106) genomes. Strong computing facilities for data analysis and storage are important for SG-sequencing.




Second-generation sequencing
URL: http://seq.zbio.net
e-mail: soldatov@molgen.mpg.de
visits:
Warning: require(/home/molbiol/data/www/vphp/include.php) [function.require]: failed to open stream: No such file or directory in /usr/home/molbiol/domains/molbiol.ru/public_html/seq/ssi/counter.php on line 6

Fatal error: require() [function.require]: Failed opening required '/home/molbiol/data/www/vphp/include.php' (include_path='.:/usr/local/lib/php') in /usr/home/molbiol/domains/molbiol.ru/public_html/seq/ssi/counter.php on line 6
Last modification: 12/12/08

seq.zbio.net  ·  soldatov@molgen.mpg.de

molbiol.ru - methods, information and programs for molecular biologists   Rambler