The Sanger sequencing technique was developed by Frederick Sanger, who won a Nobel Prize for it, and collaborators in 1977. For a long time this sequencing technique was the reference technique for DNA or cDNA sequencing, initially based on a hierarchical sequencing approach and subsequently using a shotgun approach. Today, Sanger sequencing has been partially replaced by next-generation sequencing methods. However, this technique remains widely used for the sequencing of small portions of the genome or simply for the sequencing of a target sequence of interest.
There are two methods of Sanger sequencing which differ in the procedure and type of sequencer used, namely Sanger sequencing performed with an acrylamide gel sequencer or with a capillary sequencer. The latter is now the most used so, with the aim of making reading more "serene", I decided to talk to you in detail exclusively about this. If you still want to know more about Sanger sequencing with acrylamide gel sequencer you can see the video summary of the Sanger gel sequencing technique below.
Now let's talk in detail about Sanger sequencing with capillary sequencer. This technique follows three basic steps:
- Construction of the library for Sanger sequencing.
- Sequencing reaction.
- Separation of the sub-fragments by electrophoresis.
Let's analyze the different steps separately:
Construction of the library for Sanger sequencing
An important requirement for sequencing a DNA or cDNA molecule is the construction of a library, as I mentioned in the article "The sequencing", the set of DNA or cDNA fragments obtained from a genome or transcriptome to be sequenced is called library.
A library can be of two types depending on the biological material that is going to be sequenced:
- Genomic library, that is the set of all the DNA fragments whose sum allows us to obtain the entire genome of the organism or a portion of it.
- CDNA or transcriptomic library, that is, the set of cDNAs produced by reverse transcription of the mRNAs that together make up the organism's transcriptome or a part of it.
To build a cDNA or DNA library that is to be used for sequencing using the Sanger capillary sequencer, the following steps are performed:
- DNA or cDNA extraction and fragmentation which can be done through a partial digestion with restriction enzymes, or using restriction enzymes placed in non-optimal restriction conditions in order to form partially overlapping DNA fragments (or cDNA) starting from the different copies of extracted DNA or cDNA (see image A), or by sonication, that is a mechanical fragmentation that consists in the use of shock waves to break the DNA into partially overlapping fragments(see image B). Furthermore, in sonication the fragments are damaged therefore it is necessary to repair the ends, in fact we speak of "end-repairing". In both cases it is therefore necessary to obtain stackable DNA fragments or cDNA in order to then be able to reorder the sequences of the fragments (reads) during assembly with special bioinformatics software.
When fragmentation of the DNA or cDNA to be sequenced is performed it is also possible to choose the average size of the fragments that will make up the library by regulating the reaction times and conditions, in the case in which a chemical digestion with restriction enzymes is carried out or by managing the times and intensity of sonication, in the event that a mechanical fragmentation is carried out.
2. Cloning within a vector. After obtaining the fragments they are inserted inside cloning vectors, that is, double-stranded DNA molecules (dsDNA) in which exogenous DNA capable of replicating autonomously in a host cell is inserted. All cloning vectors must have the following characteristics:
- Own one or more restriction sites (polylinker or multiple cloning site) necessary for the insertion of the exogenous DNA, i.e. the single DNA fragments of the library.
- Possess an origin of replication, that is, the genes that control the autonomous replication of the vector within the host organism.
- Possess one or more selection marker genes, necessary for the recognition and selection of the transformed cells and therefore containing the vector.
It must be considered that to insert the fragment inside the vector it is necessary that the first has complementary ends at the ends of the site of the vector affected by the cut of specific restriction enzyme (image C). To have this complementarity it is essential to use the same restriction enzyme used to cut the vector also for DNA fragmentation in order to obtain the fragments that will make up the library. If, on the other hand, the fragmentation of the DNA to be sequenced is mechanical, i.e. the fragments are obtained by sonication, it is necessary to add to these complementary adapters at the cutting site in the vector (see image D). In any case, the enzyme that allows the link between the DNA fragment and the vector is called ligase.
There are different types of cloning vectors but the most important are:
- The plasmids, that is, extra-chromosomal double-stranded DNA molecules and represent about 2-3% of the bacterial genome
- The BACs, large plasmids created by exploiting the F plasmid typical of bacteria, ie the plasmid that contains the F genes or the fertility factors for bacterial conjugation. These vectors allow you to contain and clone fragments that have a maximum length of 300 bp and are therefore the most suitable vectors for the formation of large genome libraries.
- I YACs, or vectors consisting of double-stranded DNA which is a hybrid between bacterial plasmid DNA and yeast DNA. When a fragment is inserted into these vectors they form gods artificial and linear mini-chromosomes which are then inserted into Saccharomyces cereviseae. YACs vectors can contain fragments with a maximum length of 2 Mbp.
3. Transformation of unicellular organisms with cloning vectors. Cloning vectors equipped with the DNA fragments coming from the DNA or cDNA to be sequenced are inserted into bacteria or yeast depending on the type of vector. In particular the plasmid vectors are inserted in Escherichia coli through thermal shock while the vectors BACs and YACs are inserted in respectively E. E. coli coli and S. cereviseae through electroporation.
4. Cultivation and culture of the transformed cells with suitable growth media, which allow the cloning of the vectors and therefore of the DNA fragments inserted in them.
5. Selection of correctly transformed cells, i.e. those cells that will have the vectors containing the DNA fragments.
6. Extraction of cloning vectors containing cloned DNA fragments thanks to the reproduction of the bacterium or yeast.
Once the DNA or cDNA library has been obtained, the sequencing reaction can proceed. It is at this stage that the Sanger capillary sequencer is used. But let's see in detail how the capillary Sanger sequencing reaction takes place.
The reagents needed for the sequencing reaction are:
- Fragments of the library in cloning vectors extracted from the bacterium or yeast used during cloning.
- DNA polymerase (Taq Polymerase)
- Universal sequencing primer, complementary to a portion of the cloning vector used for the construction of the library.
- dNTPs (deoxynucleotides)
- ddNTPs (di-deoxynucleotides, lack 3 'OH), these are defined terminators because, if they are inserted in a polynucleotide chain by the DNA polymerase, they cause the interruption of replication due to the lack of the OH group to which a subsequent nucleotide should bind. Usually the terminators are present in less quantity in the reaction mixture than the deoxynucleotides, in particular their ratio is 1: 100 (1 terminator every 100 deoxynucleotides). Another fundamental aspect to consider is that the terminators are also "marked" with 4 different fluorescent molecules, one for each nitrogen base (A, C, G, T), which absorb light at a wavelength of 490 nm but which emit light by fluorescence at a different wavelength which allows us to distinguish the different nitrogenous bases of the sequencing fragment according to the light emitted.
But here's how the sequencing reaction takes place in detail:
The following steps take place within an eppendorf that contains a specific fragment of the library, therefore at the end of the sequencing reaction we will obtain an eppendorf for each particular fragment of the library, each containing the product of the sequencing reaction of that fragment. What we can anticipate is that the sequencing reaction is inspired by the DNA replication process that occurs in the cell.
- Pairing between the sequencing primer and the cloning vectors in which the different copies of a specific fragment are contained.
- The DNA polymerase starts adding deoxynucleotides until a terminator (di-deoxynucleotides) is randomly added, which causes the interruption of the replication of the fragment to be sequenced.
- The first two steps are repeated several times, so that from a fragment to be sequenced all possible sub-fragments will be formed with a length between the length of the primer plus a nucleotide and 800/1000 bp. These sub-fragments are obtained by interruption of the replication when one of the four types of terminators present in the reaction mixture is inserted. All the fragments that are produced for the different sequencing cycles starting from the same fragment of the library differ from each other by a nucleotide since, although the terminator is added randomly, the number of sequencing cycles that are carried out allow the terminators to bind in all possible positions included and included between the first nucleotide and the last nucleotide after 800/1000 bp.
Separation of the sub-fragments by electrophoresis
This step allows the separation of the different sub-fragments obtained from the Sanger sequencing of a specific library fragment. The eppendorf containing the product of the sequencing reaction is placed on a support that can accommodate up to 96 eppendorf, for this reason the capillary sanger sequencer allows us to sequence simultaneously up to 96 different fragments of a library. In addition, the contents of each eppendorf are placed in the wells of a holder, and each well is connected to a capillary. Inside each capillary there is a gel in which the DNA or cDNA fragments of each well move according to the potential difference present between the ends of the capillary itself. In addition, each capillary is illuminated by a laser with a wavelength of 490 nm, in order to stimulate the emission of light by fluorescence by the fluorescent molecules associated with the terminators. Different fluorescent molecules are associated with each different terminator and it is precisely for this reason that it is possible to reconstruct the sequence of the fragment during sequencing. Furthermore, at the end of the reading of the different fluorescence signals by a special detector placed inside the sequencer, a software elaborates a chromatogram, that is a trace that highlights the different fluorescence peaks corresponding to the different nucleotides of the sequence. In fact, the software after reading the fluorescence proceeds with the identification of the nitrogenous bases corresponding to the different peaks through a defined process base calling. The first output produced directly by the software is a file with extension * .scf or * .ab1 in which the chromatogram of the sequenced library fragments is stored. Subsequently the file containing the chromatogram allows you to generate two file:
1) FASTA file, which contains the sequence of nitrogenous bases of the sequenced fragment.
2) QUAL file, which contains information about the quality of the sequence obtained from the sequencing. Indeed in this file there are quality values corresponding to each peak in the chromatogram, calculated on the basis of various parameters such as the shape, position and resolution of the individual peaks in the chromatogram. In file which quality values are reported as numerical values each separated by a space.
As we have said, the sequencer therefore builds a chromatogram from which the nucleotide sequence (up to 800-1000 bp) of the DNA or cDNA we have sequenced is obtained. However, it is necessary to consider that observing the chromatogram is always very useful as it allows us to understand if there were any problems during the sequencing. The problems that can occur during sequencing are different, so let's see some of them in detail:
- The first 50 bp that are detected by the sequencer are sometimes not correct because in this first phase the fluorescence signal is indistinguishable from the background noise that may be present due to the abnormal migration of small DNA fragments that contain agglomerates of fluorescent molecules ( see image below). To overcome this problem, it is possible to use sequencing primers complementary to a portion of the cloning vector which is located about 50 nucleotides upstream of the point where the fragment to be sequenced is inserted. In this way the agglomerates of fluorescent molecules are related to the 50 nucleotides of the cloning vector and not to our fragment of interest thus preserving the nucleotide sequence of the target fragment.
- With the Sanger sequencing technique it is possible to sequence fragments as long as 800-1000 bp as otherwise it would be difficult to discriminate the fragments for a single nucleotide, in fact the detection of the fluorescence of fragments longer than 1000 bp is more complicated. In fact, if we look at the terminal portion of a chromatogram (approximately after 900 nucleotides) the peaks are no longer homogeneous and errors may occur frequently during the base calling (see image below).
- If undefined peaks are observed also at the level of the central portion of the chromatogram, the DNA used to carry out the sequencing reaction was probably contaminated. To overcome this problem, it may be useful purify the DNA before proceeding with sequencing (see image below). This type of problem occurs because the capillary sequencer is very sensitive to the presence of contaminants.
- If overlapping and not well spaced peaks are observed, they are probably present in the reaction mixture for sequencing the residual salts of a previous PCR therefore to overcome this problem it is necessary to re-prepare the DNA by purifying it in order to remove possible contaminants (see image below ).
- Finally, if a particularly altered chromatogram is observed (see image below), probably the quantity of DNA that was sequenced was too high, therefore in this case, to overcome this problem, it will be necessary to dilute the DNA used in the sequencing reaction.
Finally I think it is useful to list the main advantages and disadvantages of capillary Sanger sequencing:
|- Accuracy (99.9%)|
- Allows to obtain relatively long reads (800-1000 bp)
|- It is possible to sequence a limited number of samples (max 96 with the capillary sequencer).|
- It involves a complex organization of the work, just think of the fact that it is necessary to build a very complex library which also requires very long times determined mainly by the cloning phase.
- High costs (€ 0,40 / Kbp), therefore Sanger sequencing is economically viable only in the case of small fragments.
I know that the whole procedure I have just described may seem very abstruse to you, as always I have tried to simplify and make as clear as possible complex concepts which, at least for me, are impossible to express in a more understandable way. Anyway, to make it easier to understand, I decided to include a video that I found on YouTube that illustrates very well the different phases of capillary Sanger sequencing.
Bye and see you soon.