Site icon BIOINFORMATICAMENTE

The sequencing

I think that for many of you this article could be a little heavy but on Instagram you asked me to talk about contents a little more technical trying to simplify as much as possible they. Many studies and analyzes that I have also performed start from the sequencing process so it is essential to know more, but let's go in order.

A bioinformatician often finds himself working with several molecules that are important for the life of the organism but certainly the one that is often in the spotlight is DNA. The acronym DNA is going to Deoxyribonucleic acid, a nucleic acid consisting of several bricks called nucleotides which bind together forming a chain which in turn binds laterally to another nucleotide chain arranged in an antiparallel manner. The DNA molecule appears as a double helix structure held together by hydrogen bonds between the chains, and by covalent bonds between the nucleotides of the same chain. The nucleotides, or the bricks mentioned above, are nothing more than molecules in turn made up of a sugar, Deoxyribose, linked to carbon 3 by a phosphate group which in turn binds the carbon 5 of the sugar of the adjacent nucleotide in the chain, it is also linked to carbon 1 by a nitrogenous base which in turn binds, forming bridges called hydrogen, to the nitrogenous bases of the nucleotides of the chain placed in an antiparallel manner.

However, it should be noted that there are 4 different types of nucleotides in the DNA molecule, in particular:

The nitrogenous bases present in the two DNA strands also bind to each other with a very precise and preserved pattern, in particular adenine binds with two hydrogen bonds to thymine and cytosine binds with three hydrogen bonds to guanine.

DNA is the seat of an organism's genetic information, written in a code called precisely genetic code. This is made up of 3 letter words spoken codons, and the alphabet used is made of 4 letters: where A stands for adenine, C for cytosine, G for guanine and T for thymine. Therefore each letter recalls the relative nitrogenous bases of the nucleotides that make up the double helix of DNA. The main features of the genetic code are:

With this verbose introduction I intend to make you understand how important is to know the message carried by a DNA molecule, in fact by knowing its nucleotide sequence it is possible to easily understand its function and its biological importance.

One of the most important preliminary techniques for studying DNA is the sequencing, that is obtaining, in a file, the succession of nitrogenous bases that make up a certain DNA. It is also important to know that in the same way the RNA of an organism can be studied, in fact the RNA can be converted into DNA (which for the occasion is called cDNA) by an enzyme called reverse transcriptase and subsequently sequenced.

There are two basic steps to obtain DNA or cDNA sequencing:

  1. Building a library. To sequence a DNA or cDNA molecule, or even the entire genome or transcriptome of an organism, it is necessary to fragment it to make it easier to manipulate and sequence. The set of DNA fragments obtained is called library.
  2. After obtaining the library of DNA or cDNA fragments we proceed with the sequencing.

DNA or cDNA sequencing can be of two types:

The sequencing of a DNA or cDNA molecule (RNA converted into DNA by reverse transcription) can follow two different approaches:

At this point, I think it makes sense to talk about the factors that influence the sequencing of a DNA or cDNA molecule. These are different but certainly the factors that most influence the sequencing result are:

Diagram showing the main sequencing techniques.

Where:
P is the probability that we want to have to find a given one sequence within the library.

For the second and third generation sequencing techniques, also called NGS (Next Generation Sequencing) techniques, the representativeness of a library (both DNA and cDNA) is given by the coverage level, i.e. the average number of times the same DNA or cDNA sequence is sequenced, obviously the higher the coverage the greater the safety that you have sequenced all the fragments of the library.
The coverage level (cl) is calculated as follows:

As output we mean the number of nitrogenous bases that have been sequenced in total on the flow cell, that is the support on which the DNA sequencing process takes place. Since the output is specific for each flow cell type, by choosing the type of flow cell we can choose the output and therefore the coverage level.
It is good to point out that when we do the sequencing for the first time complete of a genome it is necessary to have very high coverage, usually from 40 X up, that is a coverage that allows the sequencing of the same sequence 40 or more times. If, on the other hand, a genome has already been sequenced and we just want to compare genomes, to evaluate the presence of any variations and therefore carry out what is called re-sequencing, it is sufficient to have a coverage level of 3-5 X.

Also for today we have come to an end, I remind you that if the article was to your liking or if you have some clarification or constructive criticism to make, I would be very pleased to know, perhaps with a comment.

Bye and see you soon.

Exit mobile version