In the middle of my thesis journey

I found myself for a dark technique,

because the dose of reagents was confused.

Ouch as to say what it was is hard thing

this wild and harsh and strong champion

that in thought renews fear!

If Dante Alighieri had been a biologist, I think the Divine Comedy would have started like this.

Instead, it was me who reinterpreted his words. These days I have had the painful pleasure of extracting DNA from plants Solanum linnaeanum. Yes, you read that right, I said painful pleasure. It's a bit of a contradiction but in fact it's not. Just see the image below to understand that touching the leaves of this plant, a relative of the malenzana (Solanum melongena), it's definitely painful, but extracting the DNA from this is really fun.

Solanum linneanum (“Solanum linnaeanum Hepper & P.-ML Jaeger {ID 7382} - Morella di Sodoma”. In Acta Plantarum, Forum. Available online (consultation date: Oct 2020):

Anyway, while I was fighting with the thorny plant I remembered all those times I tried to explain to someone, non-expert, that with certain programs it is possible to know the DNA sequence of any living organism. Often that someone looked at me confusedly. In fact I realize that, at first glance, this concept can be a bit abstract, after all DNA is a molecule that is found in cells, how is it possible to obtain a sequence from it? And then, what is meant by DNA sequence? So I decided to explain to you, in a simple way, how to pass from the DNA of an organism to the computer that allows us to read the sequence of letters (A, T, C, G) that describes it.

DNA is an organic molecule, of large dimensions, made up of different nucleotides, let's call them bricks, which in turn are composed of other molecules, much smaller than DNA, called nitrogenous bases. These contain the instructions on the functioning of the cell and therefore of the whole organism. Imagine an individual's DNA as a manual of how it works. This manual is written in a four letter alphabet, that is A (Adenine), T (Thymine), C (Cytosine) is G (Guanine).

Adenine, Thymine, Cytosine and Guanine are the nitrogenous bases that we find in the nucleotides of DNA.

Through a process called sequencing it is possible to obtain the entire DNA sequence of an individual, or the succession of the aforementioned letters. We could therefore say that through sequencing we obtain the entire DNA manual written in file easily usable by the computer in order to obtain various information relating to the organism from which the DNA was extracted.

However, to obtain the DNA sequence it is necessary to start from the laboratory and arrive at the computer.

I try to keep it simple, hoping that those who work in the sector will grant me this simplification. The key steps to go from the cell's DNA to the computer are basically two:

  1. DNA extraction, which is the process that allows you to isolate DNA from the rest of the cellular components of an organism. Just like I extracted the DNA from the leaves of the "killer" plant. To do this you can prepare specific reagents in the laboratory or use kits specially built for the purpose. For example, I used a EZNA® Plant DNA Kit.
  2. DNA sequencing. The sequencing techniques are really a lot and talking about them now would weigh down the reading. For now you need just to know that sequencing is a crucial process for us. In fact, this allows to obtain the sequence of all the nitrogenous bases found in the extracted DNA by means of an instrument called a sequencer. In this phase we therefore pass from the DNA molecule to file containing the DNA message. I don't know if I got the idea but it is through sequencing that we pass from the cell to the computer.

Laboratories that do not have a sequencer usually send the extracted DNA to specialized sequencing centers; an example in this regard is Novogene ( These centers sequence the DNA that is sent to them and, upon payment, return a hard disk with the file containing the DNA sequence of the organism. These file will then be the starting point of the bioinformatics analyzes. I will talk about these files in more detail in the next articles.

I hope you now have a clearer understanding of how the DNA sequences are obtained and how these is very important for a bioinformatician in order to make sense of all the data set in front of him and thus answering the different questions that can be asked when we are facing a biological phenomenon. But be careful, the bioinformatician works not only with DNA but also with other molecules, equally fundamental, such as RNA, proteins and various metabolites produced by the different processes that take place in the cell, which I will talk about later.

Bye-bye everybody. See you soon.