As I have already said several times here on the blog, bioinformatics allows us to answer questions and understand biological phenomena by analyzing the data obtained from experimental tests performed in the laboratory. But what are the general methods by which bioinformatics analyzes data in order to achieve the desired knowledge?

There are two main methods by which bioinformatics investigates:

  • comparative methods
  • predictive methods

The comparative methods, as can be deduced from the name, allow you to give answers regarding a biological question studied by comparing the data under examination with data already interpreted present in special databases.

Let's take a quick example to better understand their mechanism of action:

Let's say we are analyzing a DNA sequence of a gene and our goal is to define the function of this gene, or to answer the following questions: for which protein does it code? what is the structure of the protein?

To answer these questions we can compare, with appropriate algorithms, such as alignment algorithms, the sequence of the gene under examination with sequences of genes whose functions are already known and placed in the reference database. Basically it is possible to infer the function of the gene, as well as the structure of the protein for which it encodes, from genes and similar proteins already characterized.

The predictive methods, on the other hand, are capable of answering biological questions by predicting the answer on the basis of previously acquired knowledge. In fact, these methods follow approaches based on machine learning, that is, they exploit algorithms capable of learning from existing data and deducing something still unknown from this knowledge.

Let's take another example:

If our intention is to trace the protein-coding genes present in a genome, we can exploit algorithms capable of identifying them, taking into account that the genes have a characteristic structure. In fact, inside the genes we find a 5'UTR and a 3'UTR region, exons separated by introns, a transcription start codon and an end codon and so on. Therefore by teaching to the algorithm it is possible to predict the genes present in the genome on the basis of previously acquired knowledge.

Well, for today we say goodbye here. As you have noticed this article is shorter than usual but don't be fooled, it is full of meaning. In fact, understanding how bioinformatics investigates biological data also allows us to understand how in general certain problems are solved by the algorithms used.

I remind you to leave a comment for any questions or clarifications, also if you liked the article and it seemed useful to you, I would really like to know it through a "like".

Bye-bye and see you soon.