To be honest, I am addicted to Google Maps, without it I could not easily find my favorite shoe store or go to places I've never been before. I don't think I have a bad sense of orientation but by dint of using a map to move around the city I no longer ask myself the problem of which way to take for moving from point A to point B, I simply let myself be guided by the directions dictated by my smartphone. Why am I telling you about this? Simply because even for a bioinformatician the maps are extremely important, in fact they allow to move in the genome of an organism in a simple way without running the risk of getting lost. For example, thanks to a good map of the human genome, I can easily find out in which chromosome and where the gene that controls the color of our eyes is located. After all, a map is nothing more than a graphic representation that allows us to orient ourselves in the objects present in a space, which in our case are the different genetic components present in the genome of an organism.
It should be noted that we can take advantage of different types of maps to orient ourselves in the DNA of an organism. In fact, there are:
- Genetic maps. Simply explaining how genetic maps are built and why they are useful is complicated, but let me give you an illustrative example. My father is a doctor and practices his profession in a private practice which is located just below the house. Patients often call to find out where the medical office is located, and although I show them the street name it is necessary to be more detailed in the directions. Then the simplest thing to do is to give them geographical references that progressively get closer to the destination site. For example, I tell them if they know the school located in front of the grocery store near his study, then I say that when they enter in a small street they should be near a white gate and so on until they reach the office. Well, building a genetic map involves a fairly similar process, in fact, numerous DNA sequences whose location is known are taken as a reference in order to create a more detailed representation of an organism's genome. Each of these sequences is called molecular marker but let's be a more technical. When you want to build a genetic map you simply need to roughly identify the location of specific genes or QTLs (Quantitative Trait Loci) which control characteristics (qualitative or quantitative for QTLs) of interest. To identify the position of these, the association and therefore the distance relative to the molecular markers mapped taken as a reference are calculated. In a schematic way we could say that to build a genetic map it is necessary:
- Map the molecular markers by association analysis. To do this, we observe how the molecular markers are inherited within a progeny (a population of individuals obtained from sexual reproduction starting from two parents). In particular, it is necessary to evaluate the frequencies of recombination (FR) or crossing over between two molecular markers at a time to determine the mutual distance, expressed in centiMorgan, or cM, (1% of FR = 1 cM), and their position at the level of a chromosome. The recombination frequency describes in fact the degree of association between two markers, therefore the closer and more associated the two molecular markers are, the lower the recombination frequency between them and therefore they will be more inherited together in the progeny. I hope I haven't confused you, but we need to be a little more precise. To map the molecular markers a segregating population is required, i.e. a progeny of individuals obtained from the crossing of two parents, in which it is possible to notice the frequencies of segregation of molecular markers on which various statistical tests are carried out in order to obtain data, analyzed through specific software like Mapmaker 3.0 o JoinMap, useful for understanding where these are positioned.
- Once the position of the molecular markers is understood, it is possible to map the different genes / QTL associated with them. In fact, by measuring the association between one or more molecular markers and a specific phenotype of a trait (qualitative or quantitative) it is possible to estimate the position of the gene or QTL that controls that trait. For example, one of the genes that controls eye color is the gene Goose2. Its position in the human genome can be estimated simply by referring to the position of known molecular markers, which being very close and associated with the gene Goose2, are inherited together. Knowing the location of these molecular markers we can assume that the gene is not far from them. Also in this case it is necessary to use specific software and statistical tests. I don't think it's wise to talk about the statistical tests used in this case because I risk boring you more than I already have. However, what I would like to make you understand is that the process of mapping the genes and QTLs that control specific traits of interest is called "Linkage Disequilibrium Mapping".
- Physical maps. They are the graphic representation of different chromosomal DNA sequences whose distance is expressed in base pair(bp). These distances are no longer relative as in the case of genetic maps but are real distances obtained by sequencing the genome or most of it. The key steps for building a physical map are:
- Building a library, that is a set of fragments of genomic DNA taken from the tissue of an organism of which we want to obtain the physical map.
- Sequencing of library fragments. This step allows to read the nucleotide sequence of the DNA fragments of the library in order to have the sequences in fasta or fastq file. These sequences are called "reads" and their length varies according to the sequencing technique used.
- Assembly of "Reads". In this last phase, the final construction of the physical map takes place. In fact, by assembly of the reads we mean the process of ordering the DNA sequences obtained from sequencing in order to build the sequence of the entire genome of the organism, which is basically nothing more than the physical map. The ordering of the reads can be performed following their alignment on a reference genome of that organism already present in a database, in this case we speak of re-sequencing, or by aligning the reads obtained from the sequencing between them with the help of a Optical map, that is a reference map made up of DNA linearized and labeled in specifics thus facilitating the arrangement of the reads. For the alignment and assembly of the reads different bioinformatics tools are used such as CAP3, Velvet or MaSuRCA v3.1.3 which use different types of algorithms which I will talk about in detail in one of the next articles.
- Cytological maps. Cytological maps are graphical representations of the chromosomes of an individual's genome. These are obtained using dyes that allow to obtain specific banding profiles, or colored bands that highlight specific chromosomal traits.
Ok, I must admit that today I let myself get carried away a little, I apologize for the length of the article but believe me it is not easy to simplify and summarize such a vast and complex topic, therefore I also apologize to the experts in the sector who have certainly considered too simplified the topic but trying to reach even those who do not know much about molecular biology I consider the simplifications necessary.
I invite you to subscribe to the blog if you do not want to miss the next articles and to leave a comment below. I also remind you that you can also follow me on Instagram and Twitter.
Bye and see you soon.