Site icon BIOINFORMATICAMENTE

Fasta and Furious!

As I told you in a previous article, the data is contained in file and these can have different forms, distinguishable by their extensions. The extension of a file is that wording that we commonly find at the end of the name of this immediately after a point. Maybe with an example I make the idea better. Go to word, write a new document and save it as "ciccio", after which go to search for it in your computer and you will see that your file it is called “ciccio.docx”. Or if you right-click on the file ciccio and go to property, you can see in the info related to the type of file the .docx extension. The extension of the file is the word that describes the type of file; for example the files word will have the extension .docx, the files text extension .txt, the files power point extension .pptx and so on.

Obviously also in bioinformatics the extension of file it is very useful because we can understand what kind of file we have in our hands with a single glance. The types of file that a bioinformatician deals with are really different but some are very frequent and I think it is right say somethings of them:

  1. FASTA file, distinguishable with the extension .fasta, as well as .fa and .fna. A fasta is a file text that contains the sequence of a DNA, RNA , or of a protein that has been sequenced.

The structure of the file fasta is the following:

Two clarifications must be made regarding the file fasta:

  1. The SRA file , described by the extension .sra, is a file raw obtained as the first output from the next generation sequencing processes (NGS) and stored in the database. The file sra are compressed files so in order to analyze the sequences contained within them they must first be decompressed with special tools such as SRA toolkit ( https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view = software ) which output a file fastq.
  2. The FASTQ file, with the extension .fastq or .fq, is a file in some respects similar to FASTA file, in fact this also contain the nucleotide sequence of a nucleic acid that is sequenced by means of next generation sequencing techniques (NGS) but unlike FASTA file also presents qualitative information, relating to the sequence, expressed with ASCII characters, or by means of numbers, letters and symbols.

The structure of the file fastq is as follows:

Well, I think we can say goodbye for today, of course the files described above are just some of the types of files that exist in the world of bioinformatics, we will meet many others in the next articles.

As always, I invite you to leave a comment and follow this blog.

Bye-bye and see you soon.

Exit mobile version