Hello. How are you? I'm a little sad. These days I had an idea of applying ML models in bioinformatic field. The idea is good but moving from idea to practical application is always complex and often models do not learn as we hope. In short, I am a little in down, but this is very frequent for a bioinformatician is the bittersweet taste of this work. In any case, to distract myself from the disappointment I thought of writing an article as always with the aim of sharing with you what I independently study and deepen in my free time.

For me it is always fascinating to think that living beings are ultimately determined by how their genes express themselves. For example, an aubergine plant may have more or less thorns on the calyx depending on how much a certain gene is expressed. In fact, the transition from gene expression to the phenotypic trait controlled by it is not always so linear. First, not all traits are controlled by a single gene (mendelian traits) often the same trait is determined by the expression of multiple genes.

In addition the expression of one gene has several levels of control. Indeed, it is not enough for a gene to be transcribed to determine the phenotypic result. In the image below you can see the main levels of regulation of gene expression: a) Chromatin level, b) Transcription level, c and d) Post-transcriptional level, e) at the translational level and f) at the post-translational level.

Each level of regulation has different mechanisms in action. For example, after transcription, RNA degradation, RNA capture, alternative splicing and epitranscriptomic mechanisms that affect the fate and efficacy of gene expression may act at the post-transcriptional level. Another aspect often very underestimated that these control mechanisms almost never involve an ON/OFF of the gene expression is all more like a gradient. I like to imagine the control of the gene expression as a descent from which continuously flow balls down but during the descent encounter different obstacles/ walls more or less thick and wide that eventually affect the number of balls that They reach the valley. In essence it is as if the balls were the gene products but these to get downstream, that is to determine the final phenotype must overcome various obstacles and mechanisms of modulation and then in the end it will be the intensity of the gene expression to dictate the phenotype. I hope to have created a good mental image of the process, it helps me.

Studying in an associated way the different mechanisms of regulation of gene expression in the biological system and in the conditions of interest is certainly the core of the studies called Multiomic, having the opportunity in this way to understand broadly how gene expression regulates phenotype. Unfortunately, being experts in all the different levels of gene expression is not easy at all and often it is not even computationally simple to statistically associate all the obtained multiomic data.
To date, with my few years of career behind me, I have had the opportunity to focus on the study of gene expression mainly by looking at two of the above mentioned control levels:

1) Transcriptional level, by the quantification of gene transcription and therefore only potential expression, through bulk RNAseq assays or at single cell level.

2) Chromatin (or pre-transcriptional) level, in which one tries to quantify the accessibility at specific points or in general sense of the chromatin by the molecular machinery necessary for transcription of genes. Since this is a single level of regulation, the most upstream, even in such a case the accessibility of chromatin gives us an information only regarding the potential gene expression, However, it is important to note that combining information on chromatin accessibility with gene transcription quantification is already a good way of increasing the accuracy with which actual gene expression is hypothesized.

Here, in this article that, it will be very long, I aim to deepen some of the main techniques for the study of chromatin accessibility and therefore for the investigation of the regulation of gene expression at pre-transcriptional level.

So let’s start with a simple but important question:

Why study the accessibility of chromatin?

As mentioned, to have the transcription of a gene it is necessary that the transcription machinery (leaving aside the substantial differences between eukaryotes and prokaryotes, difficult to deepen in this context) reaches the promoter of the gene. This machinery consists mainly of RNA polymerases and basal transcription factors, which guide the RNA polymerase in the transcription process by binding to the promoter close to the transcription start site. There are also other specific transcription factors which, by binding to distal regulatory regions such as enhancers and silencers, can further modulate the transcription. The transcription factors that bind enhancers are called activators because they boost ("boost") the transcription. Those that bind silencers, on the other hand, are called repressors because they reduce or limit transcription.

At this point, it is easy to understand how the accessibility of chromatin to the proteins that make up the transcriptional machinery and to specific transcription factors represents a crucial point in gene regulation. Of course, other important mechanisms must also be considered, for example the relative amount of repressors and activators produced by the cell, or the availability of basal transcription factors. In general, however, we can say that an increase in the accessibility of chromatin to promoters and regulatory elements such as enhancers may facilitate greater transcriptional activity, provided this is accompanied by a favorable molecular context (high presence of activators, low presence of repressors, etc.).
There are also other factors that need to be mentioned, namely the molecular bridges called mediators. Mediators are multiprotein complexes that play a key role in the gene transcription process in eukaryotes. In particular, they have the function of mediating interactions between specific transcription factors (such as activators or repressors that bind enhancers and silencers) and the basic transcription machinery (RNA polymerase II and general factors).


A Focus on Genome Organization within the Nucleus

In general, chromatin can, depending on its three-dimensional organization, appear in the form of Euchromatin (looser and more accessible) or Heterochromatin (more condensed and less accessible). Depending on the spatial and 3D organization of chromatin, we observe different behaviors of transcription and therefore gene expression. We can thus take home a key message: variations in the three-dimensional organization of the genome can be responsible for differences in gene expression and consequently in phenotype determination.

I believe the best way to understand the three-dimensional organization of DNA in the nucleus is to imagine observing it with progressively more powerful lenses, so follow me on this imaginary journey:

(1) Lens with a resolution of about 100 Mbp
At this resolution, relatively well-defined clusters appear in the nucleus, known as Chromosome Territories.

Be careful! Each chromosome defines a specific territory during interphase.

(2) Lens with a resolution of about 1 Mbp
At this resolution, two distinct compartments can be seen within each chromosome territory:

Compartment A, where chromatin is looser (euchromatin) and therefore transcriptionally active because it is accessible to the molecular machinery for transcription. Genes found here are generally active.

Compartment B, where chromatin is more condensed (heterochromatin), and thus genes are not active because it is difficult for the molecular machinery to access it for transcription.

WARNING!!!:
Transitions from one compartment state to another are well-known and are fundamental for pre-transcriptional control of gene expression.

(3) Lens with a resolution of 100 Kbp - 1 Mbp
At this resolution, we can identify structures called Topologically Associating Domains (TADs). A TAD is nothing more than a chromatin region whose parts tend to interact—and therefore come into contact—more frequently compared to external regions. These interactions between chromatin regions within the TAD are mediated by effector proteins such as mediators, activators, and repressors, allowing for specific regulation of gene transcription. For example, it is within these TADs that we observe loops that bring enhancers and promoters, or silencers and promoters, into proximity, enabling their interaction. Obviously, TADs are not only distinguishable by their internal interaction frequency, but also by the fact that they are distinct and spatially separated by molecular barriers called boundary regions, which are enriched with CTCF proteins and cohesin complexes.

These boundaries serve to separate the chromatin regions of different TADs, which is extremely important because they prevent enhancers from one TAD from interacting with promoters of another. It might seem trivial, but know that DNA mutations, such as translocations, can move or eliminate these boundaries and thus lead to aberrant interactions between TADs, causing significant alterations in gene transcription, such as those of oncogenes. Imagine that an SNP or other variation causes the loss of the CTCF binding site and therefore the loss of a specific boundary between two TADs. This event could allow the enhancer of one TAD to interact with the promoter of an oncogene that is normally not transcribed, leading to disease onset.

Examples of TAD alterations include:

  • Deletion or displacement of TAD boundaries (Boundary Disruption)

  • Fusion between adjacent TADs (TAD merging)

  • Genomic inversions and translocations that move genes into active TADs

Focus on CTCF

  • CTCF proteins bind to specific DNA motifs with a consensus sequence of about 15–20 bp, characterized by multiple C and G repeats and a central conserved sequence CCCTC.

  • CTCF has 11 zinc finger domains that allow it to specifically recognize and bind DNA. Several factors can influence the binding between CTCF and DNA, such as:
    a) DNA methylation: If the cytosines in the CpG sites of the CTCF consensus sequence are methylated, the binding between the protein and DNA is generally inhibited. This also explains how DNA methylation at cytosines is an important pre-transcriptional regulatory mechanism of gene expression.
    b) Histone modifications: The binding between CTCF and DNA is influenced by histone modifications as they regulate chromatin compaction and thus the possible recruitment of CTCF. For example, H3K9me or H3K27ac can favor or inhibit CTCF recruitment.

(4) Lens with a resolution of 10 Kbp - 100 Kbp

At this point, we can distinguish the Chromatin Nano Domains (CDNs)—the true functional units within TADs. These are the individual interaction loops between enhancers and promoters or between silencers and promoters. We can say that today, this represents the finest level of chromatin investigation.

Here is a summary image of how the genome is organized in three dimensions within the nucleus.


How is chromatin accessibility regulated?

Now that we understand why studying chromatin accessibility is important and how it is organized in three dimensions, a natural question arises: how does the cell regulate gene expression by modulating chromatin accessibility? One could write an entire book answering this single question, but to simplify, we can say that cells primarily regulate chromatin accessibility through:

  • Covalent histone modifications

  • Chromatin remodeling via ATP-dependent complexes

  • Direct DNA methylation

Histone Modifications:

Histone tails can undergo chemical modifications that affect chromatin accessibility. The main types of histone modifications include:

  • Histone Acetylation:
    Enzymes involved:
    (a) HATs (Histone Acetyltransferases) → promote chromatin opening (transcription ON)
    (b) HDACs (Histone Deacetylases) → promote chromatin closing (transcription OFF)
    Biological effect:
    Open chromatin → increased DNA accessibility

  • Histone Methylation:
    Enzymes involved:
    (a) Histone Methyltransferases (HMTs) → add methyl groups to histones (repression or activation depending on the site)
    (b) Histone Demethylases (KDMs) → remove methyl groups
    Biological effect:
    (a) H3K4 methylation → transcriptional activation
    (b) H3K9 or H3K27 methylation → transcriptional repression

  • Phosphorylation, Ubiquitination, SUMOylation of Histones:
    These additional modifications also contribute to the fine-tuning of chromatin structure and gene regulation, often in a context-dependent manner.

ATP-dependent Chromatin Remodeling Complexes

These are protein complexes that use ATP energy to move, reposition, or remove nucleosomes. Specifically, these complexes render DNA more or less accessible by directly modifying chromatin structure, thus regulating gene expression. The major protein families include:

(a) SWI/SNF: opens chromatin (transcriptional activation)

(b) ISWI: can compact or decompact chromatin depending on the context

(c) CHD/Mi-2: often involved in transcriptional repression

(d) INO80: involved in DNA repair and transcriptional regulation



Direct DNA Methylation

This is a chemical modification of DNA, usually occurring at cytosine bases within CpG dinucleotides.

Enzymes involved:
(a) DNA Methyltransferases (DNMTs): add methyl groups → gene repression
(b) TET Demethylases: remove methyl groups → gene reactivation

Biological effect:
Methylated DNA → more compact chromatin → transcription OFF

How is chromatin studied?

So far, so good? I hope so. At this point, we can begin discussing how chromatin is studied in order to gain insights into gene regulation at the pre-transcriptional level—and, in some cases, also into structural modifications, whether small-scale (such as nucleosome positioning) or large-scale (such as chromosomal domains).

To be honest, providing a comprehensive overview of all available techniques in a single article would be nearly impossible—and, frankly, goes beyond the scope of this piece. The world of chromatin profiling technologies is vast and sometimes overwhelming. Below, I offer a personal and non-exhaustive classification of the major methods currently available for studying chromatin.

Here's what I propose: I’ll briefly introduce each category in this diagram. Then, in upcoming articles, I promise to delve deeper—with code examples—into techniques like ATAC-seq, ChIP-seq, Hi-C, and HiChIP. Sounds like a deal? All I ask in return is that you subscribe to the newsletter and leave a comment to help build a stronger scientific community around these topics!

Microscopy based methods

There are techniques to investigate the conformation of chromatin that allow you to study the three-dimensional arrangement of specific sequences of DNA within the nucleus by looking at them directly under a microscope.

Among these techniques, FISH (Fluorescence in Situ Hybridization) stands out. This is based on the hybridization of complementary and specific fluorescent probes for DNA sequences of interest. These probes bind in situ (that is, inside the cell on fixed preparations) and thanks to their fluorescence they allow us to observe with a microscope how the region under investigation is located in the nucleus, thus allowing us to study:

- The nuclear position of specific genes or loci.
Knowing the nuclear position is important from a biological point of view because the positioning in the nucleus is not random but is fundamental for the regulation, in particular:
(a) At the center of the nucleus we find transcriptionally permissive regions since generally the chromation is open (euchromatin) and therefore we find genes generally transcribed. These zones are also rich in RNA polymerase and splicing factors.

(b) Close to the speckle and nucleoli regions we find very transcriptionally active regions where often highly expressed genes are located.

(c) At the nuclear periphery near the nuclear lamina and in general in compartments B we find heterochromatin regions with genes tendentially not transcribed.

- 3D interactions between genomic regions.
For example, the colocalization between enhancers and gene promoters allows to hypothesize their interaction and thus influence on expression.

- Large chromosomal variations such as translocations, deletions, repetitions and so on.
Obviously we have the big limit that, being the FISH technique a target method, then it is necessary to know a priori the variation. Therefore, this method cannot be used for the identification of chromosomal variations but for their validation and observation.

- Changes in chromatin topology under different biological conditions.
For example, it allows us to study the topological changes during cell development.

Crosslinking-based methods

Crosslinking-based chromatin conformation methods are experimental techniques that use chemical crosslinking (typically with formaldehyde) to fix spatial proximities between DNA regions and/or protein-DNA complexes in the nucleus, preserving the three-dimensional chromatin architecture at the time of fixation. These proximities are then detected through sequencing-based approaches.

A common classification in the literature

1. Ligation-based methods
These rely on ligation of DNA fragments that are spatially proximal after crosslinking.
Examples:

  • Hi-C (Genome-wide)

  • 4C, 5C, Capture-C (Targeted)

  • ChIA-PET (Targeted using immunoprecipitation)

  • PLAC-seq, HiChIP

2. Ligation-free methods
These do not involve ligation but use alternative techniques (e.g., spatial proximity and direct sequencing).

Examples:

  • GAM (Genome Architecture Mapping)

  • SPRITE (Split-Pool Recognition of Interactions by Tag Extension)

  • ChIA-Drop

3. Targeted methods
These can fall under either of the two previous categories but are focused on specific regions or loci.

Examples:

  • Capture-C

  • HiChIP, PLAC-seq

  • 4C (captures all interactions of a specific locus)

4. Multimodal methods
These integrate 3D chromatin conformation data with other modalities such as gene expression, epigenetic marks, or accessibility.

Examples:

  • scN3-C, Paired-Tag, SNARE-seq, etc.

Chromatin Accessibility based methods

Chromatin accessibility analysis based methods are experimental techniques used to identify regions of the genome that are open, or accessible, to regulatory proteins such as transcription factors. These accessible regions often correspond to active promoters, enhancers, and other regulatory elements. Such methods exploit the fact that accessible chromatin is less compacted and more susceptible to enzymatic cleavage, transposition, or chemical labeling.

Major Classes of Chromatin Accessibility Methods

1. Enzyme-Based Cleavage Methods
These methods use enzymes that preferentially cut nucleosome-free regions.

a. DNase I hypersensitivity assays (DNase-seq)

  • Uses DNase I to digest accessible chromatin.

  • Sequencing of cut sites reveals DNase Hypersensitive Sites (DHS).

  • High resolution; labor-intensive.

b. MNase-seq (Micrococcal Nuclease)

  • Uses MNase, which cuts linker DNA between nucleosomes.

  • Provides a map of nucleosome positioning, indirectly reflecting accessibility.

2. Transposase-Based Methods
ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing)- Uses the Tn5 transposase to insert sequencing adapters into open regions. This is rapid and highly sensitive. ATAC-seq is now the most widely used method due to its simplicity and resolution.

3. Chemical Labeling Methods
FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements). Less commonly used now, uses formaldehyde crosslinking followed by sonication. Open chromatin regions are under-crosslinked and preferentially recovered.

4. Single-Cell Chromatin Accessibility Methods
Provide accessibility data at single-cell resolution.

Examples:

  • scATAC-seq (single-cell ATAC-seq)

  • sci-ATAC-seq, 10x Genomics scATAC, SNARE-seq (multi-omic: chromatin + transcriptome)

What Do These Methods Reveal?

a) Regulatory landscapes: location of enhancers, promoters, insulators

b) Cell type-specific regulation: especially in heterogeneous tissues

c) Epigenomic changes: in disease, development, or response to stimuli

DNA–protein interaction methods

DNA–protein interaction methods are experimental techniques used to identify specific regions of the genome that interact with DNA-binding proteins, such as transcription factors, histones, or chromatin remodelers. These methods help map where proteins bind on the DNA and infer their regulatory roles.

The most well-known technique in this category is ChIP-seq (Chromatin Immunoprecipitation followed by sequencing).

Main DNA–Protein Interaction Methods

1. ChIP-seq (Chromatin Immunoprecipitation followed by sequencing)

  • Gold standard for mapping DNA-binding proteins or histone modifications.

  • Requires a high-quality, specific antibody.

  • Can be applied to:

(a) Transcription factors (e.g., CTCF, p53)

(b) Histone modifications (e.g., H3K4me3, H3K27ac)

(c) Chromatin-associated proteins

  • Input: Typically requires millions of cells, though low-input and single-cell versions are emerging.

2. CUT&RUN (Cleavage Under Targets and Release Using Nuclease)

  • Uses protein A/G fused to micrococcal nuclease (MNase).

  • Tethers MNase to the antibody-bound protein → cleaves nearby DNA in situ.

  • No crosslinking, lower background, works well with fewer cells.

3. CUT&Tag (Cleavage Under Targets and Tagmentation)

  • Uses Tn5 transposase fused to protein A.

Note:

Protein A is a bacterial protein (from Staphylococcus aureus) that binds specifically to the Fc region of immunoglobulin G (IgG) antibodies — especially those from rabbits and humans.

It is often used in biochemical assays as a universal antibody-binding adaptor.


  • Inserts sequencing adapters at binding sites directly → ready for PCR & sequencing.

  • Very low input (even single cells).

  • Ideal for profiling histone marks and some transcription factors.

4. ChIA-PET (Chromatin Interaction Analysis by Paired-End Tag sequencing)

  • Combines ChIP with 3D chromatin interaction capture.

  • Captures long-range interactions mediated by a specific protein (e.g., CTCF, RNA Pol II).

  • Requires large amounts of material.

What Do These Methods Reveal?

(a) Where proteins bind in the genome.

(b) Which regulatory elements (enhancers, promoters) are active or repressed.

(c) The chromatin state (e.g., active, poised, silenced).

(d) Long-range protein-mediated chromatin interactions (ChIA-PET).

Epigenetic Modification Methods

Epigenetic modification methods are techniques used to detect and map chemical modifications to DNA and histones — such as DNA methylation and histone modifications — which play critical roles in regulating chromatin structure and gene expression.

These modifications can influence chromatin conformation by promoting either compaction (heterochromatin) or relaxation (euchromatin).

Modification Effect on Chromatin Associated Structure
H3K27me3, H3K9me3 Compacts chromatin Heterochromatin
H3K27ac, H3K4me3 Loosens chromatin Euchromatin
DNA methylation Represses transcription Often compacted
Histone acetylation Opens chromatin Active regions

These marks are read by structural proteins (e.g., HP1, Polycomb, CTCF), which further influence TADs, loop formation, and compartmentalization.

Main Classes of Epigenetic Modification Methods

1. DNA Methylation Profiling

a. Bisulfite Sequencing (BS-seq)

  • Converts unmethylated cytosines to uracil (read as thymine), while methylated cytosines remain unchanged.

  • Can be whole-genome (WGBS) or targeted (RRBS).

  • Reveals CpG methylation patterns.

b. OxBS-seq and TAB-seq

  • Differentiate between 5-mC and 5-hmC.

  • Provide insight into dynamic methylation and hydroxymethylation states.

c. Nanopore or PacBio sequencing

  • Directly detect methylation without bisulfite treatment, using shifts in electric signal or polymerase kinetics.

DNA methylation often correlates with transcriptional repression and heterochromatin formation.

2. Histone Modification Mapping
a. ChIP-seq (as previously described)
Maps histone post-translational modifications (PTMs) such as:

  • H3K27ac (active enhancers)

  • H3K4me3 (active promoters)

  • H3K27me3 (Polycomb-mediated repression)

  • H3K9me3 (heterochromatin marker)

b. CUT&RUN / CUT&Tag

  • Higher resolution, lower background, and suitable for low-input samples.

  • Especially useful for mapping histone modifications in specific cell types or single cells.

Conclusion

Well. I think you have noticed how the techniques available for studying pre-transcriptional regulation processes are numerous complex and often multifunctional. In short, orienting myself in these is a real talent that I, at least now, do not have. So I tried with this article to create a mental and logical map that I hope will help you. I remind you to leave comments, thoughts and why not maybe also corrections since what I write is the result of my personal study and so sometimes I could be wrong in understanding and report concepts studied by books and online resources.

See you soon and good bioinformatics at all.

Omar Almolla


REFERENCES