Analyzing scATAC-seq data with the Signac toolkit is fascinating because it allows us to obtain several interesting results. Among these, I find the results from TF footprint analysis extremely useful because they allow us to probabilistically estimate the activity of various transcription factors without having to perform multiple ChIP-seq experiments. In this context, ChIP-seq could be used later to validate the actual activity of specific transcription factors that the footprint analysis suggests may be active in certain samples or conditions. However, interpreting footprint plots can be challenging, so today I want to clarify how to interpret these types of results. But first, let’s review some biological and computational context.
Biological Context:
scATAC-seq data consist of DNA fragments generated by cuts from the Tn5 transposase. This enzyme tends to cut in accessible regions of the chromatin, where the DNA is not compacted or bound by proteins like histones. When a transcription factor (TF) binds to a specific DNA region, that region becomes less accessible to Tn5, as the TF physically occupies the binding site and shields it from cuts.


Computational Context:
The Signac package, used for scATAC-seq data analysis, includes several functions. One of these is the Footprint() function, which allows us to perform TF footprint analysis for motifs of interest, specifically those enriched in differentially accessible peaks in a particular comparison. The PlotFootprint() function then generates footprint plots for each transcription factor binding to these enriched motifs. Below is an example of a TF footprint plot to help familiarize you with the meaning of these results.

Interpreting the Plot:
As mentioned, the TF footprint plot is highly informative, but interpreting it can be complex. Below, I share what I’ve understood.
Let’s take the plot above as an example. On the X-axis, we have the position relative to the motif to which the transcription factor binds. Position 0 indicates the exact binding site, while the values to the right and left indicate the distance from the binding site, extending up to 200 bases on each side. On the Y-axis, there are two plots using the same X-axis:
- Top plot: Represents the enrichment of Tn5 cutting frequency around the motif in question. This enrichment reflects DNA accessibility.
- Bottom plot: Represents the expected or theoretical Tn5 cutting profile, known as the background, which shows the expected level of cutting in the absence of the TF’s effect.
Interpreting the Top Plot (Experiment plot)
In the top plot, we observe an increase in cutting frequency as we approach the binding site of the transcription factor. This frequency peaks near the site and then drops sharply at position 0, where the TF binds to the DNA. This inverted “V” pattern, with peaks on the sides and a valley in the center, indicates that the TF physically shields the DNA at the binding site (valley) but makes the adjacent areas more accessible due to chromatin remodeling, causing Tn5 to cut more frequently in the immediate vicinity of the binding site.
Question: Why does Tn5 cutting increase as we get closer to the TF binding site?
This is a valid question. If Tn5 tends to cut where the DNA is free, we might expect the cutting frequency to decrease as we get closer to the occupied binding site. However, the opposite is observed, and this can be explained by the fact that, when a TF binds to DNA, it often recruits chromatin remodeling complexes that make the regions immediately adjacent to the binding site more accessible. This increases the cutting frequency near the site, creating the characteristic inverted “V” pattern.
Interpreting the Bottom Plot (Background)
The bottom plot represents the expected level of Tn5 cutting in that region, assuming no TF is bound to interfere with the cutting. This plot serves as the “background” and helps to correct potential biases in the observed cutting frequency plot, highlighting the actual signal relative to what would be expected in the absence of active TFs.
Importance of Comparing with the Background
Comparing the observed cutting pattern with the background is crucial to determine if a footprint is truly due to the presence of a TF or if it’s simply a random effect or technical artifact. To conclude that a cutting pattern is not actually due to the TF, one should observe:
- A flat or homogeneous distribution in the experimental data, similar to the background, suggesting a lack of TF activity in that region.
- No significant enrichment at the sides of the binding site compared to the background.
- Absence of a central valley in the observed cutting frequency plot.
Here is an example of not active TF both samples/conditions:

In the context of scATACseq data analysis
When evaluating a TF footprint plot, it’s essential to compare Tn5 cutting patterns across samples or conditions of interest to highlight potential differences in TF activity under different conditions or samples.

Let's do another practical example. Below is how I interpreted the following plot:

Examining the plot above, we can make some observations to determine whether the transcription factor (TF) ZNF384 shows a significant footprint pattern, which would indicate its activity in each of the two groups (01 and 02).
Observations on the Plot
-
Behavior of Experimental Data (Groups 01 and 02):
- The experimental data show a different pattern between the two groups.
- In Group 01 (red line), there is a marked peak around position 0, which represents the binding site, followed by a gradual decline.
- In Group 02 (blue line), the enrichment is more uniform and does not show the same distinct peak near the binding site, but rather a flatter distribution, especially around position 0.
-
Comparison with the Background:
- The background (bottom plot) shows a relatively flat expectation, except for a valley at the central motif (around position 0), which is normal because it indicates where one would expect protection if the TF were present and active.
- The background line does not show significant peaks in the areas adjacent to the binding site.
-
Footprint Pattern between Groups:
- In Group 01, the peak near position 0 (to the left of the binding site) and the increase in enrichment seem to indicate a pattern that suggests TF activity. This trend differs from the background, especially around the inverted "V," indicating a potential footprint.
- In Group 02, however, the distribution of cuts is more similar to the background. There is no defined peak near the binding site, and the distribution is flat and homogeneous. This indicates that ZNF384 may not be active o in general less active in this group, as the cut profile does not show any significant enrichment around the binding site or central protection.
Footprint Interpretation
-
For Group 01 (red line): The profile shows a peak around the binding site and a reduction closer to the center, suggesting a footprint pattern that could indicate activity of the TF ZNF384. This profile is different from the background, suggesting that ZNF384 may be active under this condition.
-
For Group 02 (blue line): The profile is flat and similar to the background, without a clear footprint pattern. This suggests that ZNF384 may not be active in this group, or, if present, it is not significantly affecting DNA accessibility.
In summary:
- ZNF384 is likely active in Group 01, where the cut pattern suggests a footprint, with enrichment around the binding site and some central protection.
- ZNF384 appear to be less active in Group 02, as the cut profile is flat and similar to the background, without an evident footprint pattern.
References: