HVG selection diagnostics

Will Macnair - Computational Sciences Center of Excellence, F Hoffmann-La Roche Ltd, Basel, Switzerland

March 23, 2026

The selection of highly variable genes is a crucial step in the integration process and in identifying clusters in single-cell experiments. However, contamination from ambient RNA can act as a significant confounding factor. This report provides plots to evaluate the influence of ambient RNA on the selection of highly variable genes and its potential impact on downstream analyses.

Are highly variable genes “ambient”?

The plot shows the extent to which highly variable genes identified using the Seurat VST method may be influenced by ambient RNA. The x-axis represents the log2fc values derived from the ambient gene estimation step in scprocess (see scprocess documentation for more details), while the y-axis shows the trend-normalized variance calculated using the Seurat VST method. Each point represents a gene, annotated based on whether it is among the top HVGs and whether it is identified as “ambient” by the ambient gene detection step. Labelled genes are top 20 with highest mean variance, including genes that were not included as HVGs because of high expression in empty droplets.

print(plot_hvg_stats_vs_empty_log2fc(hvgs_dt, edger_dt))

Which genes are “ambient”?

The plot shows the results of the scprocess ambient gene detection procedure. The y-axis is the -log10 nominal p-value, with a dotted line indicating the threshold where the adjusted p-value is sufficiently small (< 0.01). Multiple plots are shown, each corresponding to a different minimum expression level filter applied to the ambient profiles.

cpm_ls  = c(100, 50, 10, 0)
for (min_cpm in cpm_ls) {
  if (min_cpm == 0) {
    cat("### all genes\n")
  } else {
    cat(sprintf("### >= %d CPM expression in ambient\n", min_cpm))
  }
  print(plot_ambient_gene_calculations(edger_dt, min_cpm_empty = min_cpm))
  cat("\n\n")
}

>= 100 CPM expression in ambient

>= 50 CPM expression in ambient

>= 10 CPM expression in ambient

all genes

Which genes are variable across the ambient profiles?

Examining how ambient genes vary across different samples can provide valuable insights. Such variation may highlight cases where certain samples require distinct treatment, for example, if case and control samples consistently exhibit different ambient profiles.

The heatmaps display the pseudobulk expression of various genes across the ambient profiles of each sample in the dataset. The genes shown are the top 40 selected based on the following criteria:

  • highest variance across ambient profiles;
  • highest mean expression across ambient profiles;
  • highest log2fc in empty droplets vs cells in scprocess’s ambient gene detection procedure; and
  • smallest p-value in empty droplets vs cells in scprocess’s ambient gene detection procedure.
title_ls = c(
  "var"           = "HVGs",
  "mean"          = "Highest expression",
  "log2fc.empty"  = "Highest log2fc",
  "pval.empty"    = "Smallest p-value"
)
for (top_var in names(title_ls)) {
  cat(sprintf("### %s in ambient\n", title_ls[top_var]))
  suppressMessages(draw(plot_heatmap_of_ambient_profiles(vst_obj, top_var = top_var, 
    n_top = 40), heatmap_legend_side = "right", merge_legend = TRUE ))
  cat("\n\n")
}

HVGs in ambient

Highest expression in ambient

Highest log2fc in ambient

Smallest p-value in ambient

R session info

Details of the R package versions used are given below.

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.3 (2025-02-28)
##  os       Red Hat Enterprise Linux 8.10 (Ootpa)
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Zurich
##  date     2026-03-25
##  pandoc   3.8.2.1 @ /home/macnairw/packages/scprocess/.snakemake/conda/4fef11cadd34f9d2d13a0d6139d09340_/bin/ (via rmarkdown)
##  quarto   NA
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package              * version  date (UTC) lib source
##  abind                  1.4-8    2024-09-12 [1] CRAN (R 4.4.3)
##  assertthat           * 0.2.1    2019-03-21 [1] CRAN (R 4.4.3)
##  beeswarm               0.4.0    2021-06-01 [1] CRAN (R 4.4.3)
##  Biobase              * 2.66.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  BiocGenerics         * 0.52.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  BiocManager            1.30.27  2025-11-14 [1] CRAN (R 4.4.3)
##  BiocParallel           1.40.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  BiocStyle            * 2.34.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  bookdown               0.45     2025-10-03 [1] CRAN (R 4.4.3)
##  bslib                  0.9.0    2025-01-30 [1] CRAN (R 4.4.3)
##  ca                     0.71.1   2020-01-24 [1] CRAN (R 4.4.3)
##  cachem                 1.1.0    2024-05-16 [1] CRAN (R 4.4.3)
##  Cairo                  1.7-0    2025-10-29 [1] CRAN (R 4.4.3)
##  callr                  3.7.6    2024-03-25 [1] CRAN (R 4.4.3)
##  cellranger             1.1.0    2016-07-27 [1] CRAN (R 4.4.3)
##  circlize             * 0.4.16   2024-02-20 [1] CRAN (R 4.4.3)
##  cli                    3.6.5    2025-04-23 [1] CRAN (R 4.4.3)
##  clue                   0.3-66   2024-11-13 [1] CRAN (R 4.4.3)
##  cluster                2.1.8.1  2025-03-12 [1] CRAN (R 4.4.3)
##  codetools              0.2-20   2024-03-31 [1] CRAN (R 4.4.3)
##  colorspace             2.1-2    2025-09-22 [1] CRAN (R 4.4.3)
##  ComplexHeatmap       * 2.22.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  crayon                 1.5.3    2024-06-20 [1] CRAN (R 4.4.3)
##  data.table           * 1.17.8   2025-07-10 [1] CRAN (R 4.4.3)
##  DelayedArray           0.32.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  DESeq2               * 1.46.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  devtools               2.4.6    2025-10-03 [1] CRAN (R 4.4.3)
##  digest                 0.6.39   2025-11-19 [1] CRAN (R 4.4.3)
##  doParallel             1.0.17   2022-02-07 [1] CRAN (R 4.4.3)
##  dplyr                  1.1.4    2023-11-17 [1] CRAN (R 4.4.3)
##  ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.4.3)
##  evaluate               1.0.5    2025-08-27 [1] CRAN (R 4.4.3)
##  farver                 2.1.2    2024-05-13 [1] CRAN (R 4.4.3)
##  fastmap                1.2.0    2024-05-15 [1] CRAN (R 4.4.3)
##  forcats              * 1.0.1    2025-09-25 [1] CRAN (R 4.4.3)
##  foreach                1.5.2    2022-02-02 [1] CRAN (R 4.4.3)
##  fs                     1.6.6    2025-04-12 [1] CRAN (R 4.4.3)
##  generics               0.1.4    2025-05-09 [1] CRAN (R 4.4.3)
##  GenomeInfoDb         * 1.42.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  GenomeInfoDbData       1.2.13   2026-03-05 [1] Bioconductor
##  GenomicRanges        * 1.58.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  GetoptLong             1.0.5    2020-12-15 [1] CRAN (R 4.4.3)
##  getPass                0.2-4    2023-12-10 [1] CRAN (R 4.4.3)
##  ggbeeswarm           * 0.7.2    2023-04-29 [1] CRAN (R 4.4.3)
##  ggh4x                * 0.3.1    2025-05-30 [1] CRAN (R 4.4.3)
##  ggplot2              * 4.0.1    2025-11-14 [1] CRAN (R 4.4.3)
##  ggrepel              * 0.9.6    2024-09-07 [1] CRAN (R 4.4.3)
##  git2r                  0.35.0   2024-10-20 [1] CRAN (R 4.4.3)
##  GlobalOptions          0.1.2    2020-06-10 [1] CRAN (R 4.4.3)
##  glue                   1.8.0    2024-09-30 [1] CRAN (R 4.4.3)
##  gridExtra              2.3      2017-09-09 [1] CRAN (R 4.4.3)
##  gtable                 0.3.6    2024-10-25 [1] CRAN (R 4.4.3)
##  htmltools              0.5.8.1  2024-04-04 [1] CRAN (R 4.4.3)
##  httpuv                 1.6.16   2025-04-16 [1] CRAN (R 4.4.3)
##  httr                   1.4.7    2023-08-15 [1] CRAN (R 4.4.3)
##  IRanges              * 2.40.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  iterators              1.0.14   2022-02-05 [1] CRAN (R 4.4.3)
##  jquerylib              0.1.4    2021-04-26 [1] CRAN (R 4.4.3)
##  jsonlite               2.0.0    2025-03-27 [1] CRAN (R 4.4.3)
##  knitr                  1.50     2025-03-16 [1] CRAN (R 4.4.3)
##  later                  1.4.4    2025-08-27 [1] CRAN (R 4.4.3)
##  lattice                0.22-7   2025-04-02 [1] CRAN (R 4.4.3)
##  lifecycle              1.0.4    2023-11-07 [1] CRAN (R 4.4.3)
##  locfit                 1.5-9.12 2025-03-05 [1] CRAN (R 4.4.3)
##  magrittr             * 2.0.4    2025-09-12 [1] CRAN (R 4.4.3)
##  Matrix                 1.7-4    2025-08-28 [1] CRAN (R 4.4.3)
##  MatrixGenerics       * 1.18.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  matrixStats          * 1.5.0    2025-01-07 [1] CRAN (R 4.4.3)
##  memoise                2.0.1    2021-11-26 [1] CRAN (R 4.4.3)
##  otel                   0.2.0    2025-08-29 [1] CRAN (R 4.4.3)
##  patchwork            * 1.3.2    2025-08-25 [1] CRAN (R 4.4.3)
##  pillar                 1.11.1   2025-09-17 [1] CRAN (R 4.4.3)
##  pkgbuild               1.4.8    2025-05-26 [1] CRAN (R 4.4.3)
##  pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.4.3)
##  pkgload                1.4.1    2025-09-23 [1] CRAN (R 4.4.3)
##  png                    0.1-8    2022-11-29 [1] CRAN (R 4.4.3)
##  processx               3.8.6    2025-02-21 [1] CRAN (R 4.4.3)
##  promises               1.5.0    2025-11-01 [1] CRAN (R 4.4.3)
##  ps                     1.9.1    2025-04-12 [1] CRAN (R 4.4.3)
##  purrr                  1.2.0    2025-11-04 [1] CRAN (R 4.4.3)
##  R.methodsS3            1.8.2    2022-06-13 [1] CRAN (R 4.4.3)
##  R.oo                   1.27.1   2025-05-02 [1] CRAN (R 4.4.3)
##  R.utils                2.13.0   2025-02-24 [1] CRAN (R 4.4.3)
##  R6                     2.6.1    2025-02-15 [1] CRAN (R 4.4.3)
##  RColorBrewer         * 1.1-3    2022-04-03 [1] CRAN (R 4.4.3)
##  Rcpp                   1.1.0    2025-07-02 [1] CRAN (R 4.4.3)
##  readxl               * 1.4.5    2025-03-07 [1] CRAN (R 4.4.3)
##  registry               0.5-1    2019-03-05 [1] CRAN (R 4.4.3)
##  remotes                2.5.0    2024-03-17 [1] CRAN (R 4.4.3)
##  rhdf5                * 2.50.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  rhdf5filters           1.18.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  Rhdf5lib               1.28.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  rjson                  0.2.23   2024-09-16 [1] CRAN (R 4.4.3)
##  rlang                  1.1.6    2025-04-11 [1] CRAN (R 4.4.3)
##  rmarkdown              2.30     2025-09-28 [1] CRAN (R 4.4.3)
##  rmdformats             1.0.4    2022-05-17 [1] CRAN (R 4.4.3)
##  rprojroot              2.1.1    2025-08-26 [1] CRAN (R 4.4.3)
##  rstudioapi             0.17.1   2024-10-22 [1] CRAN (R 4.4.3)
##  S4Arrays               1.6.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  S4Vectors            * 0.44.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  S7                     0.2.1    2025-11-14 [1] CRAN (R 4.4.3)
##  sass                   0.4.10   2025-04-11 [1] CRAN (R 4.4.3)
##  scales               * 1.4.0    2025-04-24 [1] CRAN (R 4.4.3)
##  seriation            * 1.5.8    2025-08-20 [1] CRAN (R 4.4.3)
##  sessioninfo            1.2.3    2025-02-05 [1] CRAN (R 4.4.3)
##  shape                  1.4.6.1  2024-02-23 [1] CRAN (R 4.4.3)
##  SingleCellExperiment * 1.28.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  SparseArray            1.6.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  strex                * 2.0.1    2024-10-03 [1] CRAN (R 4.4.3)
##  stringi                1.8.7    2025-03-27 [1] CRAN (R 4.4.3)
##  stringr              * 1.6.0    2025-11-04 [1] CRAN (R 4.4.3)
##  SummarizedExperiment * 1.36.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  tibble                 3.3.0    2025-06-08 [1] CRAN (R 4.4.3)
##  tidyselect             1.2.1    2024-03-11 [1] CRAN (R 4.4.3)
##  TSP                    1.2.6    2025-11-27 [1] CRAN (R 4.4.3)
##  UCSC.utils             1.2.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  usethis                3.2.1    2025-09-06 [1] CRAN (R 4.4.3)
##  vctrs                  0.6.5    2023-12-01 [1] CRAN (R 4.4.3)
##  vipor                  0.4.7    2023-12-18 [1] CRAN (R 4.4.3)
##  viridis              * 0.6.5    2024-01-29 [1] CRAN (R 4.4.3)
##  viridisLite          * 0.4.2    2023-05-02 [1] CRAN (R 4.4.3)
##  whisker                0.4.1    2022-12-05 [1] CRAN (R 4.4.3)
##  withr                  3.0.2    2024-10-28 [1] CRAN (R 4.4.3)
##  workflowr            * 1.7.2    2025-08-18 [1] CRAN (R 4.4.3)
##  xfun                   0.54     2025-10-30 [1] CRAN (R 4.4.3)
##  XVector                0.46.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  yaml                   2.3.11   2025-11-28 [1] CRAN (R 4.4.3)
##  zlibbioc               1.52.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
## 
##  [1] /home/macnairw/packages/scprocess/.snakemake/conda/4fef11cadd34f9d2d13a0d6139d09340_/lib/R/library
##  * ── Packages attached to the search path.
## 
## ──────────────────────────────────────────────────────────────────────────────