Integration diagnostics

Will Macnair - Computational Sciences Center of Excellence, F Hoffmann-La Roche Ltd, Basel, Switzerland

March 23, 2026

Cells that passed quality control filtering were integrated with Harmony together with doublets identified using scDblfinder. Clustering of cells at a high resolution is then performed. Cells that appear in clusters which are enriched in doublets are excluded from further analysis. Integration of cells and clustering is repeated after removing doublet enriched clusters.

Doublets over UMAP

The plot displays a binned UMAP with the proportion of doublets as well as the number of doublets in each bin.

g_dbl   = plot_umap_doublets(int_dbl)
g_dens  = plot_umap_density(int_dbl[, .(cell_id, UMAP1, UMAP2) ])
g       = g_dbl + g_dens
print(g)

Doublet proportions in clusters

The plot shows the proportion of doublets for each cluster in relation to the total number of cells in that cluster. Clusters with a doublet proportion exceeding 50% are excluded from further analysis.

( plot_doublet_clusters(int_dbl, dbl_cl_prop) )

Clusters over UMAP

Clustering of data is performed at different resolution values. For each value the clusters are displayed over a UMAP together with a plot showing the density of cells.

After removing doublets, there were 94810 QC-ed cells used for integration.

for (res in res_ls) {
  cat('### ', res, '\n')
  g_cl    = plot_umap_cluster(
    umap_dt   = int_dt[, .(cell_id, UMAP1, UMAP2) ], 
    clust_dt  = int_dt[, .(cell_id, cluster = get(paste0('RNA_snn_res.', res))) ],
    name      = sprintf('res = %s', res))
  g_dens  = plot_umap_density(int_dt[, .(cell_id, UMAP1, UMAP2) ])
  g       = g_cl + g_dens
  print(g)
  cat('\n\n')
}

0.1

0.2

0.5

1

2

Evaluating cluster distribution across samples

This plot visualizes the relationship between the entropy of clusters and the maximum proportion of cells from a single sample within each cluster (higher values suggest that a cluster predominantly contains cells from a single sample). Entropy measures how evenly distributed cells are across samples within each cluster—higher entropy indicates that cells from different samples are more evenly distributed, while lower entropy suggests that a cluster is dominated by cells from a small number of samples.

n_clusts = res_ls %>% lapply(function(res){
  tmp_dt = int_dt[, .(cluster = get(paste0('RNA_snn_res.', res)))]
  length(unique(tmp_dt$cluster))
})
res_ls_pl = res_ls[n_clusts > 1]
for (res in res_ls_pl) {
  cat('### ', res, '\n')
  input_dt  = int_dt[, .(batch_var = get(batch_var), cell_id, 
    cluster = get(paste0('RNA_snn_res.', res)))]
  suppressWarnings(print(plot_cluster_entropies(input_dt, batch_var, what = "norm")))
  cat('\n\n')
}

0.1

0.2

0.5

1

2

Check QC metrics of clusters

Distributions of QC metrics (library size, number of features, mitochondrial proportion, and spliced proportion) are shown for each cluster across different resolution values.

for (res in res_ls) {
  cat('### ', res, '\n')
  suppressWarnings(print(plot_cluster_qc_distns(
    qc_melt, 
    clust_dt  = int_dt[, .(cell_id, cluster = get(paste0('RNA_snn_res.', res)))], 
    name      = res)))
  cat('\n\n')
}

0.1

0.2

0.5

1

2

R session info

Details of the R package versions used are given below.

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.4.3 (2025-02-28)
##  os       Red Hat Enterprise Linux 8.10 (Ootpa)
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       Europe/Zurich
##  date     2026-03-25
##  pandoc   3.8.2.1 @ /home/macnairw/packages/scprocess/.snakemake/conda/4fef11cadd34f9d2d13a0d6139d09340_/bin/ (via rmarkdown)
##  quarto   NA
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package              * version  date (UTC) lib source
##  abind                  1.4-8    2024-09-12 [1] CRAN (R 4.4.3)
##  assertthat           * 0.2.1    2019-03-21 [1] CRAN (R 4.4.3)
##  basilisk               1.18.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  basilisk.utils         1.18.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  beachmat               2.22.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  beeswarm               0.4.0    2021-06-01 [1] CRAN (R 4.4.3)
##  Biobase              * 2.66.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  BiocGenerics         * 0.52.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  BiocManager            1.30.27  2025-11-14 [1] CRAN (R 4.4.3)
##  BiocNeighbors          2.0.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  BiocParallel         * 1.40.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  BiocSingular           1.22.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  BiocStyle            * 2.34.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  bluster                1.16.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  bookdown               0.45     2025-10-03 [1] CRAN (R 4.4.3)
##  bslib                  0.9.0    2025-01-30 [1] CRAN (R 4.4.3)
##  ca                     0.71.1   2020-01-24 [1] CRAN (R 4.4.3)
##  cachem                 1.1.0    2024-05-16 [1] CRAN (R 4.4.3)
##  callr                  3.7.6    2024-03-25 [1] CRAN (R 4.4.3)
##  cellranger             1.1.0    2016-07-27 [1] CRAN (R 4.4.3)
##  circlize             * 0.4.16   2024-02-20 [1] CRAN (R 4.4.3)
##  cli                    3.6.5    2025-04-23 [1] CRAN (R 4.4.3)
##  clue                   0.3-66   2024-11-13 [1] CRAN (R 4.4.3)
##  cluster                2.1.8.1  2025-03-12 [1] CRAN (R 4.4.3)
##  codetools              0.2-20   2024-03-31 [1] CRAN (R 4.4.3)
##  colorspace             2.1-2    2025-09-22 [1] CRAN (R 4.4.3)
##  ComplexHeatmap       * 2.22.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  cowplot                1.2.0    2025-07-07 [1] CRAN (R 4.4.3)
##  crayon                 1.5.3    2024-06-20 [1] CRAN (R 4.4.3)
##  data.table           * 1.17.8   2025-07-10 [1] CRAN (R 4.4.3)
##  DelayedArray           0.32.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  deldir                 2.0-4    2024-02-28 [1] CRAN (R 4.4.3)
##  devtools               2.4.6    2025-10-03 [1] CRAN (R 4.4.3)
##  digest                 0.6.39   2025-11-19 [1] CRAN (R 4.4.3)
##  dir.expiry             1.14.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  doParallel             1.0.17   2022-02-07 [1] CRAN (R 4.4.3)
##  dotCall64              1.2      2024-10-04 [1] CRAN (R 4.4.3)
##  dplyr                  1.1.4    2023-11-17 [1] CRAN (R 4.4.3)
##  dqrng                  0.3.2    2023-11-29 [1] CRAN (R 4.4.3)
##  edgeR                  4.4.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.4.3)
##  evaluate               1.0.5    2025-08-27 [1] CRAN (R 4.4.3)
##  farver                 2.1.2    2024-05-13 [1] CRAN (R 4.4.3)
##  fastDummies            1.7.5    2025-01-20 [1] CRAN (R 4.4.3)
##  fastmap                1.2.0    2024-05-15 [1] CRAN (R 4.4.3)
##  filelock               1.0.3    2023-12-11 [1] CRAN (R 4.4.3)
##  fitdistrplus           1.2-4    2025-07-03 [1] CRAN (R 4.4.3)
##  forcats              * 1.0.1    2025-09-25 [1] CRAN (R 4.4.3)
##  foreach                1.5.2    2022-02-02 [1] CRAN (R 4.4.3)
##  fs                     1.6.6    2025-04-12 [1] CRAN (R 4.4.3)
##  future               * 1.68.0   2025-11-17 [1] CRAN (R 4.4.3)
##  future.apply           1.20.0   2025-06-06 [1] CRAN (R 4.4.3)
##  generics               0.1.4    2025-05-09 [1] CRAN (R 4.4.3)
##  GenomeInfoDb         * 1.42.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  GenomeInfoDbData       1.2.13   2026-03-05 [1] Bioconductor
##  GenomicRanges        * 1.58.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  GetoptLong             1.0.5    2020-12-15 [1] CRAN (R 4.4.3)
##  getPass                0.2-4    2023-12-10 [1] CRAN (R 4.4.3)
##  ggbeeswarm           * 0.7.2    2023-04-29 [1] CRAN (R 4.4.3)
##  ggh4x                * 0.3.1    2025-05-30 [1] CRAN (R 4.4.3)
##  ggplot.multistats    * 1.0.1    2024-09-25 [1] CRAN (R 4.4.3)
##  ggplot2              * 4.0.1    2025-11-14 [1] CRAN (R 4.4.3)
##  ggrepel              * 0.9.6    2024-09-07 [1] CRAN (R 4.4.3)
##  ggridges               0.5.7    2025-08-27 [1] CRAN (R 4.4.3)
##  git2r                  0.35.0   2024-10-20 [1] CRAN (R 4.4.3)
##  GlobalOptions          0.1.2    2020-06-10 [1] CRAN (R 4.4.3)
##  globals                0.18.0   2025-05-08 [1] CRAN (R 4.4.3)
##  glue                   1.8.0    2024-09-30 [1] CRAN (R 4.4.3)
##  goftest                1.2-3    2021-10-07 [1] CRAN (R 4.4.3)
##  gridExtra              2.3      2017-09-09 [1] CRAN (R 4.4.3)
##  gtable                 0.3.6    2024-10-25 [1] CRAN (R 4.4.3)
##  harmony              * 1.2.4    2025-10-10 [1] CRAN (R 4.4.3)
##  hexbin                 1.28.5   2024-11-13 [1] CRAN (R 4.4.3)
##  htmltools              0.5.8.1  2024-04-04 [1] CRAN (R 4.4.3)
##  htmlwidgets            1.6.4    2023-12-06 [1] CRAN (R 4.4.3)
##  httpuv                 1.6.16   2025-04-16 [1] CRAN (R 4.4.3)
##  httr                   1.4.7    2023-08-15 [1] CRAN (R 4.4.3)
##  ica                    1.0-3    2022-07-08 [1] CRAN (R 4.4.3)
##  igraph                 2.1.4    2025-01-23 [1] CRAN (R 4.4.3)
##  IRanges              * 2.40.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  irlba                  2.3.5.1  2022-10-03 [1] CRAN (R 4.4.3)
##  iterators              1.0.14   2022-02-05 [1] CRAN (R 4.4.3)
##  jquerylib              0.1.4    2021-04-26 [1] CRAN (R 4.4.3)
##  jsonlite               2.0.0    2025-03-27 [1] CRAN (R 4.4.3)
##  KernSmooth             2.23-26  2025-01-01 [1] CRAN (R 4.4.3)
##  knitr                  1.50     2025-03-16 [1] CRAN (R 4.4.3)
##  later                  1.4.4    2025-08-27 [1] CRAN (R 4.4.3)
##  lattice                0.22-7   2025-04-02 [1] CRAN (R 4.4.3)
##  lazyeval               0.2.2    2019-03-15 [1] CRAN (R 4.4.3)
##  lifecycle              1.0.4    2023-11-07 [1] CRAN (R 4.4.3)
##  limma                  3.62.1   2024-11-03 [1] Bioconductor 3.20 (R 4.4.2)
##  listenv                0.10.0   2025-11-02 [1] CRAN (R 4.4.3)
##  lmtest                 0.9-40   2022-03-21 [1] CRAN (R 4.4.3)
##  locfit                 1.5-9.12 2025-03-05 [1] CRAN (R 4.4.3)
##  magrittr             * 2.0.4    2025-09-12 [1] CRAN (R 4.4.3)
##  MASS                   7.3-65   2025-02-28 [1] CRAN (R 4.4.3)
##  Matrix               * 1.7-4    2025-08-28 [1] CRAN (R 4.4.3)
##  MatrixGenerics       * 1.18.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  matrixStats          * 1.5.0    2025-01-07 [1] CRAN (R 4.4.3)
##  memoise                2.0.1    2021-11-26 [1] CRAN (R 4.4.3)
##  metapod                1.14.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  mgcv                   1.9-4    2025-11-07 [1] CRAN (R 4.4.3)
##  mime                   0.13     2025-03-17 [1] CRAN (R 4.4.3)
##  miniUI                 0.1.2    2025-04-17 [1] CRAN (R 4.4.3)
##  nlme                   3.1-168  2025-03-31 [1] CRAN (R 4.4.3)
##  otel                   0.2.0    2025-08-29 [1] CRAN (R 4.4.3)
##  parallelly             1.45.1   2025-07-24 [1] CRAN (R 4.4.3)
##  patchwork            * 1.3.2    2025-08-25 [1] CRAN (R 4.4.3)
##  pbapply                1.7-4    2025-07-20 [1] CRAN (R 4.4.3)
##  pillar                 1.11.1   2025-09-17 [1] CRAN (R 4.4.3)
##  pkgbuild               1.4.8    2025-05-26 [1] CRAN (R 4.4.3)
##  pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.4.3)
##  pkgload                1.4.1    2025-09-23 [1] CRAN (R 4.4.3)
##  plotly                 4.11.0   2025-06-19 [1] CRAN (R 4.4.3)
##  plyr                   1.8.9    2023-10-02 [1] CRAN (R 4.4.3)
##  png                    0.1-8    2022-11-29 [1] CRAN (R 4.4.3)
##  polyclip               1.10-7   2024-07-23 [1] CRAN (R 4.4.3)
##  processx               3.8.6    2025-02-21 [1] CRAN (R 4.4.3)
##  progressr              0.18.0   2025-11-06 [1] CRAN (R 4.4.3)
##  promises               1.5.0    2025-11-01 [1] CRAN (R 4.4.3)
##  ps                     1.9.1    2025-04-12 [1] CRAN (R 4.4.3)
##  purrr                  1.2.0    2025-11-04 [1] CRAN (R 4.4.3)
##  R.methodsS3            1.8.2    2022-06-13 [1] CRAN (R 4.4.3)
##  R.oo                   1.27.1   2025-05-02 [1] CRAN (R 4.4.3)
##  R.utils                2.13.0   2025-02-24 [1] CRAN (R 4.4.3)
##  R6                     2.6.1    2025-02-15 [1] CRAN (R 4.4.3)
##  RANN                   2.6.2    2024-08-25 [1] CRAN (R 4.4.3)
##  RColorBrewer         * 1.1-3    2022-04-03 [1] CRAN (R 4.4.3)
##  Rcpp                 * 1.1.0    2025-07-02 [1] CRAN (R 4.4.3)
##  RcppAnnoy              0.0.22   2024-01-23 [1] CRAN (R 4.4.3)
##  RcppHNSW               0.6.0    2024-02-04 [1] CRAN (R 4.4.3)
##  readxl               * 1.4.5    2025-03-07 [1] CRAN (R 4.4.3)
##  registry               0.5-1    2019-03-05 [1] CRAN (R 4.4.3)
##  remotes                2.5.0    2024-03-17 [1] CRAN (R 4.4.3)
##  reshape2               1.4.5    2025-11-12 [1] CRAN (R 4.4.3)
##  reticulate             1.44.1   2025-11-14 [1] CRAN (R 4.4.3)
##  rhdf5                * 2.50.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  rhdf5filters           1.18.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  Rhdf5lib               1.28.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  rjson                  0.2.23   2024-09-16 [1] CRAN (R 4.4.3)
##  rlang                  1.1.6    2025-04-11 [1] CRAN (R 4.4.3)
##  rmarkdown              2.30     2025-09-28 [1] CRAN (R 4.4.3)
##  rmdformats             1.0.4    2022-05-17 [1] CRAN (R 4.4.3)
##  ROCR                   1.0-11   2020-05-02 [1] CRAN (R 4.4.3)
##  rprojroot              2.1.1    2025-08-26 [1] CRAN (R 4.4.3)
##  RSpectra               0.16-2   2024-07-18 [1] CRAN (R 4.4.3)
##  rstudioapi             0.17.1   2024-10-22 [1] CRAN (R 4.4.3)
##  rsvd                   1.0.5    2021-04-16 [1] CRAN (R 4.4.1)
##  Rtsne                  0.17     2023-12-07 [1] CRAN (R 4.4.3)
##  S4Arrays               1.6.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  S4Vectors            * 0.44.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  S7                     0.2.1    2025-11-14 [1] CRAN (R 4.4.3)
##  sass                   0.4.10   2025-04-11 [1] CRAN (R 4.4.3)
##  ScaledMatrix           1.14.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  scales               * 1.4.0    2025-04-24 [1] CRAN (R 4.4.3)
##  scattermore            1.2      2023-06-12 [1] CRAN (R 4.4.3)
##  scran                * 1.34.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  sctransform            0.4.2    2025-04-30 [1] CRAN (R 4.4.3)
##  scuttle              * 1.16.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  seriation            * 1.5.8    2025-08-20 [1] CRAN (R 4.4.3)
##  sessioninfo            1.2.3    2025-02-05 [1] CRAN (R 4.4.3)
##  Seurat               * 5.3.1    2025-10-29 [1] CRAN (R 4.4.3)
##  SeuratObject         * 5.2.0    2025-08-27 [1] CRAN (R 4.4.3)
##  shape                  1.4.6.1  2024-02-23 [1] CRAN (R 4.4.3)
##  shiny                  1.11.1   2025-07-03 [1] CRAN (R 4.4.3)
##  SingleCellExperiment * 1.28.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  sp                   * 2.2-0    2025-02-01 [1] CRAN (R 4.4.3)
##  spam                   2.11-1   2025-01-20 [1] CRAN (R 4.4.3)
##  SparseArray            1.6.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.3)
##  spatstat.data          3.1-9    2025-10-18 [1] CRAN (R 4.4.3)
##  spatstat.explore       3.6-0    2025-11-22 [1] CRAN (R 4.4.3)
##  spatstat.geom          3.6-1    2025-11-20 [1] CRAN (R 4.4.3)
##  spatstat.random        3.4-3    2025-11-21 [1] CRAN (R 4.4.3)
##  spatstat.sparse        3.1-0    2024-06-21 [1] CRAN (R 4.4.3)
##  spatstat.univar        3.1-5    2025-11-17 [1] CRAN (R 4.4.3)
##  spatstat.utils         3.2-0    2025-09-20 [1] CRAN (R 4.4.3)
##  statmod                1.5.1    2025-10-09 [1] CRAN (R 4.4.3)
##  strex                * 2.0.1    2024-10-03 [1] CRAN (R 4.4.3)
##  stringi                1.8.7    2025-03-27 [1] CRAN (R 4.4.3)
##  stringr              * 1.6.0    2025-11-04 [1] CRAN (R 4.4.3)
##  SummarizedExperiment * 1.36.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  survival               3.8-3    2024-12-17 [1] CRAN (R 4.4.3)
##  tensor                 1.5.1    2025-06-17 [1] CRAN (R 4.4.3)
##  tibble                 3.3.0    2025-06-08 [1] CRAN (R 4.4.3)
##  tidyr                  1.3.1    2024-01-24 [1] CRAN (R 4.4.3)
##  tidyselect             1.2.1    2024-03-11 [1] CRAN (R 4.4.3)
##  TSP                    1.2.6    2025-11-27 [1] CRAN (R 4.4.3)
##  UCSC.utils             1.2.0    2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  usethis                3.2.1    2025-09-06 [1] CRAN (R 4.4.3)
##  uwot                 * 0.2.4    2025-11-10 [1] CRAN (R 4.4.3)
##  vctrs                  0.6.5    2023-12-01 [1] CRAN (R 4.4.3)
##  vipor                  0.4.7    2023-12-18 [1] CRAN (R 4.4.3)
##  viridis              * 0.6.5    2024-01-29 [1] CRAN (R 4.4.3)
##  viridisLite          * 0.4.2    2023-05-02 [1] CRAN (R 4.4.3)
##  whisker                0.4.1    2022-12-05 [1] CRAN (R 4.4.3)
##  withr                  3.0.2    2024-10-28 [1] CRAN (R 4.4.3)
##  workflowr            * 1.7.2    2025-08-18 [1] CRAN (R 4.4.3)
##  xfun                   0.54     2025-10-30 [1] CRAN (R 4.4.3)
##  xtable                 1.8-4    2019-04-21 [1] CRAN (R 4.4.3)
##  XVector                0.46.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  yaml                   2.3.11   2025-11-28 [1] CRAN (R 4.4.3)
##  zellkonverter        * 1.16.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  zlibbioc               1.52.0   2024-10-29 [1] Bioconductor 3.20 (R 4.4.2)
##  zoo                    1.8-14   2025-04-10 [1] CRAN (R 4.4.3)
## 
##  [1] /home/macnairw/packages/scprocess/.snakemake/conda/4fef11cadd34f9d2d13a0d6139d09340_/lib/R/library
##  * ── Packages attached to the search path.
## 
## ──────────────────────────────────────────────────────────────────────────────