Question: My Xenium brain tissue sample shows a low fraction of transcripts in cells. Although this is expected, how can I rescue the transcripts and assign them to cells?
Answer: Rescue transcripts with external transcript-based cell segmentation methods (Approach A) or with Xenium Ranger nuclei-based cell segmentation (Approach B).
This article covers one context out of the multiple presented in Rescue Xenium transcripts outside cells. The article presents learnings from human and mouse Xenium brain samples that may extend to other tissue types.
In brain tissue we expect interior 18S rRNA to segment most cells. The interior segmentation algorithm logic requires a nucleus to segment a cell. If nuclei are sparse, then cells will also be sparse. In such samples, many transcripts will remain outside of cells. Despite the lower transcript capture rate, for both mouse and human brain samples Multimodal Cell Segmentation gives better quality data over nuclei-expansion-segmentation-only samples.
Approach A: Rescue transcripts with external transcript-based cell segmentation
Researchers may opt to define cells anew using an external transcript-based cell segmentation method or use niche or neighborhood analyses. Some popular methods are as follows.
- Baysor (https://github.com/kharchenkolab/Baysor); an article to help get started here.
- Proseg (https://github.com/dcjones/proseg)
- Ficture (https://github.com/seqscope/ficture)
- BIDCell (https://github.com/SydneyBioX/BIDCell)
Apply the external cell segmentation results to Xenium transcript calls using xeniumranger import-segmentation
. Usage documentation is here. This produces an updated Xenium results bundle that can be visualized with Xenium Explorer.
Import logic documentation is here. Transcript-based segmentation results can omit nuclei definitions. If only cells are imported but no --nuclei
are defined, then xeniumranger-v2
and prior versions will create nuclei that correspond to the cell shapes. xeniumranger-v3+
will not define nuclei. It is possible to supply the XA run nuclei definitions with --nuclei xeniumbundle/cells.zarr.zip
.
For niche and neighborhood analyses, manual binning of transcripts is another option, e.g. with Python.
Approach B: Rescue transcripts with Xenium Ranger nuclei-only cell segmentation
To rescue transcripts for comparison with older XA-v1 data or to capture the transcripts in cells for QC using tool frameworks that expect cell x gene matrices, it is possible to reprocess the data with xeniumranger
and define cells with only nuclei-expansion and an expansion distance larger than 5µm. The expansion distance in xeniumranger
can be set with the --expansion-distance
parameter up to 100 µm. Because expansion is based on the biological feature of nuclei and stops when it encounters another expanding cell, this is a sure option to capture the transcripts found in tissue covered regions.
- Some brain researchers use
--expansion-distance 0
for conservative cell definitions that correspond to the nucleus. This produces a cell x gene matrix of transcript counts corresponding to nuclei and minimizes doublets. - The results from over-expanding cell definitions beyond 5 µm should be used with caution, e.g. only for metrics comparisons. The data that uses the multimodal cell segmentation stain and logic is better in quality, and researchers should use this for biological insights.
- For a brain sample where the XA-v2 Multimodal Cell Segmentation run gave 54% transcripts in cells, switching to all nuclei-based segmentation and 5 µm expansion captured 65% of transcripts in cells. Further increasing the expansion distance to 15 µm captured 90.4% of transcripts and to 100 µm captured 99.7% of transcripts.
- Xenium Ranger is not forward-compatible, e.g. running xeniumranger-v1 on XA-v2 data is unsupported and will error. See version compatibility matrix here.
For nuclei-only cell segmentation, xeniumranger
offers two approaches–with the import-segmentation
module and with the resegment
module. Respective documentation is here and here. The commands below illustrate with the default 5 µm expansion distance.
The import-segmentation
command uses the nuclei segmentation defined previously and expands the nuclei to the µm defined by --expansion-distance
to define cells. The command shows use of the cells.zarr.zip
nuclei segmentation from a previous XA run.
xeniumranger-xenium2.0/xeniumranger \
import-segmentation \
--id updatedresults \
--xenium-bundle xeniumbundle \
--nuclei xeniumbundle/cells.zarr.zip \
--expansion-distance 5 \ #default
--localcores 32 \
--localmem 128
The resegment
module will perform nuclei segmentation anew and consumes more compute than import-segmentation
. The command disables both boundary and interior stain logic.
xeniumranger-xenium2.0/xeniumranger \
resegment \
--id updatedresults \
--xenium-bundle xeniumbundle \
--boundary-stain=disable \
--interior-stain=disable \
--expansion-distance 5 \ #default
--localcores 32 \
--localmem 128
Discussion
While brain tissue run on Xenium Analysis v1 pipeline did not present this issue, brain tissue run on the v2 pipeline show variable transcript capture rates and may even trigger the ‘Low fraction of transcripts within cells’ warning. Part of the difference is due to a change in the default nucleus expansion distance. In XA-v1 it is 15 µm and in XA-v2 the default expansion distance is 5 µm, which better represents biology. Another contributing factor is the use of Multimodal Cell Segmentation, again which defines cells that better reflect biology. Finally, the XA-v2 pipeline updates the nucleus segmentation model to be more stringent.
- The 'Low fraction of transcripts within cells' warning is triggered when the value is <= 50.0%. We explain alerts at https://www.10xgenomics.com/support/software/xenium-onboard-analysis/latest/analysis/analysis-summary-troubleshooting. The page states this alert is expected for brain samples.
- See this article for XA-v2 pipeline changes that impact data comparability.
- Twenty-two human and mouse Xenium brain samples run with Multimodal Cell Segmentation showed the percent of transcripts within cells ranged from 31.7% to 59.9%. The mean 'Percent of transcripts within cells' was 48% and the median was 51%.
For both human and mouse brain, the majority of cells are segmented based on the interior 18S rRNA stain. The boundary protein stain targets ATP1A1, CD45 (PTPRC), and E-Cadherin (CDH1; E stands for epithelial) on the surface membrane of immune cells and epithelial cells. Researchers may observe increased cell segmentation from the boundary protein stain if the brain tissue section includes immune cells, the choroid plexus or other cell types expressing the proteins, e.g. in cancer. The choroid plexus is a small convolute tissue structure that expresses E-Cadherin [1]. For many Z-planes of the tissue section the choroid plexus is absent. The image shows the boundary stain on mouse choroid plexus.
A third stain, the interior protein stain that targets alphaSMA (ACTA2) and Vimentin (VIM), is not used by the XA software to segment cells. However, it does stain some cell types of interest, e.g. brain astrocytes and microglia. The star-like shape of these cell types with arms that branch in 3-dimensions poses a challenge in defining cells with Xenium thin tissue slices and 2-D stain images.
A 2024 study shows a 3-D mapping of neurons based on electron-microscopy that can be explored by the accompanying software tool Neuroglancer (https://github.com/google/neuroglancer). The interior protein stain shapes resemble this mapping. However, use of the interior stain with the current cell segmentation logic will not define the star-shaped cells nor improve results. Cell segmentation model training uses cell shapes whose boundaries are easily drawable from stain images and do not include star-shaped cells.
Product: Xenium In Situ Gene Expression
Last modified: September 4, 2024