How to demultiplex a Seurat object and convert it to 10X files for analysis in Cellenics®


Demultiplexing is the process of separating sequenced single-cell RNA-sequencing (scRNA-seq) reads for each sample into separate files. To load your data into the Biomage-hosted community instance of Cellenics®, you'll need the raw count matrices in the shape of three files: barcodes.tsv, features.tsv and matrix.mtx files. This is a common data type processed by 10x Cell Ranger. For every sample, you'll need a different set of files, and each set of files should be in a separate folder with the sample name.

The following notebook shows a brief tutorial written in R to demultiplex a Seurat object stored in an R data files (.rds) object and convert it to 10x files, which can be directly uploaded to the Biomage-hosted community instance of Cellenics®.

In this tutorial, we will use data from this publication. Data is available on this link.

Please, note that in this case, the .rds file stores a Seurat object, but it can potentially store many different types of data, such as a count matrix or a SingleCellExperiment object. This tutorial is intended as an example that can be used as a walkthrough for similar cases.

We recommend reading the .rds file with R as illustrated in this tutorial but then looking carefully at the class of the object and its data structure. If you are dealing with an object different from the one used in this tutorial and you're not sure how to convert it, please get in touch with the Biomage team via the community forum (https://community.biomage.net/), and we'll be happy to assist you.

Now let's jump into our data!

1. Load data and check data structure

Set working directory and load rds object:

setwd("/Your/Directory")
data.seurat <- readRDS("HSPCs.rds")

Check the class of the object stored in the .rds file. Please note that if you are using a different .rds file and the class of the object is not “Seurat,” this script won't work. If you're unsure how to convert it, don't hesitate to contact the Biomage team via the community forum (https://community.biomage.net/), and we'll be happy to assist you.

class(data.seurat)
## [1] "Seurat"
## attr(,"package")
## [1] "Seurat"

Load required package:

library("DropletUtils")

Looking at the data we can see that this is a Seurat object with 2 assays (RNA and integrated):

data.seurat
## An object of class Seurat 
## 67388 features across 5183 samples within 2 assays 
## Active assay: RNA (33694 features, 0 variable features)
##  1 other assay present: integrated
##  2 dimensional reductions calculated: pca, umap

We set the default assay to “RNA” because we want the original data, as Cellenics® will take care of normalization and integration. Then we extract the count matrix from the Seurat object:

Seurat::DefaultAssay(data.seurat) <- "RNA"
counts <- data.seurat[["RNA"]]@data

The count matrix has gene symbols as rownames and cell barcodes as colnames. Let's look at the first gene symbol and the first cell barcode as an example:

rownames(counts)[1]
## [1] "RP11-34P13.3"
colnames(counts)[1]
## [1] "AAACCTGAGCGGCTTC-1"

In this case, samples are encoded in a slot inside the Seurat object. We can see all the unique sample names by doing:

unique(data.seurat$Sample)
## [1] "FBM4" "FBM5" "FBM3" "UCB5" "UCB1" "ABM4" "ABM3"

IMPORTANT: Please note that samples might be encoded in different ways. For example, they could be encoded inside cell barcodes. Suppose you're dealing with a similar case. Once you have extracted the count matrix from the Seurat object, you can move to this tutorial (https://www.biomage.net/blog/how-to-demultiplex-an-rds-object-and-convert-it-to-10x-files), where demultiplexing is done considering samples encoded in cell barcodes. In the following steps, we'll demultiplex the data using sample names encoded inside the Seurat object.

2. Demultiplex data and export as 10X files

Split the Suerat object by sample names:

data.seurat.list <- Seurat::SplitObject(data.seurat, split.by = "Sample")

Get unique sample names:

sample.names <- unique(data.seurat$Sample)

Check sample names:

head(sample.names)
## [1] "FBM4" "FBM5" "FBM3" "UCB5" "UCB1" "ABM4"
tail(sample.names)
## [1] "FBM5" "FBM3" "UCB5" "UCB1" "ABM4" "ABM3"

To export as 10X files that can be directly uploaded to the Biomage-hosted community instance of Cellenics®, define a function that creates a subdirectory named “demultiplexed” inside the current working directory, and save 10X data for each sample in different subfolders. If a folder named “demultiplexed” already exists, it will stop and return an error to avoid overwriting files.

demultiplex_convert_to_10x <- function(obj, samples) {
            if(class(data.seurat) != "Seurat") {
              message("WARNING: this rds file does not contain a Seurat object! STOP RUNNING THIS SCRIPT")
              message("Check the data type by running:")
              message("class(data.seurat)")
              stop()
          }
          if(!dir.exists(file.path(getwd(), "demultiplexed"))) {
          dir.create(file.path(getwd(), "demultiplexed"))
        } else {
          print("WARNING! A demultiplexed directory already exists")
          return()
        }
        for (i in 1:length(samples)) {
        print(paste0("Converting sample ", samples[i]))
        obj.sub <- obj[[samples[i]]]
        DropletUtils::write10xCounts(path = paste0(getwd(),"/demultiplexed/",samples[i]), x = obj.sub[["RNA"]]@data, type = "sparse", version="3")
        }
}

Run the function:

demultiplex_convert_to_10x(obj = data.seurat.list, samples = sample.names)

Now you will find all the samples inside the “demultiplexed” folder. Each sample folder should contain 3 files named “barcodes.tsv”, “features.tsv”, and “matrix.mtx”. If the demultiplexed folder was not created or was created, but it's empty, this can be mainly for 2 reasons:

1. The class of the object stored in your .rds file is not a Seurat object. Please look at the first part of section 1 to check the class of the object stored in your .rds file.

2. The Seurat object slot that contains sample names is not named "Samples". Please, check the structure of the Seurat object and look for the name of the slot containing sample names. Then go back to the first part of section 2 of this tutorial and change the name accordingly.

Your samples are now ready to be uploaded to Cellenics®! Start your analysis for free using the Biomage-hosted community instance of Cellenics® that's available at https://scp.biomage.net/

Previous
Previous

Single-cell RNA sequencing data analysis: 6 tools you need to know

Next
Next

Converting H5 files for analysis in the open source scRNA-seq data visualization tool Cellenics®