Single-cell RNA sequencing data analysis: 6 tools you need to know


Can you imagine yourself on the brink of a ground-breaking discovery, except you don’t know it yet?

Those of you who work in the single-cell RNA-sequencing (scRNA-seq) field may be familiar with this. Having laid the groundwork, prepared, and analyzed your samples, you now have your results. In front of you are hundreds, thousands, if not millions of cells, and your data is beginning to look like something out of a science fiction film. What can you do with this data if you're not proficient in programming or don't have time to learn bioinformatics?

Fortunately, there is an increasing number of single-cell data analysis platforms available to you. In this article, we have compiled a list of the best single-cell RNA-seq analysis software that will help you move your single-cell research forward.

Single-cell RNA-sequencing – where lies the stumbling block?

In recent years, scRNA-seq has proven to be one of the most successful academic and clinical research methods. While the wet-lab side of scRNA-seq has improved, single-cell research is currently hampered by the difficulty of single-cell data analysis. So, biological insight continues to be locked away in sequencing data. As a result, there is an increasing need for bioinformatics solutions. 

Which factors should you consider?

Once you have your scRNA-seq data, you can gain insights from it using a single-cell analysis software. But how do you know which one is a good fit for you? We look at the common and unique features across the most popular scRNA-seq analysis tools currently out there.

You should evaluate the type of application your workstation can handle and the level of intuitiveness of the platform. 

The next things to think about are your specific requirements for data analysis. What input formats and single-cell technologies are accepted by the software? How in-depth cell filtering and data exploration do you want to perform? Do you need single-cell analysis software with automatic cell type prediction?

You should also consider price and licensing. Some scRNA-seq data analysis tools are free, while others charge licensing fees. Users should not overlook the many robust free solutions available. The most expensive software doesn't necessarily have the most features.

You can find a detailed list of features in the overview table below. 

Overview

Our picks for the best tools to analyze scRNA-seq data:

  • Cellenics

  • BioTuring Browser

  • 10x Loupe Browser

  • Partek® Flow®

  • Cellxgene

  • Rosalind

The following overview table provides an overview of the features that have been considered

Best scRNA-seq data analysis tools

Cellenics®

Pros
- User-friendly interface with a clear workflow
- Hosted on the cloud
- Integration with NextFlow
- A variety of pre-loaded publication-ready plots
- Free of charge for academic users (up to 500k cells)
Cons
- Only supports the import of raw count matrices
- Doesn't support multi-omics technologies
Dashboard of Data Exploration module in Cellenics

Cellenics® is an open-source scRNA-seq analysis software developed by © 2020-2022 President and Fellows of Harvard College under the scientific supervision of Prof. Peter Kharchenko and the administrative supervision of the Department of Biomedical Informatics at Harvard Medical School. Biomage is an open-source software company that provides services for the design and development of Cellenics® and currently provides services focused on the deployment, training, and user support of Cellenics®. Biomage host the largest community instance of Cellenics® (at https://scp.biomage.net/) that was released in August 2021.

Cellenics® is a cloud-based single-cell RNA-seq analysis software that allows you to explore and analyze your dataset without prior programming knowledge. Being in the cloud means users can analyze their dataset from anywhere in the world at any time. So, the user doesn't need a powerful workstation for single-cell data analysis and storage. Cellenics® also has a user-friendly graphical interface divided into four components – data management, data processing, data exploration, and plots and tables.

Data import. Users can import raw count matrices into one of the hosted instances of Cellenics® in the shape of three files per sample: barcodes.tsv, features.tsv and matrix.mtx. This common data type is generated after processing FASTQ files with 10x Genomics’ Cell Ranger. Users can also import BD Rhapsody data - expression_data.st or expression_data.st.gz files [as of November 2022]. Cellenics® currently doe not support multi-omics technologies. 

Biomage offers additional bioinformatics support to import other data types for a small consulting service fee.

The platform is essentially species agnostic for all functions except pathway analysis. This is because the species is defined at the pre-processing stage when the reads are aligned to a specific genome, which is upstream of data import to Cellenics®.

Data processing. Cellenics® offers an in-depth data processing and quality control workflow where the data are filtered and integrated. There is a classifier, cell size distribution, mitochondrial content, doublet, and a number of genes versus UMIs filter for step-by-step filtering of empty droplets, dead cells, poor quality cells and doublets. Data are filtered per sample basis with a plot for each sample within each filter. Additionally, any of the filtering steps can also be disabled if needed. 

Cellenics® supports fast MNN, Harmony, Seurat v4 for data integration, log-transformation for data normalization, and user control over dimensionality reduction. UMAP and t-SNE projections are available for embedding with Louvain as a default clustering method. All quality control plots have customizable options for dimensions, title, axis, and font. 

Data processing steps have options for suggested automatic settings and manual override.

Data exploration. Cellenics® has a wide variety of data exploration features. Custom cell sets can be created using selection tools or based on the expression of one or more genes. It's easy to rename clusters or recolor by sample, metadata, or gene. Standard analysis actions such as marker heatmap and UMAP are pre-loaded. Automatic cluster annotation is available for Human and Mouse species. Users can calculate differential expression between cell sets within a sample/group or compare a cell set between samples and groups. Differential expression results can be filtered further, for example, by selecting only upregulated genes. Users can perform pathway analysis on the list of differentially expressed genes using external services - pantherdb or enrichr. Trajectory analysis is also available in the Plots and Tables module [as of September 2022].

Plot types. Cellenics® provides a range of pre-loaded plots so that users can export publication-quality figures very quickly. 

Available plots: categorical and continuous embedding, frequency plot for cell sets, volcano plot, dot plot, violin plot, marker heatmap, and custom heatmap. Plots are fully customizable and can be downloaded as SVG and PNG and opened in Vega Editor. 

Data export and sharing. Users can download each sample's features, barcodes, and matrix files from the sample list. Raw and processed Seurat objects can be exported from any processed project. Data Processing settings can be downloaded as a .txt file. Furthermore, a list of differentially expressed genes can be exported as .csv files from the volcano plot.

Users can share their scRNA-seq data and analyses using Cellenics®. Multiple users can collaborate on the analysis of a single dataset. 

Pricing. The Biomage-hosted community instance of Cellenics® is free-of-charge for academic researchers with datasets up to 500k cells. This includes unlimited storage and processing of an unlimited number of projects .

Users can further enhance their knowledge of Cellenics® and their single-cell RNA-seq analysis skills with Biomage's comprehensive online course. Dive into our dedicated blog post to embark on this enlightening journey: Unlock the Secrets of Single-Cell RNA-Seq Data Analysis with Our Comprehensive Course

BioTuring Browser

Pros
- Supports the import of a variety of data types
- Supports CITE-seq, ATAC-seq, TCR-seq, and spatial transcriptomics data
Cons
- Desktop scRNA-seq data analysis tool
- Paid software
- Not an open-source software
- The quality assessment of data is limited
Dashboard from Bioturing Browser (BBrowser)

BioTuring Browser, or BBrowser, is a tool that provides an end-to-end scRNA-seq data analysis solution. This means that the tool provides an analysis solution that takes you from raw single-cell data to publication-ready figures. It was first released in October 2018, and the latest version of the software is BBrowser X. 

This is a desktop tool and therefore has specific system requirements. It runs on Windows (64-bit only) and macOS operating systems. For datasets over 100 000 cells, BioTuring recommends having 8-16GB RAM, and all data downloaded from the BBrowser database or uploaded to the software is stored on the local computer. This means that one needs a powerful workstation to analyze larger datasets. 

Importantly, BBroswer is not an open-source platform. Two open-source packages released by BioTuring can write BioTuring Compressed Study (BCS) files from a Seurat object or an AnnData (Scanpy) object. 

Data import. In terms of data import, BBrowser accepts the upload of several file types. You can import MTX, TSV, TXT, CSV, and h5 integer count matrix files. You can also upload raw data (FASTQ files). Doing this will require downloading the Visual C++ Redistributable package if you are using Windows and your needed reference index (5GB on average memory required for each reference file). You also need sufficient RAM to run these files' alignment and quantification process. BBroswer also supports the upload of Seurat (.rds) and Scanpy objects (.h5ad) with integer counts in most cases.

BBrowser accepts data from the following species:

  • Human (Homo sapiens)

  • Long-tailed macaque (Macaca fascicularis)

  • Mouse (Mus musculus)

  • Rat (Rattus norvegicus)

  • Zebrafish (Danio rerio)

  • Fruit fly (Drosophila melanogaster)

For any other species, gene information will be disabled. Take a note that FASTQ import only works for human and mouse data generated by 10x Genomics’ technologies. 

BBroswer accepts any scRNA-seq technologies, like 10x Gemonics, if the data format is correct. BioTuring's software also supports CITE-seq, ATAC-seq, TCR-seq, and spatial transcriptomics data.

One of the unique features of BBrowser is the available public sequencing data from the latest publications. So, users can analyze the available data and compare multiple datasets. 

Data processing. The process from quality control to dimensionality reduction is applied to public and in-house datasets imported in MTX, TSV or CSV, and FASTQ. If the data has been processed (Seurat, Scanpy, and BCS), quality control (filtering) and batch effect correction will not be applied. 

For imported data BBrowser allows users to define cut-offs for quality control, such as minimum UMI counts, percentage of mitochondrial content, and distribution of cells by the number of detected features and the distribution of features by cell frequency. Data normalization is only available by log transformation. Therefore, it could be said the quality assessment is limited in BBrowser. This is a slight drawback, especially for those with some bioinformatics knowledge or who need specific quality control steps. Users can also skip the filtering steps if needed. In BBrowser 3, users can interact with graphs showing quality assessment. 

BBrowser supports three data integration methods for batch effect removal – MNN, CCA, and Harmony. Both t-SNE and UMAP are available for dimensionality reduction, and clustering by the Louvain method is available. 

Data exploration. BBrowser supports finding differentially expressed and marker genes using the Venice or edgeR packages. Also, the software has an automatic feature for cell type prediction. This feature is triggered by selecting a cluster and using BioTuring's knowledge base to match cell type markers. BBrowser can also perform gene set enrichment analysis on biological processes, molecular function, cellular components, and biological pathways based on the GSEA method. The analysis tool also allows trajectory analysis.

Plot types. BBrowser offers a wide range of visualizations, such as UMAP/t-SNE, feature plots in 2D and 3D, correlation plots, heatmap, and violin plot. Most visualizations are available when a user queries genes/proteins' expression. 

There are good options for customizing the main plots (scatter and t-SNE/UMAP) – changing point opacity and size, color scheme, and so on. Most other plots are fixed in design and layout, except the color scheme can be changed before making a plot. Otherwise, you need to export the graph's data to tsv and reconstruct it outside BBrowser. BBrowser supports exporting graphs into .PNG or .SVG formats. Box plots, violin plots, and density can be exported as SVG files in gene query and the differential expression analysis dashboard.

Data export. Users can export many types of data to a TSV file: annotations, clonotypes, graph data, a list of marker genes, and DE genes. The expression matrix can be exported as a folder containing the matrix.mtx, features.tsv, and barcodes.tsv files or a Seurat object (.rds).

Pricing. BBrowser is paid single-cell analysis software. Pricing is available on demand. 

Users without a license can access up to 5 public studies in the BioTuring database, however, they cannot upload their own data. Academic free users can also import Seurat or Scanpy objects on a free plan and run downstream analysis (except sub-clustering, trajectory analysis and differential expression analysis). 

10x Loupe Browser

Pros
- Free for analyzing 10x Genomics datasets
- Supports integration with ATAC-seq, CITE-seq, and VDJ sequencing data
Cons
- Doesn’t support data processing steps such as data filtering and integration
- Doesn’t support trajectory analysis
- Data import limited to .cloupe files
Dashboard from Loupe Browser by 10x Genomics

Loupe Browser by 10x Genomics is a desktop tool that enables anyone to analyze and visualize 10x Genomics datasets. Users can install the latest Loupe Browser 6.1.0 on macOS or Windows 64-bit. The software requires a minimum of 4GB RAM and 16GB RAM for datasets over 100 000 cells. Loupe Browser has a simple interface centered around the main view panel. It is worth noting that 10x Loupe Browser is not open-source software though it is available for free to all users with 10x Genomics data. 

Data import. The user can open any .cloupe file generated by other 10x Genomics software to visualize their data. Loupe Browser supports integration with ATAC-seq, CITE-seq, and VDJ sequencing data. 

There is no available information on the supported species in the 10x Loupe Browser. However, we can confidently assume that data from human and mouse species are accepted, as Cell Ranger provided pre-built reference packages for these species. 

Data processing. It’s important to note that Loupe Browser does not offer data processing (filtering, integration, dimensionality reduction, etc.) and requires users to use additional 10x Genomics tools such as Cell Ranger or 10x Cloud for the pre-processing of raw data files. So, 10x Loupe Broswer is relatively constrained in data analysis. 

Regarding clustering, Loupe Browser provides three ways to display clusters – graph-based clustering, k-means clustering, and custom-created categories. A great function of this software is the ability to split the projections (like t-SNE) by clusters in a single view. 

When the user subsets the cells, it’s possible to recluster them. Reclustering entails setting thresholds for UMI counts, number of features, and mitochondrial fraction. Note that the user must manually input thresholds. Reclustering results in t-SNE and UMAP projections. Reclustering is only possible for datasets with less than 100 000 cells in Loupe Browser.

Data exploration. Users can perform differential expressions between clusters. The globally distinguishing method allows defining features that distinguish a selected cluster from every other cell in the dataset. The locally distinguishing method finds features highly expressed within clusters chosen compared to other designated groups. Loupe Browser also allows creating cell subsets by expression filtering or by manual selection. 

There is no automatic cell set prediction. However, users can see upregulated genes for each cluster in a feature table and use them for manual annotation. Users can also view the expression of particular genes in the projections with configurable parameters such as the minimum UMI value of the expression value. Loupe also doesn’t support trajectory analysis.

Plot types. Available plots are limited: UMAP/t-SNE, heatmap, violin plot, feature plot. However, users can input custom trajectory projections generated by third-party tools. The plots are not fully customizable so it’s impossible to generate publication-ready figures. The user can adjust the color scheme and point size. 

UMAP and t-SNE coordinates can be exported as CSV. Feature plots, UMAP, and t-SNE can also be exported as images. 

Data export and sharing. Users can export categories in a CSV file with a list of barcodes and their associated cluster labels. The significant gene table and the currently active (selected) features can also be exported as CSV.

Data sharing is impossible in Loupe Browser, as files in the software are self-contained to the user’s local environment. Though, users can share datasets by sharing the .cloupe file. 

Pricing. Free for analyzing 10x Genomics data. 

Partek® Flow®

Pros
- Supports the import of a variety of data types
- Allows to enter the data analysis pipeline from various processing stages
- Offers many statistical algorithms and various analysis options for data processing
Cons
- Requires powerful hardware to run
- Steep learning curve for data analysis interface
- Not an open-source software
- Paid software
Dashboard from Partek Flow software

Partek® Flow® is a software for analyzing next-generation and scRNA-seq data. It is a web-based application that users can install on a desktop computer or a computer cluster. Users can also run the software on the Amazon Web Services cloud, which might require additional technical knowledge. Irrespective of where the server is, the user will interact with Partek® Flow® using a web browser. Google Chrome, Mozilla Firefox, Microsoft Edge, and Apple Safari browsers are currently supported. 

Still, Partek® Flow® requires powerful hardware to run. Even for datasets with less than 100 000 cells, the system requirements are 64GB RAM, > 2TB storage for data, and > 100 GB storage for root partition. Partek® Flow® recommends accounting for 3-5 times more storage than required, which is an important consideration to keep in mind. Also, Partek® Flow® is not open-source software.

Partek® Flow® has an intuitive interface. The interface presents the user with appropriate task options, so first-time users without bioinformatics knowledge can import their data and perform downstream analysis. However, this falls a bit flat when it comes to data analysis. Partek® Flow® includes many statistical algorithms and various analysis options for data processing. Yet, these are organized in a task graph with data and task nodes, and analysis requires going to and from the task graph and data viewer and some knowledge of analysis processes and methods. This means that the user might need help with bioinformatics to perform downstream analysis confidently. 

Task graph and task nodes from Partek Flow

Data import. Partek® Flow® supports the data import of a wide range of file formats, including count matrices, FASTQ, Seurat objects (.Rds), and H5 files. The software also supports multi-omics technologies scATAC-seq and CITE-seq in specific data analysis tasks, like finding multimodal neighbors. Partek Flow does not seem to have any restrictions regarding supported species. 

An advantage of Partek® Flow® is the ability to enter the data analysis pipeline from various points – the user can start with raw data, aligned reads, count data, or normalized counts. 

Data processing. Partek® Flow® offers several quality control tools. There are four quality assessment plots – counts per cell, detected features per cell, and percentage of mitochondrial and ribosomal counts per cell. Partek® Flow® allows users to filter data by features, groups, barcodes, and more. The thresholds for these are not set automatically and must be configured by the user. 

The software supports several normalization methods, including log-transformation and SCTransform. Additionally, it provides general linear model, Harmony, and Seurat v3 methods for data integration. The platform provides graph-based and k-clustering with scatter plots, PCA, t-SNE, and UMAP visualizations. It also offers hierarchical clustering heat maps and violin plots with configurable parameters. Additionally, users can perform trajectory analysis with interactive 2D and 3D plots. The user can also calculate pseudotime from chosen starting points of trajectories with the Monocle 3 method.

Data exploration. Users can perform gene set enrichment analysis with groups defined by Gene Ontology or another imported gene ontology source. Differential pathway expression analysis is also available with interactive KEGG pathway maps for additional information. Automatic cell type prediction is not available. However, if the user has attribute information about the cells in the dataset, they can use this to annotate cells. Marker genes for each cluster can be calculated using ANOVA. 

Plot types. Available plots: UMAP/t-SNE, heatmap, bubble map, scatter plot, dot plot, and volcano plot. Visualizations can be downloaded as publication-quality SVG files with customizable image size and DPI. 

Data export and sharing. Users can export count matrix data (including filtered and normalized counts and more) to h5ad files.

Partek® Flow® supports multiple users on a server, and each user can be classified as an administrator or a regular user. This allows for direct data sharing. 

Pricing. Pricing is available on demand. 

Cellxgene

Pros
- Free of charge for academic users
- An open source scRNA-seq data analysis software
- Can support very large datasets with millions of cells
- Easy to select and subset cells
Cons
- Primarily a data visualization tool
- Installation and data exploration in the desktop version requires programming knowledge

Cellxgene is an open-source, free-to-use scRNA-seq data analysis tool. However, it is essential to note that Cellxgene is almost entirely a data exploration and visualization tool. 

Cellxgene offers a hosted data explorer and a self-hosted desktop data explorer. The hosted data explorer allows you to look at published data sets in the Cellxgene platform in UMAP form. The downloadable desktop version allows the exploration of private single-cell data via a PyPI package. 

To install the self-hosted single-cell RNA-seq analysis software you need Python 3.6+ and Google Chrome 61+, Edge 15+, or Firefox 60+ browser. Cellxgene desktop explorer is launched via the command line in Python. Without coding knowledge, the installation and launch might pose a challenge. 

Cellxgene could be very useful to computational biologists who write their code via a mini Jupyter notebook-like interface. This provides additional capabilities beyond the set of plotting functions provided in the tool.

Data import. As input, Cellxgene takes an h5ad file that contains a pre-computed embedding and, optionally, additional metadata. Data upload (converting data into the correct format and data import) requires programming knowledge. 

Multi-omics technologies are not supported. But, multi-omic public datasets are available in the hosted data explorer. There is no available information on the exact species supported in Cellxgene. However, the hosted Cellxgene explorer version seems to revolve mainly around Homo Sapiens.

Data processing. Cellxgene doesn’t offer data processing steps. Users must do data processing functions such as filtering, integration, dimensionality reduction, and data normalization outside the Cellxgene platform. This again might require either previous bioinformatics knowledge or time with a single-cell bioinformatics expert. 

However, quality control metrics can be uploaded as metadata and used for filtering. To exclude outliers it’s possible to clip all continuous quality control data to a percentile range. 

Data exploration. The desktop version offers the ability to annotate cells and recompute a new embedding based on a selection of cells. Cells can be selected and subset via the s election on the embedding, gene expression cut-offs, or based on categorical metadata such as timepoint or sex. Users can also compute differentially expressed and marker genes. No trajectory analysis or cell set prediction. 

Plot types. Data visualizations are limited to a few plots such as UMAP/t-SNE, and bivariate plots in the hosted version. The plot customization and export options are also limited. Genes and gene sets can be used to color the embedding. 

Many visualization options are available with cellxgene VIP - an interactive visualization plugin.

Data sharing. Cellxgene desktop is meant to be used by researchers on their local workstations. However, private links can be shared with collaborators or manuscript reviewers who haven’t installed Cellxgene. 

Pricing. Free of charge. 

ROSALIND

Pros
- Deep data exploration available via 50+ knowledge bases
- Virtual rooms allow real-time collaboration between researchers
- Many visualization options available
Cons
- Not an open-source software
- Paid software
- Doesn’t support trajectory analysis

Image source: https://bit.ly/3yzrbrD. Copyright © 2021 ROSALIND, Inc.

ROSALIND is a cloud-based bioinformatics platform designed for life science researchers. It’s not exclusively scRNA-seq data analysis software. The software is accessible via any secure web browser.

ROSALIND data analysis platform is not open-source. 

Data import. ROSALIND accepts raw FASTQ files and processed counts data. It can process 10x Genomics Cell Ranger datasets. The software is optimized for 10x Genomics Chromium single-cell library kits. ROSALIND also supports the analysis of cell clusters created in the 10x Loupe Browser. The platform doesn’t support other single-cell technologies. However, comparisons with multi-omic data (ATAC-seq and ChIP-seq) are possible.

It’s worth noting that users need to pre-define various things before the data upload such as sample kit model, sample attributes, and analysis parameters.

ROSALIND also allows the import of public data from the National Center for Biotechnology Information, Short Read Archive, and Gene Expression Omnibus.

When it comes to scRNA-seq, ROSALIND supports the analysis of Human (Homo Sapiens), Mouse (Mus Musculus), and Rat (Rattus Norvegicus) data. 

Data processing. The quality control pipeline consists of automatic contamination detection, Q30 scores, ribosomal content, duplicate rates, gene coverage, sample correlation, and multidimensional scaling. Additionally, information on the number of cells, and average and median reads per cell are also available. ROSALIND provides Cell Ranger, Seurat, and k-means clustering methods.

However, while the plots on quality control are available, there are no apparent options for filtering and adjusting various quality control parameters. So, the data processing step can only be used to verify the experiments and each sample before beginning the interpretation.

Data exploration. The platform has integrated knowledge bases that allow for the exploration of pathways, cell types, and gene ontology. ROSALIND allows to compare cluster proportions and identify differentially expressed genes. The software also has assisted cell type identification based on found marker genes. Trajectory analysis is not available.

Important to note, that the analysis of samples requires Analysis Units that are included in specific subscriptions or can be additionally bought ($35/sample (in 50-Packs)).

Plot types. Many customizable visualization options are available – UMAP/t-SNE, heatmap, volcano plot, MA plot, box plots, and more.

Data export and sharing. The main advantage is the collaborative functions of this analysis software. ROSALIND Spaces allows collaboration between researchers through virtual data rooms that allow to interactively explore shared experiments. Every update is instantly available to each participant. Real-time activity feeds and historical reports are also available.

All plots, diagrams, source, and result files are downloadable on ROSALIND. 

Pricing. ROSALIND is a paid software with a free trial available. The pricing is found on their website. For academic researchers the price for a yearly subscription is 1,800 dollars (2 seats).

Wrap up

Even though scRNA-seq analysis can seem daunting, it’s easy for anyone without coding experience to work with single-cell data when you have an appropriate tool. In doing so, you can unlock biological insight from your datasets within a few hours, and take the next step to advance your research project quickly and easily.

Importantly, there is no universal decision when choosing a single-cell data analysis software, as all of them have their pros and cons. However, we hope you have gained some insight into the best scRNA-seq data analysis tools currently available and are more confident in choosing one for your single-cell research. 

 

Conflict of interest disclaimer.

Biomage hosts a community instance of Cellenics®, an open source, cloud-based analytics tool for single-cell RNA-sequencing data. Cellenics® has been included in our list of the best scRNA-seq data analysis tools.

Previous
Previous

Converting CSV/TSV files to upload to Cellenics®

Next
Next

How to demultiplex a Seurat object and convert it to 10X files for analysis in Cellenics®