Data security in the Biomage-hosted community instance of Cellenics®: frequently asked questions

We often get asked about data security in the community instance of Cellenics® that’s hosted by Biomage (at https://scp.biomage.net/). We understand that data security is important to you. For example, you might have unpublished data or clinical data that is particularly sensitive. This article aims to answer some of the questions we’ve heard most frequently.

It is worth noting that we are a team of scientists and software engineers - none of us is a security expert. We have added answers to the questions below to the best of our knowledge as non-experts in security and with advice from external experts in this area.

When I upload data to the Biomage-hosted community instance of Cellenics®, where is it stored?

All scientific data files that are uploaded to the community instance of Cellenics® that’s hosted by Biomage (located at scp.biomage.net) are stored in the cloud, on Amazon Web Services (AWS). The AWS servers used by this instance of Cellenics® are physically located in Ireland in the European Union (the region name is eu-west-1).

For data storage, we use an AWS service, called S3 buckets. We store the original files (i.e. the barcodes, features and matrix files), filtered versions of the original files (in the form of .rds objects) after Data Processing is run, along with the cell annotations (this includes the Louvain clusters, all annotated clusters as well as custom cell sets). We also store the data used to draw the plots and populate the tables in Data Processing, Data Exploration and Plots and Tables. For the latter two modules data is automatically deleted after a period of user inactivity (within several days of last user interaction with the respective plot or table).

Who has access to my data in the Biomage-hosted community instance of Cellenics®?

Your scientific data is not accessible by or shared with other users of the Biomage-hosted instance of Cellenics® or with third parties, unless with your specific permission.

You may grant access to your project(s) including the data and metadata therein to other Cellenics® users via the project ‘Share’ feature in the Data Management module. The sharing of your scientific data is your responsibility - Biomage does not accept responsibility for any change, loss or inadvertent disclosure of your scientific data in these circumstances.

Members of the Biomage team have access to the data files and associated metadata that you upload to the Biomage-hosted instance of Cellenics®. Biomage uses this data for Cellenics® development, debugging purposes, and improvement of our proprietary processes, algorithms and machine learning models. Biomage might monetise the machine learning models. Full details are available in our Privacy Policy.

What about clinical data - is it ok to upload clinical data to the Biomage-hosted instance of Cellenics®?

We are not aware of any restrictions on uploading clinical scRNA-seq data to the community instance of Cellenics® that is hosted by Biomage.

We’ve heard that some users are concerned about sequencing data being traced back to the donor. Unlike genetic data (DNA sequences) which is unique to an individual and highly personal, transcriptomics data is by comparison fairly generic. In most studies, transcriptomics data cannot be tracked back to the individual. This general statement may not apply in cases of rare genetic disorders, where only a handful individuals in a population lack expression of specific genes or exhibit a particularly unusual and identifiable signature. 

One important point to note, however, is that it is your responsibility as a user of the Biomage-hosted community instance of Cellenics® to ensure that you do not upload any metadata (e.g. patient name, date of birth, etc.) that could result in the patient being identified. In other words, as the data owner, it’s your responsibility to ensure that the data and metadata uploaded to the Biomage instance of Cellenics® are anonymized appropriately so as not to identify individuals or patients who donated the samples to your project.

It’s also worth considering if your institution has specific rules for clinical data. For example, we have heard that some institutions enforce strong restrictions on the ‘sharing’ of clinical data. In some cases, this includes upload of clinical data to the cloud. If your institution restricts you from uploading clinical data to an ‘external’ cloud (such as the Biomage-hosted instance of Cellenics®), it’s worth considering our deployment services (see section below on ‘What can I do if my company/institution wants to use an instance of Cellenics® that’s more secure? What additional security can you offer?’). In any case, please do reach out to us to tell us about any restrictions that are enforced by your institution!

Are there national/international guidelines about data security in the cloud?

Yes, there are. A detailed list of most popular international standards is available here. AWS adheres to most of them (see their compliance page to see which standards they follow exactly). Let us know what your organization’s requirements are, if there are any!

I’ve heard that Cellenics® runs in AWS. What implications does this have on how you approach security?

AWS follows the shared responsibility model. This means that AWS is responsible for protecting the infrastructure that runs all of the services offered in the AWS Cloud. AWS takes this responsibility seriously and supports more security standards and compliance certifications than any other cloud provider (for more details, read the AWS Compliance page).

A big benefit of using AWS is that we have access to a data center and network architecture that are built to meet the requirements of the most security-sensitive organizations (read more about this here). Hence, there is no need for our team to maintain facilities and hardware ourselves - we have more time to focus on supporting you and building new features.

We follow our part from the shared responsibility model by applying AWS-recommended best practices when configuring the platform. In particular, we encrypt the data stored in S3,  we use AWS-managed services where possible (for example, we use AWS Cognito to manage user accounts; we use rds with Aurora for our sql database to manage experiment details). In addition, the Biomage-hosted instance of Cellenics® gets periodic reviews from an external security consulting company and we take the time to address any important recommendations.

What else is there to increase the security of my account and data in the Biomage-hosted community instance of Cellenics®?

All experiments hosted in the Biomage instance of Cellenics® are isolated from one another - we store each data file under a different “folder” in S3 and we start a separate backend server (we use an EC2 instance for this) for your data analysis. That server is not shared with any other experiment. Once your session is finished, the instance is recycled and all data that was loaded into its memory is deleted.

What can I do to increase security of my account and data in the Biomage-hosted community instance of Cellenics®?

You can do a lot! In fact, most of our efforts to secure your account and data won’t matter, unless you follow good password and account management hygiene. Here is in particular what you can do to help us protect your account and data in the Biomage-hosted instance of Cellenics®.

Passwords. Make sure that you set a strong password when signing up to the Biomage-hosted instance of Cellenics®. Remember to rotate the password periodically (the general recommendation is every 3 months). Use a trusted password manager for your passwords. There are plenty of resources online about what setting a strong password means, here is one example.

One can’t steal what doesn’t exist. Delete data from the Biomage-hosted instance of Cellenics® that you no longer need or use.

Follow the principle of least privilege. Don’t share data with people who don’t need access to it. Make sure you trust the person you are sharing the data with.   Periodically review your projects and remove access to people who no longer need the data.

It is your responsibility to follow good password and account management hygiene. If you have any further questions, don’t hesitate to reach out. We are here to help!

What can I do if my company/institution wants to use an instance of Cellenics® that’s more secure? What additional security can you offer?

We (or you - the code is open source after all!) can install a separate deployment of Cellenics® to an AWS account owned by your organization. With such a setup, you can configure several additional security features such as a private URL, access only via a specified VPN, as well as user account control (single sign on or restriction of account creation).

If you’d like the Biomage team to support you with setting up a Cellenics® deployment, more information about our services can be found on our website: https://www.biomage.net/software-services. Or reach out to us directly via the community forum (https://community.biomage.net/) to arrange a call.

Take home message

We do our best to ensure the security of your data and account in the Biomage-hosted instance of Cellenics®. We are not security experts and we rely on the shared responsibility model with AWS, on following AWS's best practices when building new features and on advice given to us by security experts.

The vast majority of academic users are comfortable with using the community instance of Cellenics® that’s hosted by Biomage. Most industry partners, on the other hand, choose to have their own deployment of Cellenics® that has additional privacy and security settings. These may include a private URL, access only via specified VPN, and controlled access (e.g. only a specified Admin can create new user accounts). Often, these additional security measures are important to biopharma customers but are unnecessary or even impractical for academic institutions.

We have aimed to answer the most frequently heard questions. However, if there is something that we haven’t covered in this article, please get in touch via the community forum (https://community.biomage.net/) – we will be happy to help!

Previous
Previous

Leveraging Public scRNA-seq Data: A Guide to Repositories and Resources

Next
Next

Cellenics® now supports data generated using BD Rhapsody™ technology