Cell Ranger processes all Feature Barcode data through a counting pipeline that quantifies each feature in each cell. This analysis is done by the cellranger count
pipeline. The pipeline outputs a unified feature-barcode matrix that contains gene expression counts alongside Feature Barcode counts for each cell barcode. The feature-barcode matrix replaces the gene-barcode matrix emitted by older versions of Cell Ranger.
The pipeline first extracts and corrects the cell barcode and UMI from the feature library using the same methods as gene expression read processing. It then matches the Feature Barcode read against the list of features declared in the Feature Barcode Reference. The counts for each feature are available in the feature-barcode matrix output files and in the Loupe Browser output file.
To enable Feature Barcode analysis, cellranger count
needs two new inputs:
- Libraries CSV is passed to
cellranger count
with the--libraries
flag, and declares the FASTQ files and library type for each input dataset. In a typical Feature Barcode analysis, there are two input libraries: one for Single Cell Gene Expression reads, and one for Feature Barcode reads. This argument replaces the--fastqs
argument. - Feature Reference CSV is passed to
cellranger count
with the--feature-ref
flag and declares the set of Feature Barcode reagents in use in the experiment. For each unique Feature Barcode used, this file declares a feature name and identifier, the unique Feature Barcode sequence associated with this reagent, and a pattern indicating how to extract the Feature Barcode sequence from the read sequence. See Feature Barcode Reference for details on how to construct the feature reference.
After creating these two CSV files and customizing the code, run cellranger count
:
cd /home/jdoe/runs
cellranger count --id=sample345 \
--libraries=library.csv \
--transcriptome=/opt/refdata-gex-GRCh38-2020-A \
--feature-ref=feature_ref.csv
The complete set of arguments to cellranger count
are covered in manual page.
When inputting Feature Barcode data to Cell Ranger via the Libraries CSV file,you must declare the library_type
of each library. Specific values for library_type
will enable additional downstream processing, specifically for CRISPR Guide Capture and Antibody Capture. The following table outlines the types of libraries that can be specified and what they mean for downstream processing.
library_type | Description |
---|---|
Antibody Capture | For use with experiments measuring cell surface protein expression levels via an antibody and/or antigen-multimer staining assay. Enables a t-SNE projection of the cells using only the Antibody Capture / Cell Surface Protein feature counts. This projection is available in an output file and in Loupe Browser. See the Antibody Capture Algorithm page for more details. |
CRISPR Guide Capture | Enables analysis of gene expression changes caused by the presence of CRISPR perturbations, in a Perturb-Seq style assay. See the CRISPR Guide Capture Algorithm page for more details. This mode also creates a t-SNE projection using only the CRISPR guide counts. This projection is available in an output file and in Loupe Browser. |
Antigen Capture | Only applicable to Barcode Enabled Antigen Mapping (BEAM) libraries. Described on the 5' Immune Profiling section of the software support documentation |
Custom | Provides processing of the Feature Barcode reads and a basic summary of the sequencing quality and library quality, but performs no special processing of the Feature Barcode counts. |
The Libraries CSV file declares the input FASTQ data for the libraries that make up a Feature Barcode experiment. This will include one library containing Single Cell Gene Expression reads, and one or more libraries containing Feature Barcode reads. To use cellranger count
in Feature Barcode mode, you must create a Libraries CSV file and pass it with the --libraries
flag. The following table describes what should be in the Libraries CSV file.
Column Name | Description |
---|---|
fastqs | A fully qualified path to the directory containing the demultiplexed FASTQ files for this sample. Analogous to the --fastqs arg to cellranger count. This field does not accept comma-delimited paths. If you have multiple sets of FASTQs for this library, add a row and use the same library_type value. |
sample | Same as the --sample arg to cellranger count . Sample name assigned in the bcl2fastq sample sheet. |
library_type | Must match a valid library type as described in the Library/Feature Types section. FASTQ data is interpreted from rows in the Feature Reference file. The algorithm matches feature_type from the Feature Reference CSV with library_type . This field is case-sensitive. Must be Gene Expression for the Single Cell Gene Expression libraries (same for Targeted Gene Expression). For Feature Barcode libraries, must be one of Custom , Antibody Capture (for Cell Surface Protein), or CRISPR Guide Capture . Use Antibody Capture (for TotalSeq™-C). |
This section has a few example Libraries CSVs. Copy+Paste the most relevant example into a file, customize it for your experiment, and save it as a CSV. Alternatively, you may download this Libraries CSV template and customize it. Be sure to use the correct full path to your FASTQ files.
Gene expression + CRISPR Guide Capture libraries. In this example, we have demultiplexed sequencing data from two libraries named GEX_sample1
and CRISPR_sample1
on the bcl2fastq
/bcl-convert
/mkfastq
sample sheet. This generated two FASTQ files named GEX_sample1_S0_L001_R1_001.fastq.gz
and CRISPR_sample1_S0_L001_R1_001.fastq.gz
in the path /opt/foo
(be sure to use the correct full path to your FASTQ files). We pass the FASTQ sample names and paths to Cell Ranger with the appropriate library types:
fastqs,sample,library_type,
/opt/foo/,GEX_sample1,Gene Expression,
/opt/foo/,CRISPR_sample1,CRISPR Guide Capture,
Gene Expression + Antibody Capture + CRISPR Guide Capture libraries. In this example, we have demultiplexed sequencing data from three libraries named GEX_sample3, Ab_sample3, and CRISPR_sample3 on the bcl2fastq
/ bcl-convert
/ mkfastq
sample sheet. The result is three FASTQ files named GEX_sample3_S0_L001_R1_001.fastq.gz
, Ab_sample3_S0_L001_R1_001.fastq.gz
, and CRISPR_sample3_S0_L001_R1_001.fastq.gz
in the path /opt/foo
(be sure to use the correct full path to your FASTQ files). We pass the FASTQ sample names to Cell Ranger with the appropriate library types:
fastqs,sample,library_type,
/opt/foo/,GEX_sample3,Gene Expression,
/opt/foo/,Ab_sample3,Antibody Capture,
/opt/foo/,CRISPR_sample3,CRISPR Guide Capture,
Gene Expression + Antibody Capture + Antigen-multimer staining (TotalSeq™-C). In this example, we have demultiplexed sequencing data from three libraries named GEX_sample4, Ab_sample4, and Ag_sample4 on the bcl2fastq
/ bcl-convert
/ mkfastq
sample sheet. The result is three FASTQ files named GEX_sample4_S0_L001_R1_001.fastq.gz
, Ab_sample4_S0_L001_R1_001.fastq.gz
, and Ag_sample4_S0_L001_R1_001.fastq.gz
in the path /opt/foo
(be sure to use the correct full path to your FASTQ files). We pass the FASTQ sample names to Cell Ranger with the appropriate library types:
fastqs,sample,library_type,
/opt/foo/,GEX_sample4,Gene Expression,
/opt/foo/,Ab_sample4,Antibody Capture,
/opt/foo/,Ag_sample3,Antibody Capture,
A Feature Reference CSV file is required when processing Feature Barcode data. It declares the molecule structure and unique Feature Barcode sequence of each feature present in your experiment. Each line of the CSV declares one unique Feature Barcode. The Feature Reference CSV file is passed to cellranger count
with the --feature-ref
flag or to cellranger multi
in the [feature]
section of the multi config CSV file. Please note that the CSV may not contain characters outside of the ASCII range.
This table describes the columns in the Feature Reference CSV file. Example files can be found below.
Column Name | Description |
---|---|
id | Unique ID used to track feature counts. May only include ASCII characters and must not use whitespace, slash, quote, or comma characters. Each ID must be unique and must not collide with a gene identifier from the transcriptome. |
name | Human-readable name for this feature. May only include ASCII characters and must not use whitespace, slash, quote, or comma characters. This name will be displayed in the Loupe Browser Active Feature list. |
read | Specifies which RNA sequencing read contains the Feature Barcode sequence. Must be R1 or R2 . Note: in most cases R2 is the correct read. |
pattern | Specifies how to extract the Feature Barcode sequence from the read. See the Barcode Extraction Pattern section below for details. |
sequence | Nucleotide barcode sequence associated with this feature. E.g., antibody barcode or sgRNA protospacer sequence. |
feature_type | Type of the feature. See the Library/Feature Types section for details on the allowed values for this field. FASTQ data specified in the Library CSV file with a library_type that matches the feature_type will be scanned for occurrences of this feature. Each feature type in the feature reference must match a library_type entry in the Libraries CSV file. This field is case-sensitive. |
mhc_allele | Only relevant for BEAM-T (TCR Antigen Capture). Defines the MHC allele associated with each antigen included in the experiment. See the Feature Reference section on the Antigen Capture page for more details. |
The pattern
field of the feature reference defines how to locate the Feature Barcode within a read. The Feature Barcode may appear at a known offset with respect to the start or end of the read or may appear at a fixed position relative to a known anchor sequence. The pattern
column can be made up of a combination of these elements:
- 5P: denotes the beginning of the read sequence. May appear zero or one time, and must be at the beginning of the pattern. Only 5P or 3P may appear, not both (^ may be used instead of 5P).
- 3P: denotes the end of the read sequence. May appear zero or one times, and must be at the end of the pattern ($ may be used instead of 3P).
- N: denotes an arbitrary base.
- A, C, G, T: denotes a fixed base that must match the read sequence exactly.
- (BC): denotes the Feature Barcode sequence as specified in the
sequence
column of the feature reference. Must appear exactly once in the pattern.
Any constant sequences made up of A, C, G, and T in the pattern must match exactly in the read sequence. Any N in the pattern is allowed to match a single arbitrary base. A modest number of fixed bases should be used to minimize the chance of a sequencing error disrupting the match. The fixed sequence should also be long enough to uniquely identify the position of the Feature Barcode. For feature types that require a non-N anchor, we recommend 12bp-20bp of constant sequence.
The extracted Feature Barcode sequence is aligned to the feature reference and up to one base mismatch is allowed. The extracted Feature Barcode sequences are corrected up to a Hamming distance of one base with the 10x Genomics barcode correction algorithm for correcting cell barcodes.
TotalSeq™-B is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 3' v3 assay. The Feature Barcode sequence appears at a fixed position (10th base) in the R2 read.
read | pattern |
---|---|
R2 | 5PNNNNNNNNNN(BC) |
-
Example TotalSeq™-B Feature Reference CSV (Please refer to BioLegend for the latest conjugated Feature Barcode information.)
TotalSeq™-C is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 5' assay. The Feature Barcode sequence appears at a fixed position (10th base) in the R2 read.
read | pattern |
---|---|
R2 | 5PNNNNNNNNNN(BC) |
-
Example TotalSeq™-C Feature Reference CSV (Please refer to BioLegend for the latest conjugated Feature Barcode information.)
The feature reference for Immudex's dMHC Dextramer® libraries with dCODE Dextramers has the same feature barcode pattern as TotalSeq™-C. Use "Antibody Capture" in the feature_type
column for dextramer or multimer reagents. Therefore, the samefeature reference example for TotalSeq™-C can also be used for MHC Dextramer® libraries.
To analyze Barcode Enabled Antigen Mapping (BEAM) libraries, visit the corresponding 5' Immune Profiling page.
TotalSeq™-A is a line of antibody-oligonucleotide conjugates supplied by BioLegend that are compatible with the Single Cell 3' v2 and Single Cell 3' v3 kits. The Feature Barcode sequence appears at the start of the R2 read.
Although TotalSeq™-A can be used with the CITE-Seq assay, CITE-Seq is not a 10x Genomics-supported assay. Please contact New York Genome Center or BioLegend for assistance with the assay or software.
read | pattern |
---|---|
R2 | 5P(BC) |
- Example TotalSeq™-A Feature Reference CSV (Please refer to BioLegend for the latest conjugated Feature Barcode information.)
In CRISPR Guide Capture assays, the Feature Barcode sequence is the CRISPR protospacer sequence. The protospacer is followed by a downstream constant sequence in the guide RNA which is used as an anchor to identify the location of the protospacer. We recommend using a 12bp-20bp constant sequence that can be uniquely identified but is short enough that it is unlikely to be disrupted by a sequencing error.
The example Feature Reference CSV files list six guide RNA features, each with six distinct barcode/protospacer sequences (sequence
column). The pattern
column has the same pattern for all six features. We use the target_gene_id
and target_gene_name
columns to declare the target gene of each guide RNA, for use in downstream CRISPR perturbation analysis. Two guides are declared with target_gene_id
as Non-Targeting
. Cells containing Non-Targeting
guides will be used as controls for CRISPR perturbation analysis. The four remaining guides target two genes.
Read | Pattern | Assay | Example |
---|---|---|---|
R2 | (BC)GTTTAAGAGCTAAGCTGGAA | 3’ Gene Expression with Feature Barcode | Download 3' CSV |
R2 | TTCCAGCATAGCTCTTAAAC(BC) | 5’ Gene Expression with Feature Barcode | Download 5' CSV |