The cellranger-arc count
pipeline requires ATAC and GEX FASTQ files as input, which typically come from running cellranger-arc mkfastq
, a 10x Genomics-aware convenience wrapper for bcl2fastq
. However, it is possible to use FASTQ files from other sources, such as Illumina's bcl2fastq or BCL Convert, a published dataset, or the 10x Genomics bamtofastq tool. Input FASTQ files must conform to the naming conventions of bcl2fastq
and mkfastq
for cellranger-arc count
to successfully complete. These files are specified using a libraries CSV file and passed to the cellranger-arc count
pipeline using the --libraries
argument.
The cellranger-arc count
pipeline can process data from one Multiome ATAC library and one Multiome GEX library, each of which could be sequenced on multiple flow cells. Multi-library analysis is not possible at this time. cellranger-arc count
must not be used to process GEX or ATAC data alone.
There are multiple ways bcl2fastq
, bcl-convert
> and mkfastq
can be invoked, resulting in a wide range of potential file names and locations as output. Since finding the right FASTQ files to process and the right arguments to process those files as desired can be confusing, we will illustrate some common scenarios below.
To serve as inputs for Cell Ranger ARC, FASTQ files should conform to the naming conventions of bcl2fastq
and mkfastq
described below.
[Sample Name]
S1_L00[Lane Number]
[Read Type]
_001.fastq.gz
Where Read Type
is one of:
I1
: Dual index i7 read (optional)I2
: Dual index i5 read (optional)R1
: Read 1R2
: Read 2
[Sample Name]
S1_L00[Lane Number]
[Read Type]
_001.fastq.gz
Where Read Type
is one of:
I1
: Dual index i7 read (optional)R1
: Read 1R2
: Dual index i5 readR3
: Read 2
Cell Ranger ARC will also accept ATAC FASTQs in this format:
I1
: Dual index i7 read (optional)R1
: Read 1I2
: Dual index i5 readR2
: Read 2
Where are your GEX FASTQ files?
-
In an output folder from
cellranger-arc mkfastq
orbcl2fastq
(fastq_path
) and: -
In a different folder:
How are your GEX FASTQ files named?
How did I get here?
By running cellranger-arc mkfastq
with a simple CSV layout file or Illumina Experiment Manager samplesheet, or by running bcl2fastq directly (with an IEM samplesheet) on a flow cell.
Your files will be in a (MKFASTQ_ID)/outs/fastq_path
folder, and the file hierarchy may look similar to this:
MKFASTQ_ID
|-- MAKE_FASTQS_CS
`-- outs
|-- fastq_path
|-- HFLC5BBXX
|-- test_sample1
| |-- test_sample1_S1_L001_I1_001.fastq.gz
| |-- test_sample1_S1_L001_I2_001.fastq.gz
| |-- test_sample1_S1_L001_R1_001.fastq.gz
| |-- test_sample1_S1_L001_R2_001.fastq.gz
| |-- test_sample1_S1_L002_I1_001.fastq.gz
| |-- test_sample1_S1_L002_I2_001.fastq.gz
| |-- test_sample1_S1_L002_R1_001.fastq.gz
| |-- test_sample1_S1_L002_R2_001.fastq.gz
| |-- test_sample1_S1_L003_I1_001.fastq.gz
| |-- test_sample1_S1_L003_I2_001.fastq.gz
| |-- test_sample1_S1_L003_R1_001.fastq.gz
| `-- test_sample1_S1_L003_R2_001.fastq.gz
|-- test_sample2
| |-- test_sample2_S2_L001_I1_001.fastq.gz
| |-- test_sample2_S2_L001_I2_001.fastq.gz
| |-- test_sample2_S2_L001_R1_001.fastq.gz
| |-- test_sample2_S2_L001_R2_001.fastq.gz
| |-- test_sample2_S2_L002_I1_001.fastq.gz
| |-- test_sample2_S2_L002_I2_001.fastq.gz
| |-- test_sample2_S2_L002_R1_001.fastq.gz
| |-- test_sample2_S2_L002_R2_001.fastq.gz
| |-- test_sample2_S2_L003_I1_001.fastq.gz
| |-- test_sample2_S2_L003_I2_001.fastq.gz
| |-- test_sample2_S2_L003_R1_001.fastq.gz
| `-- test_sample2_S2_L003_R2_001.fastq.gz
|-- Reports
|-- Stats
|-- Undetermined_S0_L001_I1_001.fastq.gz
...
`-- Undetermined_S0_L003_R2_001.fastq.gz
Your file hierarchy may look similar to this:
BCL2FASTQ_OUTPUT_DIR
|-- HFLC5BBXX
|-- test_sample1
| |-- test_sample1_S1_L001_I1_001.fastq.gz
| |-- test_sample1_S1_L001_I2_001.fastq.gz
| |-- test_sample1_S1_L001_R1_001.fastq.gz
| |-- test_sample1_S1_L001_R2_001.fastq.gz
| |-- test_sample1_S1_L002_I1_001.fastq.gz
| |-- test_sample1_S1_L002_I2_001.fastq.gz
| |-- test_sample1_S1_L002_R1_001.fastq.gz
| |-- test_sample1_S1_L002_R2_001.fastq.gz
| |-- test_sample1_S1_L003_I1_001.fastq.gz
| |-- test_sample1_S1_L003_I2_001.fastq.gz
| |-- test_sample1_S1_L003_R1_001.fastq.gz
| `-- test_sample1_S1_L003_R2_001.fastq.gz
|-- test_sample2
| |-- test_sample2_S2_L001_I1_001.fastq.gz
| |-- test_sample2_S2_L001_I2_001.fastq.gz
| |-- test_sample2_S2_L001_R1_001.fastq.gz
| |-- test_sample2_S2_L001_R2_001.fastq.gz
| |-- test_sample2_S2_L002_I1_001.fastq.gz
| |-- test_sample2_S2_L002_I2_001.fastq.gz
| |-- test_sample2_S2_L002_R1_001.fastq.gz
| |-- test_sample2_S2_L002_R2_001.fastq.gz
| |-- test_sample2_S2_L003_I1_001.fastq.gz
| |-- test_sample2_S2_L003_I2_001.fastq.gz
| |-- test_sample2_S2_L003_R1_001.fastq.gz
| `-- test_sample2_S2_L003_R2_001.fastq.gz
...
You will have one set of fastq files per sample, prefixed with the name of the sample as it appears in the simple CSV layout file or IEM samplesheet.
For more information on the naming conventions, please visit Illumina's support site or refer to the bcl2fastq User Guide. The scenario where your files do not conform to the naming convention is described in a different section later on this page.
The table below describes the line in the libraries CSV file you would use in the corresponding scenario. Be sure to substitute the capitalized text as appropriate. The "All Samples" entries in this table are provided for technical completeness.
Situation | Line in libraries CSV |
---|---|
All samples (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,,Gene Expression ... |
All samples (mkfastq), multiple flow cells | fastqs,sample,library_type /PATH/TO/MKFASTQ_FLOWCELL1/outs/fastq_path,,Gene Expression /PATH/TO/MKFASTQ_FLOWCELL2/outs/fastq_path,,Gene Expression ... |
All samples (bcl2fastq direct) | fastqs,sample,library_type /PATH/TO/BCL2FASTQ_OUTPUT_DIR,,Gene Expression ... |
Process test_sample1 (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,test_sample1,Gene Expression ... |
Process test_sample1 and test_sample2 as a single merged sample (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,test_sample1,Gene Expression /PATH/TO/MKFASTQ_ID/outs/fastq_path,test_sample2,Gene Expression ... |
How did I get here?
An Illumina Experiment Manager-formatted samplesheet was used with either no entry or a blank entry for the Sample_Project
column. Your hierarchy may look similar to this:
fastq_path
|-- Reports
|-- Stats
|-- test_sample_S1_L001_I1_001.fastq.gz
|-- test_sample_S1_L001_I2_001.fastq.gz
|-- test_sample_S1_L001_R1_001.fastq.gz
|-- test_sample_S1_L001_R2_001.fastq.gz
|-- test_sample_S1_L002_I1_001.fastq.gz
|-- test_sample_S1_L002_I2_001.fastq.gz
|-- test_sample_S1_L002_R1_001.fastq.gz
|-- test_sample_S1_L002_R2_001.fastq.gz
|-- test_sample_S1_L003_I1_001.fastq.gz
|-- test_sample_S1_L003_I2_001.fastq.gz
|-- test_sample_S1_L003_R1_001.fastq.gz
|-- test_sample_S1_L003_R2_001.fastq.gz
|-- Undetermined_S0_L001_I1_001.fastq.gz
...
`-- Undetermined_S0_L003_R2_001.fastq.gz
This is fine; you would use the same arguments as if the FASTQs were organized into subfolders within the output folder.
Situation | Line in libraries CSV |
---|---|
All samples (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,,Gene Expression ... |
All samples (bcl2fastq direct) | fastqs,sample,library_type /PATH/TO/BCL2FASTQ_OUTPUT_DIR,,Gene Expression ... |
Process test_sample only (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,test_sample,Gene Expression ... |
How did I get here?
It is likely that FASTQ files have been transferred from either a mkfastq
or bcl2fastq
run into another folder. They still retain the names assigned by bcl2fastq
, which is a combination of sample name, sample order, lane, read type, and chunk. Your file hierarchy may look like this:
PROJECT_FOLDER
|-- MySample_S1_L001_I1_001.fastq.gz
|-- MySample_S1_L001_I2_001.fastq.gz
|-- MySample_S1_L001_R1_001.fastq.gz
|-- MySample_S1_L001_R2_001.fastq.gz
|-- MySample_S1_L002_I1_001.fastq.gz
|-- MySample_S1_L002_I2_001.fastq.gz
|-- MySample_S1_L002_R1_001.fastq.gz
|-- MySample_S1_L002_R2_001.fastq.gz
This is fine; since the files are named according to the bcl2fastq
standard, you would use the same arguments as if the FASTQs were organized into a flow cell folder or mkfastq
output folder.
How did I get here?
It is likely that you received files that were processed through a proprietary LIMS system, which employs its own naming conventions.
10x Genomics pipelines require files to be named in the bcl2fastq
convention in order to run properly. You will need to determine the corresponding sample and read type for each file, likely by consulting your sequencing core or the individual who demultiplexed your flow cell.
It is highly likely that these files were initially processed with bcl2fastq
. Once you track the origin of the file, you will rename the files in the following format:
[Sample Name]
S1_L00[Lane Number]
[Read Type]
_001.fastq.gz
Where Read Type
is one of:
I1
: Dual index i7 read (optional)I2
: Dual index i5 read (optional)R1
: Read 1R2
: Read 2
After the files have been renamed in the specified format, you will use the following arguments:
Situation | Line in libraries CSV |
---|---|
All samples | fastqs,sample,library_type /PATH/TO/PROJECT_FOLDER,,Gene Expression ... |
Process SAMPLENAME only | fastqs,sample,library_type /PATH/TO/PROJECT_FOLDER,SAMPLENAME,Gene Expression ... |
Where are your ATAC FASTQ files?
-
In an output folder from
cellranger-arc mkfastq
orbcl2fastq
(fastq_path
) and: -
In a different folder:
How are your ATAC FASTQ files named?
How did I get here?
By running cellranger-arc mkfastq
with a simple CSV layout file or Illumina Experiment Manager samplesheet, or by running bcl2fastq
directly (with an IEM samplesheet) on a flow cell.
Your files will be in a (MKFASTQ_ID)/outs/fastq_path
folder, and your file hierarchy may look similar to this:
MKFASTQ_ID
|-- MAKE_FASTQS_CS
`-- outs
|-- fastq_path
|-- HFLC5BBXX
|-- test_sample1
| |-- test_sample1_S1_L001_I1_001.fastq.gz
| |-- test_sample1_S1_L001_R1_001.fastq.gz
| |-- test_sample1_S1_L001_R2_001.fastq.gz
| |-- test_sample1_S1_L001_R3_001.fastq.gz
| |-- test_sample1_S1_L002_I1_001.fastq.gz
| |-- test_sample1_S1_L002_R1_001.fastq.gz
| |-- test_sample1_S1_L002_R2_001.fastq.gz
| |-- test_sample1_S1_L002_R3_001.fastq.gz
| |-- test_sample1_S1_L003_I1_001.fastq.gz
| |-- test_sample1_S1_L003_R1_001.fastq.gz
| |-- test_sample1_S1_L003_R2_001.fastq.gz
| `-- test_sample1_S1_L003_R3_001.fastq.gz
|-- test_sample2
| |-- test_sample2_S1_L001_I1_001.fastq.gz
| |-- test_sample2_S1_L001_R1_001.fastq.gz
| |-- test_sample2_S1_L001_R2_001.fastq.gz
| |-- test_sample2_S1_L001_R3_001.fastq.gz
| |-- test_sample2_S1_L002_I1_001.fastq.gz
| |-- test_sample2_S1_L002_R1_001.fastq.gz
| |-- test_sample2_S1_L002_R2_001.fastq.gz
| |-- test_sample2_S1_L002_R3_001.fastq.gz
| |-- test_sample2_S1_L003_I1_001.fastq.gz
| |-- test_sample2_S1_L003_R1_001.fastq.gz
| |-- test_sample2_S1_L003_R2_001.fastq.gz
| `-- test_sample2_S1_L003_R3_001.fastq.gz
|-- Reports
|-- Stats
|-- Undetermined_S0_L001_I1_001.fastq.gz
...
`-- Undetermined_S0_L003_R3_001.fastq.gz
Your file hierarchy may look similar to this:
BCL2FASTQ_OUTPUT_DIR
|-- HFLC5BBXX
|-- test_sample1
| |-- test_sample1_S1_L001_I1_001.fastq.gz
| |-- test_sample1_S1_L001_R1_001.fastq.gz
| |-- test_sample1_S1_L001_R2_001.fastq.gz
| |-- test_sample1_S1_L001_R3_001.fastq.gz
| |-- test_sample1_S1_L002_I1_001.fastq.gz
| |-- test_sample1_S1_L002_R1_001.fastq.gz
| |-- test_sample1_S1_L002_R2_001.fastq.gz
| |-- test_sample1_S1_L002_R3_001.fastq.gz
| |-- test_sample1_S1_L003_I1_001.fastq.gz
| |-- test_sample1_S1_L003_R1_001.fastq.gz
| |-- test_sample1_S1_L003_R2_001.fastq.gz
| `-- test_sample1_S1_L003_R3_001.fastq.gz
|-- test_sample2
| |-- test_sample2_S1_L001_I1_001.fastq.gz
| |-- test_sample2_S1_L001_R1_001.fastq.gz
| |-- test_sample2_S1_L001_R2_001.fastq.gz
| |-- test_sample2_S1_L001_R3_001.fastq.gz
| |-- test_sample2_S1_L002_I1_001.fastq.gz
| |-- test_sample2_S1_L002_R1_001.fastq.gz
| |-- test_sample2_S1_L002_R2_001.fastq.gz
| |-- test_sample2_S1_L002_R3_001.fastq.gz
| |-- test_sample2_S1_L003_I1_001.fastq.gz
| |-- test_sample2_S1_L003_R1_001.fastq.gz
| |-- test_sample2_S1_L003_R2_001.fastq.gz
| `-- test_sample2_S1_L003_R3_001.fastq.gz
...
You will have one set of fastq files per sample, prefixed with the name of the sample as it appears in the simple CSV layout file or IEM samplesheet. Other situations described later on this page deal with the presence of four separate sets of files (four "samples" from bcl2fastq's point of view) per single biological sample/library.
For more information on the naming conventions, please visit Illumina's support site or refer to the bcl2fastq User Guide. The scenario where your files do not conform to the naming convention is described in a different section later on this page.
The table below describes the line in the libraries CSV file you would use in the corresponding scenario. Be sure to substitute the capitalized text as appropriate. The "All Samples" entries in this table are provided for technical completeness.
Situation | Line in libraries CSV |
---|---|
All samples (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,,Chromatin Accessibility ... |
All samples (mkfastq), multiple flow cells | fastqs,sample,library_type /PATH/TO/MKFASTQ_FLOWCELL1/outs/fastq_path,,Chromatin Accessibility /PATH/TO/MKFASTQ_FLOWCELL2/outs/fastq_path,,Chromatin Accessibility ... |
All samples (bcl2fastq direct) | fastqs,sample,library_type /PATH/TO/BCL2FASTQ_OUTPUT_DIR,,Chromatin Accessibility ... |
Process test_sample1 (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,test_sample1,Chromatin Accessibility ... |
Process test_sample1 and test_sample2 as a single merged sample (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,test_sample1,Chromatin Accessibility /PATH/TO/MKFASTQ_ID/outs/fastq_path,test_sample2,Chromatin Accessibility ... |
How did I get here?
It is likely that the input samplesheet used explicitly separated the four oligos in a 10x Genomics sample index set into four separate sample names. You may see a file hierarchy similar to this:
bcl2fastq_output
|-- HFLC5BBXX
|-- SI-GA-A1_1
| |-- SI-GA-A1_1_S1_L001_I1_001.fastq.gz
| |-- SI-GA-A1_1_S1_L001_R1_001.fastq.gz
| |-- SI-GA-A1_1_S1_L001_R2_001.fastq.gz
| `-- SI-GA-A1_1_S1_L001_R3_001.fastq.gz
|-- SI-GA-A1_2
| |-- SI-GA-A1_2_S2_L001_I1_001.fastq.gz
| |-- SI-GA-A1_2_S2_L001_R1_001.fastq.gz
| |-- SI-GA-A1_2_S2_L001_R2_001.fastq.gz
| `-- SI-GA-A1_2_S2_L001_R3_001.fastq.gz
|-- SI-GA-A1_3
| |-- SI-GA-A1_3_S3_L001_I1_001.fastq.gz
| |-- SI-GA-A1_3_S3_L001_R1_001.fastq.gz
| |-- SI-GA-A1_3_S3_L001_R2_001.fastq.gz
| `-- SI-GA-A1_3_S3_L001_R3_001.fastq.gz
|-- SI-GA-A1_4
| |-- SI-GA-A1_4_S4_L001_I1_001.fastq.gz
| |-- SI-GA-A1_4_S4_L001_R1_001.fastq.gz
| |-- SI-GA-A1_4_S4_L001_R2_001.fastq.gz
| `-- SI-GA-A1_4_S4_L001_R3_001.fastq.gz
|-- Reports
|-- Stats
|-- Undetermined_S0_L001_I1_001.fastq.gz
|-- Undetermined_S0_L001_R1_001.fastq.gz
|-- Undetermined_S0_L001_R2_001.fastq.gz
`-- Undetermined_S0_L001_R3_001.fastq.gz
You probably want to be able to merge All samples from the SI-GA-A1
index into a single analysis. If you only run one index at a time, you will see a smaller number of reads than expected, which may translate to lower than expected coverage or cell count for the experiment.
Situation | Line in libraries CSV |
---|---|
All samples (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,,Chromatin Accessibility ... |
Process all SI-GA-A1 reads in a single analysis | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,SI-GA-A1_1,Chromatin Accessibility /PATH/TO/MKFASTQ_ID/outs/fastq_path,SI-GA-A1_2,Chromatin Accessibility /PATH/TO/MKFASTQ_ID/outs/fastq_path,SI-GA-A1_3,Chromatin Accessibility /PATH/TO/MKFASTQ_ID/outs/fastq_path,SI-GA-A1_4,Chromatin Accessibility ... |
Only process first sample index | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,SI-GA-A1_1,Chromatin Accessibility ... |
How did I get here?
An Illumina Experiment Manager-formatted samplesheet was used with either no entry or a blank entry for the Sample_Project
column. Your hierarchy may look similar to this:
fastq_path
|-- Reports
|-- Stats
|-- test_sample_S1_L001_I1_001.fastq.gz
|-- test_sample_S1_L001_R1_001.fastq.gz
|-- test_sample_S1_L001_R2_001.fastq.gz
|-- test_sample_S1_L001_R3_001.fastq.gz
|-- test_sample_S1_L002_I1_001.fastq.gz
|-- test_sample_S1_L002_R1_001.fastq.gz
|-- test_sample_S1_L002_R2_001.fastq.gz
|-- test_sample_S1_L002_R3_001.fastq.gz
|-- test_sample_S1_L003_I1_001.fastq.gz
|-- test_sample_S1_L003_R1_001.fastq.gz
|-- test_sample_S1_L003_R2_001.fastq.gz
|-- test_sample_S1_L003_R3_001.fastq.gz
|-- Undetermined_S0_L001_I1_001.fastq.gz
...
`-- Undetermined_S0_L003_R3_001.fastq.gz
This is fine; you would use the same arguments as if the FASTQs were organized into subfolders within the output folder.
Situation | Line in libraries CSV |
---|---|
All samples (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,,Chromatin Accessibility ... |
All samples (bcl2fastq direct) | fastqs,sample,library_type /PATH/TO/BCL2FASTQ_OUTPUT_DIR,,Chromatin Accessibility ... |
Process test_sample only (mkfastq) | fastqs,sample,library_type /PATH/TO/MKFASTQ_ID/outs/fastq_path,test_sample,Chromatin Accessibility ... |
How did I get here?
It is likely that FASTQ files have been transferred from either a mkfastq
or bcl2fastq
run into another folder. They still retain the names assigned by bcl2fastq
, which is a combination of sample name, sample order, lane, read type, and chunk. Your file hierarchy may look similar to this:
PROJECT_FOLDER
|-- MySample_S1_L001_I1_001.fastq.gz
|-- MySample_S1_L001_I2_001.fastq.gz
|-- MySample_S1_L001_R1_001.fastq.gz
|-- MySample_S1_L001_R2_001.fastq.gz
|-- MySample_S1_L002_I1_001.fastq.gz
|-- MySample_S1_L002_I2_001.fastq.gz
|-- MySample_S1_L002_R1_001.fastq.gz
|-- MySample_S1_L002_R2_001.fastq.gz
This is fine; since the files are named according to the bcl2fastq
standard, you would use the same arguments as if the FASTQs were organized into a flow cell folder or mkfastq
output folder.
Situation | Line in libraries CSV |
---|---|
All samples (mkfastq) | fastqs,sample,library_type /PATH/TO/PROJECT_FOLDER,,Chromatin Accessibility ... |
Process MySample only | fastqs,sample,library_type /PATH/TO/PROJECT_FOLDER,MySample,Chromatin Accessibility ... |
How did I get here?
It is likely that you received files that were processed through a proprietary LIMS system, which employs its own naming conventions.
10x Genomics pipelines require files to be named in the bcl2fastq
convention in order to run properly. You will need to determine the corresponding sample and read type for each file, likely by consulting your sequencing core or the individual who demultiplexed your flow cell.
It is highly likely that these files were initially processed with bcl2fastq
, so you will need to rename the files in one of the following formats, once you track down their origin:
[Sample Name]
S1_L00[Lane Number]
[Read Type]
_001.fastq.gz
Where Read Type
is one of:
I1
: Dual index i7 read (optional)R1
: Read 1R2
: Dual index i5 readR3
: Read 2
Alternatively, Cell Ranger ARC will also accept ATAC FASTQs in this format:
I1
: Dual index i7 read (optional)R1
: Read 1I2
: Dual index i5 readR2
: Read 2
After you have renamed those files into that format, you'll use the following arguments:
Situation | Line in libraries CSV |
---|---|
All samples (mkfastq) | fastqs,sample,library_type /PATH/TO/PROJECT_FOLDER,,Chromatin Accessibility ... |
Process SAMPLENAME only | fastqs,sample,library_type /PATH/TO/PROJECT_FOLDER,SAMPLENAME,Chromatin Accessibility ... |