The pipeline output directory described in Outputs Overview contains all of the data produced by one invocation of a pipeline (a pipestance) as well as rich metadata describing the characteristics of each stage. This directory contains a specific structure used by the Martian pipeline framework to track the state of the pipeline as execution proceeds.
Xenium Ranger's notion of a pipeline is very flexible in that a pipeline can be composed of stages that run stage code or sub-pipelines that may themselves contain stages or sub-pipelines.
Xenium Ranger pipelines follow the convention that stages are named with verbs (such as, MERGE_METRICS
, SELECT_CELLS_DATASET
) and sub-pipelines are named with nouns and prefixed with an underscore (e.g., _CELL_SEGMENTOR_FINALIZER
). Each stage runs in its own directory bearing its name, and each stage's directory is contained within its parent pipeline's directory.
For example, the xeniumranger resegment
pipeline begins with the following process graph:
where
XR_PREFLIGHT
is a preflight stage, which validates inputs prior to running the other stages.XENIUM_RANGER_CS
is the top-level pipeline stageXR_RESEGMENT
is a sub-pipeline contained inXENIUM_RANGER_CS
. TheSEGMENT_NUCLEI
,SETUP_CELL_SEGMENTOR_FINALIZER
stages are contained in theXR_RESEGMENT
sub-pipeline.
Every pipestance operates wholly inside of its pipeline output directory. When the pipestance completes, this pipestance output directory contains three outputs: metadata files, the pipestance output file directory, and the top-level pipeline stage directory.
- Metadata files are files prefixed with an underscore (
_
) and usually contain unstructured text or JSON-encoded arrays and hashes. - The pipestance output file directory is a directory called
outs/
that contains the pipestance's output files. - The top-level pipeline stage directory is a directory named according to the top-level pipeline stage that contains the child stage directories that compose this pipestance.
The top-level pipeline stage directory is a stage directory that contains any number of child stage directories as well as one stage output directory for each fork run by that stage. There is one possible top-level pipeline stage, Xenium_Ranger_CS
.
All Xenium Ranger pipelines contain only single-fork stages, so there is only one fork0
stage output directory within each stage directory. Chunk output directories are a subset of stage output directories that additionally contain runtime information specific to the job or process being run by that chunk such as a process ID or cluster job ID.
For example, any of the Xenium Ranger pipelines' pipeline output directory contains the following directory structure:
Files beginning with an underscore, _ | Metadata files (described below) |
outs/ | Pipestance output file directory |
XENIUM_RANGER_CS/ | Top-level pipeline stage directory |
INSITU_COUNTER_CS/fork0/ | One of the stage output directories in XENIUM_RANGER_CS/ |
INSITU_COUNTER_CS/fork0/files/ | Stage output files |
INSITU_COUNTER_CS/fork0/CREATE_METRICS_SUMMARY_CSV_CS/ | Stage directory |
INSITU_COUNTER_CS/fork0/CREATE_METRICS_SUMMARY_CSV_CS/fork0/chnk0/ | Chunk output directory |
The metadata contained in the pipeline output directory includes:
File Name | Description |
---|---|
_cmdline | The command line code used to run analysis. |
_finalstate | Metadata cache that is populated when a pipestance completes to minimize re-aggregation of metadata. |
_invocation | The MRO call used to invoke this pipestance. |
_jobmode | Job mode specified by --jobmode . |
_log | The log messages that are reported to your terminal window when running xeniumranger commands. |
_mrosource | The entire MRO describing the pipeline with all @include statements dereferenced. |
_perf | Detailed runtime performance data for every stage in the pipestance. |
_sitecheck | System compatibility information. |
_timestamp | The start and finish time for this pipestance. |
_uuid | Unique ID for the analysis pipestance. |
_vdrkill | A list of all of the volatile data (temporary files) removed during pipeline execution as well as total number of files and bytes deleted. |
_versions | Versions of the components used by the pipeline. |
Stage directories contain stage output directories, stage output files, and the stage directories of any child stages or pipelines.
Stage output directories typically contain:
File Name | Contents |
---|---|
files/ | Directory containing any files created by this stage that were not considered volatile (temporary). |
split/ | A special stage output directory for the step that divided this stage's input into parallel chunks. |
chnkN/ | A chunk output directory for the Nth parallel chunk executed. |
join/ | A special stage output directory for the step that recombined this stage's parallel output chunks into a single output dataset again. |
_complete | A file that, when present, signifies that this stage has successfully completed. |
_errors | A file that, when present, signifies that this stage failed. Contains the errors that resulted in stage failure. |
_invocation | The MRO call used to execute this stage by the Martian framework. |
_outs | The output files generated by this stage. |
_vdrkill | A list of all of the volatile data (temporary files) removed during pipeline execution as well as total number of files and bytes deleted. |
Chunk output directories are a subset of stage output directories that, in addition to the aforementioned stage output, may contain:
File Name | Contents |
---|---|
_args | The arguments passed to the stage's stage code. |
_jobinfo | Metadata describing the stage's execution, including performance metrics, job manager jobid and jobname, and process ID. |
_jobscript | The script submitted to the cluster job manager (cluster mode-only). |
_stdout | Any stage code output that was printed to the stdout stream. |
_stderr | Any stage code output that was printed to the stderr stream |
Metadata files should be treated as read-only. Altering the contents of metadata files is not recommended.