Simple meta analysis
This is a strategy for a simple meta-analysis nextflow workflow. Here "simple" means that all you individual processing steps are already expected to run on all the data in a single study. This would be the case for Qiime2 analyses for instance because each Qiime2 always operates on all data from a study.
Here we first have a main table that lists the studies with the required parameters/settings for each study.
id,layout,forward_primer,reverse_primer,trunc_f,trunc_r,location
study 1,paired,ACG,TGC,220,200,study_1
study 2,single,ACT,GCT,220,0,study_2
The nextflow pipeline then reads the CSV and passed the study information as value in all the steps. The initial channel has one entry for each study.
#!/usr/bin/env nextflow
params.data = "${launchDir}"
params.studies = "${params.data}/studies.csv"
params.visualize = true
workflow {
// Read the studies table
studies = Channel.fromPath(params.studies)
.splitCsv(header: true, sep: ",")
// Do some work on the studies
studies | import_data | step1
// Visualize if desired
if (params.visualize) {
visualize(import_data.out)
}
}
process import_data {
publishDir "${params.data}/imports", mode: 'copy', overwrite: true
cpus 4
memory "8GB"
time "2h"
input:
val(study)
output:
tuple val(study), path("*.txt")
script:
if (study.layout == "paired") {
"""
echo "importing from ${params.data}/${study.location}/manifest.tsv in paired-end layout." > '${study.id}.txt'
"""
} else if (study.layout == "single") {
"""
echo "importing from ${params.data}/${study.location}/manifest.tsv in single-end layout." > '${study.id}.txt'
"""
} else {
error "Invalid libray layout specified. Must be 'paired' or 'single' :("
}
}
process step1 {
publishDir "${params.data}/step1", mode: 'copy', overwrite: true
cpus 1
memory "2GB"
time "1h"
input:
tuple val(study), path(imported)
output:
tuple val(study), path("*.result")
script:
"""
echo "processed ${study.id} data with truncations of ${study.trunc_f},${study.trunc_r}" > '${study.id}.result'
"""
}
process visualize {
publishDir "${params.data}/viz", mode: 'copy', overwrite: true
cpus 1
memory "2GB"
time "1h"
input:
tuple val(study), path(imported)
output:
tuple val(study), path("${study.id}.viz")
script:
"""
echo "visualized ${study.id} import" > '${study.id}.viz'
"""
}
Note
The individuals processng scripts don't make much sense here. Those just serve as an example how to inject the study parameters.
Running this will then distribute the row for each study.
$ nextflow run main.nf
N E X T F L O W ~ version 25.03.1-edge
Launching `main.nf` [stoic_murdock] DSL2 - revision: 0e2f97c09b
executor > local (6)
[ec/8110b0] process > import_data (1) [100%] 2 of 2 ✔
[12/497c3c] process > step1 (1) [100%] 2 of 2 ✔
[1e/c9f3fe] process > visualize (2) [100%] 2 of 2 ✔
This also shows an example how to globally disable a part of the pipeline (visualization) while still retaining the cache.
$ nextflow run main.nf -resume --visualize false
N E X T F L O W ~ version 25.03.1-edge
Launching `main.nf` [intergalactic_goldwasser] DSL2 - revision: 0e2f97c09b
[f9/53dc3a] process > import_data (2) [100%] 2 of 2, cached: 2 ✔
[04/179b27] process > step1 (1) [100%] 2 of 2, cached: 2 ✔
You can see the injection of the library layout and the truncation parameters from the main table.