Beyond Your First Snakefile

This section is intended for people who have already used snakemake, and now want to learn about and apply some more snakemake features!

Some initial motivation

Let's consider the below Snakefile:

FASTQ_FILES = glob_wildcards("{sample}.fastq")

rule all:
    input:
        "multiqc_report.html"

rule multiqc:
    input:
        expand("{sample}_fastqc.html", sample=FASTQ_FILES.sample)
    output:
        "multiqc_report.html"
    shell: """
        multiqc . --filename {output:q} -f
    """

rule fastqc_raw:
    input:
        "{sample}.fastq"
    output:
        "{sample}_fastqc.html", "{sample}_fastqc.zip"
    shell: """
       fastqc {input:q}
    """

This Snakefile will find all files ending in .fastq under the current directory. snakemake will then run FASTQC on each one, and build a summary report using multiqc. It works for any number of files, and will find files under any and all subdirectories. It can run in parallel on a single machine, or on multiple machines on a cluster, limited only by the computational resources you make available to snakemake. And if new FASTQ files are added, snakemake will automatically detect them, run fastqc on them, and rerun multiqc to update the summary report.

You might say that for all this power it is fairly short, as computer programs go. But it is also somewhat terse and complicated looking!

This section is devoted to explaining all of the features of snakemake (and how to write them into Snakefiles) that power the above functionality. By the end of this section, you will be able to use 80% or more of the core features of snakemake! And you will also have pointers into much of the remaining 20% of snakemake's core feature set, which will be available to you when and as you need it.

A summary of this section

This section attempts to bridge between the more gradual on-ramp of the first two sections, and the full power of this fully operational workflow system as discussed in later sections as well as the official snakemake documentation.

This section introduces input and output blocks, wildcards, params blocks, glob_wildcards, and expand. It will also discuss common approaches to debugging snakemake workflows and cover basic syntax rules.