Chapter 9 - using expand to make filenames

You might note that the list of files in the compare_genomes rule all share the same suffix, and they're all built using the same rule. Can we use that in some way?

Yes! We can use a function called expand(...) and give it a template filename to build, and a list of values to insert into that filename.

Below, we build a list of accessions named ACCESSIONS, and then use expand to build the list of input files of the format {acc}.fna.gz.sig from that list, creating one filename for each value in ACCESSSIONS.

ACCESSIONS = ["GCF_000017325.1",
              "GCF_000020225.1",
              "GCF_000021665.1",
              "GCF_008423265.1"]

rule sketch_genome:
    input:
        "genomes/{accession}.fna.gz",
    output:
        "{accession}.fna.gz.sig",
    shell: """
        sourmash sketch dna -p k=31 {input} --name-from-first
    """

rule compare_genomes:
    input:
        expand("{acc}.fna.gz.sig", acc=ACCESSIONS),
    output:
        "compare.mat"
    shell: """
        sourmash compare {input} -o {output}
    """

rule plot_comparison:
    message: "compare all input genomes using sourmash"
    input:
        "compare.mat"
    output:
        "compare.mat.matrix.png"
    shell: """
        sourmash plot {input}
    """

While wildcards and expand use the same syntax, they do quite different things.

expand generates a list of filenames, based on a template and a list of values to insert into the template. It is typically used to make a list of files that you want snakemake to create for you.

Wildcards in rules provide the rules by which one or more files will be actually created. They are recipes that say, "when you want to make a file with name that looks like THIS, you can do so from files that look like THAT, and here's what to run to make that happen.

expand tells snakemake WHAT you want to make, wildcard rules tell snakemake HOW to make those things.

CTB: add discussion about how this is the same as list of strings. CTB: refer to additional expand docs.