For the Programmer - Adding Modules¶
Author: Menachem Sklarz
Table of Contents
Choose a name for the module. e.g.
Decide which level the module will work on: samples or project-wide?
Change the name of the template file to to
Make sure the file is within a directory which includes an empty
__init__.pyfile. This directory is passed to NeatSeq-Flow through the
module_pathglobal parameter (see Parameter file definition)
Change the class name to
Step_<module_name>in the line beginning with
class Step_.... Make sure
<module_name>here is identical to the one you used in the filename above.
self.shellto csh or bash, depending on the shell language you want your scripts to be coded in (It is best to use
bashbecause it will work with Install and execute with Conda).
step_sample_initiation()method, you can do things on the
sample_datastructure before actual script preparing, such as assertion checking (Exceptions and Warnings) to make sure the data the step requires exists in the
build_scripts()is the actual place to put the step script building code. See Instructions for build_scripts() function.
make_sample_file_index()is a place to put code that produces an index file of the files produced by this step (BLAST uses this function, so you can check it out in
create_spec_preliminary_script()you create the code for a script that will be run before all other step scripts are executed. If not defined or returns nothing, it will be ignored (i.e. you can set it to
pass). This is useful if you need to prepare a database, for example, before the other scripts use it.
create_spec_wrapping_up_script()you create the code for a script that will be run after all other step scripts are executed. If not defined or returns nothing, it will be ignored (i.e. you can set it to “pass”). This is the place to call
make_sample_file_index()to create an index of the files produced in this step; and to call a script that takes the index file and does some kind of data agglomeration.
- It is highly recommended to create an instance-scope list of the redirected parameters that the user should not pass because they are dealt with by your module. The list should be called
auto_redirsand you should place it directly after the class definition line (i.e. the line beginning with
class Step_...). After instance creation, the list is checked by NeatSeq-Flow to make sure the user did not pass forbidden parameters.
If sample-level scripts are required, the function should contain a loop:
for sample in self.sample_data["samples"]:
self.scriptto contain the command/s executed by the script (This will go inside the
forloop for sample-level steps)
Initialize it with
self.script = ""
self.script += self.get_script_const()will add the
setenvparameter, if it exists; the
script_pathparameter and the redirected parameters. Then all that remains is to see to input and output parameters.
The input parameter, typically
-i, is usually based on the sample data structure, e.g.:
self.script += "-i %s \\\n\t" % self.sample_data[sample]["fasta.nucl"]
"\\\n\t"at the end of the string makes the final script more readable.
The output parameter (typicall
-o) should be set to a filename within
self.base_dir. If the step is a sample-level step, get a directory for the output files by calling:
sample_dir = self.make_folder_for_sample(sample)
Place the output file somewhere in the
self.sample_data[sample]["bam"] = (sample_dir + os.path.basename(output_filename))
If the output is a standard file, e.g. BAM or fastq files, put them in the respective places in
sample_data. See documentation for similar modules to find out the naming scheme. Otherwise, use a concise file-type descriptor for the file and specify the location you decided on in the module documentation.
You can add more than one command in the
self.scriptvariable if the two commands are typically executed together. See
samtoolsmodule for an example.
The function should end with the following line (within the sample-loop, if one exists):
That, and a little bit of debugging, usually, is all it requires to add a module to the pipeline.
The steps above assume you don’t want to support the option of working on a local directory and transferring the finished results to the final location (see local parameter). If you do want to support it, you have to create a temporary directory with:
use_dir = self.local_start(sample_dir)
use_dir = self.local_start(self.base_dir)
use_dir when defining the script, but use
self.base_dir when assigining to
self.sample_data (see the templates for examples).
Finally, add the following line before
Note that the above procedure enables the user to decide whether to run locally by adding the ``local`` parameter to the step parameter block in the parameter file!
When programming a module, the programmer usually has certain requirements from the user, for instance parameters that are required to be set in the parameter file, sets of parameters which the user has to choose from and parameters which can take only specific values.
This kind of condition is typically programmed in python using assertions.
In NeatSeq-Flow, assertions are managed with the
AssertionExcept exception class. For testing the parameters, create an
if condition which raises an
AssertionExcept. The arguments to
AssertionExcept are as follows:
- An error message to be displayed.
AssertionExceptwill automatically add the step name to the message.
- Optional: The sample name, in case the condition failed for a particular sample (e.g. a particular sample does not have a BAM file defined.)
A typical condition testing code snippet:
for sample in self.sample_data["samples"]: if not CONDITION: raise AssertionExcept("INFORMATIVE error message\n", sample)
The reason for using
if not CONDITION rather than
if CONDITION is that the condition is a condition for success rather than for failure, which is more intuitive (for me at least)
If you only want to warn the user about a certain issue, rather than failing, you can induce NeatSeq-Flow to produce a warning message with the same format as an
AssertionExcept message, as follows:
for sample in self.sample_data["samples"]: if CONDITION: self.write_warning("Warning message.\n", sample)
sample argument is optional.