Welcome to terminus’s documentation!

What is Terminus

Terminus is a tool that implements a new method for analyzing transcript-level abundance estimates from RNA-seq data. Terminus works downstream of salmon, and collapses individual transcripts into groups whose total transcriptional output can be estimated more accurately and robustly.

Requirements

Terminus uses the [cargo](https://github.com/rust-lang/cargo) build system and package manager. To build terminus from source, you will need to have rust (ideally v1.40 or greater) installed. Then, you can build terminus by executing:

cargo build --release

Input to terminus

Terminus expects salmon to be run on the raw fastq files with two non-default option enabled, --numGibbsSamples and -d. A typical run of salmon that is ideal for terminus is as follows,

salmon quant -la -i <index> -1 <fatsq_file(s)> -2 <fatsq_file(s)> --numGibbsSamples 100 -d -o <output>

This step can be run on multiple samples in case, one wish to run terminus on multiple samples.

Terminus Grouping

At the first step terminus groups the transcripts per experiment. This step deals with each experiment independently. To run grouping terminus has to be run with the following command,

target/release/terminus group -m <> --tolerance <> -d input_dir/<experiment> -o output_dir

The above command expects input_dir/<experiment> to contain all the files that are written by salmon along with the bootstraps directory and the equivalence class file. Note that it assumes the experiment is the experiment name inside the input_dir. Terminus would create a direcoty named experiment in <output_dir>. After a successfull run there will be a groups.txt file written in output_dir/experiment.

This step is parallalizable trivially by taking help of gnu parallel or such tools.

Grouping options

  • -d: Input directory where the salmon files are written.
  • -m: The threshold value for passing the min-spread, for a posterior distribution, spread is defined as (max - min)/mean value.
  • -o: output directory where the a folder by the root name of experiment would be created.
  • --tolerance: The allowable difference between the weight vectors for transcripts to consider them as identical.
  • --seed: An iteger value for deciding the seed. This seed would be used for all the random number generations. The default seed is 10.

Terminus Collapsing

Given the groups are written in the directory as specified above, collapsing operation collapses the groups and create a consensus group that is common across all the experiments.

target/release/terminus collapse -c <>  -d <input_dir/experiment_1> <input_dir/experument_2> ... -o output_dir

The results of this step would be written in output_dir in the corresponding folders experiment_1, experument_2 etc.

Collapsing options

  • -d: Input directory where all the root level directories of salmon is present
  • -c: The consensus threshold, determines the number of experiments a group has to appear into. A value of 0.5 dictates, that the final grups are at least present in half of the experiments.

Output

At the end of the above two steps the final directory would contain a the experiments directory with in the exact same way salmon writes output. Although the number of transcripts in the final output would be total_number_of_transcripts - transcripts_collapsed + groups.

Contents:

Indices and tables