.. terminus documentation master file, created by sphinx-quickstart on Mon Feb 3 15:38:59 2020. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to terminus's documentation! ==================================== What is Terminus ---------------- Terminus is a tool that implements a new method for analyzing transcript-level abundance estimates from RNA-seq data. Terminus works downstream of salmon, and collapses individual transcripts into groups whose total transcriptional output can be estimated more accurately and robustly. Requirements ------------ Terminus uses the [cargo](https://github.com/rust-lang/cargo) build system and package manager. To build terminus from source, you will need to have rust (ideally v1.40 or greater) installed. Then, you can build terminus by executing: .. code-block:: bash cargo build --release Input to terminus ----------------- Terminus expects salmon to be run on the raw fastq files with two non-default option enabled, ``--numGibbsSamples`` and ``-d``. A typical run of salmon that is ideal for terminus is as follows, .. code-block:: bash salmon quant -la -i -1 -2 --numGibbsSamples 100 -d -o This step can be run on multiple samples in case, one wish to run terminus on multiple samples. Terminus Grouping ----------------- At the first step terminus groups the transcripts per experiment. This step deals with each experiment independently. To run grouping terminus has to be run with the following command, .. code-block:: bash target/release/terminus group -m <> --tolerance <> -d input_dir/ -o output_dir The above command expects ``input_dir/`` to contain all the files that are written by salmon along with the ``bootstraps`` directory and the equivalence class file. Note that it *assumes* the ``experiment`` is the experiment name inside the `input_dir`. Terminus would create a direcoty named ``experiment`` in ````. After a successfull run there will be a ``groups.txt`` file written in ``output_dir/experiment``. This step is parallalizable trivially by taking help of ``gnu parallel`` or such tools. Grouping options ---------------- * ``-d``: Input directory where the salmon files are written. * ``-m``: The threshold value for passing the `min-spread`, for a posterior distribution, spread is defined as (max - min)/mean value. * ``-o``: output directory where the a folder by the root name of experiment would be created. * ``--tolerance``: The allowable difference between the weight vectors for transcripts to consider them as identical. * ``--seed``: An iteger value for deciding the seed. This seed would be used for all the random number generations. The default seed is `10`. Terminus Collapsing ------------------- Given the groups are written in the directory as specified above, collapsing operation collapses the groups and create a `consensus` group that is common across all the experiments. .. code-block:: bash target/release/terminus collapse -c <> -d ... -o output_dir The results of this step would be written in `output_dir` in the corresponding folders `experiment_1`, `experument_2` etc. Collapsing options ------------------ * ``-d``: Input directory where all the root level directories of salmon is present * ``-c``: The consensus threshold, determines the number of experiments a group has to appear into. A value of `0.5` dictates, that the final grups are at least present in half of the experiments. Output ------ At the end of the above two steps the final directory would contain a the experiments directory with in the *exact* same way salmon writes output. Although the number of transcripts in the final output would be `total_number_of_transcripts - transcripts_collapsed + groups`. Contents: .. toctree:: :maxdepth: 2 Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`