@@ -6,22 +6,30 @@ This repo contains scripts used to validate the genomic qPCR data from the paper
Marta Florio, Michael Heide, Anneline Pinson, Holger Brandl, Mareike Albert, Sylke Winkler, Pauline Wimberger, Wieland B. Huttner and Michael Hiller<br>
## Materials & Methods summary taken from the publication:
Paired-End data were trimmed using cutadapt (v1.15; -m 20 -q 25 -a file:${Ill_ADAPTERS} -A file:${Ill_ADAPTERS}) and mapped with STAR (v2.5.2b; ---alignSJoverhangMin 100 ---outFilterType BySJout ---sjdbGTFfile ${gtfFile}). bedtools intersect (v2.25.0) was used to determine the number of overlapping alignments at each locus of interest, and samtools flagstat was used to determine the library size. Final data integration and visualization was implemented using R.
Paired-End data were trimmed using cutadapt (v1.15; `-m 20 -q 25 -a file:${Ill_ADAPTERS} -A file:${Ill_ADAPTERS}`) and mapped with STAR (v2.5.2b; `---alignSJoverhangMin 100 ---outFilterType BySJout ---sjdbGTFfile ${gtfFile}`). bedtools intersect (v2.25.0) was used to determine the number of overlapping alignments at each locus of interest, and samtools flagstat was used to determine the library size. Final data integration and visualization was implemented using R.
We share these computational protocols in the spirit of open data and reproducible research. So feel welcome to provide comments, report errors, or suggest improvements.
## Contained Workflow
## Workflow
[`quantify_offtargets.sh`](./quantify_offtargets.sh) contains all performed steps
**This is currently a place holder and the workflow will be added within the next few days**
1. Data DL, QC, and Trimming
2. Alignment & Locus count intersection
3. Off-target ratio calcuation and reporting
The used region model is also indluded under [ortho_model_hsap_ppan_ptro.fixed.txt](./ortho_model_hsap_ppan_ptro.fixed.txt)
## Usage & Disclaimer
The scripts are provided as is, without the intention that an interested reader will able to run them directly. We share our protocol here, not a ready-to-use tool. Still, we think they should provide enough technical detail to allow replicating our analysis.
The scripts are provided as is under [MIT License](https://opensource.org/licenses/MIT), without the intention that an interested reader will able to run them directly. We share our protocol here, not a ready-to-use tool. Still, we think they should provide enough technical detail to allow replicating our analysis.
Not detailed out in this repo are the steps to prepare a linux environment to include the listed tools and dependencies. Required tools include recent versions of R, samtools, STAR, bedtools, as well as:
* https://git.mpi-cbg.de/bioinfo/ngs_tools
* https://git.mpi-cbg.de/bioinfo/ngs_tools which is a collection of deep-sequencies utilites. The version used to build the data was `6747f0add32ba9bc41a3cd04de72dad69afdbb6d`
* https://github.com/holgerbrandl/joblist which an HPC task manager
*[rend.R](https://github.com/holgerbrandl/datautils/tree/master/tools/rendr) which is a wrapper around `knitr`