We'll be taking GitLab down for maintenance around 22 in the evening on the 15th of September, so this Sunday. Let us know (tt.mpi-cbg.de) if you experience any issues with it after the maintenance period.

Commit 6f23dc09 authored by Holger Brandl's avatar Holger Brandl

added workflow and region model

parent e499b1fc
# Created by .ignore support plugin (hsz.mobi)
### JetBrains template
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839
# User-specific stuff:
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/dictionaries
# Sensitive or high-churn files:
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.xml
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
# Gradle:
.idea/**/gradle.xml
.idea/**/libraries
# CMake
cmake-build-debug/
cmake-build-release/
# Mongo Explorer plugin:
.idea/**/mongoSettings.xml
## File-based project format:
*.iws
## Plugin-specific files:
# IntelliJ
out/
# mpeltonen/sbt-idea plugin
.idea_modules/
# JIRA plugin
atlassian-ide-plugin.xml
# Cursive Clojure plugin
.idea/replstate.xml
# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties
......@@ -6,22 +6,30 @@ This repo contains scripts used to validate the genomic qPCR data from the paper
Marta Florio, Michael Heide, Anneline Pinson, Holger Brandl, Mareike Albert, Sylke Winkler, Pauline Wimberger, Wieland B. Huttner and Michael Hiller<br>
## Materials & Methods summary taken from the publication:
Paired-End data were trimmed using cutadapt (v1.15; -m 20 -q 25 -a file:${Ill_ADAPTERS} -A file:${Ill_ADAPTERS}) and mapped with STAR (v2.5.2b; ---alignSJoverhangMin 100 ---outFilterType BySJout ---sjdbGTFfile ${gtfFile}). bedtools intersect (v2.25.0) was used to determine the number of overlapping alignments at each locus of interest, and samtools flagstat was used to determine the library size. Final data integration and visualization was implemented using R.
Paired-End data were trimmed using cutadapt (v1.15; `-m 20 -q 25 -a file:${Ill_ADAPTERS} -A file:${Ill_ADAPTERS}`) and mapped with STAR (v2.5.2b; `---alignSJoverhangMin 100 ---outFilterType BySJout ---sjdbGTFfile ${gtfFile}`). bedtools intersect (v2.25.0) was used to determine the number of overlapping alignments at each locus of interest, and samtools flagstat was used to determine the library size. Final data integration and visualization was implemented using R.
We share these computational protocols in the spirit of open data and reproducible research. So feel welcome to provide comments, report errors, or suggest improvements.
## Contained Workflow
## Workflow
[`quantify_offtargets.sh`](./quantify_offtargets.sh) contains all performed steps
**This is currently a place holder and the workflow will be added within the next few days**
1. Data DL, QC, and Trimming
2. Alignment & Locus count intersection
3. Off-target ratio calcuation and reporting
The used region model is also indluded under [ortho_model_hsap_ppan_ptro.fixed.txt](./ortho_model_hsap_ppan_ptro.fixed.txt)
## Usage & Disclaimer
The scripts are provided as is, without the intention that an interested reader will able to run them directly. We share our protocol here, not a ready-to-use tool. Still, we think they should provide enough technical detail to allow replicating our analysis.
The scripts are provided as is under [MIT License](https://opensource.org/licenses/MIT), without the intention that an interested reader will able to run them directly. We share our protocol here, not a ready-to-use tool. Still, we think they should provide enough technical detail to allow replicating our analysis.
Not detailed out in this repo are the steps to prepare a linux environment to include the listed tools and dependencies. Required tools include recent versions of R, samtools, STAR, bedtools, as well as:
* https://git.mpi-cbg.de/bioinfo/ngs_tools
* https://git.mpi-cbg.de/bioinfo/ngs_tools which is a collection of deep-sequencies utilites. The version used to build the data was `6747f0add32ba9bc41a3cd04de72dad69afdbb6d`
* https://github.com/holgerbrandl/joblist which an HPC task manager
* [rend.R](https://github.com/holgerbrandl/datautils/tree/master/tools/rendr) which is a wrapper around `knitr`
## Support
......
external_gene_name ensembl_gene_id chromosome_name start_position end_position species
ARHGAP11A ENSG00000198826 15 32615144 32639949 hsap
ARHGAP11B ENSG00000187951 15 30624494 30772993 hsap
FAM72A ENSG00000196550 1 206186179 206204414 hsap
FAM72B ENSG00000188610 1 121167646 121185539 hsap
FAM72C ENSG00000263513 1 143955364 143971965 hsap
FAM72D ENSG00000215784 1 145096000 145112696 hsap
GTF2H2 ENSG00000145736 5 71032670 71067689 hsap
GTF2H2B ENSG00000226259 5 70415352 70448015 hsap
GTF2H2C ENSG00000183474 5 69560208 69594723 hsap
SMN1 ENSG00000172062 5 70925030 70953942 hsap
SMN2 ENSG00000205571 5 70049612 70078522 hsap
STX12 ENSG00000117758 1 27773183 27824452 hsap
ARHGAP11A ENSPTRG00000006867 15 29379131 29403889 ptro
ARHGAP11B NA NA NA NA ptro
FAM72A ENSPTRG00000023098 1 185286022 185302699 ptro
FAM72B NA NA NA NA ptro
FAM72C NA NA NA NA ptro
FAM72D NA NA NA NA ptro
GTF2H2 ENSPTRG00000016953 5 45635974 45669662 ptro
GTF2H2C NA NA NA NA ptro
SMN1 ENSPTRG00000016955 5 45526056 45552822 ptro
SMN2 NA NA NA NA ptro
STX12 ENSPTRG00000000416 1 27781808 27833481 ptro
ARHGAP11A NA 15 29947016 29972748 ppan
ARHGAP11B NA NA NA NA ppan
FAM72A NA 1 186075381 186091543 ppan
FAM72B NA NA NA NA ppan
FAM72C NA NA NA NA ppan
FAM72D NA NA NA NA ppan
GTF2H2 NA 5 45985077 46028618 ppan
GTF2H2C NA NA NA NA ppan
SMN1 NA 5 45876144 45903268 ppan
SMN2 NA NA NA NA ppan
STX12 ENSPPAG00000041257 1 28076625 28130032 ppan
\ No newline at end of file
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment