ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2019-02-12T10:35:31Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/1make sure that exported tables contain double-normalized counts and not just ...2019-02-12T10:35:31Zbrandlbrandl@mpi-cbg.demake sure that exported tables contain double-normalized counts and not just size-normalized onesbrandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.dehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/6Add pcr duplication analysis for rna-seq workflow2017-11-09T15:39:52Zbrandlbrandl@mpi-cbg.deAdd pcr duplication analysis for rna-seq workflowhttps://www.bioconductor.org/packages/release/bioc/html/dupRadar.html
@lakshman what do you thinkhttps://www.bioconductor.org/packages/release/bioc/html/dupRadar.html
@lakshman what do you thinkhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/7finish panther enrichment and reintegrate into cp_enrichment.R2020-02-20T16:06:56Zbrandlbrandl@mpi-cbg.definish panther enrichment and reintegrate into cp_enrichment.Rsee
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/panther_enrichment.R
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/cp_enrichment.Rsee
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/panther_enrichment.R
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/cp_enrichment.Rhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/8create tool to create igv session file list of bam files2017-07-08T23:44:42Zbrandlbrandl@mpi-cbg.decreate tool to create igv session file list of bam files@lakshman fyi https://github.com/igvteam/igv/issues/234@lakshman fyi https://github.com/igvteam/igv/issues/234https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/9Export counts as TPM in featcounts_deseq_mf.R2017-07-08T23:44:42Zbrandlbrandl@mpi-cbg.deExport counts as TPM in featcounts_deseq_mf.RThey are better than fpkm according to Mathias and allow for better comparison across samplesThey are better than fpkm according to Mathias and allow for better comparison across samplesbrandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.dehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/10use http://multiqc.info/ for qc reporting2019-11-18T09:06:51Zbrandlbrandl@mpi-cbg.deuse http://multiqc.info/ for qc reportinghttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/11Integrate tools to assess gc/3/5 bias, insert size, and capture efficiency2020-02-20T12:42:13Zbrandlbrandl@mpi-cbg.deIntegrate tools to assess gc/3/5 bias, insert size, and capture efficiencye.g. using http://deeptools.readthedocs.io/en/latest/content/tools/computeGCBias.html?highlight=bias
(see "Multi-perspective quality control of Illumina RNA sequencing data analysis" http://bfg.oxfordjournals.org/content/early/2016/09/28...e.g. using http://deeptools.readthedocs.io/en/latest/content/tools/computeGCBias.html?highlight=bias
(see "Multi-perspective quality control of Illumina RNA sequencing data analysis" http://bfg.oxfordjournals.org/content/early/2016/09/28/bfgp.elw035.abstract )
@lakshman opinion?https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/12Add option to perform diffex-test in mf_feat_counts.R using fc-cutoff > 0.2017-06-19T07:58:38Zbrandlbrandl@mpi-cbg.deAdd option to perform diffex-test in mf_feat_counts.R using fc-cutoff > 0.brandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.dehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/14export TPM instead of FPKM tables2017-06-19T11:19:42Zbrandlbrandl@mpi-cbg.deexport TPM instead of FPKM tableshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/56dge_workflow: echoing R code in to Rscript not loading personal library packages2019-06-20T09:09:14Zdominguedge_workflow: echoing R code in to Rscript not loading personal library packagesThe function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/dataut...The function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/datautils/raw/v1.40/R/core_commons.R")
Error in loadNamespace(name) : there is no package called ‘devtools’
Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted
```
I tracked it down and this occurs because of the flag "--vanilla" in:
`| R --no-save --no-restore --no-site-file -q`. This flag keeps the R environment "clean" but that also means R packages installed in personal libraries are not loaded - thus the error despite `devtools` being installed.
There are two (three) solutions for this:
1. Replace `--vanilla` flag with `--no-save --no-restore --no-site-file` since `vanilla` is in fact a wrapper for `--no-save, --no-restore, --no-site-file, --no-init-file and --no-environ`. Removing the flags causing the issue will work (I tested it).
2. Instead of echoing the `R`code (`echo '[some code]' | R --no-save --no-restore --no-site-file -q`) we could the approach `Rscript - <<"EOF" [some code] EOF`. This was also tested and also works, but I have not looked into unintended consequences (loading of hidden R files for instance).
3. More of a long term solution, and probably not feasible, keep these R snippets in their separate `.R` files, or as functions, a call them with `Rscript some_function.R`.
For the time being I would suggest either of 1. and 2. @herseman If you have any preference let me know so that I make the PR.
---
A consequence of this bug, and that was how I found it, is that `star_align.kts` will not produce the count matrix table and, afaik, finish successfully with a reported warning.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/57dge workflow: add GTF an argument for differential gene expression2019-06-28T17:51:47Zdominguedge workflow: add GTF an argument for differential gene expressionRelates to the [DGE analysis](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R) and it would have two uses:
1. retrieval of "accurate" gene lengths, exonic regions only, to calculate RPKM and FPM u...Relates to the [DGE analysis](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R) and it would have two uses:
1. retrieval of "accurate" gene lengths, exonic regions only, to calculate RPKM and FPM using `DESeq2` in-built functionality (more details [here](https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/fpkm))
2. GTFs already contain a wealth of information which currently needs to be retrieved wiht `biomaRt`. Getting it from the GTF would make the process faster, more reproducible (in my experience `biomaRt` changes quite often) and it would work even for organisms not present in biomart or other marts (eg. planaria)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/67expression_explorer: harcoded Rscipt path2019-08-21T11:19:33Zdomingueexpression_explorer: harcoded Rscipt pathThe path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/u...The path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/usr/local/bin/Rscript -<<"EOF" ${SCRIPT_DIRECTORY}
```
I encountered the issue because my local linux installation stores it in `/usr/bin/Rscript`, a different path on `falcon`, and if anyone uses `conda` it is likely it will be somewhere else.
A solution, tested on ubuntu 18.04 is to replace the line this:
```bash
$(which Rscript) -<<"EOF" ${SCRIPT_DIRECTORY}
```
which will find the path to `Rscript` whichever that might be. Since most users use OSX I am not sure who big of a problem this is, but the bug fix would solve it. Shall I go ahead and make the change?dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/70cut delimeter breaks fastqc_summary.R2019-09-09T14:56:18Zdominguecut delimeter breaks fastqc_summary.RIn this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.In this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/71Grep error message crashes the script2019-09-09T14:58:04ZdomingueGrep error message crashes the scriptThis function also has a grep related error:
```R
readBaseQualDist = function(statsFile){
# statsFile="./fastqc/mouse_big_cyst_rep2_fastqc/fastqc_data.txt"
# statsFile="./fastqc/mouse_liver_polar_stage3_rep2_fastqc/fastqc_data....This function also has a grep related error:
```R
readBaseQualDist = function(statsFile){
# statsFile="./fastqc/mouse_big_cyst_rep2_fastqc/fastqc_data.txt"
# statsFile="./fastqc/mouse_liver_polar_stage3_rep2_fastqc/fastqc_data.txt"
# grep -A30 -F '>>Per base sequence quality' /Volumes/projects/bioinfo/holger/projects/helin/mouse/fastqc/mouse_big_cyst_rep1_fastqc/fastqc_data.txt | grep -B100 -F '>>END_M' | head -n-1 | tail -n+2 | tr '#' ' '
# echo("reading", statsFile)
baseStats = read.delim(pipe(
#http://stackoverflow.com/questions/1946363/how-do-i-display-data-from-the-beginning-of-a-file-until-the-first-occurence-of/1947950#1947950
paste(get_zip_pipe(statsFile, "fastqc_data.txt"), " | grep -A200 -F '>>Per base sequence quality' | perl -pe 'last if />>END_MODULE/' | head -n-2 | tail -n+2 | tr '#' ' '")
)) %>% mutate(
run=trim_ext(basename(statsFile), ".zip")
)
baseStats %>% mutate(base_order=1:n())
}
grep: write error: Broken pipe
```
In this [line](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L148). Here the issue has detailed elsewhere:
> grep is complaining because it has more output than 10 lines, and head is cutting it off before it finishes
> I suggest hiding grep's stderr output (this is where the broken pipe error is printed).
I will try this.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/72Changes in ggplot break boxplot in fastqc_summary.R2019-09-09T14:57:39ZdomingueChanges in ggplot break boxplot in fastqc_summary.RIn particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed be...In particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed before:
https://stackoverflow.com/questions/57192727/getting-an-error-that-ggplot2-3-2-0-cant-draw-more-than-one-boxplot-per-groupdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/73Implement renvs2020-02-20T14:49:29ZdomingueImplement renvsIt started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.It started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/74dge_workflow: fix lcf help2019-12-02T14:43:31Zdominguedge_workflow: fix lcf helpRight now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical mode...Right now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical modelling leading to a more strict set of results. The help should reflect that.
See:
- https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#tests-of-log2-fold-change-above-or-below-a-threshold
https://support.bioconductor.org/p/101504/dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/75dge_workflow: future improvements2020-07-20T08:54:19Zdominguedge_workflow: future improvementsWhilst https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R works well, it was developed a long time ago and some of the `DESeq2` functionality and best-practices changed. So did what we might want to a...Whilst https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R works well, it was developed a long time ago and some of the `DESeq2` functionality and best-practices changed. So did what we might want to add or remove from the script.
In here we should list the things that we would like to change if we consider creating a new script for analysis. hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/76iGenome permissions2020-02-20T14:52:43ZdomingueiGenome permissionsSome of the iGenomes folders lack group write permissions, which means that only the owner can add / remove / change files. Is this by design? Or was an accident?Some of the iGenomes folders lack group write permissions, which means that only the owner can add / remove / change files. Is this by design? Or was an accident?henryhersemandominguehenryhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/81dge _workflow: analyze duplicates has a broken link to the plot of the model ...2020-02-10T13:08:45Zdominguedge _workflow: analyze duplicates has a broken link to the plot of the model (dupRadar)Right now it is a link to a non-permanent location (image hosting site). It should be replaced by a more stable location to avoid issues in the future.Right now it is a link to a non-permanent location (image hosting site). It should be replaced by a more stable location to avoid issues in the future.dominguedomingue