ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2019-12-02T14:43:31Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/74dge_workflow: fix lcf help2019-12-02T14:43:31Zdominguedge_workflow: fix lcf helpRight now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical mode...Right now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical modelling leading to a more strict set of results. The help should reflect that.
See:
- https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#tests-of-log2-fold-change-above-or-below-a-threshold
https://support.bioconductor.org/p/101504/dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/73Implement renvs2020-02-20T14:49:29ZdomingueImplement renvsIt started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.It started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/72Changes in ggplot break boxplot in fastqc_summary.R2019-09-09T14:57:39ZdomingueChanges in ggplot break boxplot in fastqc_summary.RIn particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed be...In particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed before:
https://stackoverflow.com/questions/57192727/getting-an-error-that-ggplot2-3-2-0-cant-draw-more-than-one-boxplot-per-groupdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/71Grep error message crashes the script2019-09-09T14:58:04ZdomingueGrep error message crashes the scriptThis function also has a grep related error:
```R
readBaseQualDist = function(statsFile){
# statsFile="./fastqc/mouse_big_cyst_rep2_fastqc/fastqc_data.txt"
# statsFile="./fastqc/mouse_liver_polar_stage3_rep2_fastqc/fastqc_data....This function also has a grep related error:
```R
readBaseQualDist = function(statsFile){
# statsFile="./fastqc/mouse_big_cyst_rep2_fastqc/fastqc_data.txt"
# statsFile="./fastqc/mouse_liver_polar_stage3_rep2_fastqc/fastqc_data.txt"
# grep -A30 -F '>>Per base sequence quality' /Volumes/projects/bioinfo/holger/projects/helin/mouse/fastqc/mouse_big_cyst_rep1_fastqc/fastqc_data.txt | grep -B100 -F '>>END_M' | head -n-1 | tail -n+2 | tr '#' ' '
# echo("reading", statsFile)
baseStats = read.delim(pipe(
#http://stackoverflow.com/questions/1946363/how-do-i-display-data-from-the-beginning-of-a-file-until-the-first-occurence-of/1947950#1947950
paste(get_zip_pipe(statsFile, "fastqc_data.txt"), " | grep -A200 -F '>>Per base sequence quality' | perl -pe 'last if />>END_MODULE/' | head -n-2 | tail -n+2 | tr '#' ' '")
)) %>% mutate(
run=trim_ext(basename(statsFile), ".zip")
)
baseStats %>% mutate(base_order=1:n())
}
grep: write error: Broken pipe
```
In this [line](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L148). Here the issue has detailed elsewhere:
> grep is complaining because it has more output than 10 lines, and head is cutting it off before it finishes
> I suggest hiding grep's stderr output (this is where the broken pipe error is printed).
I will try this.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/70cut delimeter breaks fastqc_summary.R2019-09-09T14:56:18Zdominguecut delimeter breaks fastqc_summary.RIn this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.In this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/67expression_explorer: harcoded Rscipt path2019-08-21T11:19:33Zdomingueexpression_explorer: harcoded Rscipt pathThe path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/u...The path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/usr/local/bin/Rscript -<<"EOF" ${SCRIPT_DIRECTORY}
```
I encountered the issue because my local linux installation stores it in `/usr/bin/Rscript`, a different path on `falcon`, and if anyone uses `conda` it is likely it will be somewhere else.
A solution, tested on ubuntu 18.04 is to replace the line this:
```bash
$(which Rscript) -<<"EOF" ${SCRIPT_DIRECTORY}
```
which will find the path to `Rscript` whichever that might be. Since most users use OSX I am not sure who big of a problem this is, but the bug fix would solve it. Shall I go ahead and make the change?dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/57dge workflow: add GTF an argument for differential gene expression2019-06-28T17:51:47Zdominguedge workflow: add GTF an argument for differential gene expressionRelates to the [DGE analysis](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R) and it would have two uses:
1. retrieval of "accurate" gene lengths, exonic regions only, to calculate RPKM and FPM u...Relates to the [DGE analysis](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R) and it would have two uses:
1. retrieval of "accurate" gene lengths, exonic regions only, to calculate RPKM and FPM using `DESeq2` in-built functionality (more details [here](https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/fpkm))
2. GTFs already contain a wealth of information which currently needs to be retrieved wiht `biomaRt`. Getting it from the GTF would make the process faster, more reproducible (in my experience `biomaRt` changes quite often) and it would work even for organisms not present in biomart or other marts (eg. planaria)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/56dge_workflow: echoing R code in to Rscript not loading personal library packages2019-06-20T09:09:14Zdominguedge_workflow: echoing R code in to Rscript not loading personal library packagesThe function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/dataut...The function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/datautils/raw/v1.40/R/core_commons.R")
Error in loadNamespace(name) : there is no package called ‘devtools’
Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted
```
I tracked it down and this occurs because of the flag "--vanilla" in:
`| R --no-save --no-restore --no-site-file -q`. This flag keeps the R environment "clean" but that also means R packages installed in personal libraries are not loaded - thus the error despite `devtools` being installed.
There are two (three) solutions for this:
1. Replace `--vanilla` flag with `--no-save --no-restore --no-site-file` since `vanilla` is in fact a wrapper for `--no-save, --no-restore, --no-site-file, --no-init-file and --no-environ`. Removing the flags causing the issue will work (I tested it).
2. Instead of echoing the `R`code (`echo '[some code]' | R --no-save --no-restore --no-site-file -q`) we could the approach `Rscript - <<"EOF" [some code] EOF`. This was also tested and also works, but I have not looked into unintended consequences (loading of hidden R files for instance).
3. More of a long term solution, and probably not feasible, keep these R snippets in their separate `.R` files, or as functions, a call them with `Rscript some_function.R`.
For the time being I would suggest either of 1. and 2. @herseman If you have any preference let me know so that I make the PR.
---
A consequence of this bug, and that was how I found it, is that `star_align.kts` will not produce the count matrix table and, afaik, finish successfully with a reported warning.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/14export TPM instead of FPKM tables2017-06-19T11:19:42Zbrandlbrandl@mpi-cbg.deexport TPM instead of FPKM tableshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/12Add option to perform diffex-test in mf_feat_counts.R using fc-cutoff > 0.2017-06-19T07:58:38Zbrandlbrandl@mpi-cbg.deAdd option to perform diffex-test in mf_feat_counts.R using fc-cutoff > 0.brandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.dehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/11Integrate tools to assess gc/3/5 bias, insert size, and capture efficiency2020-02-20T12:42:13Zbrandlbrandl@mpi-cbg.deIntegrate tools to assess gc/3/5 bias, insert size, and capture efficiencye.g. using http://deeptools.readthedocs.io/en/latest/content/tools/computeGCBias.html?highlight=bias
(see "Multi-perspective quality control of Illumina RNA sequencing data analysis" http://bfg.oxfordjournals.org/content/early/2016/09/28...e.g. using http://deeptools.readthedocs.io/en/latest/content/tools/computeGCBias.html?highlight=bias
(see "Multi-perspective quality control of Illumina RNA sequencing data analysis" http://bfg.oxfordjournals.org/content/early/2016/09/28/bfgp.elw035.abstract )
@lakshman opinion?https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/10use http://multiqc.info/ for qc reporting2019-11-18T09:06:51Zbrandlbrandl@mpi-cbg.deuse http://multiqc.info/ for qc reportinghttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/9Export counts as TPM in featcounts_deseq_mf.R2017-07-08T23:44:42Zbrandlbrandl@mpi-cbg.deExport counts as TPM in featcounts_deseq_mf.RThey are better than fpkm according to Mathias and allow for better comparison across samplesThey are better than fpkm according to Mathias and allow for better comparison across samplesbrandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.dehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/8create tool to create igv session file list of bam files2017-07-08T23:44:42Zbrandlbrandl@mpi-cbg.decreate tool to create igv session file list of bam files@lakshman fyi https://github.com/igvteam/igv/issues/234@lakshman fyi https://github.com/igvteam/igv/issues/234https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/7finish panther enrichment and reintegrate into cp_enrichment.R2020-02-20T16:06:56Zbrandlbrandl@mpi-cbg.definish panther enrichment and reintegrate into cp_enrichment.Rsee
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/panther_enrichment.R
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/cp_enrichment.Rsee
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/panther_enrichment.R
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/cp_enrichment.Rhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/6Add pcr duplication analysis for rna-seq workflow2017-11-09T15:39:52Zbrandlbrandl@mpi-cbg.deAdd pcr duplication analysis for rna-seq workflowhttps://www.bioconductor.org/packages/release/bioc/html/dupRadar.html
@lakshman what do you thinkhttps://www.bioconductor.org/packages/release/bioc/html/dupRadar.html
@lakshman what do you thinkhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/1make sure that exported tables contain double-normalized counts and not just ...2019-02-12T10:35:31Zbrandlbrandl@mpi-cbg.demake sure that exported tables contain double-normalized counts and not just size-normalized onesbrandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.de