ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2019-08-21T11:19:33Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/67expression_explorer: harcoded Rscipt path2019-08-21T11:19:33Zdomingueexpression_explorer: harcoded Rscipt pathThe path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/u...The path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/usr/local/bin/Rscript -<<"EOF" ${SCRIPT_DIRECTORY}
```
I encountered the issue because my local linux installation stores it in `/usr/bin/Rscript`, a different path on `falcon`, and if anyone uses `conda` it is likely it will be somewhere else.
A solution, tested on ubuntu 18.04 is to replace the line this:
```bash
$(which Rscript) -<<"EOF" ${SCRIPT_DIRECTORY}
```
which will find the path to `Rscript` whichever that might be. Since most users use OSX I am not sure who big of a problem this is, but the bug fix would solve it. Shall I go ahead and make the change?dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/14export TPM instead of FPKM tables2017-06-19T11:19:42Zbrandlbrandl@mpi-cbg.deexport TPM instead of FPKM tableshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/9Export counts as TPM in featcounts_deseq_mf.R2017-07-08T23:44:42Zbrandlbrandl@mpi-cbg.deExport counts as TPM in featcounts_deseq_mf.RThey are better than fpkm according to Mathias and allow for better comparison across samplesThey are better than fpkm according to Mathias and allow for better comparison across samplesbrandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.dehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/100exDesign: numeric vs categorical variables and shrinkage method2021-02-23T07:51:33ZgohrexDesign: numeric vs categorical variables and shrinkage method- DESeq2 treats numeric experimental variables as numeric variables, not as discrete variables
- hence: all discrete variables should have values that are not numbers, e.g. litter1, litter, litter3 instead of 1, 2, 3
- DESeq2: one should...- DESeq2 treats numeric experimental variables as numeric variables, not as discrete variables
- hence: all discrete variables should have values that are not numbers, e.g. litter1, litter, litter3 instead of 1, 2, 3
- DESeq2: one should go through these steps:
1. contrast_oe <- c("sampletype", "MOV10_overexpression", "control")
2. res_tableOE_unshrunken <- results(dds, contrast=contrast_oe, alpha = 0.05)
3. res_tableOE <- lfcShrink(dds, contrast=contrast_oe, res=res_tableOE_unshrunken)
This allows to use other approaches for shrinking the logFC than the DEQeq standard approach See:
> What you observe is consistent with what we see in testing on the benchmarking data and on simulation data.
> If you just compare method="normal" to method="apeglm" or "ashr", the differences you are likely to see is
> that normal will shrink large effects even if they have high precision (so shrinking too much) and allow
> small effects to float around 0, while apeglm/ashr will not shrink the precise, large effects much at all and > the small effects which are indistinguishable from 0 will be shrunk to 0.
Papers show that these other two approaches are more effective.gohrgohrhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/75dge_workflow: future improvements2020-07-20T08:54:19Zdominguedge_workflow: future improvementsWhilst https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R works well, it was developed a long time ago and some of the `DESeq2` functionality and best-practices changed. So did what we might want to a...Whilst https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R works well, it was developed a long time ago and some of the `DESeq2` functionality and best-practices changed. So did what we might want to add or remove from the script.
In here we should list the things that we would like to change if we consider creating a new script for analysis. hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/74dge_workflow: fix lcf help2019-12-02T14:43:31Zdominguedge_workflow: fix lcf helpRight now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical mode...Right now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical modelling leading to a more strict set of results. The help should reflect that.
See:
- https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#tests-of-log2-fold-change-above-or-below-a-threshold
https://support.bioconductor.org/p/101504/dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/91dge_workflow: expression_explorer app failed to load due to renamed/additiona...2020-07-10T10:23:01Zhersemandge_workflow: expression_explorer app failed to load due to renamed/additional annotation columns**Issues**:
- at least for the igenome `Homo_sapiens/Ensembl_v99` (*others were not tested*) running `featcounts_deseq_mf.R` with the `--gtf` flag results in empty gene descriptions which have to be manually added; however, adding the in...**Issues**:
- at least for the igenome `Homo_sapiens/Ensembl_v99` (*others were not tested*) running `featcounts_deseq_mf.R` with the `--gtf` flag results in empty gene descriptions which have to be manually added; however, adding the information from biomaRt ensembl results in the column 'description' instead of 'gene_description' if not manually changed and this leads to issues with the `expression_explorer` app which assumes the 'gene_description' but not the 'description' column
- annotation columns (e.g. domain prediction) which are additionally added to the dge results are not taken into account when the columns for further data summarization are selected in the `gather` functionshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/56dge_workflow: echoing R code in to Rscript not loading personal library packages2019-06-20T09:09:14Zdominguedge_workflow: echoing R code in to Rscript not loading personal library packagesThe function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/dataut...The function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/datautils/raw/v1.40/R/core_commons.R")
Error in loadNamespace(name) : there is no package called ‘devtools’
Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted
```
I tracked it down and this occurs because of the flag "--vanilla" in:
`| R --no-save --no-restore --no-site-file -q`. This flag keeps the R environment "clean" but that also means R packages installed in personal libraries are not loaded - thus the error despite `devtools` being installed.
There are two (three) solutions for this:
1. Replace `--vanilla` flag with `--no-save --no-restore --no-site-file` since `vanilla` is in fact a wrapper for `--no-save, --no-restore, --no-site-file, --no-init-file and --no-environ`. Removing the flags causing the issue will work (I tested it).
2. Instead of echoing the `R`code (`echo '[some code]' | R --no-save --no-restore --no-site-file -q`) we could the approach `Rscript - <<"EOF" [some code] EOF`. This was also tested and also works, but I have not looked into unintended consequences (loading of hidden R files for instance).
3. More of a long term solution, and probably not feasible, keep these R snippets in their separate `.R` files, or as functions, a call them with `Rscript some_function.R`.
For the time being I would suggest either of 1. and 2. @herseman If you have any preference let me know so that I make the PR.
---
A consequence of this bug, and that was how I found it, is that `star_align.kts` will not produce the count matrix table and, afaik, finish successfully with a reported warning.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/84dge_workflow: collect_kallisto_data.R fails for paired-end data2020-03-12T10:54:15Zhersemandge_workflow: collect_kallisto_data.R fails for paired-end data- parsing the kallisto.log files failed because the two fastq files are listed not in one line
- additionally, it makes more sense to provide a list of kallisto output folders as argument to the script instead of taking all subfolders in...- parsing the kallisto.log files failed because the two fastq files are listed not in one line
- additionally, it makes more sense to provide a list of kallisto output folders as argument to the script instead of taking all subfolders in the current working directory which causes issues in case other subfolders e.g. from `multiqc` are present as wellhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/81dge _workflow: analyze duplicates has a broken link to the plot of the model ...2020-02-10T13:08:45Zdominguedge _workflow: analyze duplicates has a broken link to the plot of the model (dupRadar)Right now it is a link to a non-permanent location (image hosting site). It should be replaced by a more stable location to avoid issues in the future.Right now it is a link to a non-permanent location (image hosting site). It should be replaced by a more stable location to avoid issues in the future.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/57dge workflow: add GTF an argument for differential gene expression2019-06-28T17:51:47Zdominguedge workflow: add GTF an argument for differential gene expressionRelates to the [DGE analysis](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R) and it would have two uses:
1. retrieval of "accurate" gene lengths, exonic regions only, to calculate RPKM and FPM u...Relates to the [DGE analysis](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R) and it would have two uses:
1. retrieval of "accurate" gene lengths, exonic regions only, to calculate RPKM and FPM using `DESeq2` in-built functionality (more details [here](https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/fpkm))
2. GTFs already contain a wealth of information which currently needs to be retrieved wiht `biomaRt`. Getting it from the GTF would make the process faster, more reproducible (in my experience `biomaRt` changes quite often) and it would work even for organisms not present in biomart or other marts (eg. planaria)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/70cut delimeter breaks fastqc_summary.R2019-09-09T14:56:18Zdominguecut delimeter breaks fastqc_summary.RIn this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.In this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/8create tool to create igv session file list of bam files2017-07-08T23:44:42Zbrandlbrandl@mpi-cbg.decreate tool to create igv session file list of bam files@lakshman fyi https://github.com/igvteam/igv/issues/234@lakshman fyi https://github.com/igvteam/igv/issues/234https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/72Changes in ggplot break boxplot in fastqc_summary.R2019-09-09T14:57:39ZdomingueChanges in ggplot break boxplot in fastqc_summary.RIn particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed be...In particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed before:
https://stackoverflow.com/questions/57192727/getting-an-error-that-ggplot2-3-2-0-cant-draw-more-than-one-boxplot-per-groupdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/90argparser error when --bam_files is the last optional argument2020-07-01T14:58:02Zdomingueargparser error when --bam_files is the last optional argumentLena reported some error occurring when the `--bam_files` option was placed as the last optional arguument of `genic_counts.R`.
I will investigate.Lena reported some error occurring when the `--bam_files` option was placed as the last optional arguument of `genic_counts.R`.
I will investigate.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/6Add pcr duplication analysis for rna-seq workflow2017-11-09T15:39:52Zbrandlbrandl@mpi-cbg.deAdd pcr duplication analysis for rna-seq workflowhttps://www.bioconductor.org/packages/release/bioc/html/dupRadar.html
@lakshman what do you thinkhttps://www.bioconductor.org/packages/release/bioc/html/dupRadar.html
@lakshman what do you thinkhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/12Add option to perform diffex-test in mf_feat_counts.R using fc-cutoff > 0.2017-06-19T07:58:38Zbrandlbrandl@mpi-cbg.deAdd option to perform diffex-test in mf_feat_counts.R using fc-cutoff > 0.brandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.de