ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2019-08-09T09:35:40Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/60ms_workflow: add intensities to results table2019-08-09T09:35:40Zdominguems_workflow: add intensities to results tableSomehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.Somehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/59MS worflow: improvements2019-08-07T14:44:34ZdomingueMS worflow: improvementsWhilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it.
I will also catch up some some literature for MS since I know very little about it. Whilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it.
I will also catch up some some literature for MS since I know very little about it. dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/58dge_workflow: error message because gtf_file did not exist2019-06-27T13:53:03Zhersemandge_workflow: error message because gtf_file did not existran featcounts_deseq_mf.R without specifying the gtf file and got the following error message:
````
Warning message:
package ‘GenomicFeatures’ was built under R version 3.5.3
> ## horrible hack to avoid masking of dplyr::sel
> select <...ran featcounts_deseq_mf.R without specifying the gtf file and got the following error message:
````
Warning message:
package ‘GenomicFeatures’ was built under R version 3.5.3
> ## horrible hack to avoid masking of dplyr::sel
> select <- dplyr::select
>
> ## import gtf
> # if(!file.exists(gene.model)) stop(paste("GTF File:", gene.model, " does NOT exist. Run with: \n", runstr))
>
> gtf <- import.gff(
+ gtf_file,
+ format = "gtf",
+ feature.type = "exon"
+ )
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘import’ for signature ‘"NULL", "character", "character"’
Calls: import.gff -> import.gff -> import -> <Anonymous>
Execution halted
````hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/56dge_workflow: echoing R code in to Rscript not loading personal library packages2019-06-20T09:09:14Zdominguedge_workflow: echoing R code in to Rscript not loading personal library packagesThe function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/dataut...The function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/datautils/raw/v1.40/R/core_commons.R")
Error in loadNamespace(name) : there is no package called ‘devtools’
Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted
```
I tracked it down and this occurs because of the flag "--vanilla" in:
`| R --no-save --no-restore --no-site-file -q`. This flag keeps the R environment "clean" but that also means R packages installed in personal libraries are not loaded - thus the error despite `devtools` being installed.
There are two (three) solutions for this:
1. Replace `--vanilla` flag with `--no-save --no-restore --no-site-file` since `vanilla` is in fact a wrapper for `--no-save, --no-restore, --no-site-file, --no-init-file and --no-environ`. Removing the flags causing the issue will work (I tested it).
2. Instead of echoing the `R`code (`echo '[some code]' | R --no-save --no-restore --no-site-file -q`) we could the approach `Rscript - <<"EOF" [some code] EOF`. This was also tested and also works, but I have not looked into unintended consequences (loading of hidden R files for instance).
3. More of a long term solution, and probably not feasible, keep these R snippets in their separate `.R` files, or as functions, a call them with `Rscript some_function.R`.
For the time being I would suggest either of 1. and 2. @herseman If you have any preference let me know so that I make the PR.
---
A consequence of this bug, and that was how I found it, is that `star_align.kts` will not produce the count matrix table and, afaik, finish successfully with a reported warning.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/55dge_workflow: include MA plot from ggpubr in dge report2019-06-18T07:35:08Zhersemandge_workflow: include MA plot from ggpubr in dge report- https://rpkgs.datanovia.com/ggpubr/reference/ggmaplot.html- https://rpkgs.datanovia.com/ggpubr/reference/ggmaplot.htmlhersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/54general: add multiQC report to fastqc_summary2019-12-02T14:45:58Zhersemangeneral: add multiQC report to fastqc_summary- [multiQC](https://multiqc.info/)- [multiQC](https://multiqc.info/)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/53general: Test packrat for keep reproducible R environments2019-09-16T08:02:05Zdominguegeneral: Test packrat for keep reproducible R environments02.08.2019: created repo https://git.mpi-cbg.de/bioinfo/renvs_test
Right now every time there is an update to an R package, or R itself, some dependencies are also silently updated and some scripts might stop working. This is particular...02.08.2019: created repo https://git.mpi-cbg.de/bioinfo/renvs_test
Right now every time there is an update to an R package, or R itself, some dependencies are also silently updated and some scripts might stop working. This is particular critical for BioC packages which undergo frequent version updates that are hard to track.
In general terms it would also be good to keep a frozen set of R packages for long-running projects, and if possible to share upon publication.
[Packrat](https://rstudio.github.io/packrat/) should solve some of these issues:
> - Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because packrat gives each project its own private package library.
> - Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on.
> - Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
Things to test:
1. how long does it take to install a full set of typical packages for a project. Use `cores` option in `install.packages`.
2. how large is the resulting folder
3. can the project snapshot be recreated on a different computer (Ideally a different OS).
Things needed:
a. A list of packages (see old projects, sessionInfo logs)
b. Find out how to install these in the course of a workflow?
**Update 18.07.2019**
TODO:
- test going back to an older package version. Steps:
1. install an R package, old version, perhaps using `install_github` from an old commit.
2. Create snapshot
3. Install up-to-date package version
4. Go revert to old version
I should also test, if at all possible, with a `BioC` package. dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/52general: think about using argparser instead of docopt for futur scripts2020-05-26T12:54:01Zhersemangeneral: think about using argparser instead of docopt for futur scriptshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/49general: create new human igenome for the Ensembl release 952019-04-05T13:13:19Zhersemangeneral: create new human igenome for the Ensembl release 95* further information: http://www.ensembl.info/2019/01/09/ensembl-95-is-out/)* further information: http://www.ensembl.info/2019/01/09/ensembl-95-is-out/)hersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/48general: fix issues which occur due to updated R package versions2019-04-09T10:48:24Zhersemangeneral: fix issues which occur due to updated R package versionshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/47sc_workflow: add shinyApp to explore Seurat clusterings2019-03-22T12:38:42Zhersemansc_workflow: add shinyApp to explore Seurat clusteringshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/46sc_workflow: add additional gene information for the top50 most highly abunda...2019-03-13T09:03:47Zhersemansc_workflow: add additional gene information for the top50 most highly abundant features in `sc_quality_check.R`hersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/44sc_workflow: write ShinyApp to explore metrics from the initial quality control2019-03-20T10:29:13Zhersemansc_workflow: write ShinyApp to explore metrics from the initial quality control- include diffusion map and PCA which should both be colorable by all metrics found in the seo.rds object meta.cell slot and the cell_infos.txt or if present, the basic_cell_infos_incl_ccp.txt file
- include violin plot to additionally e...- include diffusion map and PCA which should both be colorable by all metrics found in the seo.rds object meta.cell slot and the cell_infos.txt or if present, the basic_cell_infos_incl_ccp.txt file
- include violin plot to additionally explore the distribution of those metricshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/43sc_workflow: add script for the calculation of diffusion map and pseudo time ...2019-03-05T15:57:01Zhersemansc_workflow: add script for the calculation of diffusion map and pseudo time using the `destiny` packagehersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/42sc_workflow: update sc_quality_check.R2019-03-20T12:58:01Zhersemansc_workflow: update sc_quality_check.R- check if script runs with current `scater` and `scran` versions
- export all quality metrics
- save sessionInfo()
- copy script to working directory- check if script runs with current `scater` and `scran` versions
- export all quality metrics
- save sessionInfo()
- copy script to working directoryhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/41ms_workflow: ms_limma.R returns error when no differentially abundant protein...2019-02-14T14:15:01Zhersemanms_workflow: ms_limma.R returns error when no differentially abundant proteins were foundhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/38general: save the actually used ngs_tools script in results data folder2019-03-13T16:08:41Zhersemangeneral: save the actually used ngs_tools script in results data folderso far, we log the version of ngs_tools used in a project; however, referring to the ngs_tools version used is not always accurate as an individual project may exist of multiple analyses steps done over a longer period of time and which ...so far, we log the version of ngs_tools used in a project; however, referring to the ngs_tools version used is not always accurate as an individual project may exist of multiple analyses steps done over a longer period of time and which therefore depend on different ngs_tools versions; to ensure reproducibility it may be useful to copy the actually used script to the results output folder, i.e. featcounts_deseq_mf.R in the dge_analysis folders
hersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/37ms_workflow: handling of contaminations2019-12-10T08:11:26Zhersemanms_workflow: handling of contaminations- REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script
- think about removal of keratin as a ...- REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script
- think about removal of keratin as a standard set-up (maybe Anna can provide us with a list of most common contaminations and we can start by reporting those specifically)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/36ms_workflow: add additional quality metrics to the differential abundance ana...2019-03-08T13:24:24Zhersemanms_workflow: add additional quality metrics to the differential abundance analysis output- take the intensities of the standard as a minimal threshold to mark low-abundant proteins
- summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); ...- take the intensities of the standard as a minimal threshold to mark low-abundant proteins
- summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); additionally include some summary of identification types per sample in the ms_data_prep.R script)
- report for which entries the proteinGroups order has been changed by sorting
- add information per gene whether the value for any replicate per condition has been imputedhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/34ms_workflow: report differential abundance proteins along with the identifica...2019-02-01T09:11:39Zhersemanms_workflow: report differential abundance proteins along with the identification type information as well as additional protein informationhersemanherseman