ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2019-11-18T09:06:51Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/10use http://multiqc.info/ for qc reporting2019-11-18T09:06:51Zbrandlbrandl@mpi-cbg.deuse http://multiqc.info/ for qc reportinghttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/65DEP installation on falcon - nc-config missing2019-09-17T07:17:00ZdomingueDEP installation on falcon - nc-config missingI was trying to install an R/BioC package on falcon r/3.5.1, and ran into a dependency issue:
```R
* installing *source* package ‘ncdf4’ ...
** package ‘ncdf4’ successfully unpacked and MD5 sums checked
configure.ac: starting
checking f...I was trying to install an R/BioC package on falcon r/3.5.1, and ran into a dependency issue:
```R
* installing *source* package ‘ncdf4’ ...
** package ‘ncdf4’ successfully unpacked and MD5 sums checked
configure.ac: starting
checking for nc-config... no
-----------------------------------------------------------------------------------
Error, nc-config not found or not executable. This is a script that comes with the
netcdf library, version 4.1-beta2 or later, and must be present for configuration
to succeed.
If you installed the netcdf library (and nc-config) in a standard location, nc-config
should be found automatically. Otherwise, you can specify the full path and name of
the nc-config script by passing the --with-nc-config=/full/path/nc-config argument
flag to the configure script. For example:
./configure --with-nc-config=/sw/dist/netcdf4/bin/nc-config
Special note for R users:
-------------------------
To pass the configure flag to R, use something like this:
R CMD INSTALL --configure-args="--with-nc-config=/home/joe/bin/nc-config" ncdf4
where you should replace /home/joe/bin etc. with the location where you have
installed the nc-config script that came with the netcdf 4 distribution.
-----------------------------------------------------------------------------------
ERROR: configuration failed for package ‘ncdf4’
```
I had this issue when installing this same package on my computer, also running Linux, and if I remember correctly the solution was to install some missing libraries with sudo. Well I don’t have sudo for falcon so I contacted hpc support to get it fixed.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/53general: Test packrat for keep reproducible R environments2019-09-16T08:02:05Zdominguegeneral: Test packrat for keep reproducible R environments02.08.2019: created repo https://git.mpi-cbg.de/bioinfo/renvs_test
Right now every time there is an update to an R package, or R itself, some dependencies are also silently updated and some scripts might stop working. This is particular...02.08.2019: created repo https://git.mpi-cbg.de/bioinfo/renvs_test
Right now every time there is an update to an R package, or R itself, some dependencies are also silently updated and some scripts might stop working. This is particular critical for BioC packages which undergo frequent version updates that are hard to track.
In general terms it would also be good to keep a frozen set of R packages for long-running projects, and if possible to share upon publication.
[Packrat](https://rstudio.github.io/packrat/) should solve some of these issues:
> - Isolated: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because packrat gives each project its own private package library.
> - Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on.
> - Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
Things to test:
1. how long does it take to install a full set of typical packages for a project. Use `cores` option in `install.packages`.
2. how large is the resulting folder
3. can the project snapshot be recreated on a different computer (Ideally a different OS).
Things needed:
a. A list of packages (see old projects, sessionInfo logs)
b. Find out how to install these in the course of a workflow?
**Update 18.07.2019**
TODO:
- test going back to an older package version. Steps:
1. install an R package, old version, perhaps using `install_github` from an old commit.
2. Create snapshot
3. Install up-to-date package version
4. Go revert to old version
I should also test, if at all possible, with a `BioC` package. dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/71Grep error message crashes the script2019-09-09T14:58:04ZdomingueGrep error message crashes the scriptThis function also has a grep related error:
```R
readBaseQualDist = function(statsFile){
# statsFile="./fastqc/mouse_big_cyst_rep2_fastqc/fastqc_data.txt"
# statsFile="./fastqc/mouse_liver_polar_stage3_rep2_fastqc/fastqc_data....This function also has a grep related error:
```R
readBaseQualDist = function(statsFile){
# statsFile="./fastqc/mouse_big_cyst_rep2_fastqc/fastqc_data.txt"
# statsFile="./fastqc/mouse_liver_polar_stage3_rep2_fastqc/fastqc_data.txt"
# grep -A30 -F '>>Per base sequence quality' /Volumes/projects/bioinfo/holger/projects/helin/mouse/fastqc/mouse_big_cyst_rep1_fastqc/fastqc_data.txt | grep -B100 -F '>>END_M' | head -n-1 | tail -n+2 | tr '#' ' '
# echo("reading", statsFile)
baseStats = read.delim(pipe(
#http://stackoverflow.com/questions/1946363/how-do-i-display-data-from-the-beginning-of-a-file-until-the-first-occurence-of/1947950#1947950
paste(get_zip_pipe(statsFile, "fastqc_data.txt"), " | grep -A200 -F '>>Per base sequence quality' | perl -pe 'last if />>END_MODULE/' | head -n-2 | tail -n+2 | tr '#' ' '")
)) %>% mutate(
run=trim_ext(basename(statsFile), ".zip")
)
baseStats %>% mutate(base_order=1:n())
}
grep: write error: Broken pipe
```
In this [line](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L148). Here the issue has detailed elsewhere:
> grep is complaining because it has more output than 10 lines, and head is cutting it off before it finishes
> I suggest hiding grep's stderr output (this is where the broken pipe error is printed).
I will try this.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/72Changes in ggplot break boxplot in fastqc_summary.R2019-09-09T14:57:39ZdomingueChanges in ggplot break boxplot in fastqc_summary.RIn particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed be...In particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed before:
https://stackoverflow.com/questions/57192727/getting-an-error-that-ggplot2-3-2-0-cant-draw-more-than-one-boxplot-per-groupdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/70cut delimeter breaks fastqc_summary.R2019-09-09T14:56:18Zdominguecut delimeter breaks fastqc_summary.RIn this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.In this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/69ms_worflow: bug in heatmap clustering2019-08-28T09:04:01Zdominguems_worflow: bug in heatmap clusteringWe are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git...We are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/ms_workflow/02-ms-DEP-analysis.R#L482dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/68ms_workflow: Add table with % of contaminants in the most abudant proteins2019-08-26T08:24:52Zdominguems_workflow: Add table with % of contaminants in the most abudant proteinsThis can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological proce...This can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological process being studied. We will leave the decision on how to handle contaminants to the project owner.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/67expression_explorer: harcoded Rscipt path2019-08-21T11:19:33Zdomingueexpression_explorer: harcoded Rscipt pathThe path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/u...The path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/usr/local/bin/Rscript -<<"EOF" ${SCRIPT_DIRECTORY}
```
I encountered the issue because my local linux installation stores it in `/usr/bin/Rscript`, a different path on `falcon`, and if anyone uses `conda` it is likely it will be somewhere else.
A solution, tested on ubuntu 18.04 is to replace the line this:
```bash
$(which Rscript) -<<"EOF" ${SCRIPT_DIRECTORY}
```
which will find the path to `Rscript` whichever that might be. Since most users use OSX I am not sure who big of a problem this is, but the bug fix would solve it. Shall I go ahead and make the change?dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/64ms_workflow: QC improvments2019-08-13T13:55:45Zdominguems_workflow: QC improvmentsDuring a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peru...During a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peruse to see if there is something we could use for our workflow.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/62ms_workflow: replace PCA with plotly2019-08-09T14:40:25Zdominguems_workflow: replace PCA with plotlyCurrently having colour and shape to define condition / replicate is not visually pleasant or readable. I am was using the `dep::plot_pca` but I will replace it with plotly.Currently having colour and shape to define condition / replicate is not visually pleasant or readable. I am was using the `dep::plot_pca` but I will replace it with plotly.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/61ms_workflow: add % of NAs to results table2019-08-09T09:58:43Zdominguems_workflow: add % of NAs to results table@herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions.@herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/60ms_workflow: add intensities to results table2019-08-09T09:35:40Zdominguems_workflow: add intensities to results tableSomehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.Somehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/63ms_workflow: add more explanations2019-08-09T08:21:02Zdominguems_workflow: add more explanationsCurrently the text explanations are very sparse. Things to add:
- what are uniq and prop proteins
- LFQ vs rawCurrently the text explanations are very sparse. Things to add:
- what are uniq and prop proteins
- LFQ vs rawdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/59MS worflow: improvements2019-08-07T14:44:34ZdomingueMS worflow: improvementsWhilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it.
I will also catch up some some literature for MS since I know very little about it. Whilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it.
I will also catch up some some literature for MS since I know very little about it. dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/58dge_workflow: error message because gtf_file did not exist2019-06-27T13:53:03Zhersemandge_workflow: error message because gtf_file did not existran featcounts_deseq_mf.R without specifying the gtf file and got the following error message:
````
Warning message:
package ‘GenomicFeatures’ was built under R version 3.5.3
> ## horrible hack to avoid masking of dplyr::sel
> select <...ran featcounts_deseq_mf.R without specifying the gtf file and got the following error message:
````
Warning message:
package ‘GenomicFeatures’ was built under R version 3.5.3
> ## horrible hack to avoid masking of dplyr::sel
> select <- dplyr::select
>
> ## import gtf
> # if(!file.exists(gene.model)) stop(paste("GTF File:", gene.model, " does NOT exist. Run with: \n", runstr))
>
> gtf <- import.gff(
+ gtf_file,
+ format = "gtf",
+ feature.type = "exon"
+ )
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘import’ for signature ‘"NULL", "character", "character"’
Calls: import.gff -> import.gff -> import -> <Anonymous>
Execution halted
````hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/56dge_workflow: echoing R code in to Rscript not loading personal library packages2019-06-20T09:09:14Zdominguedge_workflow: echoing R code in to Rscript not loading personal library packagesThe function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/dataut...The function `dge_star_counts2matrix`, located in [dge_utils](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/dge_utils.sh#L364) stops with the error:
```bash
devtools::source_url("https://git.mpi-cbg.de/bioinfo/datautils/raw/v1.40/R/core_commons.R")
Error in loadNamespace(name) : there is no package called ‘devtools’
Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted
```
I tracked it down and this occurs because of the flag "--vanilla" in:
`| R --no-save --no-restore --no-site-file -q`. This flag keeps the R environment "clean" but that also means R packages installed in personal libraries are not loaded - thus the error despite `devtools` being installed.
There are two (three) solutions for this:
1. Replace `--vanilla` flag with `--no-save --no-restore --no-site-file` since `vanilla` is in fact a wrapper for `--no-save, --no-restore, --no-site-file, --no-init-file and --no-environ`. Removing the flags causing the issue will work (I tested it).
2. Instead of echoing the `R`code (`echo '[some code]' | R --no-save --no-restore --no-site-file -q`) we could the approach `Rscript - <<"EOF" [some code] EOF`. This was also tested and also works, but I have not looked into unintended consequences (loading of hidden R files for instance).
3. More of a long term solution, and probably not feasible, keep these R snippets in their separate `.R` files, or as functions, a call them with `Rscript some_function.R`.
For the time being I would suggest either of 1. and 2. @herseman If you have any preference let me know so that I make the PR.
---
A consequence of this bug, and that was how I found it, is that `star_align.kts` will not produce the count matrix table and, afaik, finish successfully with a reported warning.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/55dge_workflow: include MA plot from ggpubr in dge report2019-06-18T07:35:08Zhersemandge_workflow: include MA plot from ggpubr in dge report- https://rpkgs.datanovia.com/ggpubr/reference/ggmaplot.html- https://rpkgs.datanovia.com/ggpubr/reference/ggmaplot.htmlhersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/48general: fix issues which occur due to updated R package versions2019-04-09T10:48:24Zhersemangeneral: fix issues which occur due to updated R package versionshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/49general: create new human igenome for the Ensembl release 952019-04-05T13:13:19Zhersemangeneral: create new human igenome for the Ensembl release 95* further information: http://www.ensembl.info/2019/01/09/ensembl-95-is-out/)* further information: http://www.ensembl.info/2019/01/09/ensembl-95-is-out/)hersemanherseman