ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2020-02-10T13:08:45Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/81dge _workflow: analyze duplicates has a broken link to the plot of the model ...2020-02-10T13:08:45Zdominguedge _workflow: analyze duplicates has a broken link to the plot of the model (dupRadar)Right now it is a link to a non-permanent location (image hosting site). It should be replaced by a more stable location to avoid issues in the future.Right now it is a link to a non-permanent location (image hosting site). It should be replaced by a more stable location to avoid issues in the future.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/79updating ms_workflow2020-07-13T09:28:16Zhersemanupdating ms_workflowhersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/78Create stable conda enviroment for Single cell analysis conda on falcon2020-07-08T06:37:41ZdomingueCreate stable conda enviroment for Single cell analysis conda on falconFollowing up from #77, the goal is now to use conda and create stable, shareable environments with different versions of tools / R that we could use for projects.Following up from #77, the goal is now to use conda and create stable, shareable environments with different versions of tools / R that we could use for projects.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/77conda and bioconductor version2020-01-21T15:41:41Zdomingueconda and bioconductor versionThe idea is that could use conda to setup an environment with different R / BioC packages to avoid situations when a package is update without backwards compatibility (as it happened with `scater`).
We need to test if in a conda env for...The idea is that could use conda to setup an environment with different R / BioC packages to avoid situations when a package is update without backwards compatibility (as it happened with `scater`).
We need to test if in a conda env for `R3.5` the BioC version installed is the latest, or that which was released with that R version.hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/76iGenome permissions2020-02-20T14:52:43ZdomingueiGenome permissionsSome of the iGenomes folders lack group write permissions, which means that only the owner can add / remove / change files. Is this by design? Or was an accident?Some of the iGenomes folders lack group write permissions, which means that only the owner can add / remove / change files. Is this by design? Or was an accident?henryhersemandominguehenryhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/75dge_workflow: future improvements2020-07-20T08:54:19Zdominguedge_workflow: future improvementsWhilst https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R works well, it was developed a long time ago and some of the `DESeq2` functionality and best-practices changed. So did what we might want to a...Whilst https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R works well, it was developed a long time ago and some of the `DESeq2` functionality and best-practices changed. So did what we might want to add or remove from the script.
In here we should list the things that we would like to change if we consider creating a new script for analysis. hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/74dge_workflow: fix lcf help2019-12-02T14:43:31Zdominguedge_workflow: fix lcf helpRight now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical mode...Right now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical modelling leading to a more strict set of results. The help should reflect that.
See:
- https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#tests-of-log2-fold-change-above-or-below-a-threshold
https://support.bioconductor.org/p/101504/dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/73Implement renvs2020-02-20T14:49:29ZdomingueImplement renvsIt started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.It started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/72Changes in ggplot break boxplot in fastqc_summary.R2019-09-09T14:57:39ZdomingueChanges in ggplot break boxplot in fastqc_summary.RIn particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed be...In particular this plot:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L187
with the error:
> Error: Can't draw more than one boxplot per group. Did you forget aes(group = ...)?
This has been noticed before:
https://stackoverflow.com/questions/57192727/getting-an-error-that-ggplot2-3-2-0-cant-draw-more-than-one-boxplot-per-groupdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/71Grep error message crashes the script2019-09-09T14:58:04ZdomingueGrep error message crashes the scriptThis function also has a grep related error:
```R
readBaseQualDist = function(statsFile){
# statsFile="./fastqc/mouse_big_cyst_rep2_fastqc/fastqc_data.txt"
# statsFile="./fastqc/mouse_liver_polar_stage3_rep2_fastqc/fastqc_data....This function also has a grep related error:
```R
readBaseQualDist = function(statsFile){
# statsFile="./fastqc/mouse_big_cyst_rep2_fastqc/fastqc_data.txt"
# statsFile="./fastqc/mouse_liver_polar_stage3_rep2_fastqc/fastqc_data.txt"
# grep -A30 -F '>>Per base sequence quality' /Volumes/projects/bioinfo/holger/projects/helin/mouse/fastqc/mouse_big_cyst_rep1_fastqc/fastqc_data.txt | grep -B100 -F '>>END_M' | head -n-1 | tail -n+2 | tr '#' ' '
# echo("reading", statsFile)
baseStats = read.delim(pipe(
#http://stackoverflow.com/questions/1946363/how-do-i-display-data-from-the-beginning-of-a-file-until-the-first-occurence-of/1947950#1947950
paste(get_zip_pipe(statsFile, "fastqc_data.txt"), " | grep -A200 -F '>>Per base sequence quality' | perl -pe 'last if />>END_MODULE/' | head -n-2 | tail -n+2 | tr '#' ' '")
)) %>% mutate(
run=trim_ext(basename(statsFile), ".zip")
)
baseStats %>% mutate(base_order=1:n())
}
grep: write error: Broken pipe
```
In this [line](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L148). Here the issue has detailed elsewhere:
> grep is complaining because it has more output than 10 lines, and head is cutting it off before it finishes
> I suggest hiding grep's stderr output (this is where the broken pipe error is printed).
I will try this.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/70cut delimeter breaks fastqc_summary.R2019-09-09T14:56:18Zdominguecut delimeter breaks fastqc_summary.RIn this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.In this line:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/common/fastqc_summary.R#L122
`-d'\t'` is not necessary because `cut` is aware of tabs and it breaks the script because a delimiter must be a single character.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/69ms_worflow: bug in heatmap clustering2019-08-28T09:04:01Zdominguems_worflow: bug in heatmap clusteringWe are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git...We are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/ms_workflow/02-ms-DEP-analysis.R#L482dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/68ms_workflow: Add table with % of contaminants in the most abudant proteins2019-08-26T08:24:52Zdominguems_workflow: Add table with % of contaminants in the most abudant proteinsThis can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological proce...This can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological process being studied. We will leave the decision on how to handle contaminants to the project owner.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/67expression_explorer: harcoded Rscipt path2019-08-21T11:19:33Zdomingueexpression_explorer: harcoded Rscipt pathThe path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/u...The path to `Rscript` is [hard coded](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/expression_explorer/expression_explorer#L5) which means that if this binary is in a different paths it will not be found:
```bash
/usr/local/bin/Rscript -<<"EOF" ${SCRIPT_DIRECTORY}
```
I encountered the issue because my local linux installation stores it in `/usr/bin/Rscript`, a different path on `falcon`, and if anyone uses `conda` it is likely it will be somewhere else.
A solution, tested on ubuntu 18.04 is to replace the line this:
```bash
$(which Rscript) -<<"EOF" ${SCRIPT_DIRECTORY}
```
which will find the path to `Rscript` whichever that might be. Since most users use OSX I am not sure who big of a problem this is, but the bug fix would solve it. Shall I go ahead and make the change?dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/65DEP installation on falcon - nc-config missing2019-09-17T07:17:00ZdomingueDEP installation on falcon - nc-config missingI was trying to install an R/BioC package on falcon r/3.5.1, and ran into a dependency issue:
```R
* installing *source* package ‘ncdf4’ ...
** package ‘ncdf4’ successfully unpacked and MD5 sums checked
configure.ac: starting
checking f...I was trying to install an R/BioC package on falcon r/3.5.1, and ran into a dependency issue:
```R
* installing *source* package ‘ncdf4’ ...
** package ‘ncdf4’ successfully unpacked and MD5 sums checked
configure.ac: starting
checking for nc-config... no
-----------------------------------------------------------------------------------
Error, nc-config not found or not executable. This is a script that comes with the
netcdf library, version 4.1-beta2 or later, and must be present for configuration
to succeed.
If you installed the netcdf library (and nc-config) in a standard location, nc-config
should be found automatically. Otherwise, you can specify the full path and name of
the nc-config script by passing the --with-nc-config=/full/path/nc-config argument
flag to the configure script. For example:
./configure --with-nc-config=/sw/dist/netcdf4/bin/nc-config
Special note for R users:
-------------------------
To pass the configure flag to R, use something like this:
R CMD INSTALL --configure-args="--with-nc-config=/home/joe/bin/nc-config" ncdf4
where you should replace /home/joe/bin etc. with the location where you have
installed the nc-config script that came with the netcdf 4 distribution.
-----------------------------------------------------------------------------------
ERROR: configuration failed for package ‘ncdf4’
```
I had this issue when installing this same package on my computer, also running Linux, and if I remember correctly the solution was to install some missing libraries with sudo. Well I don’t have sudo for falcon so I contacted hpc support to get it fixed.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/64ms_workflow: QC improvments2019-08-13T13:55:45Zdominguems_workflow: QC improvmentsDuring a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peru...During a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peruse to see if there is something we could use for our workflow.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/63ms_workflow: add more explanations2019-08-09T08:21:02Zdominguems_workflow: add more explanationsCurrently the text explanations are very sparse. Things to add:
- what are uniq and prop proteins
- LFQ vs rawCurrently the text explanations are very sparse. Things to add:
- what are uniq and prop proteins
- LFQ vs rawdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/62ms_workflow: replace PCA with plotly2019-08-09T14:40:25Zdominguems_workflow: replace PCA with plotlyCurrently having colour and shape to define condition / replicate is not visually pleasant or readable. I am was using the `dep::plot_pca` but I will replace it with plotly.Currently having colour and shape to define condition / replicate is not visually pleasant or readable. I am was using the `dep::plot_pca` but I will replace it with plotly.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/61ms_workflow: add % of NAs to results table2019-08-09T09:58:43Zdominguems_workflow: add % of NAs to results table@herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions.@herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/60ms_workflow: add intensities to results table2019-08-09T09:35:40Zdominguems_workflow: add intensities to results tableSomehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.Somehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.dominguedomingue