ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2020-07-13T09:28:16Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/79updating ms_workflow2020-07-13T09:28:16Zhersemanupdating ms_workflowhersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/86ms_workflow: ms_ms_prop and reorder information are missing for protein IDs w...2020-03-17T15:07:56Zhersemanms_workflow: ms_ms_prop and reorder information are missing for protein IDs without fasta_header informationhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/73Implement renvs2020-02-20T14:49:29ZdomingueImplement renvsIt started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.It started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/83ms_workflow: refine reorder information of protein groups2020-02-18T14:07:47Zhersemanms_workflow: refine reorder information of protein groupsCurrently, we only give information on whether a protein group was reordered or not prior to merging of the tables; however, this does not include information on whether the (alphabetical) reordering for the individual protein groups too...Currently, we only give information on whether a protein group was reordered or not prior to merging of the tables; however, this does not include information on whether the (alphabetical) reordering for the individual protein groups took place in all samples and thus, although they are reordered, were originally all the same, or if protein groups of individual samples were only merged because they could be matched after reordering but were different based on the original protein IDs orders.hersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/28ms_workflow: Improvemnts2019-12-10T08:25:51Zbrandlbrandl@mpi-cbg.dems_workflow: Improvemnts- [x] separate protein groups
- [x] report 0-proportions similar to NAs
- [x] expose renaming scheme as an argument
- [ ] `protein_acc` extraction depends on study (with/witnout name, w/o separator). Need to generify `protein_acc=str_spl...- [x] separate protein groups
- [x] report 0-proportions similar to NAs
- [x] expose renaming scheme as an argument
- [ ] `protein_acc` extraction depends on study (with/witnout name, w/o separator). Need to generify `protein_acc=str_split_fixed(protein_ids, "[|]", 3)`. One way: `--extract extrac_acc.R` which defines extractor function -> `protein_acc=extract_fun(protein_ids)`
- [x] detect presence/absence of identifcation types --> conditionaed executino of corresponding bits
- [ ] try to postpone annotation handling to end of data-prep workflow/analysis
- [x] fix result table links _#' [identSummary](`r add_prefix("ident_types_summary.txt")`)_
- [x] How to auto-adjust imputation proportion?
- [x] We just need imputation because we want an abundance matrix for limma. for t-tests neither na->0 nor impuations are required because we can work with long data.
limma
* also show batch-corrected qc plots (clustering, pca)
* _man muss das mal alles durchschauen_
* why does the voom with a `~condition` design fix the sample clustering in `file:///Volumes/project-stepien/stepien_ms_fractions/data/limma/p_vs_s/dge_limma.html`
* add condition/sample colors to _voom vs raw_ plot
* externalize annotation of results
* why do ma plots look so weired?
![image](/uploads/bcf9f12174dc8d16f35795496ce46856/image.png)hersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/37ms_workflow: handling of contaminations2019-12-10T08:11:26Zhersemanms_workflow: handling of contaminations- REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script
- think about removal of keratin as a ...- REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script
- think about removal of keratin as a standard set-up (maybe Anna can provide us with a list of most common contaminations and we can start by reporting those specifically)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/65DEP installation on falcon - nc-config missing2019-09-17T07:17:00ZdomingueDEP installation on falcon - nc-config missingI was trying to install an R/BioC package on falcon r/3.5.1, and ran into a dependency issue:
```R
* installing *source* package ‘ncdf4’ ...
** package ‘ncdf4’ successfully unpacked and MD5 sums checked
configure.ac: starting
checking f...I was trying to install an R/BioC package on falcon r/3.5.1, and ran into a dependency issue:
```R
* installing *source* package ‘ncdf4’ ...
** package ‘ncdf4’ successfully unpacked and MD5 sums checked
configure.ac: starting
checking for nc-config... no
-----------------------------------------------------------------------------------
Error, nc-config not found or not executable. This is a script that comes with the
netcdf library, version 4.1-beta2 or later, and must be present for configuration
to succeed.
If you installed the netcdf library (and nc-config) in a standard location, nc-config
should be found automatically. Otherwise, you can specify the full path and name of
the nc-config script by passing the --with-nc-config=/full/path/nc-config argument
flag to the configure script. For example:
./configure --with-nc-config=/sw/dist/netcdf4/bin/nc-config
Special note for R users:
-------------------------
To pass the configure flag to R, use something like this:
R CMD INSTALL --configure-args="--with-nc-config=/home/joe/bin/nc-config" ncdf4
where you should replace /home/joe/bin etc. with the location where you have
installed the nc-config script that came with the netcdf 4 distribution.
-----------------------------------------------------------------------------------
ERROR: configuration failed for package ‘ncdf4’
```
I had this issue when installing this same package on my computer, also running Linux, and if I remember correctly the solution was to install some missing libraries with sudo. Well I don’t have sudo for falcon so I contacted hpc support to get it fixed.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/69ms_worflow: bug in heatmap clustering2019-08-28T09:04:01Zdominguems_worflow: bug in heatmap clusteringWe are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git...We are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/ms_workflow/02-ms-DEP-analysis.R#L482dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/68ms_workflow: Add table with % of contaminants in the most abudant proteins2019-08-26T08:24:52Zdominguems_workflow: Add table with % of contaminants in the most abudant proteinsThis can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological proce...This can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological process being studied. We will leave the decision on how to handle contaminants to the project owner.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/64ms_workflow: QC improvments2019-08-13T13:55:45Zdominguems_workflow: QC improvmentsDuring a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peru...During a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peruse to see if there is something we could use for our workflow.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/62ms_workflow: replace PCA with plotly2019-08-09T14:40:25Zdominguems_workflow: replace PCA with plotlyCurrently having colour and shape to define condition / replicate is not visually pleasant or readable. I am was using the `dep::plot_pca` but I will replace it with plotly.Currently having colour and shape to define condition / replicate is not visually pleasant or readable. I am was using the `dep::plot_pca` but I will replace it with plotly.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/61ms_workflow: add % of NAs to results table2019-08-09T09:58:43Zdominguems_workflow: add % of NAs to results table@herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions.@herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/60ms_workflow: add intensities to results table2019-08-09T09:35:40Zdominguems_workflow: add intensities to results tableSomehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.Somehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/63ms_workflow: add more explanations2019-08-09T08:21:02Zdominguems_workflow: add more explanationsCurrently the text explanations are very sparse. Things to add:
- what are uniq and prop proteins
- LFQ vs rawCurrently the text explanations are very sparse. Things to add:
- what are uniq and prop proteins
- LFQ vs rawdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/36ms_workflow: add additional quality metrics to the differential abundance ana...2019-03-08T13:24:24Zhersemanms_workflow: add additional quality metrics to the differential abundance analysis output- take the intensities of the standard as a minimal threshold to mark low-abundant proteins
- summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); ...- take the intensities of the standard as a minimal threshold to mark low-abundant proteins
- summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); additionally include some summary of identification types per sample in the ms_data_prep.R script)
- report for which entries the proteinGroups order has been changed by sorting
- add information per gene whether the value for any replicate per condition has been imputedhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/41ms_workflow: ms_limma.R returns error when no differentially abundant protein...2019-02-14T14:15:01Zhersemanms_workflow: ms_limma.R returns error when no differentially abundant proteins were foundhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/33ms_workflow: report settings from MaxQuant log file2019-02-12T10:40:41Zhersemanms_workflow: report settings from MaxQuant log filehersemanherseman