ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2019-02-12T10:35:31Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/1make sure that exported tables contain double-normalized counts and not just ...2019-02-12T10:35:31Zbrandlbrandl@mpi-cbg.demake sure that exported tables contain double-normalized counts and not just size-normalized onesbrandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.dehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/87Move workflows to separate repositories2020-03-20T08:32:41ZdomingueMove workflows to separate repositoriesThe goal is to keep the NGS tools repo tidy and focused on bulk NGS wokflows. We already started the process by having the single cell workflow in a separate repo, but the ms_workflow is still here.
Ideally we should be able to:
1. cop...The goal is to keep the NGS tools repo tidy and focused on bulk NGS wokflows. We already started the process by having the single cell workflow in a separate repo, but the ms_workflow is still here.
Ideally we should be able to:
1. copy the ms_worklow in it's current state to a separate repo for further development
2. the commits and versioning history should be transferred as well
3. the current ms_workflow stays in ngs_tools to avoid breaking projects.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/80MS: How is imputation done?2020-01-24T09:44:01ZdomingueMS: How is imputation done?- imputation:
+ applied only to zero counts?
+ How about NAs?
+ Does it also affect non-zero values?
+ are there (good) guidelines on how to choose the methods? Check the `MSnbase` package and imputation vignette of `DEP`.- imputation:
+ applied only to zero counts?
+ How about NAs?
+ Does it also affect non-zero values?
+ are there (good) guidelines on how to choose the methods? Check the `MSnbase` package and imputation vignette of `DEP`.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/69ms_worflow: bug in heatmap clustering2019-08-28T09:04:01Zdominguems_worflow: bug in heatmap clusteringWe are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git...We are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/ms_workflow/02-ms-DEP-analysis.R#L482dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/59MS worflow: improvements2019-08-07T14:44:34ZdomingueMS worflow: improvementsWhilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it.
I will also catch up some some literature for MS since I know very little about it. Whilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it.
I will also catch up some some literature for MS since I know very little about it. dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/36ms_workflow: add additional quality metrics to the differential abundance ana...2019-03-08T13:24:24Zhersemanms_workflow: add additional quality metrics to the differential abundance analysis output- take the intensities of the standard as a minimal threshold to mark low-abundant proteins
- summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); ...- take the intensities of the standard as a minimal threshold to mark low-abundant proteins
- summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); additionally include some summary of identification types per sample in the ms_data_prep.R script)
- report for which entries the proteinGroups order has been changed by sorting
- add information per gene whether the value for any replicate per condition has been imputedhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/60ms_workflow: add intensities to results table2019-08-09T09:35:40Zdominguems_workflow: add intensities to results tableSomehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.Somehow the protein intensities for each sample (and teh average for each condition) are missing - fix this.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/63ms_workflow: add more explanations2019-08-09T08:21:02Zdominguems_workflow: add more explanationsCurrently the text explanations are very sparse. Things to add:
- what are uniq and prop proteins
- LFQ vs rawCurrently the text explanations are very sparse. Things to add:
- what are uniq and prop proteins
- LFQ vs rawdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/61ms_workflow: add % of NAs to results table2019-08-09T09:58:43Zdominguems_workflow: add % of NAs to results table@herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions.@herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/30ms_workflow: add option to run analysis on raw intensities instead of LFQs2019-01-15T15:17:59Zhersemanms_workflow: add option to run analysis on raw intensities instead of LFQshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/68ms_workflow: Add table with % of contaminants in the most abudant proteins2019-08-26T08:24:52Zdominguems_workflow: Add table with % of contaminants in the most abudant proteinsThis can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological proce...This can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological process being studied. We will leave the decision on how to handle contaminants to the project owner.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/32ms_workflow: aggregate protein isoforms data on gene level2019-02-08T14:31:39Zhersemanms_workflow: aggregate protein isoforms data on gene level- extract data on gene IDs and uniprot IDs from Ensembl
- check correlation of ensembl and uniprot IDs
- check if it's possible to further aggregate protein groups based on gene level- extract data on gene IDs and uniprot IDs from Ensembl
- check correlation of ensembl and uniprot IDs
- check if it's possible to further aggregate protein groups based on gene levelhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/37ms_workflow: handling of contaminations2019-12-10T08:11:26Zhersemanms_workflow: handling of contaminations- REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script
- think about removal of keratin as a ...- REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script
- think about removal of keratin as a standard set-up (maybe Anna can provide us with a list of most common contaminations and we can start by reporting those specifically)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/40ms_workflow: handling of technical replicates2019-12-09T16:32:07Zhersemanms_workflow: handling of technical replicatesso far technical replicates are not taken into account and we only process MaxQuant outputs with only one technical replicate; it would be nice to integrate the information on technical variability but this should of course be verified w...so far technical replicates are not taken into account and we only process MaxQuant outputs with only one technical replicate; it would be nice to integrate the information on technical variability but this should of course be verified with an appropriate test data set; for now we should at least modify our pipeline in a way that it averages technical replicates as part of the workflow and reports the technical variation.hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/28ms_workflow: Improvemnts2019-12-10T08:25:51Zbrandlbrandl@mpi-cbg.dems_workflow: Improvemnts- [x] separate protein groups
- [x] report 0-proportions similar to NAs
- [x] expose renaming scheme as an argument
- [ ] `protein_acc` extraction depends on study (with/witnout name, w/o separator). Need to generify `protein_acc=str_spl...- [x] separate protein groups
- [x] report 0-proportions similar to NAs
- [x] expose renaming scheme as an argument
- [ ] `protein_acc` extraction depends on study (with/witnout name, w/o separator). Need to generify `protein_acc=str_split_fixed(protein_ids, "[|]", 3)`. One way: `--extract extrac_acc.R` which defines extractor function -> `protein_acc=extract_fun(protein_ids)`
- [x] detect presence/absence of identifcation types --> conditionaed executino of corresponding bits
- [ ] try to postpone annotation handling to end of data-prep workflow/analysis
- [x] fix result table links _#' [identSummary](`r add_prefix("ident_types_summary.txt")`)_
- [x] How to auto-adjust imputation proportion?
- [x] We just need imputation because we want an abundance matrix for limma. for t-tests neither na->0 nor impuations are required because we can work with long data.
limma
* also show batch-corrected qc plots (clustering, pca)
* _man muss das mal alles durchschauen_
* why does the voom with a `~condition` design fix the sample clustering in `file:///Volumes/project-stepien/stepien_ms_fractions/data/limma/p_vs_s/dge_limma.html`
* add condition/sample colors to _voom vs raw_ plot
* externalize annotation of results
* why do ma plots look so weired?
![image](/uploads/bcf9f12174dc8d16f35795496ce46856/image.png)hersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/66ms_workflow: improve the results heatmap2019-08-23T13:54:28Zdominguems_workflow: improve the results heatmapSuggested by Olya:
- [x] add gene names to the rows
- [x] include plots as long as there _any_ hit
- [ ] move legend to the bottom
Some of the text accompanying the plots is now showing.Suggested by Olya:
- [x] add gene names to the rows
- [x] include plots as long as there _any_ hit
- [ ] move legend to the bottom
Some of the text accompanying the plots is now showing.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/45ms_workflow: move to peptide level2019-12-10T08:27:03Zhersemanms_workflow: move to peptide leveltry and verify the following:
- apply filter for label-free quantification to get rid of miss-cleaved and modified peptides
- use only proteotypic peptides (matching only 1 protein in the set)
- use only MS/MS and by_matching and only th...try and verify the following:
- apply filter for label-free quantification to get rid of miss-cleaved and modified peptides
- use only proteotypic peptides (matching only 1 protein in the set)
- use only MS/MS and by_matching and only the top 3 hitshersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/41ms_workflow: ms_limma.R returns error when no differentially abundant protein...2019-02-14T14:15:01Zhersemanms_workflow: ms_limma.R returns error when no differentially abundant proteins were foundhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/86ms_workflow: ms_ms_prop and reorder information are missing for protein IDs w...2020-03-17T15:07:56Zhersemanms_workflow: ms_ms_prop and reorder information are missing for protein IDs without fasta_header informationhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/64ms_workflow: QC improvments2019-08-13T13:55:45Zdominguems_workflow: QC improvmentsDuring a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peru...During a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peruse to see if there is something we could use for our workflow.dominguedomingue