ngs_tools issues

ngs_tools issues https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues 2019-02-12T10:35:31Z https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/1 make sure that exported tables contain double-normalized counts and not just ... 2019-02-12T10:35:31Z brandl brandl@mpi-cbg.de

make sure that exported tables contain double-normalized counts and not just size-normalized ones

rna_seq brandl brandl@mpi-cbg.de brandl brandl@mpi-cbg.de https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/87 Move workflows to separate repositories 2020-03-20T08:32:41Z domingue

Move workflows to separate repositories

The goal is to keep the NGS tools repo tidy and focused on bulk NGS wokflows. We already started the process by having the single cell workflow in a separate repo, but the ms_workflow is still here. Ideally we should be able to: 1. cop... The goal is to keep the NGS tools repo tidy and focused on bulk NGS wokflows. We already started the process by having the single cell workflow in a separate repo, but the ms_workflow is still here. Ideally we should be able to: 1. copy the ms_worklow in it's current state to a separate repo for further development 2. the commits and versioning history should be transferred as well 3. the current ms_workflow stays in ngs_tools to avoid breaking projects. domingue domingue https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/80 MS: How is imputation done? 2020-01-24T09:44:01Z domingue

MS: How is imputation done?

- imputation: + applied only to zero counts? + How about NAs? + Does it also affect non-zero values? + are there (good) guidelines on how to choose the methods? Check the `MSnbase` package and imputation vignette of `DEP`. - imputation: + applied only to zero counts? + How about NAs? + Does it also affect non-zero values? + are there (good) guidelines on how to choose the methods? Check the `MSnbase` package and imputation vignette of `DEP`. domingue domingue https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/69 ms_worflow: bug in heatmap clustering 2019-08-28T09:04:01Z domingue

ms_worflow: bug in heatmap clustering

We are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it. Lines: https://git... We are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it. Lines: https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/ms_workflow/02-ms-DEP-analysis.R#L482 mass_spec domingue domingue https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/59 MS worflow: improvements 2019-08-07T14:44:34Z domingue

MS worflow: improvements

Whilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it. I will also catch up some some literature for MS since I know very little about it. Whilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it. I will also catch up some some literature for MS since I know very little about it. domingue domingue https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/36 ms_workflow: add additional quality metrics to the differential abundance ana... 2019-03-08T13:24:24Z herseman

ms_workflow: add additional quality metrics to the differential abundance analysis output

- take the intensities of the standard as a minimal threshold to mark low-abundant proteins - summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); ... - take the intensities of the standard as a minimal threshold to mark low-abundant proteins - summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); additionally include some summary of identification types per sample in the ms_data_prep.R script) - report for which entries the proteinGroups order has been changed by sorting - add information per gene whether the value for any replicate per condition has been imputed mass_spec herseman herseman https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/60 ms_workflow: add intensities to results table 2019-08-09T09:35:40Z domingue

ms_workflow: add intensities to results table

Somehow the protein intensities for each sample (and teh average for each condition) are missing - fix this. Somehow the protein intensities for each sample (and teh average for each condition) are missing - fix this. mass_spec domingue domingue https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/63 ms_workflow: add more explanations 2019-08-09T08:21:02Z domingue

ms_workflow: add more explanations

Currently the text explanations are very sparse. Things to add: - what are uniq and prop proteins - LFQ vs raw Currently the text explanations are very sparse. Things to add: - what are uniq and prop proteins - LFQ vs raw mass_spec domingue domingue https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/61 ms_workflow: add % of NAs to results table 2019-08-09T09:58:43Z domingue

ms_workflow: add % of NAs to results table

@herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions. @herseman suggested that it is more informative to know the % of samples from which a protein is missing, and have this information for both conditions which are being compared - could indicate major differences between the conditions. mass_spec domingue domingue https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/30 ms_workflow: add option to run analysis on raw intensities instead of LFQs 2019-01-15T15:17:59Z herseman

ms_workflow: add option to run analysis on raw intensities instead of LFQs

herseman herseman https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/68 ms_workflow: Add table with % of contaminants in the most abudant proteins 2019-08-26T08:24:52Z domingue

ms_workflow: Add table with % of contaminants in the most abudant proteins

This can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological proce... This can useful to exclude situations when the bulk of protein intensity comes from "contaminants". Since the contaminants are, to some extent, defined by MaxQuant, it could be that they are in fact of interested for the biological process being studied. We will leave the decision on how to handle contaminants to the project owner. mass_spec domingue domingue https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/32 ms_workflow: aggregate protein isoforms data on gene level 2019-02-08T14:31:39Z herseman

ms_workflow: aggregate protein isoforms data on gene level

- extract data on gene IDs and uniprot IDs from Ensembl - check correlation of ensembl and uniprot IDs - check if it's possible to further aggregate protein groups based on gene level - extract data on gene IDs and uniprot IDs from Ensembl - check correlation of ensembl and uniprot IDs - check if it's possible to further aggregate protein groups based on gene level mass_spec herseman herseman https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/37 ms_workflow: handling of contaminations 2019-12-10T08:11:26Z herseman

ms_workflow: handling of contaminations

- REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script - think about removal of keratin as a ... - REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script - think about removal of keratin as a standard set-up (maybe Anna can provide us with a list of most common contaminations and we can start by reporting those specifically) mass_spec herseman domingue herseman https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/40 ms_workflow: handling of technical replicates 2019-12-09T16:32:07Z herseman

ms_workflow: handling of technical replicates

so far technical replicates are not taken into account and we only process MaxQuant outputs with only one technical replicate; it would be nice to integrate the information on technical variability but this should of course be verified w... so far technical replicates are not taken into account and we only process MaxQuant outputs with only one technical replicate; it would be nice to integrate the information on technical variability but this should of course be verified with an appropriate test data set; for now we should at least modify our pipeline in a way that it averages technical replicates as part of the workflow and reports the technical variation. mass_spec herseman domingue herseman https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/28 ms_workflow: Improvemnts 2019-12-10T08:25:51Z brandl brandl@mpi-cbg.de

ms_workflow: Improvemnts

- [x] separate protein groups - [x] report 0-proportions similar to NAs - [x] expose renaming scheme as an argument - [ ] `protein_acc` extraction depends on study (with/witnout name, w/o separator). Need to generify `protein_acc=str_spl... - [x] separate protein groups - [x] report 0-proportions similar to NAs - [x] expose renaming scheme as an argument - [ ] `protein_acc` extraction depends on study (with/witnout name, w/o separator). Need to generify `protein_acc=str_split_fixed(protein_ids, "[|]", 3)`. One way: `--extract extrac_acc.R` which defines extractor function -> `protein_acc=extract_fun(protein_ids)` - [x] detect presence/absence of identifcation types --> conditionaed executino of corresponding bits - [ ] try to postpone annotation handling to end of data-prep workflow/analysis - [x] fix result table links _#' [identSummary](`r add_prefix("ident_types_summary.txt")`)_ - [x] How to auto-adjust imputation proportion? - [x] We just need imputation because we want an abundance matrix for limma. for t-tests neither na->0 nor impuations are required because we can work with long data. limma * also show batch-corrected qc plots (clustering, pca) * _man muss das mal alles durchschauen_ * why does the voom with a `~condition` design fix the sample clustering in `file:///Volumes/project-stepien/stepien_ms_fractions/data/limma/p_vs_s/dge_limma.html` * add condition/sample colors to _voom vs raw_ plot * externalize annotation of results * why do ma plots look so weired? ![image](/uploads/bcf9f12174dc8d16f35795496ce46856/image.png) mass_spec herseman herseman https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/66 ms_workflow: improve the results heatmap 2019-08-23T13:54:28Z domingue

ms_workflow: improve the results heatmap

Suggested by Olya: - [x] add gene names to the rows - [x] include plots as long as there _any_ hit - [ ] move legend to the bottom Some of the text accompanying the plots is now showing. Suggested by Olya: - [x] add gene names to the rows - [x] include plots as long as there _any_ hit - [ ] move legend to the bottom Some of the text accompanying the plots is now showing. mass_spec domingue domingue https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/45 ms_workflow: move to peptide level 2019-12-10T08:27:03Z herseman

ms_workflow: move to peptide level

try and verify the following: - apply filter for label-free quantification to get rid of miss-cleaved and modified peptides - use only proteotypic peptides (matching only 1 protein in the set) - use only MS/MS and by_matching and only th... try and verify the following: - apply filter for label-free quantification to get rid of miss-cleaved and modified peptides - use only proteotypic peptides (matching only 1 protein in the set) - use only MS/MS and by_matching and only the top 3 hits herseman domingue herseman https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/41 ms_workflow: ms_limma.R returns error when no differentially abundant protein... 2019-02-14T14:15:01Z herseman

ms_workflow: ms_limma.R returns error when no differentially abundant proteins were found

mass_spec herseman herseman https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/86 ms_workflow: ms_ms_prop and reorder information are missing for protein IDs w... 2020-03-17T15:07:56Z herseman

ms_workflow: ms_ms_prop and reorder information are missing for protein IDs without fasta_header information

mass_spec herseman herseman https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/64 ms_workflow: QC improvments 2019-08-13T13:55:45Z domingue

ms_workflow: QC improvments

During a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peru... During a meeting with Olya she mentioned that our QC looked very much like those of a package created by the Kempa lab. there is an `R` package and a [paper](https://doi.org/10.1021/acs.jproteome.5b00780) to got with it which I will peruse to see if there is something we could use for our workflow. mass_spec domingue domingue