ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2020-09-23T13:41:04Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/92GSEA2020-09-23T13:41:04ZdomingueGSEAUntil now GSEA has been done by @henry using API calls and it was not super efficient. While working on a [project](s://git.mpi-cbg.de/scicomp/bioinfo_team/alexaki_rnaseq_deg) I found out that this could be done exclusively with R packag...Until now GSEA has been done by @henry using API calls and it was not super efficient. While working on a [project](s://git.mpi-cbg.de/scicomp/bioinfo_team/alexaki_rnaseq_deg) I found out that this could be done exclusively with R packages:
- msigdb, contains the annotations
- fgsea, does the enrichment analysis
Code for testing in: https://git.mpi-cbg.de/domingue/test_gseadominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/93GSEA: add option to use custom gene sets2020-09-09T10:13:36ZdomingueGSEA: add option to use custom gene setsSometimes researchers will come with lists of genes that were taken from a paper, and not listed in the mSigDB.Sometimes researchers will come with lists of genes that were taken from a paper, and not listed in the mSigDB.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/99gsea: better description2020-11-19T15:10:43Zdominguegsea: better descriptionI got feedback that the plots are not very intuitive. We need to add better explanations to the report.I got feedback that the plots are not very intuitive. We need to add better explanations to the report.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/94gsea: bug, hard-coded species2020-09-15T15:04:24Zdominguegsea: bug, hard-coded speciesThe mouse database (org.Mm.eg.db) is hard-coded when converting ensembl gene IDs into entrez IDs in the gsea.R script.The mouse database (org.Mm.eg.db) is hard-coded when converting ensembl gene IDs into entrez IDs in the gsea.R script.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/95gsea: bug, in extra sets the first line is being read as header2020-09-15T15:13:36Zdominguegsea: bug, in extra sets the first line is being read as headerThis is leads to one set missing from the analysis.This is leads to one set missing from the analysis.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/103gsea: fixing expression explorer2021-04-08T13:38:56Zgohrgsea: fixing expression explorerThe EE uses the package shinyjqui. This package has changed data structures which caused the EE to not work properly for the most recent version 0.4.0 of this package. Searching + adapting EE to make it run with all versions of shinyjqui...The EE uses the package shinyjqui. This package has changed data structures which caused the EE to not work properly for the most recent version 0.4.0 of this package. Searching + adapting EE to make it run with all versions of shinyjqui until v0.4.0. I've added expression_explorer.yml with a minimal Conda env to run the EE. On top, I've change the EE to not depend anymore on devtools and additional SCF internal scripts which includes to load libraries with library statements rather then load_pack statements.gohrgohrhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/96gsea: generate results for each contrast2020-09-23T12:06:53Zdominguegsea: generate results for each contrastRight now all genes are taken into account because I had only single contrast experiment but this is not always the caseRight now all genes are taken into account because I had only single contrast experiment but this is not always the casedominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/101GSEA: improve color code of KEGG pathways2021-03-23T15:34:26ZgohrGSEA: improve color code of KEGG pathwaysKEGG pathways are overlayed with a color code that should represent differential gene expression. This code is flawed by the fact that some boxes/entities are associated with several genes and the abs.max over genes was color encoded wit...KEGG pathways are overlayed with a color code that should represent differential gene expression. This code is flawed by the fact that some boxes/entities are associated with several genes and the abs.max over genes was color encoded without information on the fact that it's several genes and which expression of which gene is displayed by color. The new version of cp_enrichment improves on this and now it makes clear
i) if there are several genes associated to a box/entity
ii) what is the diff. expression of all these genes
iii) color codes these diff. expressions in a coherent way so user can interpret the color codes.gohrgohrhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/102gsea: improvement of description2021-04-08T13:39:07Zgohrgsea: improvement of descriptionThe current description is misleading as of where up- and down-regulated genes are in the list of sorted analyzed genes. It's true that genes are sorted according to diff. expression from down-regulated to up-regulated genes but the GSEA...The current description is misleading as of where up- and down-regulated genes are in the list of sorted analyzed genes. It's true that genes are sorted according to diff. expression from down-regulated to up-regulated genes but the GSEA inverses this list and output results (plots) will have the up-regulated genes at the beginning and the down-regulated genes at the end.
I've changed the GSEA description to be more clear on this aspect. I also shortened it to be more precise.gohrgohrhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/85Hardcoded pvalue in plot2020-04-15T10:41:53ZdomingueHardcoded pvalue in plotIn the MA plot section "MA and Volcano plots" the description reads:
> Each gene is represented with a dot. Genes with an adjusted p value below a certain threshold are shown in cyan (True)
However the code for adding color is using `p...In the MA plot section "MA and Volcano plots" the description reads:
> Each gene is represented with a dot. Genes with an adjusted p value below a certain threshold are shown in cyan (True)
However the code for adding color is using `pvalue` instead of `padj` and uses a harcoded value of `0.05`:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L576
```r
deResults %>% ggplot(aes(0.5 * log2(mean_norm_count_1 * mean_norm_count_2), log2(mean_norm_count_2 / mean_norm_count_1), color = pvalue < 0.05)) +
geom_point(alpha = 0.1) +
geom_hline(yintercept = 0, color = "red") +
facet_grid(condition_1 ~ condition_2)
```
![pvalue_maplot](/uploads/86b9de6b7faebb9ee3c7c3c8caa59200/pvalue_maplot.png)
**Expected behaviour**
Genes coloured by `is_hit` which reflects the cut-off used in arguments (`qcutoff` or `pcutoff`).dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/76iGenome permissions2020-02-20T14:52:43ZdomingueiGenome permissionsSome of the iGenomes folders lack group write permissions, which means that only the owner can add / remove / change files. Is this by design? Or was an accident?Some of the iGenomes folders lack group write permissions, which means that only the owner can add / remove / change files. Is this by design? Or was an accident?henryhersemandominguehenryhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/18impl better means to access summary and quality statististcs2017-11-09T15:37:41Zbrandlbrandl@mpi-cbg.deimpl better means to access summary and quality statististcslearn from https://sourceforge.net/projects/quickrnaseq
example report https://github.com/baohongz/QuickRNASeqlearn from https://sourceforge.net/projects/quickrnaseq
example report https://github.com/baohongz/QuickRNASeqhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/73Implement renvs2020-02-20T14:49:29ZdomingueImplement renvsIt started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.It started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/11Integrate tools to assess gc/3/5 bias, insert size, and capture efficiency2020-02-20T12:42:13Zbrandlbrandl@mpi-cbg.deIntegrate tools to assess gc/3/5 bias, insert size, and capture efficiencye.g. using http://deeptools.readthedocs.io/en/latest/content/tools/computeGCBias.html?highlight=bias
(see "Multi-perspective quality control of Illumina RNA sequencing data analysis" http://bfg.oxfordjournals.org/content/early/2016/09/28...e.g. using http://deeptools.readthedocs.io/en/latest/content/tools/computeGCBias.html?highlight=bias
(see "Multi-perspective quality control of Illumina RNA sequencing data analysis" http://bfg.oxfordjournals.org/content/early/2016/09/28/bfgp.elw035.abstract )
@lakshman opinion?https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/1make sure that exported tables contain double-normalized counts and not just ...2019-02-12T10:35:31Zbrandlbrandl@mpi-cbg.demake sure that exported tables contain double-normalized counts and not just size-normalized onesbrandlbrandl@mpi-cbg.debrandlbrandl@mpi-cbg.dehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/87Move workflows to separate repositories2020-03-20T08:32:41ZdomingueMove workflows to separate repositoriesThe goal is to keep the NGS tools repo tidy and focused on bulk NGS wokflows. We already started the process by having the single cell workflow in a separate repo, but the ms_workflow is still here.
Ideally we should be able to:
1. cop...The goal is to keep the NGS tools repo tidy and focused on bulk NGS wokflows. We already started the process by having the single cell workflow in a separate repo, but the ms_workflow is still here.
Ideally we should be able to:
1. copy the ms_worklow in it's current state to a separate repo for further development
2. the commits and versioning history should be transferred as well
3. the current ms_workflow stays in ngs_tools to avoid breaking projects.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/80MS: How is imputation done?2020-01-24T09:44:01ZdomingueMS: How is imputation done?- imputation:
+ applied only to zero counts?
+ How about NAs?
+ Does it also affect non-zero values?
+ are there (good) guidelines on how to choose the methods? Check the `MSnbase` package and imputation vignette of `DEP`.- imputation:
+ applied only to zero counts?
+ How about NAs?
+ Does it also affect non-zero values?
+ are there (good) guidelines on how to choose the methods? Check the `MSnbase` package and imputation vignette of `DEP`.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/69ms_worflow: bug in heatmap clustering2019-08-28T09:04:01Zdominguems_worflow: bug in heatmap clusteringWe are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git...We are feeding the pre-calculated euclidean distances outside the plotting function, as input matrix, but `d3heatmap, and `pheatmap`for that matter, will also calculate it internally leading to overclustering. Fix it.
Lines:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/ms_workflow/02-ms-DEP-analysis.R#L482dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/59MS worflow: improvements2019-08-07T14:44:34ZdomingueMS worflow: improvementsWhilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it.
I will also catch up some some literature for MS since I know very little about it. Whilst the workflow is running, it is still experimental and it would be good if I could have a look at what is available in Bioconductor to improve it.
I will also catch up some some literature for MS since I know very little about it. dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/36ms_workflow: add additional quality metrics to the differential abundance ana...2019-03-08T13:24:24Zhersemanms_workflow: add additional quality metrics to the differential abundance analysis output- take the intensities of the standard as a minimal threshold to mark low-abundant proteins
- summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); ...- take the intensities of the standard as a minimal threshold to mark low-abundant proteins
- summarize MS/MS information as numeric value per gene and condition (e.g. percentage of MS/MS identifications per gene across all replicates); additionally include some summary of identification types per sample in the ms_data_prep.R script)
- report for which entries the proteinGroups order has been changed by sorting
- add information per gene whether the value for any replicate per condition has been imputedhersemanherseman