ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2020-04-15T10:41:53Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/85Hardcoded pvalue in plot2020-04-15T10:41:53ZdomingueHardcoded pvalue in plotIn the MA plot section "MA and Volcano plots" the description reads:
> Each gene is represented with a dot. Genes with an adjusted p value below a certain threshold are shown in cyan (True)
However the code for adding color is using `p...In the MA plot section "MA and Volcano plots" the description reads:
> Each gene is represented with a dot. Genes with an adjusted p value below a certain threshold are shown in cyan (True)
However the code for adding color is using `pvalue` instead of `padj` and uses a harcoded value of `0.05`:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L576
```r
deResults %>% ggplot(aes(0.5 * log2(mean_norm_count_1 * mean_norm_count_2), log2(mean_norm_count_2 / mean_norm_count_1), color = pvalue < 0.05)) +
geom_point(alpha = 0.1) +
geom_hline(yintercept = 0, color = "red") +
facet_grid(condition_1 ~ condition_2)
```
![pvalue_maplot](/uploads/86b9de6b7faebb9ee3c7c3c8caa59200/pvalue_maplot.png)
**Expected behaviour**
Genes coloured by `is_hit` which reflects the cut-off used in arguments (`qcutoff` or `pcutoff`).dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/90argparser error when --bam_files is the last optional argument2020-07-01T14:58:02Zdomingueargparser error when --bam_files is the last optional argumentLena reported some error occurring when the `--bam_files` option was placed as the last optional arguument of `genic_counts.R`.
I will investigate.Lena reported some error occurring when the `--bam_files` option was placed as the last optional arguument of `genic_counts.R`.
I will investigate.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/91dge_workflow: expression_explorer app failed to load due to renamed/additiona...2020-07-10T10:23:01Zhersemandge_workflow: expression_explorer app failed to load due to renamed/additional annotation columns**Issues**:
- at least for the igenome `Homo_sapiens/Ensembl_v99` (*others were not tested*) running `featcounts_deseq_mf.R` with the `--gtf` flag results in empty gene descriptions which have to be manually added; however, adding the in...**Issues**:
- at least for the igenome `Homo_sapiens/Ensembl_v99` (*others were not tested*) running `featcounts_deseq_mf.R` with the `--gtf` flag results in empty gene descriptions which have to be manually added; however, adding the information from biomaRt ensembl results in the column 'description' instead of 'gene_description' if not manually changed and this leads to issues with the `expression_explorer` app which assumes the 'gene_description' but not the 'description' column
- annotation columns (e.g. domain prediction) which are additionally added to the dge results are not taken into account when the columns for further data summarization are selected in the `gather` functionshersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/88Stranded counts2020-07-13T10:52:16ZdomingueStranded countsChange the function `dge_star_counts2matrix` to extract read counts based on the library strandingChange the function `dge_star_counts2matrix` to extract read counts based on the library strandingdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/75dge_workflow: future improvements2020-07-20T08:54:19Zdominguedge_workflow: future improvementsWhilst https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R works well, it was developed a long time ago and some of the `DESeq2` functionality and best-practices changed. So did what we might want to a...Whilst https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R works well, it was developed a long time ago and some of the `DESeq2` functionality and best-practices changed. So did what we might want to add or remove from the script.
In here we should list the things that we would like to change if we consider creating a new script for analysis. hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/93GSEA: add option to use custom gene sets2020-09-09T10:13:36ZdomingueGSEA: add option to use custom gene setsSometimes researchers will come with lists of genes that were taken from a paper, and not listed in the mSigDB.Sometimes researchers will come with lists of genes that were taken from a paper, and not listed in the mSigDB.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/94gsea: bug, hard-coded species2020-09-15T15:04:24Zdominguegsea: bug, hard-coded speciesThe mouse database (org.Mm.eg.db) is hard-coded when converting ensembl gene IDs into entrez IDs in the gsea.R script.The mouse database (org.Mm.eg.db) is hard-coded when converting ensembl gene IDs into entrez IDs in the gsea.R script.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/95gsea: bug, in extra sets the first line is being read as header2020-09-15T15:13:36Zdominguegsea: bug, in extra sets the first line is being read as headerThis is leads to one set missing from the analysis.This is leads to one set missing from the analysis.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/96gsea: generate results for each contrast2020-09-23T12:06:53Zdominguegsea: generate results for each contrastRight now all genes are taken into account because I had only single contrast experiment but this is not always the caseRight now all genes are taken into account because I had only single contrast experiment but this is not always the casedominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/92GSEA2020-09-23T13:41:04ZdomingueGSEAUntil now GSEA has been done by @henry using API calls and it was not super efficient. While working on a [project](s://git.mpi-cbg.de/scicomp/bioinfo_team/alexaki_rnaseq_deg) I found out that this could be done exclusively with R packag...Until now GSEA has been done by @henry using API calls and it was not super efficient. While working on a [project](s://git.mpi-cbg.de/scicomp/bioinfo_team/alexaki_rnaseq_deg) I found out that this could be done exclusively with R packages:
- msigdb, contains the annotations
- fgsea, does the enrichment analysis
Code for testing in: https://git.mpi-cbg.de/domingue/test_gseadominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/98Gene enrichment: script selects random genes if list too large2020-10-07T14:53:05ZdomingueGene enrichment: script selects random genes if list too largeThe `cp_enr` function randomly samples genes if the list is larger than 1500 ([this bit](https://git.mpi-cbg.de/bioinfo/ngs_tools/-/blob/master/common/cp_utils.R#L123-125)).
Replace this function in the script until we start using `core...The `cp_enr` function randomly samples genes if the list is larger than 1500 ([this bit](https://git.mpi-cbg.de/bioinfo/ngs_tools/-/blob/master/common/cp_utils.R#L123-125)).
Replace this function in the script until we start using `corescf` [package](https://git.mpi-cbg.de/scicomp/bioinfo_team/corescf/)dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/97gsea: improvments2020-10-13T06:42:05Zdominguegsea: improvmentsA few things to add:
- [x] include on table with number of Entrez IDs per gene set at the beginning of the report. This would also help to directly see why some of the gene sets were not tested (i.e. because of too few genes) and could ...A few things to add:
- [x] include on table with number of Entrez IDs per gene set at the beginning of the report. This would also help to directly see why some of the gene sets were not tested (i.e. because of too few genes) and could help to adjust the settings accordingly.
- [x] add how many genes we miss because there was no corresponding Entrez ID found. We did something similar in cp_enrichment.R script where we give the percentage of ‘lost’ genes.
Some issues to fix:
- [ ] when a single gene list is analysed and it has more or fewer genes than or `--maxSize` `--minSize`, respectably, it will fail without a meaningful error. Add error message to fail gracefully.
- [ ] related, add a table with the gene lists analysed, number of genes per list, and if they pass the thresholds.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/99gsea: better description2020-11-19T15:10:43Zdominguegsea: better descriptionI got feedback that the plots are not very intuitive. We need to add better explanations to the report.I got feedback that the plots are not very intuitive. We need to add better explanations to the report.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/100exDesign: numeric vs categorical variables and shrinkage method2021-02-23T07:51:33ZgohrexDesign: numeric vs categorical variables and shrinkage method- DESeq2 treats numeric experimental variables as numeric variables, not as discrete variables
- hence: all discrete variables should have values that are not numbers, e.g. litter1, litter, litter3 instead of 1, 2, 3
- DESeq2: one should...- DESeq2 treats numeric experimental variables as numeric variables, not as discrete variables
- hence: all discrete variables should have values that are not numbers, e.g. litter1, litter, litter3 instead of 1, 2, 3
- DESeq2: one should go through these steps:
1. contrast_oe <- c("sampletype", "MOV10_overexpression", "control")
2. res_tableOE_unshrunken <- results(dds, contrast=contrast_oe, alpha = 0.05)
3. res_tableOE <- lfcShrink(dds, contrast=contrast_oe, res=res_tableOE_unshrunken)
This allows to use other approaches for shrinking the logFC than the DEQeq standard approach See:
> What you observe is consistent with what we see in testing on the benchmarking data and on simulation data.
> If you just compare method="normal" to method="apeglm" or "ashr", the differences you are likely to see is
> that normal will shrink large effects even if they have high precision (so shrinking too much) and allow
> small effects to float around 0, while apeglm/ashr will not shrink the precise, large effects much at all and > the small effects which are indistinguishable from 0 will be shrunk to 0.
Papers show that these other two approaches are more effective.gohrgohrhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/101GSEA: improve color code of KEGG pathways2021-03-23T15:34:26ZgohrGSEA: improve color code of KEGG pathwaysKEGG pathways are overlayed with a color code that should represent differential gene expression. This code is flawed by the fact that some boxes/entities are associated with several genes and the abs.max over genes was color encoded wit...KEGG pathways are overlayed with a color code that should represent differential gene expression. This code is flawed by the fact that some boxes/entities are associated with several genes and the abs.max over genes was color encoded without information on the fact that it's several genes and which expression of which gene is displayed by color. The new version of cp_enrichment improves on this and now it makes clear
i) if there are several genes associated to a box/entity
ii) what is the diff. expression of all these genes
iii) color codes these diff. expressions in a coherent way so user can interpret the color codes.gohrgohrhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/103gsea: fixing expression explorer2021-04-08T13:38:56Zgohrgsea: fixing expression explorerThe EE uses the package shinyjqui. This package has changed data structures which caused the EE to not work properly for the most recent version 0.4.0 of this package. Searching + adapting EE to make it run with all versions of shinyjqui...The EE uses the package shinyjqui. This package has changed data structures which caused the EE to not work properly for the most recent version 0.4.0 of this package. Searching + adapting EE to make it run with all versions of shinyjqui until v0.4.0. I've added expression_explorer.yml with a minimal Conda env to run the EE. On top, I've change the EE to not depend anymore on devtools and additional SCF internal scripts which includes to load libraries with library statements rather then load_pack statements.gohrgohrhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/102gsea: improvement of description2021-04-08T13:39:07Zgohrgsea: improvement of descriptionThe current description is misleading as of where up- and down-regulated genes are in the list of sorted analyzed genes. It's true that genes are sorted according to diff. expression from down-regulated to up-regulated genes but the GSEA...The current description is misleading as of where up- and down-regulated genes are in the list of sorted analyzed genes. It's true that genes are sorted according to diff. expression from down-regulated to up-regulated genes but the GSEA inverses this list and output results (plots) will have the up-regulated genes at the beginning and the down-regulated genes at the end.
I've changed the GSEA description to be more clear on this aspect. I also shortened it to be more precise.gohrgohr