ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2019-06-28T17:51:47Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/57dge workflow: add GTF an argument for differential gene expression2019-06-28T17:51:47Zdominguedge workflow: add GTF an argument for differential gene expressionRelates to the [DGE analysis](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R) and it would have two uses:
1. retrieval of "accurate" gene lengths, exonic regions only, to calculate RPKM and FPM u...Relates to the [DGE analysis](https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R) and it would have two uses:
1. retrieval of "accurate" gene lengths, exonic regions only, to calculate RPKM and FPM using `DESeq2` in-built functionality (more details [here](https://www.rdocumentation.org/packages/DESeq2/versions/1.12.3/topics/fpkm))
2. GTFs already contain a wealth of information which currently needs to be retrieved wiht `biomaRt`. Getting it from the GTF would make the process faster, more reproducible (in my experience `biomaRt` changes quite often) and it would work even for organisms not present in biomart or other marts (eg. planaria)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/88Stranded counts2020-07-13T10:52:16ZdomingueStranded countsChange the function `dge_star_counts2matrix` to extract read counts based on the library strandingChange the function `dge_star_counts2matrix` to extract read counts based on the library strandingdominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/97gsea: improvments2020-10-13T06:42:05Zdominguegsea: improvmentsA few things to add:
- [x] include on table with number of Entrez IDs per gene set at the beginning of the report. This would also help to directly see why some of the gene sets were not tested (i.e. because of too few genes) and could ...A few things to add:
- [x] include on table with number of Entrez IDs per gene set at the beginning of the report. This would also help to directly see why some of the gene sets were not tested (i.e. because of too few genes) and could help to adjust the settings accordingly.
- [x] add how many genes we miss because there was no corresponding Entrez ID found. We did something similar in cp_enrichment.R script where we give the percentage of ‘lost’ genes.
Some issues to fix:
- [ ] when a single gene list is analysed and it has more or fewer genes than or `--maxSize` `--minSize`, respectably, it will fail without a meaningful error. Add error message to fail gracefully.
- [ ] related, add a table with the gene lists analysed, number of genes per list, and if they pass the thresholds.dominguedomingue