gsea: improvments
A few things to add:
-
include on table with number of Entrez IDs per gene set at the beginning of the report. This would also help to directly see why some of the gene sets were not tested (i.e. because of too few genes) and could help to adjust the settings accordingly. -
add how many genes we miss because there was no corresponding Entrez ID found. We did something similar in cp_enrichment.R script where we give the percentage of ‘lost’ genes.
Some issues to fix:
-
when a single gene list is analysed and it has more or fewer genes than or --maxSize
--minSize
, respectably, it will fail without a meaningful error. Add error message to fail gracefully. -
related, add a table with the gene lists analysed, number of genes per list, and if they pass the thresholds.