Commit 73c53bdb authored by Lena Hersemann's avatar Lena Hersemann
Browse files

replaced padj by pvalue for calculating the rank

parent 447b42f4
......@@ -163,7 +163,7 @@ category_exists <- function(category, species) {
#
#' GSEA uses ranked genes based on a measure of each gene's differential expression with respect to the two phenotypes. In our case genes are ranked with the formula -log10(padj) multiplied by -1 if the gene is down-regulated (negative fold-change). This means that significant down-regulated genes will be at the top of the rank, and significantly up-regulated genes at the bottom. In the middle will be those with non-significant p-values. There other ways of ranking genes. Then the entire ranked list is used to assess how the genes of each gene set are distributed across the ranked list. To do this, GSEA walks down the ranked list of genes, increasing a running-sum statistic when a gene belongs to the set and decreasing it when the gene does not. A simplified example is shown in the following figure:
#' GSEA uses ranked genes based on a measure of each gene's differential expression with respect to the two phenotypes. In our case genes are ranked with the formula -log10(pvalue) multiplied by -1 if the gene is down-regulated (negative fold-change). This means that significant down-regulated genes will be at the top of the rank, and significantly up-regulated genes at the bottom. In the middle will be those with non-significant p-values. There other ways of ranking genes. Then the entire ranked list is used to assess how the genes of each gene set are distributed across the ranked list. To do this, GSEA walks down the ranked list of genes, increasing a running-sum statistic when a gene belongs to the set and decreasing it when the gene does not. A simplified example is shown in the following figure:
#'
#' ![figure](https://www.genepattern.org/uploaded/content_gseapic1.png)
#'
......@@ -216,8 +216,8 @@ gene_df <- read_tsv(de_file) %>%
ensembl_gene_id,
contrast = paste(condition_1, "vs", condition_2),
logfc = c1_over_c2_logfc,
padj = ifelse(padj == 0, 1e-314, padj),
rank_score = -log10(padj)
pvalue = ifelse(pvalue == 0, 1e-314, pvalue),
rank_score = -log10(pvalue)
) %>%
mutate(rank_score = ifelse(logfc < 0, rank_score * -1, rank_score))
......@@ -339,7 +339,7 @@ fgseaRes_l <- ranks_l %>%
#' **Note:** At the top of the list (left of x-axis) are the down-regulated genes, and at the bottom the up-regulated (right of x-axis).
#' Reference: https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html
#'
#' Regarding the plots saved as png in the `figure` folder, they show the value of the ranking metric as you move down the list of ranked genes. So the x-axis is the metric used to rank the genes (-log10(padj) multiplied by -1 or 1 depending on the fold-change) and the y-axis is the gene rank. Only genes part of the pathway are shown. In essence this is a visually appellative plot but not very informative. Here is an example:
#' Regarding the plots saved as png in the `figure` folder, they show the value of the ranking metric as you move down the list of ranked genes. So the x-axis is the metric used to rank the genes (-log10(pvalue) multiplied by -1 or 1 depending on the fold-change) and the y-axis is the gene rank. Only genes part of the pathway are shown. In essence this is a visually appellative plot but not very informative. Here is an example:
#'
#'![gsea_table_example](https://git.mpi-cbg.de/bioinfo/ngs_tools/uploads/5acc91e05974178323e9663233e6d035/gsea_table_example.png)
#'
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment