Commit 6c28e105 authored by Holger Brandl's avatar Holger Brandl

revised imputation scheme for ms data

parent ebb14943
......@@ -11,19 +11,18 @@ Project Setup
1. Create new git repo under `git.mpi-cbg.de`
2. Clone repo to cluster `../scripts/${PRJ_NAME}`
2. Clone repo to cluster `scripts/${PRJ_NAME}`
3. Create project directory under `/projects/bioinfo/data/${PRJ_NAME}`
4. Create `README.md` in repo and populate with project info
5. Check quota with `quotalist` for enough space (rule of thumb 10gb per sample)
RNA-Seq with DESeq2
Differential expression with DESeq2
-------------------
1. Copy template `dge_workflow/dge_star_template.sh` into project repo
2. Adjust template properties and run.
1. Copy template `dge_workflow/dge_star_template.sh` into project repo
2. Adjust template properties and run `featcounts_deseq_mf.R`.
Differential expression with limma
......@@ -31,8 +30,15 @@ Differential expression with limma
Prepare `design`, `contrasts` and `expression/abundance matrix` as for DESeq2 workflow and run
```bash
rend.R --toc ${NGS_TOOLS}/dge_workflow/limma/dge_limma.R --contrasts example_contrasts.txt ../inten_matrix_acc.txt ../exp_design.txt
## basic usage
rend.R --toc ${NGS_TOOLS}/dge_workflow/limma/dge_limma.R inten_matrix_acc.txt exp_design.txt
## with more optional arguments
rend.R --toc ${NGS_TOOLS}/dge_workflow/limma/dge_limma.R --gene_info mmus_ens_aug2017_uniprot_compl_gene_info.txt --contrasts example_contrasts.txt inten_matrix_acc.txt exp_design.txt
```
TO provide custom gene info for non-ensembl ids, provivde `--gene_info <tsv_with_gene_id_as_first_column>` argument.
\ No newline at end of file
For a complete list of arguments
see `${NGS_TOOLS}/dge_workflow/limma/dge_limma.R --help`
......@@ -50,6 +50,12 @@ if (is.numeric(pcutoff))opts$qcutoff = NULL
lfc_cutoff = if (is.null(opts$lfc))0 else as.numeric(opts$lfc)
#' Run configuration was
vec_as_df(unlist(opts)) %>%
filter(! str_detect(name, "^[<-]")) %>%
kable()
########################################################################################################################
#' ## Data Preparation
#' The working directory of the analysis was: `r getwd()`
......@@ -64,11 +70,12 @@ countData = read_tsv(count_matrix_file) %T>% glimpse
names(countData)[1] = "gene_id"
# zero-imputation is disabled here because this should be better implement per experiment
# countData %<>% mutate_if(is.numeric, funs(replace(., is.na(.), 0)))
########################################################################################################################
#' ## QC, Normalization and Preprocessing
countData %<>% mutate_if(is.numeric, funs(replace(., is.na(.), 0)))
cdLong = gather(countData, sample, expr, - gene_id) %T>% glimpse
......@@ -168,7 +175,7 @@ expMatrix = countData %>% column_to_rownames("gene_id") %>% as.matrix
#exp_study = DGEList(counts=column2rownames(countsMatrix, "ensembl_gene_id"), group=names(countsMatrix)[-1])
exp_study = DGEList(counts = expMatrix, group = orderMatcheExpDesign$condition)
par(mfrow=c(1,2)) ## 2panel plot for mean-var relationship before and after boom
# par(mfrow=c(1,2)) ## 2panel plot for mean-var relationship before and after boom
## Removing heteroscedasticity from count data
voomNorm <- voom(exp_study, design, plot = TRUE)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment