ngs_tools issueshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues2020-03-27T14:32:23Zhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/16auto-detect strandedness when extracting count matrix2020-03-27T14:32:23Zbrandlbrandl@mpi-cbg.deauto-detect strandedness when extracting count matrixsee dge_workflow/dge_utils.sh:357see dge_workflow/dge_utils.sh:357https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/87Move workflows to separate repositories2020-03-20T08:32:41ZdomingueMove workflows to separate repositoriesThe goal is to keep the NGS tools repo tidy and focused on bulk NGS wokflows. We already started the process by having the single cell workflow in a separate repo, but the ms_workflow is still here.
Ideally we should be able to:
1. cop...The goal is to keep the NGS tools repo tidy and focused on bulk NGS wokflows. We already started the process by having the single cell workflow in a separate repo, but the ms_workflow is still here.
Ideally we should be able to:
1. copy the ms_worklow in it's current state to a separate repo for further development
2. the commits and versioning history should be transferred as well
3. the current ms_workflow stays in ngs_tools to avoid breaking projects.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/86ms_workflow: ms_ms_prop and reorder information are missing for protein IDs w...2020-03-17T15:07:56Zhersemanms_workflow: ms_ms_prop and reorder information are missing for protein IDs without fasta_header informationhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/84dge_workflow: collect_kallisto_data.R fails for paired-end data2020-03-12T10:54:15Zhersemandge_workflow: collect_kallisto_data.R fails for paired-end data- parsing the kallisto.log files failed because the two fastq files are listed not in one line
- additionally, it makes more sense to provide a list of kallisto output folders as argument to the script instead of taking all subfolders in...- parsing the kallisto.log files failed because the two fastq files are listed not in one line
- additionally, it makes more sense to provide a list of kallisto output folders as argument to the script instead of taking all subfolders in the current working directory which causes issues in case other subfolders e.g. from `multiqc` are present as wellhersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/7finish panther enrichment and reintegrate into cp_enrichment.R2020-02-20T16:06:56Zbrandlbrandl@mpi-cbg.definish panther enrichment and reintegrate into cp_enrichment.Rsee
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/panther_enrichment.R
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/cp_enrichment.Rsee
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/panther_enrichment.R
/Volumes/furiosa/bioinfo/scripts/ngs_tools/dev/common/cp_enrichment.Rhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/76iGenome permissions2020-02-20T14:52:43ZdomingueiGenome permissionsSome of the iGenomes folders lack group write permissions, which means that only the owner can add / remove / change files. Is this by design? Or was an accident?Some of the iGenomes folders lack group write permissions, which means that only the owner can add / remove / change files. Is this by design? Or was an accident?henryhersemandominguehenryhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/73Implement renvs2020-02-20T14:49:29ZdomingueImplement renvsIt started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.It started with packrat in https://git.mpi-cbg.de/bioinfo/ngs_tools/issues/53# and after some testing I decided that it was worth it to add it as a function to our workflow. Still under testing.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/26update to more recent https://github.com/deeptools/deepTools/releases2020-02-20T14:19:37Zbrandlbrandl@mpi-cbg.deupdate to more recent https://github.com/deeptools/deepTools/releaseshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/25report improvements2020-02-20T14:19:05Zbrandlbrandl@mpi-cbg.dereport improvements* remove pointless R output **[done]**
* include materials and methods section
* explain algn efficiency
* more docu for lib-complexity in fastqc report
* add Figure 9.3: Features: genes seen in the whole data set. The x-axis displays t...* remove pointless R output **[done]**
* include materials and methods section
* explain algn efficiency
* more docu for lib-complexity in fastqc report
* add Figure 9.3: Features: genes seen in the whole data set. The x-axis displays the number of uniquely aligned fragments and the y-axis shows how many features are seen within each x subset of fragments.
* add PCA (1−2) of top 500 most diverse genes
* add toc (floating if we can adjust content width) or via markdown if not **[done]**
* detail out column model of main results table(s)
* render ensbml link in interactive table (make sure to also include non-ensembl data)
* prep example data for testing
* use symbols for "Extract most signifiantly changed genes and display" heatmap (if present)https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/11Integrate tools to assess gc/3/5 bias, insert size, and capture efficiency2020-02-20T12:42:13Zbrandlbrandl@mpi-cbg.deIntegrate tools to assess gc/3/5 bias, insert size, and capture efficiencye.g. using http://deeptools.readthedocs.io/en/latest/content/tools/computeGCBias.html?highlight=bias
(see "Multi-perspective quality control of Illumina RNA sequencing data analysis" http://bfg.oxfordjournals.org/content/early/2016/09/28...e.g. using http://deeptools.readthedocs.io/en/latest/content/tools/computeGCBias.html?highlight=bias
(see "Multi-perspective quality control of Illumina RNA sequencing data analysis" http://bfg.oxfordjournals.org/content/early/2016/09/28/bfgp.elw035.abstract )
@lakshman opinion?https://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/83ms_workflow: refine reorder information of protein groups2020-02-18T14:07:47Zhersemanms_workflow: refine reorder information of protein groupsCurrently, we only give information on whether a protein group was reordered or not prior to merging of the tables; however, this does not include information on whether the (alphabetical) reordering for the individual protein groups too...Currently, we only give information on whether a protein group was reordered or not prior to merging of the tables; however, this does not include information on whether the (alphabetical) reordering for the individual protein groups took place in all samples and thus, although they are reordered, were originally all the same, or if protein groups of individual samples were only merged because they could be matched after reordering but were different based on the original protein IDs orders.hersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/82general: track unsynchronized files on the cluster2020-02-17T17:18:02Zhersemangeneral: track unsynchronized files on the clusterUsing the cluster for data analysis and the project spaces on the fileserver for permanent storage creates a situation in which it is difficult to track whether the fileserver is up-to-date. This is especially important for older project...Using the cluster for data analysis and the project spaces on the fileserver for permanent storage creates a situation in which it is difficult to track whether the fileserver is up-to-date. This is especially important for older projects which could in principle be removed from the cluster but may contain unsaved changes which were never transferred, e.g. due to missing feedback from the researcher with regard to a specific setting.
The aim of this issue is to find a way to track changes made on the cluster which are not yet synchronized with the fileserver.hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/81dge _workflow: analyze duplicates has a broken link to the plot of the model ...2020-02-10T13:08:45Zdominguedge _workflow: analyze duplicates has a broken link to the plot of the model (dupRadar)Right now it is a link to a non-permanent location (image hosting site). It should be replaced by a more stable location to avoid issues in the future.Right now it is a link to a non-permanent location (image hosting site). It should be replaced by a more stable location to avoid issues in the future.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/80MS: How is imputation done?2020-01-24T09:44:01ZdomingueMS: How is imputation done?- imputation:
+ applied only to zero counts?
+ How about NAs?
+ Does it also affect non-zero values?
+ are there (good) guidelines on how to choose the methods? Check the `MSnbase` package and imputation vignette of `DEP`.- imputation:
+ applied only to zero counts?
+ How about NAs?
+ Does it also affect non-zero values?
+ are there (good) guidelines on how to choose the methods? Check the `MSnbase` package and imputation vignette of `DEP`.dominguedominguehttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/77conda and bioconductor version2020-01-21T15:41:41Zdomingueconda and bioconductor versionThe idea is that could use conda to setup an environment with different R / BioC packages to avoid situations when a package is update without backwards compatibility (as it happened with `scater`).
We need to test if in a conda env for...The idea is that could use conda to setup an environment with different R / BioC packages to avoid situations when a package is update without backwards compatibility (as it happened with `scater`).
We need to test if in a conda env for `R3.5` the BioC version installed is the latest, or that which was released with that R version.hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/28ms_workflow: Improvemnts2019-12-10T08:25:51Zbrandlbrandl@mpi-cbg.dems_workflow: Improvemnts- [x] separate protein groups
- [x] report 0-proportions similar to NAs
- [x] expose renaming scheme as an argument
- [ ] `protein_acc` extraction depends on study (with/witnout name, w/o separator). Need to generify `protein_acc=str_spl...- [x] separate protein groups
- [x] report 0-proportions similar to NAs
- [x] expose renaming scheme as an argument
- [ ] `protein_acc` extraction depends on study (with/witnout name, w/o separator). Need to generify `protein_acc=str_split_fixed(protein_ids, "[|]", 3)`. One way: `--extract extrac_acc.R` which defines extractor function -> `protein_acc=extract_fun(protein_ids)`
- [x] detect presence/absence of identifcation types --> conditionaed executino of corresponding bits
- [ ] try to postpone annotation handling to end of data-prep workflow/analysis
- [x] fix result table links _#' [identSummary](`r add_prefix("ident_types_summary.txt")`)_
- [x] How to auto-adjust imputation proportion?
- [x] We just need imputation because we want an abundance matrix for limma. for t-tests neither na->0 nor impuations are required because we can work with long data.
limma
* also show batch-corrected qc plots (clustering, pca)
* _man muss das mal alles durchschauen_
* why does the voom with a `~condition` design fix the sample clustering in `file:///Volumes/project-stepien/stepien_ms_fractions/data/limma/p_vs_s/dge_limma.html`
* add condition/sample colors to _voom vs raw_ plot
* externalize annotation of results
* why do ma plots look so weired?
![image](/uploads/bcf9f12174dc8d16f35795496ce46856/image.png)hersemanhersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/37ms_workflow: handling of contaminations2019-12-10T08:11:26Zhersemanms_workflow: handling of contaminations- REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script
- think about removal of keratin as a ...- REV__ entries mean non-sense sequences (means that there was a match against the reverse of an entry of the database of interest); REV__ entries can be removed right at the beginning of the script
- think about removal of keratin as a standard set-up (maybe Anna can provide us with a list of most common contaminations and we can start by reporting those specifically)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/15try RNA-SeQC to get qc for bam files2019-12-09T13:42:04Zbrandlbrandl@mpi-cbg.detry RNA-SeQC to get qc for bam fileshttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/54general: add multiQC report to fastqc_summary2019-12-02T14:45:58Zhersemangeneral: add multiQC report to fastqc_summary- [multiQC](https://multiqc.info/)- [multiQC](https://multiqc.info/)hersemandominguehersemanhttps://git.mpi-cbg.de/bioinfo/ngs_tools/-/issues/74dge_workflow: fix lcf help2019-12-02T14:43:31Zdominguedge_workflow: fix lcf helpRight now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical mode...Right now it says that the values is used as cut off for reporting gene which is incorrect:
https://git.mpi-cbg.de/bioinfo/ngs_tools/blob/master/dge_workflow/featcounts_deseq_mf.R#L40
In fact this value is fed into the statistical modelling leading to a more strict set of results. The help should reflect that.
See:
- https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#tests-of-log2-fold-change-above-or-below-a-threshold
https://support.bioconductor.org/p/101504/dominguedomingue