Following up from #77 (closed), the goal is now to use conda and create stable, shareable environments with different versions of tools / R that we could use for projects.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
We should consider installing some R packages with conda, for instance dotcop and devtools to speed up the the env creation and reduce the manual steps.
Despite having to install many packages manually, with install.packages and BiocManager::install, there were no issues doing this inside the conda env. I rendered the qc analysis of the test dataset (organoids) and it went without issues (except the all.equal problem).
A second attempt to create the SC stable environment, using conda to install all (or as many as possible) R packages with conda install. The idea is to avoid conflicting dependencies and keep a tight control over what is installed.
steps:
parse sessionInfo.txt files to obtain R packages and version used in a stable workflow.
install packages
check for packages that for some reason were not installed and
install the closest possible version of missing packages
I have also created a document explaining how the process works, designed both for our use and for external users. It be committed once we have decided where this will go.
@herseman the conda environments are now ready and each have been tested in a project (SC or DGE). It would be useful if you could also test it to find any potential issues early on. As far as my testing goes, it is ready for production.
To get you started on the cluster just call conda activate /projects/conda/groups/bioinfo/envs/sc_BioC3.8 for single cell or conda activate /projects/conda/groups/bioinfo/envs/dge_BioC3.8. A mode detailed description can be found in:
To set them up in your computer just use the yml files that reside in the same folders (misc/conda_envs), to create the env with the command conda env create -f /path/to/conda_environment_dge_BioC3.8.full.yml -n dge_BioC3.8. It will be installed in the default conda folder. This is has not been tested on OSX so your testing is essential.
Thank you @domingue . Currently, other sc_tools users and I are using the conda_env_py37_r35_v1/conda_env_py37_r35_v1_macOS conda environments (https://git.mpi-cbg.de/scicomp/bioinfo_team/sc_tools/tree/master/misc/conda_env/yaml_files). Except for the now solved openjdk issue in the yaml files it runs fine and à la "never change a winning horse" I am a bit reluctant to change the environment ;-) Especially, because, I also already created a yaml file compatible with macOS. The conda_env_py37_r35_v1 environment is also set up on the cluster. Please feel free to use it for single cell data analysis.
Concerning the dge_BioC3.8, I am happy to give it a try and to give you some feedback. I'll let you know once I am done with the testing.
do we a protocol in place to add packages to that conda env (conda_env_py37_r35_v1)? Or we keep it as is, and if packages are needed to other projects we put them in another env?
I would suggest that we update the conda_env_py37_r35_v1 and the corresponding yaml file in case we add further scripts to the sc_tools which need packages that are not installed yet. The goal is to provide yaml files for conda environments which allow to run everything in the sc_tools repo (see sc_tools workflow usage) and by updating the existing one, we prevent redundancy. I also specified the python and R version in the name of the corresponding conda environment and as long as there is no need to switch to other versions, I would suggest that we maintain conda_env_py37_r35_v1. However, we should work with tags to indicate different sc_tools versions and this would then also be the place were we announce changes in the yaml file/conda environment.