Escaping from Anaconda
Photo by David Clode on Unsplash
Sometime a friendly snake can turn dangerous. Here are some hints to escape from Anaconda
The code for this post is available here
Disclaimer
The python code presented below was not developed for my current employer but in my free time so no copyright can be claimed from any company; I offer it as is with MIT license and no implied warranty of any kind.
Conda history and benefit
Working with Python in Windows used to be difficult in the past (around years 2000-2010)
- Python installer used to insert the latest version in the registry
- people was trying to double-click python files with mixed results
- managing multiple python installation was nearly impossible
When I first started using some c-based python libraries in windows it was a real nightmare:
- ABI incompatibility between visual-c compiled libraries and gnu-gcc
- Compiling from source was not always a viable option
- Wheel binary packages were not always available
At that time the miniconda package manager was solving all of these issues and worked better than pip when removing packages
Recently the commercial licenses to access anaconda repo changed and my company decided to opt out, so we were suddenly looking for solutions to replace conda
Alternative tools
many more python tools are now available like
py.exe
the python launcher for windowsuv
a project management toolpoetry
another project management toolhatch
…
All of these can be used together to replace what conda used to do as a package manager
Installing Python(s)
Managing multiple installation in windows
it is possible to download python installation from https://www.python.org/downloads/
These are now installed in independent directories and can be accessed by using the python launcher py
py -3.6 --version
I find it convenient to create virtual environments with py
and keep working from there, more details below
Managing multiple installation in Linux
I found very convenient to use uv to manage different installation in a linux distribution
- it’s fast
- it may be used for way more tasks
Unfortunately today (november 2024) uv seems not able to look at pip configuration files so I use it to bootstrap a virtual environment
- to install uv execute the following command
curl -LsSf https://astral.sh/uv/install.sh | sh
uv can install or find existing python installations with the “uv python” subcommand
the run
subcommand can be used to launch a specific project or python interpreter
Update PIP to the latest version
sometime the default pip version is not updated; updating it will solve a lot of dependency issues: in windows
py -3.6 python -m pip install --upgrade pip
in linux
uv run --python 3.8 python -m pip install --upgrade pip
Recover an old project
Create a virtualenv
to properly connect your local interpreter and segregate packages: in windows
py -3.6 -m venv init myproject
or in linux
uv run --python 3.8 -m venv myproject
Recovering dependencies from conda env
this command will dump all of the dependencies, including those automatically added by conda
conda env export --file myproject.yml
in more recont versions of conda, with this command you can extract only those dependencies you added, in some cases this may be enough
conda env export --from-history --file myproject_reduced.yml
Modelling the translation
I prefer to create clear models in order to make it easier to work with my data
In this case I modelled a dependency with a package name and a list of constraints (which is the main PIP case)
The Environment class models the conda environment, while the Requiments class models the requirements.txt
from typing import Union, List from dataclasses import dataclass @dataclass class Constr: version: Union[List[int],str] operator: str def get_str_version(self): if type(self.version) == str: return self.version else: return ".".join([str(i) for i in self.version]) @dataclass class Dep: package: str constraints: List[Constr] @dataclass class Environment: name: str deps: List[Dep] prefix: str @dataclass class Requirements: deps: List[Dep]
Parsing the yaml
First we read the yaml file using pyyaml library
import re from pathlib import Path from yaml import load, CLoader as Loader from typing import Union, List from .models import Constr, Dep, Environment def read(filename: Union[str,Path]): with open(filename) as f: base_dict = load(f, Loader) deps = [] for dep in base_dict['dependencies']: if type(dep) == str: deps.append(parse_conda_dep(dep)) elif type(dep) == dict: for pip_dep in dep['pip']: deps.append(parse_pip_dep(pip_dep)) else: raise Exception(f"unknown dependency type '{dep}' : {type(dep)}") return Environment(name=base_dict['name'],deps=deps,prefix=base_dict['prefix'])
then we extract the conda dependencies removing the hash code
CONDA_RE = re.compile(r"(?P<package>[^=]+)=(?P<version>[^=]+)") def parse_conda_dep(value: str): matching = CONDA_RE.match(value) assert matching is not None, f"cannot parse conda dependency {value}" groups = matching.groupdict() version=parse_version(groups['version']) return Dep( package=groups['package'], constraints=[ Constr( version=version, operator='==' ) ] )
it may be convenient to have version numbers if any for future expansions
def parse_version(value: str): try: version = [int(i) for i in value.split('.')] except ValueError: version = value return version
pip constraints are little different and may be multiple
PIP_RE = re.compile(r"(?P<package>[_A-Za-z0-9\-]+)(?P<constraints>.*)") PIP_CONSTRAINT = re.compile(r"(?P<operator>[=~\^><]+)(?P<version>.*)") def parse_pip_dep(value: str): dep_matching = PIP_RE.match(value) assert dep_matching is not None, f"cannot parse pip dependency {value}" dep_groups = dep_matching.groupdict() constraints = [] for c in dep_groups['constraints'].split(','): constr_matching = PIP_CONSTRAINT.match(c) assert constr_matching is not None, f"cannot parse pip constraint {c} in {value}" constr_groups = constr_matching.groupdict() version = parse_version(constr_groups['version']) constraints.append( Constr( version = version, operator = constr_groups['operator'] ) ) return Dep( package = dep_groups['package'], constraints = constraints )
Dumping the requirements
This is the naive implementation to dump all requirements in a file
Of course the transformation function may contain way more logic to generate more clever constraints than ==
from pathlib import Path from .models import Environment, Requirements from typing import Union def env_to_requirement(env: Environment): return Requirements(deps=env.deps) def dump_requirements(reqs: Requirements): for dep in reqs.deps: yield "{}{}".format( dep.package, ",".join([ "{}{}".format( c.operator, c.get_str_version() ) for c in dep.constraints ]) ) def write_requirements(reqs: Requirements, path: Union[str,Path]): with open(path, mode="tw") as f: for line in dump_requirements(reqs): print(line,file=f)
Packaging our own old dependencies
Create a separate directory for dependencies
I find it useful to separate the directory where I’m fixing my dependencies from the final environments; usually a parallel directory e.g. “repos”
My directory layout looks like this now
- envs
- myapp
- repos
- mydep1
- mydep2
Download from your repo
git clone ssh://myserver/myproject-url.git
Reset to a specific version
sometime you may need a version of your package which is not the latest one
cd myproject git log -n 10 --oneline
this is going to list some versions
git reset --hard abcd33d
create a forked branch
git checkout feature/myapp
Create a dedicated venv to build your package
in windows
py -3.6 -m venv init .venv .venv\Scripts\activate
in linux
uv run --python 3.8 python -m venv init .venv source .venv/bin/activate
Use Poetry to package your code
also hatch can be used but I had some issues with dependencies on old projects
in windows
cd myproject
.venv\Scripts\activate.bat
pip install poetry
poetry init
in linux
cd myproject source .venv/bin/activate pip install poetry poetry init
Here you can exactly create your version of the package so to satisfy the dependencies
Also you are able to interactively choose which version of the dependent packages you want
now you may want to test your code
- build a wheel
poetry build
reactivate the app venv in linux
deactivate cd ../../envs/myapp source .venv/bin/activate pip install ../../deps/myproject/dist/myproject-0.1.0-py3-none-any.whl
Update repo
finally let’s update all into our base repo
git add pyproject.toml git commit -m "packaged" git push --set-upstream origin feature/myapp
Conclusions
this may be a very long and sensitive process, so additional care is needed to make sure that the new packages are working correctly.
In future posts I will cover also how to update containers removing conda dependencies