Escaping from Anaconda

david-clode--6f9-eAybjA-unsplash.jpg

Photo by David Clode on Unsplash

Sometime a friendly snake can turn dangerous. Here are some hints to escape from Anaconda

The code for this post is available here

Disclaimer

The python code presented below was not developed for my current employer but in my free time so no copyright can be claimed from any company; I offer it as is with MIT license and no implied warranty of any kind.

Conda history and benefit

Working with Python in Windows used to be difficult in the past (around years 2000-2010)

  • Python installer used to insert the latest version in the registry
  • people was trying to double-click python files with mixed results
  • managing multiple python installation was nearly impossible

When I first started using some c-based python libraries in windows it was a real nightmare:

  • ABI incompatibility between visual-c compiled libraries and gnu-gcc
  • Compiling from source was not always a viable option
  • Wheel binary packages were not always available

At that time the miniconda package manager was solving all of these issues and worked better than pip when removing packages

Recently the commercial licenses to access anaconda repo changed and my company decided to opt out, so we were suddenly looking for solutions to replace conda

Alternative tools

many more python tools are now available like

  • py.exe the python launcher for windows
  • uv a project management tool
  • poetry another project management tool
  • hatch

All of these can be used together to replace what conda used to do as a package manager

Installing Python(s)

Managing multiple installation in windows

it is possible to download python installation from https://www.python.org/downloads/

These are now installed in independent directories and can be accessed by using the python launcher py

py -3.6 --version

I find it convenient to create virtual environments with py and keep working from there, more details below

Managing multiple installation in Linux

I found very convenient to use uv to manage different installation in a linux distribution

  • it’s fast
  • it may be used for way more tasks

Unfortunately today (november 2024) uv seems not able to look at pip configuration files so I use it to bootstrap a virtual environment

  • to install uv execute the following command
curl -LsSf https://astral.sh/uv/install.sh | sh

uv can install or find existing python installations with the “uv python” subcommand

the run subcommand can be used to launch a specific project or python interpreter

Update PIP to the latest version

sometime the default pip version is not updated; updating it will solve a lot of dependency issues: in windows

py -3.6 python -m pip install --upgrade pip

in linux

uv run --python 3.8 python -m pip install --upgrade pip

Recover an old project

Create a virtualenv

to properly connect your local interpreter and segregate packages: in windows

py -3.6 -m venv init myproject

or in linux

uv run --python 3.8 -m venv myproject

Recovering dependencies from conda env

this command will dump all of the dependencies, including those automatically added by conda

conda env export --file myproject.yml

in more recont versions of conda, with this command you can extract only those dependencies you added, in some cases this may be enough

conda env export --from-history --file myproject_reduced.yml

Modelling the translation

I prefer to create clear models in order to make it easier to work with my data

In this case I modelled a dependency with a package name and a list of constraints (which is the main PIP case)

The Environment class models the conda environment, while the Requiments class models the requirements.txt

from typing import Union, List
from dataclasses import dataclass

@dataclass
class Constr:
    version: Union[List[int],str]
    operator: str
    def get_str_version(self):
        if type(self.version) == str:
            return self.version
        else:
            return ".".join([str(i) for i in self.version])

@dataclass
class Dep:
    package: str
    constraints: List[Constr]

@dataclass
class Environment:
    name: str
    deps: List[Dep]
    prefix: str

@dataclass
class Requirements:
    deps: List[Dep]

Parsing the yaml

First we read the yaml file using pyyaml library

import re
from pathlib import Path
from yaml import load, CLoader as Loader
from typing import Union, List
from .models import Constr, Dep, Environment

def read(filename: Union[str,Path]):
    with open(filename) as f:
        base_dict = load(f, Loader)
        deps = []
        for dep in base_dict['dependencies']:
        if type(dep) == str:
                deps.append(parse_conda_dep(dep))
            elif type(dep) == dict:
                for pip_dep in dep['pip']:
                    deps.append(parse_pip_dep(pip_dep))
            else:
                raise Exception(f"unknown dependency type '{dep}' : {type(dep)}")
        return Environment(name=base_dict['name'],deps=deps,prefix=base_dict['prefix'])


then we extract the conda dependencies removing the hash code

CONDA_RE = re.compile(r"(?P<package>[^=]+)=(?P<version>[^=]+)")

def parse_conda_dep(value: str):
    matching = CONDA_RE.match(value)
    assert matching is not None, f"cannot parse conda dependency {value}"
    groups = matching.groupdict()
    version=parse_version(groups['version'])
    return Dep(
        package=groups['package'],
        constraints=[
            Constr(
                version=version,
                operator='=='
            )
        ]
    )


it may be convenient to have version numbers if any for future expansions

def parse_version(value: str):
    try:
        version = [int(i) for i in value.split('.')]
    except ValueError:
        version = value
    return version

pip constraints are little different and may be multiple

PIP_RE = re.compile(r"(?P<package>[_A-Za-z0-9\-]+)(?P<constraints>.*)")
PIP_CONSTRAINT = re.compile(r"(?P<operator>[=~\^><]+)(?P<version>.*)")

def parse_pip_dep(value: str):
    dep_matching = PIP_RE.match(value)
    assert dep_matching is not None, f"cannot parse pip dependency {value}"
    dep_groups = dep_matching.groupdict()
    constraints = []
    for c in dep_groups['constraints'].split(','):
        constr_matching = PIP_CONSTRAINT.match(c)
        assert constr_matching is not None, f"cannot parse pip constraint {c} in {value}"
        constr_groups = constr_matching.groupdict()
        version = parse_version(constr_groups['version'])
        constraints.append(
            Constr(
                version = version,
                operator = constr_groups['operator']
            )
        )
    return Dep(
        package = dep_groups['package'],
        constraints = constraints
    )

Dumping the requirements

This is the naive implementation to dump all requirements in a file

Of course the transformation function may contain way more logic to generate more clever constraints than ==

from pathlib import Path
from .models import Environment, Requirements
from typing import Union

def env_to_requirement(env: Environment):
    return Requirements(deps=env.deps)

def dump_requirements(reqs: Requirements):
    for dep in reqs.deps:
       yield "{}{}".format(
           dep.package,
           ",".join([
               "{}{}".format(
                   c.operator,
                   c.get_str_version()
               )
               for c in dep.constraints
           ])
       )


def write_requirements(reqs: Requirements, path: Union[str,Path]):
    with open(path, mode="tw") as f:
        for line in dump_requirements(reqs):
            print(line,file=f)

Packaging our own old dependencies

Create a separate directory for dependencies

I find it useful to separate the directory where I’m fixing my dependencies from the final environments; usually a parallel directory e.g. “repos”

My directory layout looks like this now

  • envs
    • myapp
  • repos
    • mydep1
    • mydep2

Download from your repo

git clone ssh://myserver/myproject-url.git

Reset to a specific version

sometime you may need a version of your package which is not the latest one

cd myproject
git log -n 10 --oneline

this is going to list some versions

git reset --hard abcd33d

create a forked branch

git checkout feature/myapp

Create a dedicated venv to build your package

in windows

py -3.6 -m venv init .venv
.venv\Scripts\activate

in linux

uv run --python 3.8 python -m venv init .venv
source .venv/bin/activate

Use Poetry to package your code

also hatch can be used but I had some issues with dependencies on old projects

in windows

cd myproject
.venv\Scripts\activate.bat
pip install poetry
poetry init

in linux

cd myproject
source .venv/bin/activate
pip install poetry
poetry init

Here you can exactly create your version of the package so to satisfy the dependencies

Also you are able to interactively choose which version of the dependent packages you want

now you may want to test your code

  1. build a wheel
poetry build
  1. reactivate the app venv in linux

    deactivate
    cd ../../envs/myapp
    source .venv/bin/activate
    pip install ../../deps/myproject/dist/myproject-0.1.0-py3-none-any.whl
    

Update repo

finally let’s update all into our base repo

git add pyproject.toml
git commit -m "packaged"
git push --set-upstream origin feature/myapp

Conclusions

this may be a very long and sensitive process, so additional care is needed to make sure that the new packages are working correctly.

In future posts I will cover also how to update containers removing conda dependencies

marco.p.v.vezzoli

Self taught assembler programming at 11 on my C64 (1983). Never stopped since then -- always looking up for curious things in the software development, data science and AI. Linux and FOSS user since 1994. MSc in physics in 1996. Working in large semiconductor companies since 1997 (STM, Micron) developing analytics and full stack web infrastructures, microservices, ML solutions

You may also like...

Leave a Reply