Ilan Schnell - Early History of the Anaconda Distribution

The Early History of the Anaconda Distribution

(Ilan Schnell, 2018)

When Continuum Analytics started its business in 2012, the main product focus was Wakari, a cloud service platform, where data science users would analyze their data on servers through a web interface. The web interface allowed users to open an interactive Python prompt from which data stored on the cloud server was accessed, processed and displayed. In order to make this possible, the user facing Python interpreter needed to have access to commonly used Python libraries, such as NumPy, SciPy, Pandas, etc.

The first version of Anaconda (0.8), which was announced to the public at the SciPy conference in July of 2012, was rather limited. It was only available for 64-bit (x86_64) Linux because this was the platform Wakari was going to use, and it could only be installed into a specific file system location (/opt/anaconda). Most importantly only the complete downloadable installer was available without a package manager meaning when a new version of Anaconda was released, the entire installer had to be downloaded and installed. However, for the purpose of creating the early Anaconda installers, it proved useful to create smaller binary software packages which contained individual components of the final product. These individual components developed into what are now the well-known "conda" packages.

Aside from the fact Anaconda was missing a package manager, we realized the benefit of having a simple install method for the so-called "SciPy Stack" (that is the scientific set of packages commonly used by data scientists). In August 2012, we decided to add a MacOSX (x86_64) version of Anaconda (0.9) since most of our own developers were using Mac Books.

Additionally, since some our customers were using Windows and we already had support for Linux and MacOSX, we decided to add Windows support to the Anaconda distribution in September of 2012. This was the first release (1.0) of Anaconda which supported all three commonly used operating systems (Windows, MacOSX and Linux). All the components of the final installers were using a preliminary version of conda packages internally to make creation easier; however, no package manager was available at this time.

At this stage, package management became our primary focus. Beginning in 1998, the Python community started developing different packaging solutions; however, all these efforts were limited to packaging Python packages. This meant that the Python interpreter and other required packages were already available on the system. While the solutions of the Python community were well suited for most Python projects they were not suited for the SciPy Stack. Libraries such as SciPy have non-Python dependencies (i.e. ATLAS or MKL) that could not be installed by the package managers developed by the Python community. Moreover, the Python interpreter itself could not be installed by these package managers.

Since we were already using conda packages as "build artifacts" internally, for the construction of our installers, we decided to use these binary packages as a way to update remote installations of Anaconda. In October 2012, Anaconda 1.1 was released which included the first release (1.0) of the new conda package manager. The first conda version was preliminary and was only able to update the set of conda packages which make up the Anaconda distribution.

Although conda itself was very limited in its functionality, at this point, the concept of a conda package was very well-defined as a compressed tar archive. This archive contained the binary files being installed into an "installation prefix". In order to host a "conda repository" (a collection of conda packages for various target platforms), it was necessary to index its content in a way that would make it easy for conda to see both the available versions of the conda packages as well as the dependencies each conda package contains. The solution was to have one directory per platform (named linux-64, osx-64, etc.) which in addition to the conda packages themselves contained an index file named repodata.json. This directory structure was then served over HTTP. For example, when conda is running a 32-bit Windows machine, it would look for <base url>/win-32/repodata.json to know which conda packages were available on the remote repository.

By the end of 2012, the Anaconda distribution was available for Linux, MacOSX and Windows with about 100 packages; however, only Python version 2.7 was supported. Even though conda was very rudimentary, the conda repository structure was established. At this time, conda-build, constructor and many other useful tools did not exist. Conda packages and installers were only being created using an internal system out of which these (as well as other tools) later evolved.

At the beginning of 2013, Python 2 was still by far the most commonly used version, and Anaconda was using Python 2.7. With conda giving the ability to install different versions of Python, we added full support of Python 2.6 in Anaconda 1.3 by February 2013. This meant that all supported packages depending on Python had to be rebuild against Python 2.6. This was not a very difficult task, because at this point in time almost all Python packages still supported Python 2.6. Anaconda 1.3 supporting Python 2.6 meant that users could create a Python 2.6 conda environment with all "Anaconda" packages included, the packages which are also included in the distribution installers (with the exception of conda).

As Python 3 became more widely used, we started building Python packages against Python 3.3, which was the current Python 3 version at the time. Obviously the Python community was headed that way, and our goal was to be the first distribution, which supported Python 2 and 3 in a consistent way. So in March 2013, we released Anaconda 1.4 which allowed users to create fully supported conda environments based on Python 2.6, 2.7 or 3.3. The Anaconda installers themselves were still only based on Python 2.7.

Supporting 3 different versions of Python, and more packages, in particular older versions of packages, it became evident that conda was reaching its limitations in terms of dependency resolution. At this point, conda was using a graph-based algorithm on top of hand-coded constraints to find a solution to the install dependency problem. We started looking into using SAT solvers to solve the install dependency problem, as other package management systems, for example libzypp and Zero install were already using this approach.

The first step towards implementing the SAT based approach was to choose an existing SAT solver. After some research, we settled on PicoSAT because it was written in a single C-code file, and had a simple C interface. Since, no Python interface existed, we created the PycoSAT, a simple Python C extension module. To familiarize ourselves with Boolean problems and also to test PycoSAT, we created a SAT based Sudoku solver. For each possible digit (1-9) in the 81 Sudoku cells, we assigned a Boolean variable, and translate the Sudoku problem into Boolean clauses.

Translating the install problem into a Boolean problem is well described in this paper. The basic idea is very similar to solving the Sudoku problem using a SAT solver. That is, for each package, one creates a Boolean variable, dependencies are then formulated as clauses. After solving the corresponding SAT problem, one knows which packages need to be installed. In June 2013, the first SAT solver based conda was released, and became part of the Anaconda 1.6 release. This release also introduced the graphical .pkg installer on MacOSX.

At this point in time, downloading the Anaconda installer (about 200 to 300MB, depending on the operating system) and installing the full set of over 100 conda packages was the only (easy) way to get a working conda onto the users system. As the number of conda packages in our repository grew, and conda became more popular, we realized that providing installers that only contained conda (and its dependencies) would be very useful for the community. We called these installers "Miniconda", and the first of these installers was also released in June of 2013 along with the Anaconda 1.6 release. The end user was now able to download an installer which was only about 20MB large in order to get a working conda onto their system.

As Anaconda become more and more used by the community, the demand for building conda packages grew. Until now all conda packages had been built using a company internal system. In order to allow the community to build their own conda packages, we decided to add the conda build command to conda (this was before conda-build became its own project). This required rewriting our internal build tool in a way that it would not be tied to other parts of the internal system. Along with our internal system for building conda packages came a set of "recipes" which contained the package meta-data and instructions for building the conda packages. Therefore, we also had to introduce the a more formal "conda recipe" definition to the wider public. This, along with adding the conda build command, was done with the conda 1.7 release in June 2013.

Until now, conda itself was only compatible with Python 2.7, even though it has always been possible to install other versions of Python with conda. The conda 1.8 release in July 2013 added Python 3 support to the conda code base, such that conda itself could also run under Python 3.3. While this new functionality did not mean any improvements to the end user, as conda was normally installed as part for the Python 2.7 based Anaconda installers, it allowed Python 3 developers to contribute to the project. It also allowed us to create the first "Miniconda3" installers, in November 2013. This was the first installer which included Python 3. That meant that one could download and install Miniconda3 and have a working conda running on the Python 3.3 interpreter.