Live tracking of python 3 usage based on a small set of scientific libraries


Update

I’ve shut down the live tracker now but it was an interesting experiment. A screenshot of the app is shown below

Previous article

I’m a big fan of Python and the general improvements that have come with Python 3. While Python 3 growth has been slow it seems that with the introduction of Python 3.5, Python 3 is beginning to gain momentum due to the steady introduction of neat new features such as type hints and @ matrix multiplication.

There still seems to be a belief that Python 2 is better supported than Python 3. but Python 3 should actually by now be better supported based on a projection that more libraries will support Python 3 compared to Python 2 as of May 2016.

Given this I thought it would be useful to crunch on the proportion of Python 3 vs Python 2 package downloads again. A number of people have done this in the past but I thought that it would be neat to have some sort of live tracker to get a daily update on Python 3.

As stated before, there is a lag in these results because of the large quantity of legacy Python 2 software that is still being used on a day to day basis. So the picture of Python 3 based on what is currently being developed now is likely to be more positive than these numbers.

Nevertheless, I thought it would be interesting to create a live tracker that updates daily giving the proportion of Python 3 downloads of a number of key packages from PyPi. I wanted a data science focus so I made a fairly arbitrary choice of 6 packages that I used a lot: numpy, scipy, scikit-learn, scikit-image, pandas and theano.

Results

The results are based on the PyPi bigquery database which stores information on every download. This query extracted and collated any download of the above 6 packages using pip with either version 2 or 3 on a daily basis.

Results are based on data from this year – the data available – so the results will get more interesting as more data becomes available. This live graph updates daily.

screenshot of the previous app

Based on these packages, while Python 3 only seems to make up 20% of the downloads, from June there seems to be a steady trend upwards. I expect that this will accelerate in the next year or two as projects begin to drop Python 2.

A large proportion of these downloads might be due to legacy software and the number of new projects being developed in Python 3 is probably much higher than this proportion. Nevertheless, this is probably a fairly good indicator of the state of general python use for scientific applications. Hopefully growing fast in future!

Disclaimer: To the best of my knowledge this is correct. But it was my first time using various tools such as BigQuery so I will need to triple check the queries.

App creation

A few details on the steps to create this live pytracker app:

  1. Python and SQL script to query the-psf:pypi daily download dataset and extract Python 3 and 2 download statistics for the 6 packages of interest
  2. Store daily results in a mySQL database and calculate daily package proportions into a new table
  3. Python FLASK app to query the mySQL table and serve a Bokeh plot of the current data. App sits on a small VPS Server.

Related Posts

The move towards library agnostic machine learning

Introduction to Keras Autoencoders

Thoughts on web tools for visualisation in medical image analysis research

Packaging a python app for Ubuntu using dh-virtualenv

Drag and drop files into a GUI using python and pyside

Interactive visualisations for research papers

Passing numpy arrays between python and c++ using cython

Jabref gtk theme

Rendering volumes on the web with x3dom