The state of Python 3 adoption

Last updated on December 28, 2017, in Python

The first version of Python 3 was released 9 years ago. Unfortunately, Python 2.7 is still leading in some fields. With 2017 coming to an end soon, let's have a look at the current state of the Python adoption.

Firstly, If you haven't read the recent developer survey about Python's ecosystem by JetBrains then I suggest reading it.

In this article, I want to share another way of estimating usage by looking at PyPI download statistics. Fortunately, the data is publicly available and stored in the BigQuery database. To start using it, you just need a google account and basic SQL knowledge.

Here is how to aggregate package statistics for the past 30 days:

  SUBSTR(details.python, 0, 3) as python_version,
  COUNT(*) as download_count,
    DATE_ADD(CURRENT_TIMESTAMP(), -30, "days"),
WHERE'pip' and 
 file.project = 'numpy'
  download_count DESC

More examples.

Visualization of statistics

To get an idea about Python's version distribution, let's visualize relative frequency of download statistics.

Below, I provided usage statistics for the following packages: bokeh, celery, click, cython, django, flask, gensim, jupyter, keras, lxml, matplotlib, nltk, numpy, pandas, pillow, requests, scipy, scrapy, sklearn, spacy, tensorflow and xgboost.

We can the see shifts in the distributions towards Python 3 when looking at Django, Jypiter, spaCy, Cython and Celery. In case of the Django and Celery, there is a good reason for that — they dropped support for Python 2.

However, don't be confused by the numbers, PyPi isn't the best way to get an accurate statistics. Many of the downloads are generated by automated bots, such as continuous integration tools, mirroring clients and tox testing. In reality, the real usage of Python 2.7 should be smaller. Also, it's worth mentioning that many Linux distributions still using Python 2.

p.s. There is no reason to use Python 2 in the upcoming year and I'm not advocating it.