python3 transition - dmwm/WMCore GitHub Wiki

Identifying Python2 code to be dropped

Some of the automatic python2 to python3 translation can be performed with the 2to3 standard tool available with any python3 installation. It can be done with a command like:

2to3 bin/ -w -n

which would update all the files under the bin/ directory from python2 to python3. Some wrong and/or sub-optimal updates to note are:

  • it fails to deal with meta_classes. The way forward here would be to manually update the class to MyClass(parent, metaclass=MetaClass)
  • conflicts with past.builtins. E.g., it changes from from past.builtins import basestring to from past.builtins import str. The way forward with that is to either delete completely that line, or to change it to from builtins import str.

Some useful grep commands to spot what needs to be updated is listed below:

egrep -rI 'PY2|PY3|python' * | grep -v 'env python'
grep -I -r 'from future' *
grep -I -r 'from __future' *
grep -I -r 'from past' bin/*

note that some scripts don't necessarily have the .py extension, so searching only for python files is an incomplete solution.

How to drop py2 support

This is not urgent and can be divided into two main phases:

  1. Stop using the py2 runtime, do not provide bugfix for errors occurring in py2 only, but maintain potential py2 compatibility for WMCore clients
  2. Remove the compatibility layer for both py2 and py3, so that dmwm/WMCore works with py3 only

The first phase will not require any change in our code, it is merely a change in how we address bugs and communication with other teams.

The second phase will require some development and the main steps will be

  • if using python 3.8.2, move to using pickle protocol 5 everywhere
    • i would suggest using everywhere this variable and fixing it to a specific number, so that we do not encounter bad surprises when comp changes the runtime version.
  • remove from __future__ import statements
  • remove from past import statements
  • remove from builtins import statements
  • the following step means that we should also stop using basestring. it should be enough to
    • remove the from past.builtins import basestring
    • isinstance(_, basestring) -> isinstance(_, (str, bytes))
  • fix how dictionaries are iterated over:
    • for k in viewkeys(mydict): -> for k in mydict
    • for v in viewvalues(mydict): -> for v in mydict.values()
    • for v in listvalues(mydict): -> for v in list(mydict.values())
    • for k, v in viewitems(mydict) -> for k, v in mydict.items()
    • for k, v in listitems(mydict) -> for k, v in list(mydict.items())
    • these are only examples. such changes need to be done also when the iteration over as dictionary is not used in a loop but in other cases, such as creating a set. Every view*() and list*() iterator provided by python future's builtins needs to be replaced by the appropriate statement.
  • remove standard_library.install_aliases(), which was mainly used to access the backported version of py3 urllib and httplib into py2
  • make if PY3 the default, remove all the code in if PY2
    • if this is used in decodeBytesToUnicodeConditional or encodeUnicodeToBytesConditional, then simply use decodeBytesConditional and encodeUnicodeToBytes

Then, we can start using all the shiny new features that py3 provides!

old information

The python3 transition within WMCore isn't really a migration to python3, but it's meant to be a modernization of our code such that it's compatible with python 2.7 and python 3 (latest stable release being 3.8.x at the moment). It's unclear whether python 2.6 would have to be supported as well (especially for the WMRuntime package).

Many of the CMS Computing services are maintained and built by our own group, as well as many of their dependencies. The CMS Computing software stack is currently maintained in this repository/branch: https://github.com/cms-sw/cmsdist/tree/comp_gcc630 where we also build many of the python libraries (either for python2 or python3). Thus, during this python migration, there will be the need to also build new (python) spec files for the required dependencies; and/or to update those that are out-dated or inconsistent between py2 and py3. We use this model such that we have full control of all the dependencies shipped with our CMS software, including any possible patches needed.

This link https://docs.python.org/3/howto/pyporting.html has a lot of good stuff on all the differences between python 2 and 3. The way we have planned this migration considers passing our code through python-futurize http://python-future.org/automatic_conversion.html . Developers should consider doing this now when changing existing code and validating unit tests.

Py2 to Py3 transition work for WMCore

Work of Summer student on py2 to py3 transition is summarized

  • wiki page describing all performed steps (likely deprecated)
  • twiki page summarizing futurize steps for different use-cases (either deprecated or to be reviewed)
Individual notes
items(), keys(), and values()

Python2 uses these keywords to and creates lists of them. This can take a lot of memory. Python3 has the same syntax, but they are now iterators. In Python2, iteritems(), iterkeys(), and itervalues() behave the same as the Python3 versions. There is a "problem" with the futurist fixer for these issues in that it

  • Converts python2 uses of items() to list(items()) - not a problem, this is just explicit
  • Converts iteritems() to items()
  • This is OK for python3, but on python2 possibly alters the performance of the code
  • And if you convert this again you now end up with list(items()) in your python3 code altering the performance under Python3 too

Eric's proposal is to use the futurize fixer, but discard all changes that change iteritems() etc to the python 3 versions and use that as the python2 code. We would run it a second time to create a dedicated python3 version. We could also take this opportunity to review our uses of items(), etc in python2 since in most cases we could be using the iterator versions. The most common case where we can't do this is in doing something like len(items()).