Including files in Python package distributions - lmmx/devnotes GitHub Wiki

Including files in a distributed Python source requires manually specifying them

See: "including files in source distributions with MANIFEST.in https://packaging.python.org/guides/using-manifest-in/

There's some discussion here:

As demonstrated by Anthony Sottile in this video you can also use the package_data argument to setuptools.setup (or distutils.core.setup, equivalently)

Anthony doesn't mention that there's also setuptools_scm which will additionally handle incrementing your version number

The basic idea of setuptools_scm is that whichever files under version control will also be the files to be put into the source distribution (or "source control").

This makes sense to me, and I also like avoiding the need to increment the version number

There's a further discussion on this blog post which can be found linked on StackOverflow questions like this

setuptools_scm appears to have entered the spotlight around 4 or 5 years ago based on what I've seen on Twitter (I may be wrong)

There are also quite heated debates on Twitter among Python core/contributor developers, and reading it I'm inclined to agree with Paul Ganssle's conclusion that "the software is fine". (PEP 517 refers to the build tools requiring internet access among other details which I haven't fully followed)

As can be seen in the README of setuptools_scm, there was a recent feature added which allows you to set the important parameters in pyproject.toml (a config file you often see in Python packages). As of November 2019, setuptools version 42, you can trigger version inference (automated version numbering) by just adding a section to the pyproject.toml file,

[tool.setuptools_scm]

(I don't think you even need to put anything inside it)

Additionally, I followed the instructions to add the build requirements, as when run with just the above, there was still no inclusion of the desired files (in fact it turned out I didn't have setuptools_scm installed, so it was just silently skipping that when reading pyproject.toml)

The full pyproject.toml then looks like:

[build-system]
requires = ["setuptools>=42", "wheel", "setuptools_scm[toml]>=3.4"]

[tool.setuptools_scm]

(Note that the "legacy" script includes setup_requires argument to setuptools.setup which is mentioned in the Twitter thread above as deprecated/bad practice)

...however when this is uploaded to PyPi it is rejected!

HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/
'0.1.dev47+gf1906f5.d20201122' is an invalid value for Version. Error: Can't use PEP 440 local
versions. See https://packaging.python.org/specifications/core-metadata for more information.

(Note that this was visible from the name of the tar file in dist/, I just didn't think that there'd be a problem so ignored its weird filename)

The advice in the top search result here is

def local_scheme(version):
    return ""
 
setup(
    use_scm_version={"local_scheme": local_scheme},
)

However as mentioned above, we're avoiding setting this parameter, and instead supplying it via pyproject.toml. That means that in the section [tool.setuptools_scm] we need to add something equivalent like:

[tool.setuptools_scm]
local_scheme = ""

...but as could've been guessed from looking at this carefully, what local_scheme actually needs to be is a callable returning a blank string, not just a blank string. This could be achieved with a lambda function if the toml will allow that...

[tool.setuptools_scm]
local_scheme = lambda version: ""

...but the TOML format isn't actually Python, it's just meant to be used for strings, integers etc.

To my understanding, this means you cannot use pyproject.toml to achieve the desired result here (besides which version_scheme also needs to be set with a less trivial function, as shown below).

In conclusion (for now), the relevant parts of setup.py should be:

from setuptools import setup, find_packages

def local_scheme(version):
    return ""

def version_scheme(version):
    v = ".".join([version.tag.base_version, str(version.distance)])
    return v

setup(
    ...
    use_scm_version={
        "version_scheme": version_scheme,
        "local_scheme": local_scheme,
    },
    setup_requires=["setuptools_scm"],
    ...
)

This will mean that every git commit you do will become a hash: this means that unless you ensure that every commit happens alongside a twine upload, there will be gaps in the version numbering history.

To release a version now requires tagging a git release, and because of the way this was set up you need to make sure to just tag a major and a minor (as the micro is appended by setuptools_scm), e.g.:

git tag -a v1.0 -m "Releasing version v1.0"
  • -a stands for annotation, i.e. the label of the tagged release
  • -m stands for message, i.e. the description accompanying it (like a commit message)

Now when we run python3 setup.py --version we get 1.0.0 (the major and minor from the tag, and the micro from the version.distance i.e. it is the 0'th commit since the release).

As I understand it then, you must build the package before you commit it, as otherwise the hash will increment. So first build and upload, then git commit, then push the tags

Warning: it is not recommended to use

git push origin --tags

instead you should provide the name of the tag, so in this case

git push origin v1.0

Note that this does not involve a commit, it only sends the tags to GitHub (which then shows up under "releases").

(Note that you don't need to do anything further, the tag will serve as the release).

Unfortunately, as was noted at the start in the comment on StackOverflow, this solution will only work for sdist and not for bdist. This is not seen when running setup.py but is seen when running pip install (retrieving the binary wheel made as bdist)

The solution here is mentioned in the same Q&A (here), to also add include_package_data=True in the arguments to setup.py, which (like setuptools_scm) "aims to include files from version control".

And with that, pip now installs from PyPi with the data files I was wanting, without needing to specify them in a MANIFEST.in.


In conclusion, the setup.py call to setuptools.setup should include all of the following arguments (plus the standard ones):

setup(
    ...
    include_package_data=True,
    use_scm_version={
        "version_scheme": version_scheme,
        "local_scheme": local_scheme,
    },
    setup_requires=["setuptools_scm"],
    install_requires=reqs,
    ...
)

...and with that, pip install will give a working package, complete with all version-controlled data files, without needing to manually include them into a MANIFEST.in

  • A nice example of using MANIFEST.in can be found in Oxford Nanopore's bonito here
    • I think one purpose is to avoid shipping the demo notebooks, which aren't included in the manifest, but are in the repository.
    • This library also has nice examples of a package that specifies commands to download data (in this case the most recent version of a statistical model)

Note that for projects which should be handled more strictly, it would be possible to remove the prerelease (dev) micro version part processing function and solely base the version on the tagged version number.

  • This would mean that a new tag would be required for version incrementation
    • Since git commit hash would no longer be incrementing the micro part, it would not advance with each git commit
    • Instead you would need to explicitly run git tag to increment, which is more precise
    • Also it would not leave gaps between version numbers like the commit hash based method
    • Each tagged release would be more meaningful if it contained multiple commits
      • In this case the project is entirely for dev purposes, and pip install requires --pre to install pre-release versions hence stripping them out of the micro part
  • To be clear, this would mean that a new version could not be uploaded to PyPi without performing git tag, which is not that much different to having to manually write in the version number into the setup.py file each time, which was pretty much why I decided to use auto-versioning in the first place...