Including files in Python package distributions - lmmx/devnotes GitHub Wiki
Including files in a distributed Python source requires manually specifying them
See: "including files in source distributions with MANIFEST.in
https://packaging.python.org/guides/using-manifest-in/
There's some discussion here:
-
"Do Python projects need a
MANIFEST.in
and what should be in it?" -
"How to include package data with setuptools/distutils?"
- I found the first comment on the accepted answer helpful:
I have been researching this issue for the past hour and have been trying many approaches. As you say,
package_data
works forbdist
and notsdist
. However,MANIFEST.in
works forsdist
, but not forbdist
! Therefore, the best I have been able to come up with is to include bothpackage_data
andMANIFEST.in
in order to accommodate bothbdist
andsdist
.
As demonstrated by Anthony Sottile in this video you can also use the package_data
argument to setuptools.setup
(or distutils.core.setup
, equivalently)
Anthony doesn't mention that there's also setuptools_scm
which will additionally handle
incrementing your version number
The basic idea of setuptools_scm
is that whichever files under version control
will also be the files to be put into the source distribution (or "source control").
This makes sense to me, and I also like avoiding the need to increment the version number
There's a further discussion on this blog post which can be found linked on StackOverflow questions like this
setuptools_scm
appears to have entered the spotlight around 4 or 5 years ago
based on what I've seen on Twitter (I may be wrong)
There are also quite heated debates on Twitter among Python core/contributor developers, and reading it I'm inclined to agree with Paul Ganssle's conclusion that "the software is fine". (PEP 517 refers to the build tools requiring internet access among other details which I haven't fully followed)
- Pretty wild example of a
setup.py
usingsetuptools_scm
: https://github.com/yeatmanlab/pyAFQ/blob/master/setup.py#L62 (again via Twitter)
As can be seen in the README of setuptools_scm
, there was a recent feature added
which allows you to set the important parameters in pyproject.toml
(a config file
you often see in Python packages). As of November 2019, setuptools version 42, you
can trigger version inference (automated version numbering) by just adding a section
to the pyproject.toml
file,
[tool.setuptools_scm]
(I don't think you even need to put anything inside it)
Additionally, I followed the instructions to add the build requirements,
as when run with just the above, there was still no inclusion of the desired files
(in fact it turned out I didn't have setuptools_scm
installed, so it was just
silently skipping that when reading pyproject.toml
)
The full pyproject.toml
then looks like:
[build-system]
requires = ["setuptools>=42", "wheel", "setuptools_scm[toml]>=3.4"]
[tool.setuptools_scm]
(Note that the "legacy" script includes setup_requires
argument to setuptools.setup
which
is mentioned in the Twitter thread above as deprecated/bad practice)
...however when this is uploaded to PyPi it is rejected!
HTTPError: 400 Bad Request from https://upload.pypi.org/legacy/
'0.1.dev47+gf1906f5.d20201122' is an invalid value for Version. Error: Can't use PEP 440 local
versions. See https://packaging.python.org/specifications/core-metadata for more information.
(Note that this was visible from the name of the tar file in dist/
, I just didn't think that
there'd be a problem so ignored its weird filename)
The advice in the top search result here is
def local_scheme(version):
return ""
setup(
use_scm_version={"local_scheme": local_scheme},
)
However as mentioned above, we're avoiding setting this parameter, and instead
supplying it via pyproject.toml
. That means that in the section [tool.setuptools_scm]
we need to add something equivalent like:
[tool.setuptools_scm]
local_scheme = ""
...but as could've been guessed from looking at this carefully, what local_scheme
actually needs
to be is a callable returning a blank string, not just a blank string. This could be achieved with a
lambda
function if the toml
will allow that...
[tool.setuptools_scm]
local_scheme = lambda version: ""
...but the TOML format isn't actually Python, it's just meant to be used for strings, integers etc.
To my understanding, this means you cannot use pyproject.toml
to achieve the desired result here
(besides which version_scheme
also needs to be set with a less trivial function, as shown below).
In conclusion (for now), the relevant parts of setup.py
should be:
from setuptools import setup, find_packages
def local_scheme(version):
return ""
def version_scheme(version):
v = ".".join([version.tag.base_version, str(version.distance)])
return v
setup(
...
use_scm_version={
"version_scheme": version_scheme,
"local_scheme": local_scheme,
},
setup_requires=["setuptools_scm"],
...
)
This will mean that every git commit you do will become a hash: this means that unless you
ensure that every commit happens alongside a twine
upload, there will be gaps in the version
numbering history.
To release a version now requires tagging a git release, and because of the way this was set up
you need to make sure to just tag a major and a minor (as the micro is appended by
setuptools_scm
), e.g.:
git tag -a v1.0 -m "Releasing version v1.0"
-a
stands for annotation, i.e. the label of the tagged release-m
stands for message, i.e. the description accompanying it (like a commit message)
Now when we run python3 setup.py --version
we get 1.0.0
(the major and minor from the tag,
and the micro from the version.distance
i.e. it is the 0'th commit since the release).
As I understand it then, you must build the package before you commit it, as otherwise the hash will increment. So first build and upload, then git commit, then push the tags
Warning: it is not recommended to use
git push origin --tags
instead you should provide the name of the tag, so in this case
git push origin v1.0
Note that this does not involve a commit, it only sends the tags to GitHub (which then shows up under "releases").
(Note that you don't need to do anything further, the tag will serve as the release).
Unfortunately, as was noted at the start in the comment on StackOverflow, this solution will only
work for sdist and not for bdist. This is not seen when running setup.py
but is seen when
running pip install
(retrieving the binary wheel made as bdist
)
The solution here is mentioned in the same Q&A (here),
to also add include_package_data=True
in the arguments to setup.py
,
which (like setuptools_scm
) "aims to include files from version control".
And with that, pip
now installs from PyPi with the data files I was wanting, without needing to specify
them in a MANIFEST.in
.
In conclusion, the setup.py
call to setuptools.setup
should include all of the following
arguments (plus the standard ones):
setup(
...
include_package_data=True,
use_scm_version={
"version_scheme": version_scheme,
"local_scheme": local_scheme,
},
setup_requires=["setuptools_scm"],
install_requires=reqs,
...
)
...and with that, pip install
will give a working package, complete with all version-controlled
data files, without needing to manually include them into a MANIFEST.in
- A nice example of using
MANIFEST.in
can be found in Oxford Nanopore'sbonito
here- I think one purpose is to avoid shipping the demo notebooks, which aren't included in the manifest, but are in the repository.
- This library also has nice examples of a package that specifies commands to download
data (in this case the most recent version of a statistical model)
- See also spaCy's implementation which gets quite complicated
as it includes extension modules, but finishes with a
setup
call. Here there is also aMANIFEST.in
(as well as asetup.cfg
and apyproject.toml
)
- See also spaCy's implementation which gets quite complicated
as it includes extension modules, but finishes with a
Note that for projects which should be handled more strictly, it would be possible to
remove the prerelease (dev
) micro version part processing function and solely
base the version on the tagged version number.
- This would mean that a new tag would be required for version incrementation
- Since git commit hash would no longer be incrementing the micro part, it would not advance with each git commit
- Instead you would need to explicitly run
git tag
to increment, which is more precise - Also it would not leave gaps between version numbers like the commit hash based method
- Each tagged release would be more meaningful if it contained multiple commits
- In this case the project is entirely for dev purposes, and
pip install
requires--pre
to install pre-release versions hence stripping them out of the micro part
- In this case the project is entirely for dev purposes, and
- To be clear, this would mean that a new version could not be uploaded to PyPi
without performing
git tag
, which is not that much different to having to manually write in the version number into thesetup.py
file each time, which was pretty much why I decided to use auto-versioning in the first place...