Test automation - sympy/sympy GitHub Wiki

Our testing system is suffering from a mild case of feature creep and everybody working on his pet project, with no consensus about the end point where this should be going.

So this page is intended to collect final goals. Once they are implemented well enough that all final goals are visible in the test code, this page can go away.

This page is currently in "brainstorming mode": Add new ideas, but do not judge, modify, or delete ideas. Every idea should get its own paragraph and be signed off with the author's name so we can see who has an interest in what areas, can ask for clarification, etc. (Evaluation and consolidation will come in later phases.)

Goals of testing

All features should be tested. (asmeurer)

Any bug fix should be accompanied with a corresponding regression test. (asmeurer)

Examples should be tested, to make sure they are up-to-date. However, one should differentiate between tests, which are intended to make sure the system works, and documentation, which are intended to demonstrate behavior to users. Doctests should be thought of as documentation that happens to be tested. (asmeurer)

The default run of the tests should finish in a reasonable amount of time. (asmeurer)

But stress testing is also important, so there should be a (non-default) option to run slower tests, which will take longer. (asmeurer)

In most situations, we only need to run the tests in one or two environments, unless we specifically expect a failure in one. Only when we release do we need to make sure that everything works everywhere. (asmeurer)

master should be tested all the time on all supported configurations (that's what "continuous integration" means, doesn't it?). Otherwise, many of them are broken and releasing becomes such a chore that we never do it. (rlamy)

Desired properties of changes to the testing infrastructure

Each change should have concrete advantages. (toolforger)

No feature should require long-term commitment in terms of manpower. Rationale: SymPy is about doing symbolic math; our primary expertise is math, not testing infrastructure; we don't want to tie up manpower resources with infrastructure maintenance, the infrastructure should "just work" as far as possible. (toolforger)

Each change should have a clear implementation path. (toolforger)

Environmental variations that tests could/should run under

Python version: all versions that SymPy advertises as supported, i.e. 2.5 to 3.2, excluding 3.0. (toolforger)

Ground types: Python, gmpy. (toolforger)

Operating systems: Windows, Linux, Mac OSX(?). This is mostly to avoid imposing OS constraints on contributors, though plotting may need to be tested across operating systems. (toolforger)

Also, sometimes tests fail only in Windows, for example, due to subtle differences in Python. (asmeurer)

Processor (Intel, AMD, maybe more). This can be relevant for tests that involved floating-point calculations: the least significant bits can vary. This is because IEEE isn't as completely defined as one might think, and if you offload floats into the GPU you don't even get IEEE guarantees. (toolforger)

I've never seen processor specific test failures, but I suppose it's possible. (asmeurer)

Word size (32 vs. 64 bit). Python's implementation of hashes will put entries into different slots, which means that tests that directly or indirectly rely on the order in which entries come out of a hash will find different results in 32 and in 64 bits, causing spurious error messages. (toolforger)

We need to test all optional dependencies that SymPy depends on. See Dependencies. Probably not as important is testing the existence of all combinations (unless we have the manpower to do it). It should be good enough to test pure Python (no dependencies installed), and all dependencies installed. (asmeurer)

Run the tests with the cache off. Note that this takes significantly longer than running the tests normally, so it would have to fall into any "slow tests" category. (asmeurer)

Run the tests in alternate Python implementations, such as PyPy, Jython, and IronPython. (asmeurer)

Proposed modes of operation

Background testing for contributors while they are working on a branch: Whenever a file is modified, re-run all tests that may be affected by it. Finding out what file depends on what other files is nontrivial, so maybe this can't be done. (toolforger)

Background testing for contributors while they prepare to upload to a pull request: Provide a script that clones the workdir, runs a full test suite. Reports back in case of failures, pushes to the pull request on GitHub in case of success. (toolforger) Such a script already exists. See SymPy Bot. (asmeurer)

Full testing for reviewers: Provide a script that runs a full test suite on somebody else's pull request. Failures are uploaded as comments to the pull request. (toolforger) Ditto. See SymPy Bot. (asmeurer)

Full testing for project admins: Provide a script that merges a pull request to sympy/sympy, runs the full test suite, and either reports failures as comments on the pull request or commits the merge and pushes it back to sympy/sympy on GitHub. (toolforger) I don't see the difference between this and the previous point. (asmeurer) Automated pushing to master in case the tests succeed. You can run the thing overnight then. (toolforger)

Pre-release testing. (toolforger) Take a look at New-Release for some additional things that need to be tested at least before a release is made. (asmeurer)

Modes of operation that do something in case of success should be restrictable to doing nothing but reporting success to the person that started the test suite. (toolforger) You mean like ./sympy-bot -n? (asmeurer) No idea, I never looked too deeply into sympy-bot. If sympy-bot has that already, all for the better; we're collecting bullet points here. (toolforger)

Tests could be run locally on the developer's machine, or remotely on a testing server. The latter case may actually be the domain of some CI software that we shouldn't write, we should just make sure the test suite interoperates well with the CI-provided environment. (toolforger)

There should be a mode that runs the tests most likely to fail first. (toolforger) Interesting idea. Also note that running tests out of order can sometimes affect the outcome. (asmeurer)

There should be a mode that keeps as many CPU cores busy as possible. (toolforger)

There should be a mode that allows running tests in order or reverse order of expected running time. (toolforger) Or random order. (asmeurer) That's sort of covered by splitting single tests out into separate processes, but yes random order could be useful just for the principle of the thing. (toolforger)

There should be a mode that runs an individual test that does not have differences in the command syntax or result output depending on whether it's a doctest or not. (toolforger)

Task ideas; concrete changes to be considered

The test suite should have a mode in which is simply enumerates all tests. An outer layer can then schedule and prioritize individual tests according to various criteria. (toolforger)

If we return to upstream pytest (something that's underway right now), we should probably change sympy-bot to call into pytest instead of doing its own algorithms. (toolforger)