Proposal - BrentBaccala/Singular GitHub Wiki

Proposal by Claus

O.: GENERAL TODO: more documentation for newbies and make it easily available for them? Collect FAQ?

The goal

the structure of singular (file system and logic) needs to be easily transparent and extensible. In particular for a (new) user, it should be "obvious" where things are

O.: mostly done

the development cycle (edit-compile-run) should be as fast as possible as our time is expensive.

O.: done

it needs to be easy to produce a release (where easy might still mean a different configuration)

O.: done

a wish/observation: the build system & navigation should be as simple as possible. The more "tools" are used, the steeper the learning curve (e.g. trivial example grep vs git grep, currently, due to lack of documentation, my students nearly despair over getting new files into the system. Result: unnecessarily long single files, just to avoid the build system)

O.: our build system is as simple as possible considering the requirements posed on it. Reading wiki or asking the kernel team members may be helpful for newbies.

the (re) structuring needs to be "cheap", i.e. we cannot spent too much time on this. Our main goal is mathematics, not software engineering.

O.: any restructuring is expensive. Singular restructuring have been mostly done already (save for /Singular/) and starting it anew will cost a lot.

Considerations:

what is the market? i.e. how many users need the ability to link/run parts of singular? It is worth our time to support that? Should it be possible to build w/o factory? Big External dependencies (python, gfan, ...) are different, here there is a point, although I am not necessarily convinced that dynamic modules are worth the pain. If adding stuff to the interpreter would be easier, most of the internal modules might vanish.... However, we have them, so we should keep the option.

H.: Beside the main Singular, parts are used by:

SAGE/GAP: libsingular.so

Macaulay2: factory So we need at least the possibility to provide Singular as a shared library and have factory as a (mostly) separate part of Singular.

C.: Fine - but this is going to be tricky if factory would start using Singular for number field arithmetic - which should happen as soon as we've the number fields up properly. They will be much faster...

O.: IMHO in oder to do that factory should be generically extended to be able to deal with arbitrary numbers which may require to rewrite quite a big part of factory... when will it pay off?

H.: On the other side, dynamic modules: syzextra should be integrated in the main part, gfanlib can be integrated, but the interface to python should not be part of the main Singular (or do we really want a complete python interpreter and library there?),

C.: We don't

H.: and polymake cannot be part of the main Singular (because of its internal design). And if we have the possibility of dynamic modules

C.: Fine.

H.: anyway, why not use it also for development etc.?

C.: I'd like for development to be as close to the final thing as possible, otherwise, the interpreter interface won't be tested (as it's not used). The experience in KaSH/Magma is that only stuff that is put into the proper place with external bindings (interpreter) is actually tested properly.

I've used system, but would argue for a easier way to do "proper" functions, developed as part of the project and there to stay.

O.: AFAIR "dynamic modules" were introduced in order to simplify development and ease the maintenance. > > > The plan was to turn most of Singular parts into separate "dynamic modules". Moreover currently "dynamic modules" can be built into Singular if necessary. Also one tests them by loading them into a (once) compiled Singular without rebuilding anything else. Thus there is almost no difference to kernel functions from the user point of view. After some initial environment setup - developing "dynamic modules" is much easier than hacking the Singular internals directly (which also will be difficult to get into the official Singular in contrast to a separate, independent and self-contained like a "dynamic module").

a re-structuring in the interpreter should make it easier to interact with it. Currently, while not difficult, it is non-trivial to add new functions, hence there are not nearly enough of them. The easier the process here, the more complete the interface is going to be.

O.: interpreter is slow and not to be changed much. But it can nowadays be extended via "dynamic modules"! AFAIR one of the former goals was to make it possible to use Singular features from something more robust and familiar to end users (e.g. GAP, Sage, Python, M2 etc.)

testing: form my point of view, the product is the Singular executable, not the library, hence I'd do (only) script-testing. Reason: * target is the interpreter, not the library

O.: wrong: consider the Sage and GAP direct interface!

* for the interpreter, we have well documented testing procedures * non-trivial data-structures (interesting rings, interesting crings, interesting ideals, ...) are a pain to construct in c, hence the test in c will mostly cover easy cases.

O.: we are just not there yet :( IMHO it is pretty easy to write ("slow" but trivial to construct) C++ wrappers numbers/polynomials/ideals/modules (mostly for unit-tests).

* it adds the bar for the new developer. I see the "typical" scenario as follows: 1. a student writes a new algorithm for Singular 1. for easy testing and development, the algorithm is made available through the interpreter as a benefit, at the end of the project, the student may or may not "finalize" it: write documentation and tests.

H.: Documentation is a strong requirement - I would not accept any (nontrivial) code without, it would become useless in some weeks.

Writing tests in c at this point will make it not happen.

O.: our internal unit-test make sure that all the binary interfaces behave as expected and there are no hidden dependencies between code-units. Unit-test are not required in "dynamic modules" which is (IMHO) the only form of external contribution by external developer that has anything to do with Singular internals.

I'd rather put many more functions in the interpreter (maybe reserve a namespace for "internal" or "testing" only to keep users out? and then test this way.

H.: It is there: see Singular/extra.cc

jjSYSTEM: the offical part

jjEXTENDED_SYSTEM: the internal functions

C.: I'd call this "hacks" rather than functions. The reason to make then into proper functions is to also test the user interface: frequently I discovered when trying to use my (brilliant) new function that the interface is orthogonal to most needs, so I'd rather have the interfaces made easier instead of using system (which was very handy).

Obvious exception: python, gap and similar bindings.

* modularization is important - but not as a goal in its own. We need clear, well defined, mathematical interfaces for the modules. In the long term, the only things that is to guarantee the consistency of Singular is the mathematics. That implies that "hacks" that make sense somewhere to "speed things up" are very suspicious. Almost always, there is a way to get the same result with proper mathematical interfaces.

* interdependencies: I don't think to reduce them is particular important. It is important to know about them and document them, but in the long run, all non-trivial algorithms in any area are going to use a vast amount of the system, thus will depend of "everything".

O.: We do NOT want to go back to legacy Singular 3, where most of its parts dependent on most other parts even on a-priory unrelated ones, right?! Among our important goals was to define concise code-units as well as precise interfaces between them. That we did (partially) via modularization and dependency resolution.

Exception, possibly, integers, FF_p small p, memory management.

Although, I claim, even here the dependency is here: certain algorithms make only sense if the underlying integers support some runtime behavior: e.g. Prod a_i for a_i integers iff integer multiplication is sub-quadratic, then Prod a_i = Prod a_2i * Prod a_2i+1 offers a recursive algorithm that is much faster in reality. If the integers are classical, this is only unnecessarily complicated. So, top-level algorithms are optimized for behavior that is (usually) NOT defined in the interface, thus changing the underlying integers will have interesting results...

O.: any such internal knowledge should be expressed in form of a precise interface if it may be necessary in some use cases (e.g. in some top-level algorithm) and not be simply assumed. Otherwise maintenance will be a hell of a job :/

The same argument is easily made for the memory management: singular is optimized for omalloc. Unless a new manager has the same or similar runtime behavior, exchange is difficult.

O.: not true: omalloc is only responsible for Singular performance in dealing with waste number of small memory blocks of the same size (Singular terms) but can be replaced by xalloc if necessary.

This scales up to all levels. Whenever a crossover point for different methods is used, it will depend on the current state. Changing anything fundamental will imply a check of everything above. Possible exception: the rare case that s.th. is always faster (if this exists at all).

It used to be very important in larger projects to reduce dependencies and such as the linkers were slow, but now, that is no longer a problem... Also executable size is mostly irrelevant.

H.: Indirect executable size matters: with most modern CPUs the role of cache become more and more important, and caches are much smaller than the available memory. And with a larger executable the probability of cache misses increases. Example: Intel i7:

L1 Instruction cache: 32KB * L1 Data cache: 32KB * L2 cache: 256KB * L3 cache: 8 MB * TLB: 4 MB

C.: Well, I'd like to see this in reality: what counts is that the portion of the code currently run is in the cache, the rest of the executable is in general not even in memory at all. So claim is that a tight loop in a huge executable runs the same speed with the same cache misses as the same loop in a small executable.

C.: But this is (for me) essentially about remove complexity, here the complexity of allowing to switch functions off. This results in a large number of ifdefs and similar (which are probably not tested in all combinations)

O.: the "dynamic modules" may be dynamically loaded OR may be built-in into the final executable upon user command (via configure arguments) for almost no extra development/maintenance cost/complexity.

Proposal, 1st round:

a basically flat file system structure, modules are mostly defined by the directories. Admittedly, some of the modules don't exist at all so far... The list is NOT complete and there is NO intended order. There is NO judging of merit/ importance or anything else implied.

Singular/
    Sources/
        interpreter
        coeffs
        Z, Q, Z/nZ, GF(p)
        mult poly
        uni poly
        fin ext
        real
        complex
        doubles
        uni function field
        mult function field
        lin_alg
        cring/ ring
        ideals (in mult. poly ring)
        modules (over mult poly ring)
        modules (over Euclidean (c)rings/ fields)
        Groebner
        groebner walk
        fglm
        maps
        combinatoric
        factory
        numerical lin alg
        roots in C
        number fields
        orders in number fields
        IO
        OSWrapper
        ADTs (lists, hashes, sets, ...)
        misc (at least sorting of lists, ...)
    DynModules/
        blah/
            src/
            doc/
            tst/
    Omalloc
    Xmalloc
    NTL
    Tests/
        ideally a test file/ directoy per module
    Documentation/
        also, ideally a file/ directory per module

H.: If the documentation is at a different place as the corresponding sources they tend to diverge over time - ideally they should be in the same file.

C.: Indeed - however, then the file get really really large (especially with lots of examples)

H.: NTL should be external (like Flint and GMP)

C.: While I agree, I still have not manage to compile with NTL... thus my suggestion to include a version of NTL that is known to work in this path...

Each of the Sources subdirectories should produce (in dev mode) an individual shared library - with the exception of interpreter which should contain the executable. Thus, in development, a make in a single small directory is all that is required to test the new stuff. For release, one could create a libSingular as before and produce a statically linked executable.

I personally, would also like an include directory in Singular that contains all the public interfaces of the modules. Only reason: it is easier to just #include "blah.h" then to code #include "poly/blah.h" in particular, since blah.h might migrate to Singular/blah/blah.h if this turns out to be a rather sophisticated module (or due to refactoring). Internal includes should definitely reside with their modules.

O.: can NOT be dane as we currently have headers with similar or equal names in different directories. Moreover IMHO specifying precise location details makes it easier to read/understand/mention such a code for non authors, which IMHO pays off for a bit of extra coding. Also there are tools for easy mass-correction in case of moving/renaming much-included headers.

But then, as this is mainly laziness, I do not have any strong feelings there. One way or other, the difference between internal headers and external interfaces need to be clear.

General: I'd like to see way more static functions to break down long algorithms. Coding and documentation should be in english if possible. Documentation need to contain information about the algorithms, the requirements and the results, all stated in mathematical terms. (As an example: it took me quite some time and lots of questions to figure out the difference between Div, IntDiv and ExactDiv and I'm still not sure). If there are "easy" rules about naming conventions for files/directories, they should be spelled out in some easy-to-find readme. Again, my preference is for long names and no abbreviations unless necessary as my abbreviations will not be anyone elses. Using tab-completion, I hardly ever type them anyway.

We NEED documentation about the build system. It needs to be easy, even for relative beginners, to add a file/ directory to the system. Maybe even using a script: newdir, newfile. Otherwise people are strongly encouraged to add to existing files/ dirs just to avoid the creation of new stuff.

O.: Ok, but personally i strongly recommend to add new features via dynamic modules instead of kernel hacking. A build system for dynamic modules may be copied over from some existing one (e.g. syzextra) and updated to you case in a matter of minutes. Also this way we don't risk braking all of Singular...

The main moving of files would need to be done in a very small time frame as the required pull operations are going to be painful (the patch will probably not merge cleanly). Adding new directories later should be non-critical and patch friendly.

O.: yeah, we have such experience - it is painful.

It is not clear (to me) from the documentation (on Singular/ github) how to actually produce patches for singular, change singular in the main repository and I don't know how to change this. My background is for everyone to commit against one central repository (the svn way) which is not the git way. The git way has many advantages, but, as my students complain regularly, it comes with a steep learning curve.

O.: We prefer to merge tested pull-requests from personal forks as one has to create git commits anyway. I suggest that Mohamed gives a git-intro-talk to new contributors.