Goals - nponeccop/HNC GitHub Wiki

HN/SPL is a joint project to develop a new kind of compiler which generates human-maintainable programs in common high-level languages from higher level specifications.

While compilers generating high-level code have been around for a long time, they generate obfuscated nonsense incomprehensible by humans, talk less of maintainable.

Our first goal is to reduce vendor lockup because of requirement to use a specific programming language X. Many reasons can lead to this requirement:

  • Libraries, frameworks and tools helpful in this projec are only available for language X.
  • Developers for language X are widely available on the market and they charge less.
  • An existing team knows language X better.
  • The project is to maintain an existing system written in language X.
  • Management believes in certain language stereotypes (“A system must be written in C++ because it must be fast”, “A system must be written in Erlang because it must serve concurrent requests and be reliable”, “A system must be written in Java because Java is for enterprise automation”, “C is the only truly fast and not bloated language suitable for low-level systems”, “Garbage collection/laziness/purity/runtime metaprogramming is for weaklings” etc)
  • A license agreement demands certain language and no overhead caused by translation layers

To realize this goal, we are going to research whether machines can generate code similar to code written by humans: identifiers and general structure of input program must be preserved in output program, functions and modules must remain short, scope hierarchy must not be too deep, language-specific idioms unusual to humans must not be used (e.g. using a custom functor, a custom iterator and std::for_each instead of a simple for loop in C++).

If such generator is possible, every developer or whole team will be able to use another language secretly from their customer or management and benefit themselves and whole company in spite of ignorant bureaucracy.

Our second goal is to please minds of those programmers and managers who believe (mistakenly or not) in supremacy of native C and C++ programs over garbage-collected and JITted bloatware of these days. We are going to develop a C substitute – a language that follows all basic principles of C language design, including memory and execution models, and integrates with existing C and C++ code so tightly so it will be possible to use our new language in all areas of current C dominance, including games, operating system kernels and drivers, databases, web servers, application servers and other middleware, browsers, IM clients, archivers/codecs, data recovery software, office productivity tools, media players, language runtime libraries, high performance computing and even embedded firmware with very tight footprint requirements.

If we succeed, all existing billions of C lines of code will benefit - it will be possible to incrementally, function by function, rewrite the critical parts of codebase to get the benefits of the new language without losing any of the psychologically pleasant features and facilities of C. Even if some members of your team worship C so passionately that believe that C cannot be substituted no matter what, you will still be able to pretend that you do all your work in raw C without any of these bloated modern day productivity tools. To reach this goal, we try to add several modern language features to our language without losing manual resource management, language simplicity, low-level memory access and other benefits of C its worshippers believe in:

  • Modern syntax in spirit of Javascript, Haskell and APL
  • ML-style type inference, first-class polymorphism, (almost) first class high order functions without any abstraction penalty – the output code is completely monomorphic and first-order.
  • No possibility to put type declarations – the source code looks untyped
  • RAII for resource management
  • Controlled side effects to ease optimizations by compiler and refactoring by programming
  • Memory access safety and strong type safety

So many billions dollars have been invested into C code generators that we believe the only way to compete with C in the areas it is traditionally strong in is to reuse its compilers available for all imaginable platforms. As we generate not spaghetti generated by other compilers translating into C, but idiomatic code C compilers expects from people, we expect to get full benefits of extremely mature optimizers in modern C compilers.

Our third goal is to attack language conservatism of mainstream development teams by reducing risks of using immature compilers such as our own in real-world commercial projects. We believe that to use a non-mainstream compiler in a commercial project, it must be either mature or tiny. Our current compiler is tiny – below 5 KLOC and is expected to remain under this boundary in future. To reduce the risks we offer:

Output code that can be maintained by humans if a fatal break in compiler is discovered at some point during the project. State-of the art compiler implementation relying on mature and trustworthy codebase for most of its tasks: GHC, UUAGC, HOOPL, Parsec. The remaining untrusted, immature and potentially buggy code is below 5 KLOC in UUAG-annotated Haskell and its maintenance and fixes can be easily included into your project schedule.

The idea is if an immature compiler turns out to be broken – you can either fix it yourself or continue developing in a mature language from the point where the problem was discovered. Usually experimental compilers are too large to make fixing the toolset during main development schedule economically feasible.

Our fourth goal is to design a tiny and highly modular research vehicle for language and compiler design.