elegance - LeFreq/Singularity GitHub Wiki

Elegant code is a combination of good architecture (breaking down the problem and conceptualizing upwards into meaningful chunks) and good engineering (breaking down the problem and engineering it towards the designs of your hardware).

Quantifying language expressivity (or succinctness) or "Ending the Language Wars", derived from work by Andre Kolmogorov who invented the field of Algorithmic Information Theory. The basic idea is that you can quantify the complexity of something by measuring how BIG a computer program you need to reproduce that complexity.

Let's say you want to explore the notion of quantifying the amount of elegance a programming language provides. That is, the amount a high-level language is able to make the complex simpler via it`s language constructs. Then, you can use Algorithmic Information Theory to measure this succinctness of your language: how well it reduces the complexity of your code.

Note: this is separate from the elegance of your own code.

We'll define this idea of "simplification" as a factor of text-wise reduction (fewer characters needed to express complex concepts (a sort of reversal of Algorithmic Information Theory) or in literature, the equivalent is the use of sophisticated words that say a lot in a few characters without increasing ambiguity of meaning) and another, less easy-to-quantify concept of maintainability.

Fleshing out this latter concept, it is clear it has to do with how easily one can establish programmer consensus for the given task; i.e. how many programmers of the language would put it back the same way you've expressed it or otherwise agree on the best implementation of the problem.

I will define the Kolmogorov Quotient so that higher numbers for a given language denote a reduction in the complexity (i.e. greater amount of succinctness) of solving the problem in the given language.

Once the basic premise and a methodology above is agreed to, it is only a matter of a rough constant of difference for any specific implementation/architecture. That is, as long as the architecture is the same across all measurements, the number should be valid and comparable between languages.

But it could be implemented something like so: choose a machine's Assembly(ish) language and measure the amount of bytes of machine-code output to perform a standard suite(*) of common, non-threaded programming tasks (base_language_count). Then code that exact functionality in the language you are wanting to measure (without using external libraries) and count the number of bytes of source code you used (call it: test_language_count).

The expressivity of your language, then is base_language_count / test_language_count.

Since we're adding maintainability into the mix, we have to define a second factor.

The architectual factor of your code is (2/#_of_equivalent_programs). The closer this is to 1.0, the closer it is to its ideal architecture. That is to say, if there are only two ways to write your code in the best way, you've achieved something in your language or your code: reduced it down to the simplest primitives without compromising for one or the other. You can find me to understand why the number is 2.

Elegance combines these:

Elegance = architecturual_factor * engineering_value = (2/#_of_equivalent_programs) * (base_language_count/test_language_count)

Also, base_language should probably be machine code and the number of bytes generated, while test_language is the number of text (source) characters, also probably in bytes.

One should also record, for discussion, number of "equal programs". By which I mean, the number of programs fluent programmers of the language agree that are the best and equivalent solutions to the problem. (further: "equivalent solutions" are those who's output is the same for every input.) Languages which have a large number of equal programs say something interesting about either the programming language or the types of programmers the language attracts. What it says, I don't know.... :^)

The second ratio should always be greater than 1.0, otherwise it isn't any better than machine code. I suppose one could game the metric by designing a language with single letter keywords, but those shouldn't count.

(*) "standard suite of common programming tasks...": I see two main categories:

  • Data-processing suite, limited to simple text I/O (computation towards the machine)
  • GUI suite (computation towards the user)
The purpose of this idea is to end the tiring language wars about whose language is the best. By giving a quantitative metric, people can at least argue better.

Will all languages eventually converge into the same language (apart from a few syntactical flourishes)? My analysis is yes, for a given computational architecture (like VonNeumann). This language I have called Prime, but it could probably be done in assembly with parametized macros (replicating functions in languages), so perhaps it's better to call it the architecture it's made for: "VonNeumann".

The perfected language, then, for any particular architecture, should be called its "prime" language. An asynchronous architecture would have a different prime language than a VonNeumann one, etc.


The metric for "text-wise reduction" should incorporate a constant factor based on (non-identifier) symbols used in the language and source text. These factors will be the same across all languages implemented (i.e. designed) for a given architecture (e.g. VonNeumannArchitecture vs. Symbolics) and will be a measure of the significance of the symbol; i.e. the topology (homology?) of the language tokens. (These terms, alas, will also need to be fleshed out and defined.)
The concept assumes that the source language and machine language are using equal word-size of 8 bits. If not, the formula needs adjusted to maintain the ideal of a 1:1 comparison between machine language and source text approximate equivalency of "chunk" size. ____ You could probably shave off extra bytes on your keywords so as not to be penalized by making your language readable, down to 1 byte per keyword (if you have more than the letters of the alphabet, you might have to count 2 bytes per keyword, etc.). But then, should you be rewarded for wordy keywords? Isn't PROC an equal and unambiguous substitute for PROCEDURE?
The formula will ultimately gravitate towards an architecture's prime language.

This article is out-of-date. See expressivity.

The elegance of a program is a factor of bytes of code (shorter the better), completeness (does it handle corner cases?), and respect for the underlying hardware (shouldn't be thrashing the hardware). "Readability" is another, arguable, component, yet most elite programmers will probably agree, that if you can't understand it, yet it succeeds all of the aforementioned tests, the problem of readability is yours.

Since the number of bytes of code is dependent on the language constructs that your programming environment provides, it is dependent upon and related to your language's expressivity.

Elegance can be calculated thusly: 2/#_of_equivalent_programs. Since most systems are CPU-deficient and memory-rich, we can further refine this as the most CPU efficient...

The closer this is to 1.0, the closer code is to its maximum elegance.

One should also discuss what is meant by "equal programs". I mean, the number of programs fluent programmers of the language agree that are the good and equivalent solutions to the problem. Whether the output should be the same for every input is untested (there may be good solutions that don't include every extreme case). Languages which have a large number of equal programs say something interesting about either the programming language or the types of programmers the language attracts.


Side-talk for the newbies...

There is always at least two equivalent programs to a given programming task: one that optimizes space and one that optimizes time. Hence, the numerator is 2.


See also:
⚠️ **GitHub.com Fallback** ⚠️