kolmogorov quotient - LeFreq/Singularity GitHub Wiki

XXX This text is out of date -- it need to separate language theory with code elegance.

Quantifying language expressivity (or succinctness) or "Ending the Language Wars", is derived from work by Andre Kolmogorov who invented the field of Algorithmic Information Theory. The basic idea is that you can quantify the complexity of something by measuring how BIG a computer program you need to reproduce that complexity.

Let's say you want to explore the notion of quantifying the amount of expressivity a programming language provides. That is, the amount a high-level language is able to make the complex simpler via it`s language constructs. Then, you can use Algorithmic Information Theory to measure this expressivity.

Note: this is separate from the elegance of your own code. See the page on elegance for that.

We'll define this idea of "simplification" as a factor of text-wise reduction (fewer characters needed to express complex concepts (a sort of reversal of Algorithmic Information Theory) or, in literature, the equivalent is the use of sophisticated words that say a lot in a few characters without increasing ambiguity of meaning) and another, less easy-to-quantify concept of maintainability (note: changeto architecture.

Fleshing out this latter concept, it is clear it has to do with how easily one can establish programmer consensus for the given task; i.e. how many programmers of the language would put it back the same way you've expressed it or otherwise agree on the best implementation of the problem.

I will define the Kolmogorov Quotient so that higher numbers for a given language denote a reduction in the complexity (i.e. greater amount of succinctness) of solving the problem in the given language.

Once the basic premise and a methodology above is agreed to, it is only a matter of a rough constant of difference for any specific implementation/architecture. That is, as long as the architecture is the same across all measurements, the number should be valid and comparable between languages.

But it could be implemented something like so: choose a machine's Assembly(ish) language and measure the amount of bytes of machine-code output to perform a standard suite(*) of common, non-threaded programming tasks (base_language_count). Then code that exact functionality in the language you are wanting to measure (without using external libraries) and count the number of bytes of source code you used (call it: test_language_count).

The expressivity of your language, then is base_language_count / test_language_count.

Since we're adding maintainability into the mix, we have to define a second factor.

The elegance of your code, related and dependent on the expressivity of your language, is (2/#_of_equivalent_programs). The closer this is to 1.0, the closer it is to its ideal elegance. The kolmogorov quotient then combines these:

KQuotient = elegance * expressivity = (2/#_of_equivalent_programs)*(base_language_count/test_language_count)

Also, base_language should probably be machine code and the number of bytes generated, while test_language is the number of text (source) characters, also probably in bytes.

One should also record, for discussion, number of "equal programs". By which I mean, the number of programs fluent programmers of the language agree that are the best and equivalent solutions to the problem. (further: "equivalent solutions" are those who's output is the same for every input.) Languages which have a large number of equal programs say something interesting about either the programming language or the types of programmers the language attracts. What it says, I don't know.... :^)

The first ratio should always be greater than 1.0. I suppose one could game the metric by designing a language with single letter keywords, but those shouldn't count.

(*) "standard suite of common programming tasks...": I see two main categories:

  • Data-processing suite, limited to simple text I/O (computation towards the machine)
  • GUI suite (computation towards the user)
The purpose of this idea is to end the tiring language wars about whose language is the best. By giving a quantitative metric, people can at least argue better.

Will all languages eventually converge into the same language (apart from a few syntactical flourishes)? My analysis is yes, for a given computational architecture (like VonNeumann). This language I have called Prime, but it could probably be done in assembly with parametized macros (replicating functions in languages), so perhaps it's better to call it the architecture it's made for: "VonNeumann".

The perfected language, then, for any particular architecture, should be called its "prime" language. An asynchronous architecture would have a different prime language than a VonNeumann one, etc.


The metric for "text-wise reduction" should incorporate a constant factor based on (non-identifier) symbols used in the language and source text. These factors will be the same across all languages implemented (i.e. designed) for a given architecture (e.g. VonNeumannArchitecture vs. Symbolics) and will be a measure of the significance of the symbol; i.e. the topology (homology?) of the language tokens. (These terms, alas, will also need to be fleshed out and defined.)
The concept assumes that the source language and machine language are using equal word-size of 8 bits. If not, the formula needs adjusted to maintain the ideal of a 1:1 comparison between machine language and source text approximate equivalency of "chunk" size. ____ You could probably shave off extra bytes on your keywords so as not to be penalized by making your language readable, down to 1 byte per keyword (if you have more than the letters of the alphabet, you might have to count 2 bytes per keyword, etc.). But then, should you be rewarded for wordy keywords? Isn't PROC an equal and unambiguous substitute for PROCEDURE?
The formula will ultimately gravitate towards an architecture's prime language.
⚠️ **GitHub.com Fallback** ⚠️