SoftwareArchitecture - LeFreq/Singularity GitHub Wiki

Someone cut my cocaine with Sani-Flush and the following is speaking with authority without integration of the whole soul.

Perfect software architecture requires mastery of epistemology -- the study of how we know things -- not engineering. When software programmers are approaching a problem, they are approaching a megalith, of sorts -- an (often) giant system of order--the internals of which are unknown. A programmer must grok the whole substructure of a problem, break it down, in order to master a given problem. Once they know it, they have arrived at the perfect architecture for their program that embodies the underlying wisdom which made the original system of procedures in the physical world. Until then, they must refactor mercilessly (eXtreme Programming maxim).

STUBGood architecture reduces the complexity and size of your code base, partly by simply moving oft-used functions to the Operating System. Things like record-locking (keeping other processes or threads from accessing the same part of a storage device), network queuing, writing to a screen.

XXX Good architecture starts with Encapsulation and labelling.

Good architecture mirrors the physical architecture of what you're trying to emulate in software. For accounting software, it's the structures of the bookkeeper, but have they mastered their domain? You have to ask if others do it like they do (is it standard?), etc. So what is the physical architecture of an operating system: the manager's job. Most managers haven't mastered their problem domain, so how will an OS?

Taking a note from Shannon's information theory and Kolmogorov's algorithmic information theory, we can quantify knowledge. After all, what the program represent is knowledge: mastery of the problem such that you can quantify every minute detail into a working program. We just have to get rid of the negative sign.

Much like the game of 20 questions reduces the time to find an answer out of millions of animals, it is a log function. The precise 20 questions out of all possible questions represents the wisdom that has been honed from, by, and into your programming architecture (not the engineering, mind you). The formula (assuming a CPU with no function call logic) is:

Ideal_LOC = CPU_translation_factor*log2(unarchitected_LOC)

The CPU_translation_factor is an expression of the power that the particular CPU offers in op-code architecture -- services like iteration and call stack support in hardware (giving simple assembly instructions). As these services become greater, this factor becomes closer to 1. The Intel 8088 I estimate to be about 100 CPU_translation_cost. At 2020AD, I suggest the factor is about 20.

The reasoning of the equation is that you can roughly divide your monolithically-engineered code in two until you get to the cognitive structures which created the original problem. Engineers approach the problem from the other direction, but architects approach from the top -- this is the only way to show true mastery of the problem. Oftentime engineers are doing ad hoc architectures, creating structures like a data table to hold records of a consistent form, or even a Rational object to hold precise numbers, with enough time in the wild these evolve to become common tropes, reused for many problems. But real architecture decides you need a query language, for example, to interface to a database. All of these high-level ideas are re-useable and can be placed outside the application in the larger data ecosystem.

If you aren't near this number, you probably haven't understood your problem domain completely, your programming environment is inadequate, or you're simply too lazy to rewrite your code.

If you have a 100K compiled executable, you should be able to achieve 20 * log2(100000) ~= 340 lines of code. We ?only have 2 digits of precision? in our magical constant, so we round this to 2 digits of precision. 340 seems impossible, but consider that the equation assumes that you can push functions used by more than 1 application, upwards towards the OS, continuously minimizing what you need for your application. (See Gregory Chaitin's Omega Number and Algorithmic Information Theory.)

The equation is asymptotic to a theoretical ideal. In other words, this assumes that you are converging to an optimal destination of a perfect language, plus a perfect operating system. A perfect operating system is built such that programmers never have duplicate code (across all applications). If you have two apps that need record locking for write-protection, for example, you put it into the operating system so the app only has one line of code (the function call) for the operation, given by the OS, of course. (The heuristic is actually three, but if you have a generalization you can think of, you put it in a separate function, rather than duplicate the code.)

/*The equation shows that you actually add more code with small code bases, in anticipation of the generalized constructs you'll be using and to make explicit the otherwise hidden conceptual constructs you're using when writing programs. This scaffolding costs a marginal amount of code. These are things like named functions or an object definition. 500LOC = ~896LOC of well-architected code, the cross-over point is actually around 1000LOC. Like I said, this number assumes you have grokked your problem domain in order to architect the code, completely. */

XXXThis formula is for undocumented, uncommented LOC counts. For documented code, you may have to multiply the result by 2 to arrive at the number of lines you should have. That is probably about the healthy ratio of code to comments/docs; that is, one to one. But this is assuming poor variable naming, undocumented architecture, and poor function names, and/or bad coding. Good code needs no comment.

But if your poorly architected code-base is 25M LOC, you should be able to reduce that to about 5,000 lines of documented code. That's the difference between architecture and engineering -- two entirely different specializations, but each must use the other to make good software. These numbers assume your OS is helping with your engineering, giving you record-locking, network scheduling, or whatever services would be used by many apps.

Pretty cool, huh?

Here's another piece of architecting wisdom. There is no function or object which should need more than 4 parameters (except such meta programs, as compilers). If your program wants more, you haven't architected it properly. A set of functions for drawing polygons, for example, should be generalized to taking a list of (x,y) points or make your points a separate structure passed at once.

Lastly, most code uses re-useable constructs -- that is what the formula above is about. The estimated amount is about 50% of your code (if you're doing something novel -- not common in the mundane practices of the business). Most mathematical formulae, for example, relates to something in the real world, allowing generalization of functions that might be embedded in your own.


NOTE: If you don't have an OS, but are on an application-specific computer, you can't move functions towards the OS. So how does this equation operate in such instance?
(*) The actual term is "epistemics" as written on the wikiwikiweb -- the understanding of knowledge from it`s root: data. Interesting with OOP alone, the formula is about 1000 log2(LOC). With epistemics, it is 100 log2(LOC), because you go deeper to the core.
(**) The simplicity of that equation might be misleading: it is a revolution for Computer Science in itself.
⚠️ **GitHub.com Fallback** ⚠️