Sunday, July 27, 2014

UNISTACK - a form of Intellectual Property Compression


This post is a continuation of NFX / Aum Cluster - The Philosophy I am trying to outline an effect of intellectual property compression - an attempt to reduce system complexity

Some Definitions

What is intellectual property? - It is everything you have in your mind and disk - (in no particular order) design docs, data, state, code, architecture, pattern, know-how. "Information" - is the most general term. As we all know information takes space to store and time to process, unless we use....

Data Compression - according to the Wikipedia article: "Lossless compression reduces bits by identifying and eliminating statistical redundancy" and "Compression is useful because it helps reduce resource usage, such as data storage space or transmission capacity". Keep reading the article and we see that "Because compressed data must be decompressed to use, this extra processing imposes computational or other costs through decompression; this situation is far from being a free lunch. Data compression is subject to a space–time complexity trade-off". We know that it takes time to compress and decompress. If we take the analogy of byte stream compression and apply it to the stream of "unified concepts" that UNISTACK promotes we can conclude that the compression time is actually paid for by the creator of the UNISTACK, it is our pain to create a good dictionary of solutions, so compressing "a file"(a particular project) one does not need to create prefix codes and build Huffman trees. On the other hand, unlike regular ZIP files, the decompression is really not needed for the end product to operate, to the contrary it will operate better in the "compressed" form. This is because the compression is performed in the different domain of information - in the conceptual domain, not in the binary domain on a particular machine.

IT Chagrgrgrgrin

I think that the software industry is the place of the most speculation and term misunderstanding/misuse. Take "MVC", for example. What one calls "M" may be easily re-interpreted as "V" by someone else, i.e. "M" for UI developer may really be a data projection - that is a "VIEW" for DB developer, because for DB developer normalized tables and domains that they rely on are models, but VIEWS are views. It is all tautological.

Why am I bringing this up? I am sick and tired of "configuration" frameworks. When I talk to people I hear "have you seen this?(CF Engine/Chef/Puppet/Zookeeper....". This is exactly the same kind of confusion that I described in the prior case with MVC. What does "configuration" mean? It turns out that for most in the IT operations it means "installing updates and patches", for developers it means "reading XML files", etc.. But what does it really mean? See, there is no "really" - it all depends on who you ask, BUT at the end of the day what matters is, how MUCH $$$$ have you spent int both literal monetary sums and time-wise to build you product that does XYZ, and what are ongoing costs.

UNISTACK = Compression

The concept of UNIFIED (Software) STACK is to cut all of the costs/times without sacrificing features. This is achieved by decreasing the number of standards that the systems is built against. The standards mean: languages, runtimes, libraries, components, frameworks, file formats, DB backends, etc...

Below I have compiled a typical Q/A that I usually have while talking about our product.

  • Different languages do different things better. Why do you insist on using "one language"? - I do not disagree that different languages have better internal models for handling specific tasks, i.e. take Erlang for its concurrency model. But this is all very superficial view. If your project is large you will introduce more troubles by employing more and more languages as complexity will keep growing. Of course you can use some language for some particular system component, but then you would need to build an ecosystem just for that.
  • If I use Python for some backend business logic and NodeJS for web, I want to use some Erlang for chat server, why not? - You can use anything you want, you can even write a QuickBasic stdio redirect into web server, the question is WHY? The problem in today's world is that those very many languages are out there, yet, none of them really have a solid cluster-forming concept, maybe Erlang does to some extent, but those languages are just bare. In C++ you can not think fault tolerance in cluster, just not what the language was built for.
  • But what do you suggest? That's why we use different tools. - Everyone does, and complexity keeps increasing. There is a fundamental flaw in the approach to problem solving. Different disjoint frameworks introduce complexity when the whole system needs to work in concert. This all happens because there is never enough time to do things the right way from day one. It is an illusion, that by 'somehow making it work' regardless of 'how' you can save a lot. Maybe you can as a consultant who is interested in hourly rates and 3 month projects.
  • So what do you suggest? - I suggest that you take good general purpose programming language/runtime (not PHP) and write all components from scratch according to one simple and elegant blueprint, and that blueprint gotta have cluster in mind from day 1. As the result you will have 10-fold reduction in number of edge-cases in your system that you need to support today, from Date parsing in JSON to configuring your nodes from central servers.
  • This is too good to be true, how long will it take one to write "everything"? - in practice it will take 20 + years to refine concepts especially for cluster programming. You would need to write a good 10+ large-scale client/server web systems from scratch, each containing 100s of screens to get the experience and get rid of the fright. At the end of the day it would become apparent that the majority of software development frameworks today are created/coded by people who really lack experience building large systems all by themselves. Modern developers really lost their software engineering qualifications, that's where the term "developer" came from - landscaping. Nothing needs to be developed in software - a compiler does that for you automatically. Software is all about architecture only. If there is "development" work in your project then it means that you are not doing it the right way. There should be no repetition, only architecture and repetition only comes in binary files made by compiler. Of course, there are some edge cases, like government forms and some other sorts of corporate chaff, but that is not software engineering.
  • Some real numbers? - how many languages do you need to know in your project? 3? I need one. How many config file formats do you need to remember/document? 10? I need 1. How many entities like classes/functors/namespases do you need to work with? 1000s? I only have < 100. And the most important thing, there are no 3rd party libs, no software/license updates/ tiny footprint/self-reliance. Let's assume that there are 200 servers to deploy your system on. How many web servers, how many job, how many DBs etc. How do you deploy changes? I press one button, I monitor one screen with ability to drill down anywhere I want. In other words - had it not been for physical hardware that fails every now and then (and requires attention), one can use UNISTACK to build Google-like system that will be supported only by a few people, no matter how big it gets. That's the real practical power of UNISTACK - unified stack of software for everything you need. It is an effect of intellectual property compression - akin to ZIP - you decrease the information chaos by creating a dictionary of repeating patterns and re-use them. For that, one needs to operate on a homogeneous stream of information, say "concepts" (ZIP operates on byte streams). UNISTACK provides a stream of "unified concepts" so our process(ZIP) can compress them.
  • But is this not what all other libs are for - get things done for you? - they are, but it is the question of the scope. Most frameworks are very narrow-minded (i.e. take Hibernate - and you will get into 1000s of classes just to execute a simple SQL), we are talking framework that does everything all other common frameworks do - in the same way. Of course we do not do things like speech recognition or 10-dim differential equations, there is no need for that, but there are 100s of general-purpose functions that are just not unified across myriads of frameworks. For example, the moment I use some 3rd party control it requires log4J, but I use another logger, so now I need to support 2 completely different logging solutions in the same process. And this applies to every single function you app needs. Its like misspelling your customer's name in 10 different ways on the same page!