Sunday, August 17, 2014

Aum Cluster or Cloud?

Disambiguation and some FAQ

What is a cloud-system? How does it relate to clustering? What about server virtualization?

There are many definitions, each bringing their own little flavor, leaving some confusion aftertaste. If you inoculate yourself against correctness angst, trendy words and "term juggling", become nonchalant and just look at facts then you would realize that there is an entanglement of a completely orthogonal ontologies.

Virtual Servers

At first let's get virtualization out of the game. One can build a cloud/cluster/large system without any virtualization whatsoever. The confusion comes from the fact that many service providers these days offer "clouds" as a dynamic sets of "virtual"(not real) logical computers that customers can create/delete/start/stop. Those computers are not real hardware, there are one way or another emulated as-if being real. Of course those "logical" computers run on some physical machines, but this behavior is transparent. The point is - "virtual/ization" neither does nor does not automatically make your application for-"cloud". Your app must be created for cloud regardless of virtualization.

Virtualization is good for cloud apps because it allows to dynamically increase/decrease your server usage by adding/removing boxes as you need them, thus you can better utilize your hardware, BUT if your app is not created to dynamically(at runtime) adjust its participating nodes/servers then virtualization benefits get diminished.

What are "Clouds"?

A cloud is just an abstraction of "somewhere on the internet". A cloud-system is a system that runs on some servers in some data centers, NOT on your laptop/tablet/phone, although local devices of course interface with clouds, they do not store your data - less headache! Contrary to what many believe, clouds do not have to have public access (such as Facebook or Twitter), indeed many corporations built their own internal clouds that only internal resources/devices/employees can connect to ( i.e. via VPN).

Server Clusters?

As demands grow systems end up employing many servers to do some job. i.e. serving database requests or building web pages. A cluster is a set of machines that appear as a single logical system that performs some specific task. The machines are usually tightly connected in a data-center and may even span multiple geographical data centers.

Am I in the cloud now?

Simple. If all of your personal PCs/tablets/gadgets get fried today, will you lose your data/software in question? If yes, then you are not in the modern cloud. Modern cloud systems give you this benefit - just remember your ID and password, and you can continue where you left off from any machine/point in the world. This rule is for general apps that are usually web-based. Of course there are special kinds of apps (like 3d games) that would require to re-install something on the new computer, but still you would recover all of you "state" where you left off before all of your devices got lost.

Do I need to use clusters to be in the cloud?

Most likely your cloud system does consist of some form of cluster software/hardware. But the answer is NO. One may create a cloud service out of many disjoint computers (that someone else may call a "cluster")

Do I need to use virtual servers to built clouds?

Absolutely not. Any cloud service can be created without a single virtual computer

What are "cloud-apps"?

These are applications engineered to run in the cloud. Usually these are systems that know how to deal with myriads of problems that do not exist in "regular"/local apps. For example, in cloud clusters there are many servers to deal with, how does the app get configuration/connect strings to other members of the cloud? There are 100s of questions that cloud apps need to address that local(or small client/server apps) don't care about.

Can I AUTO-convert (without spending time) my existing client/server DB app into a "cloud-app"?

If you still expect to have 5-10 active user then yes, no need to convert. Just host your current client-server app on something like Amazon, and nothing needs to be changed (except for some config files). On the other hand, it is not going to be what guys like Google, Facebook, Twitter call "cloud app". There is no way to auto-convert your client-server application into a scale-able web service that services 1,000,000 customers a day. You need absolutely different architecture for that.

To Summarize: Cloud systems are in the cloud (literally somewhere else). Clustering is just a way of sticking many computers together (either physically or logically). Cloud services are usually comprised of software and hardware clusters of all sorts. They run applications that were engineered with all crazy cloud system nuances in mind(and cost a lot of $$$:(). And finally, virtualization is not a necessary (although convenient for some) requirement to be in the cloud.

Aum Cluster

"Aum Cluster" is a software library/framework for creation of massive general-purpose computer clusters that may be used to create public/private cloud-based applications. The "general-purpose clusters" means - many computers that perform app-dependent tasks, for example, unlike Oracle Database Cluster, which is a strictly-speaking just a name for Oracle's database product. Aum Cluster is a library, which means - you build what you want with it, be it particle physics simulation or online e-commerce site.

The purpose of Aum Cluster is to address 100s of very complex software problems that arise in distributed systems, so its users may concentrate on business-specific tasks. For example: things like configuration of 1,000,000s of servers, discovery, peer name resolution, unique ID gen, replicated data stores, process management and remote control, security is all factored in.

What sets Aum Cluster aside from many "cloud systems" is the Unistack approach. Unistack is a unified software library that gets deployed to all participating servers thus reducing the complexity 10-fold. I have blogged about it before.

Aum Cluster can run on either virtual or physical servers. Virtualization has no real significance when you write your app.

To Summarize: Aum Cluster framework allows you to properly architect and build huge systems (with millions of nodes) taking care of 100s of complex issues that exist in any distributed system. It is like Google/Facebook/Twitter internal mechanisms made available to any application in a general way.

Saturday, August 2, 2014

NFX.Glue - Interprocess Communication

Definition + Features

NFX.Glue - is a part of NFX framework that allows developers to quickly (much faster than using WCF/RMI or remoting) interconnect/"glue together" various process instances. In one sentence: NFX.Glue is a contract-based state-less or state-full RPC mechanism that uses messages as logical delivery unit. The core implementation of Glue is probably less than 10,000 LOC (very usual for NFX), and adds roughly 300Kb to the final assembly image.

NFX.Glue Features

  • Very Simple - to use and configure
  • Built-in NFX application container, so can be Hosted in any app type without special "service hosts"
  • Contract-based programming
  • Injectable binding types define protocol/message exchange patterns (i.e. sync blocking/async/multicast etc)
  • Pre-implemented native bindings: TCP sync, TCP async, In-process
  • Native bindings allow for transparent serialization, no need for special attributes (unlike WCF or ProtoBuf), supports objects of any complexity with cyclical references
  • Message-based. Every call turns into RequestMsg, server generates ResponseMsg for two-way calls
  • Aupports MessageHeaders for extra data (i.e. security credentials)
  • Supports one-way or two-way calls
  • Supports multilevel message filtering/inspection (glue/client/binding)
  • Supports security - guard contracts/methods/classes with permission attributes
  • Supports state-less or state-full server programming with volatile process lifecycle (allows process to restart without "forgeting" its state)
  • Proxy Clients natively provide sync and async call trampolines without any extra threads or wait queues/reactors
  • Built-in channel/transport lifecycle management - impose limits on the number of outgoing connections per host etc., how long to keep idle channels alive etc..
  • Detailed statistics - number of messages/bytes/calls, call round-trip times per contract/method
  • Performance on a 6 core machine: ~120,000 ops/sec two-way simple calls (return int as string+'hello!') via native TCP sync binding

How NFX.Glue Works

A call is originated from a calling party, like so:
   var node = new Node("async://quad:7311"); 
   var console = new RemoteTerminalClient( node );
   console.Connect("Jack Lowery");

   Console.WriteLine("The time on connected node is: " + console.Execute("time");

   console.Disconnect();
Here, we have connected to machine "quad" using "async" for binding. The calling process has a piece of config that says:
 glue
 {
  bindings
  {
   binding {name="async" type="NFX.Glue.Native.MpxBinding, NFX"}
  }
 } 
So now, the Glue runtime knows that "async" is an instance of "NFX.Glue.Native.MpxBinding, NFX" (with about dozen of parameters like TCP buffer windows etc). The original contract for the service is this:
    /// 
    /// Represents a contract for working with remote entities using terminal/command approach
    /// 
    [Glued]
    [AuthenticationSupport]
    [RemoteTerminalOperatorPermission]
    [LifeCycle(ServerInstanceMode.Stateful, SysConsts.REMOTE_TERMINAL_TIMEOUT_MS)]
    public interface IRemoteTerminal
    {
        [Constructor]
        RemoteTerminalInfo Connect(string who);

        string Execute(string command);

        [Destructor]
        string Disconnect();
    }
It is a state-full contract that initializes server instance (a terminal connection, in our case) with a call to "Connect" and then either times-out after "REMOTE_TERMINAL_TIMEOUT_MS" or gets torn down by a call to "Destructor". In this semantic, constructor/destructor is just a special kind of method that does regular method work, possibly returning some parameters but also telling Glue what to do with the instance. The "LifeCycle" is a part of the contract not the implementation, because it really dictates what other methods a contract should have/not have. Pay attention to "RemoteTerminalOperatorPermission" which guards ALL methods of this contract. A user must supply a valid token, for this "AuthenticationSupport" is stipulated.

On the server we will include in config:


 glue
 {
  servers
  {
   server {name="TerminalAsync" 
           node="async://*:7700"
           contract-servers="ahgov.HGovRemoteTerminal, ahgov"}
  }
 } 
And then implement the interface like so:
    /// 
    /// Provides basic app-management capabilities
    /// 
    [Serializable]
    public class AppRemoteTerminal : IRemoteTerminal
    {
        public AppRemoteTerminal()
        {                                                            
        .....
        }
       
        protected override void Destructor()
        {
        .....
        }

        private int m_ID;
        private string m_Name;
        private string m_Who;
        private DateTime m_WhenConnected;
        private DateTime m_WhenInteracted;
        
        
        public virtual RemoteTerminalInfo Connect(string who)
        {
          ..........
        }

        [AppRemoteTerminalPermission]
        public virtual string Execute(string command)
        {
           ....... 
        }

        public virtual string Disconnect()
        {
            return "Good bye!";
        }
     }
Notice the use of instance fields

Lets look at the following diagram:

The call is made in the client code, and then it gets turned into a "RequestMsg". The client transport makes a "CallSlot" - a type of "spirit-less-mailbox"(no threads/events) that captures a request with its timestamp and unique GUID. At the end of the call, the server sends ResponseMsg if a call is not OneWay, and the response gets matched by the RequestID into the original "CallSlot".

An interesting part of this design is the "Binding" area - it controls the means of message delivery (i.e. TCP/IP/USB/COM/LPT or anything else) and the message exchange mode: synchronous or asynchronous. In SYNC mode the message gets sent and response gets delivered in one operation akin to TCP blocking sockets. In ASYNC mode we use completion ports on Windows to establish a bi-directional traffic channel per every single socket. Those implementations are provided in "NFX.Glue.Native" namespace in "SyncBinding" and "MpxBinding"(MultiplexingBinding). "MpxBinding", which is asynchronous by definition, the sending is orthogonal to receiving, what this means is that the physical TCP channel IS NOT BLOCKED for the duration of the call execution. For example, suppose the server needs 100msec to execute some method. One can post 1000s calls using the same transport via MpxBinding, the responses will arrive as they get generated by the server. Had we used "SyncBinding" instead, we would have needed as many TCP connections as currently pending calls, however do not question the need for "SyncBinding". Blocking sockets work with much-lower call-roundtrip latency in scenarious when calls are not frequent and not highly-parallel - for example local machine clock update done every minute via SyncBinding would work much better time-wise vs. async socket/message IO (+-few milliseconds difference). So, "MpxBinding" is better for throughput and tolerable latencies for many calls (1000s/sec), whereas "SyncBinding" is better latency for relatively-seldom calls (10s/sec).

A Few Q/A

  • How does this relate to ZeroMQ? - NFX.Glue is a Contract-based/object-level message passing system, whereas ZeroMQ is byte-message oriented. NFX.Glue is a much higher-level framework designed to work with higher-level constructs conducive to solving business problems
  • Is Glue slower than ZeroMQ? - it really depends on what type of "business payload" your app is pushing. The network part of Glue is as fast as ZeroMQ as it uses basic sockets and avoids buffer copies whenever possible, but please do not compare sending byte[4] with calling a method on a remote class instance
  • How does Glue relate to Erlang? - a similar answer to the ZeroMQ question above, Erlang works with much lower-level(than Glue) data primitives - tuples, lists and the like. One can not really compare the two technologies directly as building the similar feature set in Erlang would require a significant effort (add security, permissions, state management), and Erlang uses its own communication platform (OTP) very well, however it is still much narrower in scope than NFX.Glue. Take a look at NFX.Erlang instead if you need to support Erlang/OTP from NFX.
  • Does Glue replace completely WCF? - for us YES, 200%. The whole Aum Cluster is based on Glue, because all nodes in cluster are running NFX, it is a benefit of UNISTACK concept that I described a few weeks back. If you are a corporate SOAP/WSE-consumer then NO, glue does not support it currently with native bindings and never will. One can create bindings for SOAP and other corporate bloat but there is really no need to pollute a clean NFX library with out-dated crap.
  • How do I expose a Glue contract as JSON/REST - you'd need to use JSONHttp binding for that, the one that I have not created and have no intention to create, because it has no practical value. In NFX, REST services are done much easier with NFX.Wave MVC controllers, that should expose your internal Glue services as a facade. Remember - Glue was never meant to be exposed publicly, although it could via corresponding bindings, but there is no need to create bindings just to support some standards that will never be used.

Sunday, July 27, 2014

UNISTACK - a form of Intellectual Property Compression

About

This post is a continuation of NFX / Aum Cluster - The Philosophy I am trying to outline an effect of intellectual property compression - an attempt to reduce system complexity

Some Definitions

What is intellectual property? - It is everything you have in your mind and disk - (in no particular order) design docs, data, state, code, architecture, pattern, know-how. "Information" - is the most general term. As we all know information takes space to store and time to process, unless we use....

Data Compression - according to the Wikipedia article: "Lossless compression reduces bits by identifying and eliminating statistical redundancy" and "Compression is useful because it helps reduce resource usage, such as data storage space or transmission capacity". Keep reading the article and we see that "Because compressed data must be decompressed to use, this extra processing imposes computational or other costs through decompression; this situation is far from being a free lunch. Data compression is subject to a space–time complexity trade-off". We know that it takes time to compress and decompress. If we take the analogy of byte stream compression and apply it to the stream of "unified concepts" that UNISTACK promotes we can conclude that the compression time is actually paid for by the creator of the UNISTACK, it is our pain to create a good dictionary of solutions, so compressing "a file"(a particular project) one does not need to create prefix codes and build Huffman trees. On the other hand, unlike regular ZIP files, the decompression is really not needed for the end product to operate, to the contrary it will operate better in the "compressed" form. This is because the compression is performed in the different domain of information - in the conceptual domain, not in the binary domain on a particular machine.

IT Chagrgrgrgrin

I think that the software industry is the place of the most speculation and term misunderstanding/misuse. Take "MVC", for example. What one calls "M" may be easily re-interpreted as "V" by someone else, i.e. "M" for UI developer may really be a data projection - that is a "VIEW" for DB developer, because for DB developer normalized tables and domains that they rely on are models, but VIEWS are views. It is all tautological.

Why am I bringing this up? I am sick and tired of "configuration" frameworks. When I talk to people I hear "have you seen this?(CF Engine/Chef/Puppet/Zookeeper....". This is exactly the same kind of confusion that I described in the prior case with MVC. What does "configuration" mean? It turns out that for most in the IT operations it means "installing updates and patches", for developers it means "reading XML files", etc.. But what does it really mean? See, there is no "really" - it all depends on who you ask, BUT at the end of the day what matters is, how MUCH $$$$ have you spent int both literal monetary sums and time-wise to build you product that does XYZ, and what are ongoing costs.

UNISTACK = Compression

The concept of UNIFIED (Software) STACK is to cut all of the costs/times without sacrificing features. This is achieved by decreasing the number of standards that the systems is built against. The standards mean: languages, runtimes, libraries, components, frameworks, file formats, DB backends, etc...

Below I have compiled a typical Q/A that I usually have while talking about our product.

  • Different languages do different things better. Why do you insist on using "one language"? - I do not disagree that different languages have better internal models for handling specific tasks, i.e. take Erlang for its concurrency model. But this is all very superficial view. If your project is large you will introduce more troubles by employing more and more languages as complexity will keep growing. Of course you can use some language for some particular system component, but then you would need to build an ecosystem just for that.
  • If I use Python for some backend business logic and NodeJS for web, I want to use some Erlang for chat server, why not? - You can use anything you want, you can even write a QuickBasic stdio redirect into web server, the question is WHY? The problem in today's world is that those very many languages are out there, yet, none of them really have a solid cluster-forming concept, maybe Erlang does to some extent, but those languages are just bare. In C++ you can not think fault tolerance in cluster, just not what the language was built for.
  • But what do you suggest? That's why we use different tools. - Everyone does, and complexity keeps increasing. There is a fundamental flaw in the approach to problem solving. Different disjoint frameworks introduce complexity when the whole system needs to work in concert. This all happens because there is never enough time to do things the right way from day one. It is an illusion, that by 'somehow making it work' regardless of 'how' you can save a lot. Maybe you can as a consultant who is interested in hourly rates and 3 month projects.
  • So what do you suggest? - I suggest that you take good general purpose programming language/runtime (not PHP) and write all components from scratch according to one simple and elegant blueprint, and that blueprint gotta have cluster in mind from day 1. As the result you will have 10-fold reduction in number of edge-cases in your system that you need to support today, from Date parsing in JSON to configuring your nodes from central servers.
  • This is too good to be true, how long will it take one to write "everything"? - in practice it will take 20 + years to refine concepts especially for cluster programming. You would need to write a good 10+ large-scale client/server web systems from scratch, each containing 100s of screens to get the experience and get rid of the fright. At the end of the day it would become apparent that the majority of software development frameworks today are created/coded by people who really lack experience building large systems all by themselves. Modern developers really lost their software engineering qualifications, that's where the term "developer" came from - landscaping. Nothing needs to be developed in software - a compiler does that for you automatically. Software is all about architecture only. If there is "development" work in your project then it means that you are not doing it the right way. There should be no repetition, only architecture and repetition only comes in binary files made by compiler. Of course, there are some edge cases, like government forms and some other sorts of corporate chaff, but that is not software engineering.
  • Some real numbers? - how many languages do you need to know in your project? 3? I need one. How many config file formats do you need to remember/document? 10? I need 1. How many entities like classes/functors/namespases do you need to work with? 1000s? I only have < 100. And the most important thing, there are no 3rd party libs, no software/license updates/ tiny footprint/self-reliance. Let's assume that there are 200 servers to deploy your system on. How many web servers, how many job, how many DBs etc. How do you deploy changes? I press one button, I monitor one screen with ability to drill down anywhere I want. In other words - had it not been for physical hardware that fails every now and then (and requires attention), one can use UNISTACK to build Google-like system that will be supported only by a few people, no matter how big it gets. That's the real practical power of UNISTACK - unified stack of software for everything you need. It is an effect of intellectual property compression - akin to ZIP - you decrease the information chaos by creating a dictionary of repeating patterns and re-use them. For that, one needs to operate on a homogeneous stream of information, say "concepts" (ZIP operates on byte streams). UNISTACK provides a stream of "unified concepts" so our process(ZIP) can compress them.
  • But is this not what all other libs are for - get things done for you? - they are, but it is the question of the scope. Most frameworks are very narrow-minded (i.e. take Hibernate - and you will get into 1000s of classes just to execute a simple SQL), we are talking framework that does everything all other common frameworks do - in the same way. Of course we do not do things like speech recognition or 10-dim differential equations, there is no need for that, but there are 100s of general-purpose functions that are just not unified across myriads of frameworks. For example, the moment I use some 3rd party control it requires log4J, but I use another logger, so now I need to support 2 completely different logging solutions in the same process. And this applies to every single function you app needs. Its like misspelling your customer's name in 10 different ways on the same page!