Saturday, November 30, 2013

NFX Instrumentation and Telemetry

NFX.Instrumentation Overview

Instrumentation is an ability to insert various level and event counters that allow for detailed runtime monitoring of the instrumented software. In Windows paradigm there is a concept of "PerformanceCounter" which is similar to the one in NFX, however Windows performance counters are OS-specific and do not scale well in hierarchical server clusters. NFX, on the other hand, provides 100% native code cluster-enabled (can run in large server farms) platform-independent concept, i.e. JFX will support the same concept for applications written in JAVA.

Instrumentation is a built-in function in the NFX library. It is a service of IApplication, so any application that runs inside of NFX application container has the luxury of being instrumented. It all comes down to the single method called "Record":

    /// 
    /// Stipulates instrumentation contract
    /// 
    public interface IInstrumentation : ILocalizedTimeProvider 
    {
      /// 
      /// Indicates whether instrumentation is enabled
      /// 
      bool Enabled { get;}
      
      /// 
      /// Records instrumentation datum
      /// 
      void Record(Datum datum);
    }
 

Instrumentation is built around the Datum class that provides general abstraction for two direct sub-types: Events and Gauges. Every Datum instance has a start/stop timestamps and Count property that says how many times a measurement has been taken / event occurred.

 /// 
 /// Base class for single measurement events (datums) reported to instrumentation
 /// 
 public abstract class Datum
 {
       /// 
       /// Returns UTC time stamp when event happened
       /// 
       public DateTime UTCTime
       /// 
       /// Returns UTC time stamp when event happened. This property may be gotten only if IsAggregated==true, otherwise UTCTime value is returned
       /// 
       public DateTime UTCEndTime
       /// 
       /// Indicates whether this instance represents a roll-up/aggregation of multiple events
       /// 
       public bool IsAggregated
       /// 
       /// Returns count of measurements. This property may be gotten only if IsAggregated==true, otherwise zero is returned
       /// 
       public int Count
       /// 
       /// Returns datum source. Data are rolled-up by type of recorded datum instances and source
       /// 
       public string Source
       /// 
       /// Returns rate of occurrence string
       /// 
       public string Rate
       /// 
       /// Returns description for data that this datum represents. Base implementation returns full type name of this instance
       /// 
       public virtual string Description
       /// 
       /// Provides access to value polymorphically
       /// 
       public abstract object ValueAsObject { get;}
       /// 
       /// Provides name for units that value is measured in
       /// 
       public abstract string ValueUnitName { get; }
 }

Every instance of Datum class represents either a single event/measurement or an aggregation of multiple measurements/events. The instrumentation framework gathers reported data and "reduces" it by keys into summary/aggregation objects once every X seconds. The aggregated data then gets written into NFX.Instrumentation.Provider-implementing class.

An Event - is a kind of Datum that does not have a value. Basically the very fact that an event instance exists tells the system that some event happened.

In contrast to events, a Gauge is a level/meter that signifies an event of measurement of some volatile value and captures the measurement value in it's Value property (pardon the tautology). Actually, there are LongGauge and DoubleGauge general ancestors for integer and real arithmetic. Just like an event, an instance of Gauge also has a Count property that says how many times a measurement has been performed.

The object model is a classic OOP. You can design your own Datum-derivatives and capture whatever data you want, i.e. you can capture complex values like lists or arrays (or real complex numbers).

Here is an example of a self-explanatory event:

    [Serializable]
    public class ServerGotOverMaxMsgSizeErrorEvent : ServerTransportErrorEvent
    {
        protected ServerGotOverMaxMsgSizeErrorEvent(string src) : base(src) {}
        
        public static void Happened(Node node)
        {
          var inst = App.Instrumentation;
          if (inst.Enabled)
           inst.Record(new ServerGotOverMaxMsgSizeErrorEvent(node.ToString())); 
        }
        
        public override string Description { get{ return "Server-side errors getting messages with sizes over limit"; }}

        protected override Datum MakeAggregateInstance()
        {
            return new ServerGotOverMaxMsgSizeErrorEvent(this.Source); 
        }
    }
And here is how it is used in Glue TCP server transport:
    
 var size = msb.ReadBEInt32();
 if (size>Binding.MaxMsgSize)
 {
  Instrumentation.ServerGotOverMaxMsgSizeErrorEvent.Happened(Node);
 
      // This is unrecoverable error - close the channel!
  throw new MessageSizeException(size, Binding.MaxMsgSize, "getRequest()", closeChannel: true);
 }

Here is an example of a self-explanatory gauge:

    [Serializable]
    public class ServerBytesReceived : ServerGauge, INetInstrument
    {
        protected ServerBytesReceived(string src, long value) : base(src, value) {}

        public static void Record(Node node, long value)
        {
           var inst = App.Instrumentation;
           if (inst.Enabled)
             inst.Record(new ServerBytesReceived(node.ToString(), value)); 
        }


        public override string Description { get{ return "How many bytes server received"; }}
        public override string ValueUnitName { get{ return "bytes"; }}


        protected override Datum MakeAggregateInstance()
        {
            return new ServerBytesReceived(this.Source, 0); 
        }
    }
And here is how it is used in one place that dumps TCP transport statistics on the Glue server:
    if (m_InstrumentServerTransportStat)
    {
     Instrumentation.ServerBytesReceived.Record(node, transport.StatBytesReceived);
        Instrumentation.ServerBytesSent    .Record(node, transport.StatBytesSent);
        Instrumentation.ServerMsgReceived  .Record(node, transport.StatMsgReceived);
        Instrumentation.ServerMsgSent      .Record(node, transport.StatMsgSent);
        Instrumentation.ServerErrors       .Record(node, transport.StatErrors);
 }

Working with Instrumentation Data

NFX.Instrumentation outputs data via NFX.Instrumentation.InstrumentationProvider. Currently we have two implementing classes: LogInstrumentationProvider and TelemetryInstrumentationProvider. The second one does everything the first one does plus it sends data over network into ITelemetryReceiver endpoint:

    /// 
    /// Represents a contract for working with remote receiver of telemetry information
    /// 
    [Glued]
    [LifeCycle(Mode = ServerInstanceMode.Singleton)]
    public interface ITelemetryReceiver
    {
        /// 
        /// Sends data to remote telemetry receiver
        /// 
        /// the name/identifier of the reporting site
        /// Telemetry data
        [OneWay] void Send(string siteName, Datum data);
    }

There is a tool called "TelemetryViewer" that implements the aforementioned interface. If you want to get your app instrumented, just add these lines to your config:

  instrumentation
  { 
    name="Instruments" 
 interval-ms="5000"
    
    provider
    {
   name="Telemetry Instrumentation Provider"
      type="NFX.Instrumentation.Telemetry.TelemetryInstrumentationProvider" 
      use-log="true"
      receiver-node="sync://devsrv2:8300"
     }
  }
This will instruct NFX.Instrumentation to aggregate all data every 5 seconds and write it to local log and remote telemetry endpoint.

Now, it is time to run the TelemetryViewer tool. If you have your instrumented app running then you will see it's events and gauges in a matter of seconds:

Here is how NFX.DataAccess.Cache output looks:

This shows how sweeper (red) has evicted expired cache records, so orange line (total records in cache) went down:

But some people like flower-looking charts:

I like those screen shots!

Here we can see collision count vs. cache page load factor behaviour:

And finally - grand finale! Allocating 25,000,000 cache items then watch them die with time:

Conclusion

NFX provides instrumentation services built-in. Unlike Windows performance counters, the NFX ones are platform-independent and work on Linux (via Mono) and interoperate with JAVA software (using JFX).

NFX instrumentation architecture is very simple and scalable in cluster environments. Aum Clusterware is built using NFX and aggregates real-time telemetry data from 1000s of live running cluster nodes.

Sunday, November 24, 2013

Working with JSON data in NFX

JSON support in NFX

There is no life this day and age without JSON support. If you are a seasoned developer you know better - JSON serialization is a pain. There are many options for it but there is always something that either does not work right (i.e. dates are not serialized per ISO standard), or slow, or both.

As I have already mentioned here, NFX is a "Unistack" concept, meaning - it has to have all vital functions natively integrated in it's core. JSON support is certainly qualified as such. We can not afford to spend time figuring out why some structure does not serialize the way Twitter or some other service expects. It has to be small, and simple. It has to perform very well.

How is JSON Support Implemented in NFX

I wrote JSON lexer and parser in less than 8 hrs, that includes Unicode escapes, intricate string escapes and much more. I also managed to write around 50 unit tests the next day. The reason why I was able to write this so fast is because NFX has a nice "CodeAnalysis" concept that provides abstract support for writing language analysis tools/compilers. This certainly deserves it's own blog post, but I'll just say that low-level mechanisms of tokenezation, source code management (read from buffer, from file), compiler pipeline contexts(Lexer->Parser->Semantics->Code Generator), warnings/errors, source positioning, and other things are already there, so I just needed to add JSON support.

Here are a few JSON lexer unit tests:

    [TestCase]
        public void TokenClassifications()
        {
          var src = @"a 'string' : 12 //comment";

          var tokens = new JL(new StringSource(src)).Tokens;

          Assert.IsTrue( tokens[0].IsBOF);
          Assert.IsTrue( tokens[0].IsNonLanguage);
          Assert.IsFalse( tokens[0].IsPrimary);
          Assert.AreEqual(TokenKind.BOF, tokens[0].Kind);

          Assert.AreEqual( JSONTokenType.tIdentifier, tokens[1].Type);
          Assert.IsFalse( tokens[1].IsNonLanguage);
          Assert.IsTrue( tokens[1].IsPrimary);
          Assert.AreEqual(TokenKind.Identifier, tokens[1].Kind);

          Assert.AreEqual( JSONTokenType.tStringLiteral, tokens[2].Type);
          Assert.IsFalse( tokens[2].IsNonLanguage);
          Assert.IsTrue( tokens[2].IsPrimary);
          Assert.IsTrue( tokens[2].IsTextualLiteral);
          Assert.AreEqual(TokenKind.Literal, tokens[2].Kind);

          Assert.AreEqual( JSONTokenType.tColon, tokens[3].Type);
          Assert.IsFalse( tokens[3].IsNonLanguage);
          Assert.IsTrue( tokens[3].IsPrimary);
          Assert.IsTrue( tokens[3].IsOperator);
          Assert.AreEqual(TokenKind.Operator, tokens[3].Kind);


          Assert.AreEqual( JSONTokenType.tIntLiteral, tokens[4].Type);
          Assert.IsFalse( tokens[4].IsNonLanguage);
          Assert.IsTrue( tokens[4].IsPrimary);
          Assert.IsTrue( tokens[4].IsNumericLiteral);
          Assert.AreEqual(TokenKind.Literal, tokens[4].Kind);

          Assert.AreEqual( JSONTokenType.tComment, tokens[5].Type);
          Assert.IsFalse( tokens[5].IsNonLanguage);
          Assert.IsFalse( tokens[5].IsPrimary);
          Assert.IsTrue( tokens[5].IsComment);
          Assert.AreEqual(TokenKind.Comment, tokens[5].Kind);

        }
  
  [TestCase]
        public void BasicTokens2()
        {
          var src = @"{a: 2, b: true, c: false, d: null, e: ['a','b','c']}";

          var lxr = new JL(new StringSource(src));

          var expected = new JSONTokenType[]
          { 
           JSONTokenType.tBOF, JSONTokenType.tBraceOpen,
           JSONTokenType.tIdentifier, JSONTokenType.tColon, JSONTokenType.tIntLiteral, JSONTokenType.tComma,
           JSONTokenType.tIdentifier, JSONTokenType.tColon, JSONTokenType.tTrue, JSONTokenType.tComma,
           JSONTokenType.tIdentifier, JSONTokenType.tColon, JSONTokenType.tFalse, JSONTokenType.tComma,
           JSONTokenType.tIdentifier, JSONTokenType.tColon, JSONTokenType.tNull, JSONTokenType.tComma,
           JSONTokenType.tIdentifier, JSONTokenType.tColon, JSONTokenType.tSqBracketOpen, JSONTokenType.tStringLiteral, JSONTokenType.tComma,
                                                                                          JSONTokenType.tStringLiteral, JSONTokenType.tComma,
                                                                                          JSONTokenType.tStringLiteral,
                                                            JSONTokenType.tSqBracketClose,
           JSONTokenType.tBraceClose, JSONTokenType.tEOF};
           
          Assert.IsTrue( lxr.Select(t => t.Type).SequenceEqual(expected) );
        }
  
  [TestCase]
        public void IntLiterals()
        {
           Assert.AreEqual(12,  new JL(new StringSource(@"12")).Tokens.First(t=>t.IsPrimary).Value);
           Assert.AreEqual(2,   new JL(new StringSource(@"0b10")).Tokens.First(t=>t.IsPrimary).Value);
           Assert.AreEqual(16,  new JL(new StringSource(@"0x10")).Tokens.First(t=>t.IsPrimary).Value);
           Assert.AreEqual(8,   new JL(new StringSource(@"0o10")).Tokens.First(t=>t.IsPrimary).Value);
        }
  
  

Did you see something weird? YES, this is not JSON, this is superset of JSON!

JSON+ (is) a JSON on Steroids (or vodka)

NFX supports reading of JSON superset. It naturally happened so that my JSON parser was built per NFX.CodeAnalysis namespace, so I got support for the following things for free:

  • Single line comments
  • Multiline comment blocks
  • Compiler directives
  • Verbatim strings
  • Hex,Bin,Octal prefixes in integer literals
Instead of removing those features because JSON does not support them, I decided to leave them as-is, so now I can use JSON+(that's how I call NFX.JSON superset) for other things. For example:

        [TestCase]
        public void ParallelDeserializationOfManyComplexObjects()
        {
            const int TOTAL = 1000000;
            var src = @"
 {FirstName: ""Oleg"",  //comments dont hurt
  'LastName': ""Ogurtsov"",
  ""Middle Name"": 'V.',
  ""Crazy\nName"": 'Shamanov',
  LuckyNumbers: [4,5,6,7,8,9], 
  /* comments
  do not break stuff */
  |* in this JSON superset *|
  History: 
  [
    #HOT_TOPIC    //ability to use directive pragmas
    {Date: '05/14/1905', What: 'Tsushima'},
    #MODERN_TOPIC
    {Date: '09/01/1939', What: 'WW2 Started', Who: ['Germany','USSR', 'USA', 'Japan', 'Italy', 'Others']}
  ] ,
  Note:
$'This note text
can span many lines
and
this \r\n is not escape'
 }
";
  var watch = Stopwatch.StartNew();
     
  System.Threading.Tasks.Parallel.For
  (0, TOTAL,
   (i)=>
   {
    var obj = src.JSONToDynamic();
    Assert.AreEqual("Oleg", obj.FirstName);
    Assert.AreEqual("Ogurtsov", obj.LastName);
    Assert.AreEqual("V.", obj["Middle Name"]);
    Assert.AreEqual("Shamanov", obj["Crazy\nName"]);
    Assert.AreEqual(6, obj.LuckyNumbers.Count);
    Assert.AreEqual(6, obj.LuckyNumbers.List.Count);
    Assert.AreEqual(7, obj.LuckyNumbers[3]);
    Assert.AreEqual("USSR", obj.History[1].Who[1]);
   }
  );

  var time = watch.ElapsedMilliseconds;
  Console.WriteLine("Long JSON->dynamic deserialization test took {0}ms for {1} objects @ {2}op/sec"
        .Args(time, TOTAL, TOTAL / (time / 1000d))
        );
        }
 

This approach is 100% compatible with "regular" JSON as "regular" JSON does not have comments and verbatim strings. The only "dangling" feature is compiler pragmas that I left there - they are just ignored for now and we may use them for something else in future. The bottom line is, that I spent <20 hours writing lexer, parser and around 90 unit tests in total. The tests are hand-written and test many edge cases like: comment within a string or string inside comment, Unicode escapes etc.

JSON Pattern Matching

I guess you already figured that we do know about benefits of programming in functional languages (such as Erlang) and message oriented systems. Since our JSON support is based on NFX.CodeAnalysis, we have a feature right out of the box - pattern matching. Pattern matching is really cool because we can use it to quickly filter/reject JSON messages that we do/do not need. Let look at the code:

    // mathc a person - a message with "Last name"
 public class JSONPersonMatchAttribute : JSONPatternMatchAttribute
 {
  public override bool Match(NFX.CodeAnalysis.JSON.JSONLexer content)
  {
    return content.LazyFSM(
     (s,t) => s.LoopUntilMatch(
      (ss, tk) => tk.LoopUntilAny("First-Name","FirstName","first_name"),
      (ss, tk) => tk.IsAnyOrAbort(JSONTokenType.tColon),
      (ss, tk) => tk.IsAnyOrAbort(JSONTokenType.tStringLiteral),
      (ss, tk) => FSMI.TakeAndComplete
            ),
     (s,t) => FSMI.Take
    ) != null;  
  }
 }
   
And now, we can quickly filter by doing this:
        [JSONPersonMatch] //<---- OUR MATCHER!
        [TestCase]
        public void JSONPatternMatchAttribute3()
        {
          var src = @"{ code: 1121982, color: red, 'first_name': 'Alex', DOB: null}";
          var lxr = new JL(new StringSource(src));
          var match = JSONPatternMatchAttribute.Check(MethodBase.GetCurrentMethod(), lxr);          
          Assert.IsTrue( match );
        }
 

The filter statement above is an example of imperative filter. It is a Finate State Machine that gets fed from lexically-analyzed JSON stream. What makes it very fast, is the fact that JSON lexer is a lazy one - it parses input only when parser asks for the next token. Suppose we need to parse a message that has 64 kbytes of JSON content. Why would a lexer need to parse all 64 kbytes if our particular business code can only process JSON message that has some certain structure? So, the way it is implemented now, as soon as pattern match fails - there is no need to keep parsing to end. Again, this is not a JSON-specific concept in NFX, rather a general NFX.CodeAnalysis concept that applies to other parsers (C#, Laconic, RelationalScema, etc.)

One more thing about parsing, if you have noticed - I used a pattern attribute on a method declaration. It is purposely done for message-processing cases (i.e. web MVC applications), where method signature may take some JSON data as input and pattern match attribute will guard the method this way. Sounds like Erlang or Delphi message, or better yet ObjC to anyone?

As far as pattern matching is concerned - people ask me "why do you not use regular expressions?". Simple - because, we do use the similar approach with FSM(Finate State Machine) but we analyse tokens, not characters, so our matches are 100% correct in terms of language grammar, whereas RegExp has no clue about tokens as it works on strings. The feature that we do want to add in future though, is an ability to write pattern matches not only imperatively, but also in a reg-exp style and we do have a reservation of special matching terminal symbols in the language. We just did not have time to implement yet.

Reading JSON data

The best way to describe our features is to show some code:

 [TestCase]
 public void ReadSimple2()//using DYNAMIC
 {
  var obj = "{a: -2, b: true, c: false, d: 'hello'}".JSONToDynamic();

  Assert.AreEqual(-2, obj.a);
  Assert.AreEqual(true, obj.b);
  Assert.AreEqual(false, obj.c);
  Assert.AreEqual("hello", obj.d);
 }
 
 [TestCase]
 public void ReadSimpleNameWithSpace()//using DYNAMIC
 {
  var obj = @"{a: -2, 'b or \'': 'yes, ok', c: false, d: 'hello'}".JSONToDynamic();

  Assert.AreEqual(-2, obj.a);
  Assert.AreEqual("yes, ok", obj["b or '"]);
  Assert.AreEqual(false, obj.c);
  Assert.AreEqual("hello", obj.d);
 }
 
 [TestCase]
 public void RootObject()//using JSONData
 {
   var src = @"{a: 1, b: true, c: null}";

   var parser = new JP(  new JL( new StringSource(src) )  );

   parser.Parse();

   Assert.IsInstanceOf(typeof(JSONDataMap), parser.ResultContext.ResultObject);
   var obj = (JSONDataMap)parser.ResultContext.ResultObject;

   Assert.AreEqual(3, obj.Count);
   Assert.AreEqual(1, obj["a"]);
   Assert.AreEqual(true, obj["b"]);
   Assert.AreEqual(null, obj["c"]);
 }
 

In the code above I read JSON content into dynamic and JSONData hashtable. And now, the shocking confession: NFX.JSON does not support reading JSON into some arbitrary CLR classes! Why? Because it is not needed, and it is impossible to implement correctly as JSON is a total "impedance" mismatch for CLR complex types. I do know about Newtonsoft etc, but I have never ever had a need to deserialize JSON into type-safe structure because either: a). you need to code that by hand anyway or b). your CLR structure must be dumb-simple in order to map to JSON 1:1. NFX design does not endorse the creation of garbage DTO (data transfer objects) just for the purpose of being able to read form JSON. Dynamic languages are far more superior for these purposes, so I decided NOT TO implement JSON deserialization into customCLR types. Think about it, and you would agree that working with "dynamic" keyword is far more convenient than creating 100s of junk classes.

Writing JSON data

Writing is a whole another story. JSONWriter class serializes any CLR complex type, IEnumerable or IDictionary into JSON:

   [TestCase]
 public void RootDictionary_object()
 {
  var dict = new Dictionary<object, object>{ {"name", "Lenin"}, 
                                             {"in space", true},
               {1905, true},
               {1917, true},
               {1961, false},
               {"Bank", null} };
  var json = JW.Write(dict);
  Console.WriteLine(json);
  Assert.AreEqual("{\"name\":\"Lenin\",\"in space\":true,\"1905\":true,\"1917\":true,\"1961\":false,\"Bank\":null}",  json);
 }
A more complex case:
    [TestCase]
 public void RootListOfDictionaries_object_SpaceSymbols()
 {
  var lst = new List<object>
       {
       12,
       16,
       new Dictionary<object, object>{ {"name", "Lenin"}, {"in space", true}},
       new Dictionary<object, object>{ {"name", "Solovei"}, {"in space", false}},
       true,
       true,
       -1789,
       new Dictionary<object, object>{ {"name", "Dodik"}, {"in space", false}}
       };
  var json = JW.Write(lst, new JSONWritingOptions{SpaceSymbols=true});
  Console.WriteLine(json);
  Assert.AreEqual("[12, 16, {\"name\": \"Lenin\", \"in space\": true}, {\"name\": \"Solovei\", \"in space\": false}, true, true, -1789, {\"name\": \"Dodik\", \"in space\": false}]", json);
 }
  
 [TestCase]
 public void RootDictionaryWithLists_object()
 {
  var lst = new Dictionary<object, object>
       {
      {"Important", true},
      {"Patient", new Dictionary<string, object>{{"LastName", "Kozloff"},
                {"FirstName","Alexander"}, 
                {"Occupation","Idiot"}}},
      {"Salaries", new List<object>{30000, 78000,125000, 4000000}},
      {"Cars", new List<object>{"Buick", "Ferrari", "Lada", new Dictionary<string,object>{ {"Make","Zaporozhets"}, {"Model", "Gorbatiy"}, {"Year", 1971}  }    }},

       };
  var json = JW.Write(lst, JSONWritingOptions.PrettyPrint);
  Console.WriteLine(json);

        var expected=
@"
{
  ""Important"": true, 
  ""Patient"": 
    {
      ""LastName"": ""Kozloff"", 
      ""FirstName"": ""Alexander"", 
      ""Occupation"": ""Idiot""
    }, 
  ""Salaries"": [30000, 78000, 125000, 4000000], 
  ""Cars"": [""Buick"", ""Ferrari"", ""Lada"", 
      {
        ""Make"": ""Zaporozhets"", 
        ""Model"": ""Gorbatiy"", 
        ""Year"": 1971
      }]
}";
            Assert.AreEqual(expected, json);
        }

And now, from JSONDynamicObject:

   [TestCase]
 public void Dynamic1()
 {
  dynamic dob = new JDO(NFX.Serialization.JSON.JSONDynamicObjectKind.Map);

  dob.FirstName = "Serge";
  dob.LastName = "Rachmaninoff";
  dob["Middle Name"] = "V";

  var json = JW.Write(dob);

  Console.WriteLine(json);

  Assert.AreEqual("{\"FirstName\":\"Serge\",\"LastName\":\"Rachmaninoff\",\"Middle Name\":\"V\"}", json);
 }
How about a full loop write->JSON->read :
   [TestCase]
 public void Dynamic3_WriteRead()
 {
  dynamic dob = new JDO(NFX.Serialization.JSON.JSONDynamicObjectKind.Map);
  dob.FirstName = "Al";
  dob.LastName = "Kutz";
  dob.Autos = new List<string>{"Buick", "Chevy", "Mazda", "Oka"};

  string json = JW.Write(dob);

  var dob2 = json.JSONToDynamic();
  Assert.AreEqual(dob2.FirstName, dob.FirstName);
  Assert.AreEqual(dob2.LastName, dob.LastName);
  Assert.AreEqual(dob2.Autos.Count, dob.Autos.Count);
 }
And some crazy Unicode content, notice the option to write JSON using ASCII-only:
 [TestCase]
 public void StringEscapes_2_ASCII_NON_ASCII_Targets()
 {
  var lst = new List<object>{ "Hello\n\rDolly!", "Главное за сутки"};
  var json = JW.Write(lst, JSONWritingOptions.CompactASCII );//ASCII-only
  Console.WriteLine(json);
  Assert.AreEqual("[\"Hello\\n\\rDolly!\",\"\\u0413\\u043b\\u0430\\u0432\\u043d\\u043e\\u0435 \\u0437\\u0430 \\u0441\\u0443\\u0442\\u043a\\u0438\"]", json);
  json = JW.Write(lst, JSONWritingOptions.Compact );
  Console.WriteLine(json);
  Assert.AreEqual("[\"Hello\\n\\rDolly!\",\"Главное за сутки\"]", json);
 } 
How about anonymous classes? Here ya go:
   [TestCase]
 public void RootAnonymousClass_withArrayandSubClass()
 {
  var data = new {Name="Kuklachev", Age=99, IsGood= new object []{ 1, new {Meduza="Gargona", Salary=123m},true}}; 
  var json = JW.Write(data);
  Console.WriteLine(json);
  Assert.AreEqual("{\"Name\":\"Kuklachev\",\"Age\":99,\"IsGood\":[1,{\"Meduza\":\"Gargona\",\"Salary\":123},true]}", json);
 }
And a "regular" .NET CLR POCO class:
 internal class ClassWithAutoPropFields
 {
  public string Name{ get; set;}
  public int Age{ get; set;}
 }
 [TestCase]
 public void RootAutoPropFields()
 {
  var data = new ClassWithAutoPropFields{Name="Kuklachev", Age=99}; 

  var json = JW.Write(data);
  Console.WriteLine(json);
  Assert.AreEqual("{\"Name\":\"Kuklachev\",\"Age\":99}", json);
 }
And a few more cool features, mostly for performance and portability:
 /// 
 /// Denotes a CLR type-safe entity (class or struct) that can directly write itself as JSON content string. 
 /// This mechanism bypasses all of the reflection/dynamic code.
 /// This approach may be far more performant for some classes that need to serialize their state/data in JSON format, 
 /// than relying on general-purpose JSON serializer that can serialize any type but is slower
 /// 
 public interface IJSONWritable
 {
 /// 
 /// Writes entitie's data/state as JSON string
 /// 
 ///
 ///TextWriter to write JSON content into
 ///
 /// 
 /// A level of nesting that this instance is at, relative to the graph root.
 /// Implementations may elect to use this parameter to control indenting or ignore it
 /// 
 /// 
 /// Writing options, such as indenting.
 /// Implementations may elect to use this parameter to control text output or ignore it
 /// 
 void WriteAsJSON(TextWriter wri, int nestingLevel, JSONWritingOptions options = null);
 } 
And here is the place where it is used:
    /// 
    /// Provides base for rowset implementation. 
    /// Rowsets are mutable lists of rows where all rows must have the same schema, however a rowset may contain a mix of
    ///  dynamic and typed rows as long as they have the same schema.
    /// Rowsets are not thread-safe
    /// 
    [Serializable]
    public abstract class RowsetBase : IList, IComparer, IJSONWritable
    {
    .........
  /// 
  /// Writes rowset as JSON including schema information. 
  /// Do not call this method directly, instead call rowset.ToJSON() or use JSONWriter class
  /// 
  public void WriteAsJSON(System.IO.TextWriter wri, int nestingLevel, JSONWritingOptions options = null)
  {
   var tp = GetType();

   var map = new Dictionary<string, object>
   {
     {"Instance", m_InstanceGUID.ToString("D")},
     {"Type", tp.FullName},
     {"IsTable", typeof(Table).IsAssignableFrom( tp )},
     {"Schema", m_Schema},
     {"Rows", m_List}
   };
   JSONWriter.WriteMap(wri, map, nestingLevel, options);
  } 
 }  

Performance/Benchmarks

And finally, some numbers. Lets compare NFX.Serialization.JSON with MS-provided stuff. Tests reside in NFX.NUnit.Integration for now:

 ***** NFX.NUnit.Integration.Serialization.JSON.Benchmark_Serialize_DataObjectClass()
Serialize.DataObjectClass
    NFX: 15290.5198776758 op/sec 
    MS JSser: 3777.86173026067 op/sec
    MS DataContractSer: 8920.60660124888 op/sec
    Ratio NFX/JS: 4.04740061162079
    Ratio NFX/DC: 1.71406727828746 
     
***** NFX.NUnit.Integration.Serialization.JSON.Benchmark_Serialize_DictionaryPrimitive()
Serialize.DictionaryPrimitive
    NFX: 303030.303030303 op/sec 
    MS JSser: 270270.27027027 op/sec
    MS DataContractSer: 45248.8687782805 op/sec
    Ratio NFX/JS: 1.12121212121212
    Ratio NFX/DC: 6.6969696969697 
     
***** NFX.NUnit.Integration.Serialization.JSON.Benchmark_Serialize_ListObjects()
DataContractJSONSerializer does not support this test case: [System.Runtime.Serialization.SerializationException] 
Type 'NFX.NUnit.Integration.Serialization.APerson' with data contract name 'APerson:http://schemas.datacontract.org/2004/07/NFX.NUnit.Integration.Serialization'
 is not expected. Consider using a DataContractResolver or add any types not known statically to the list of known types - for example, by using the KnownTypeAttribute attribute or by adding them to the list of known types passed to DataContractSerializer.
Serialize.ListObjects
    NFX: 71942.4460431655 op/sec 
    MS JSser: 16366.612111293 op/sec
    MS DataContractSer: 1.0842021724855E-12 op/sec
    Ratio NFX/JS: 4.39568345323741
    Ratio NFX/DC: N/A 
     
***** NFX.NUnit.Integration.Serialization.JSON.Benchmark_Serialize_ListPrimitive()
Serialize.ListPrimitive
    NFX: 344827.586206897 op/sec 
    MS JSser: 370370.37037037 op/sec
    MS DataContractSer: 114942.528735632 op/sec
    Ratio NFX/JS: 0.931034482758621
    Ratio NFX/DC: 3 
     
***** NFX.NUnit.Integration.Serialization.JSON.Benchmark_Serialize_PersonClass()
Serialize.PersonClass
    NFX: 285714.285714286 op/sec 
    MS JSser: 52631.5789473684 op/sec
    MS DataContractSer: 277777.777777778 op/sec
    Ratio NFX/JS: 5.42857142857143
    Ratio NFX/DC: 1.02857142857143 
 
 
What we see here is that NFX JSON code really beats both Microsoft JavaScript and DataContract serializers, and frankly I had not had a chance yet to optimize JSON lexer in NFX. I guess I can squeeze another good 25% speed boost if I revise string parsing, but this is not important now.

Conclusion

NFX provides rich support for working with JSON data format. The functionality is built on top of NFX.CodeAnalysis which unifies and simplifies the construction of lexers and parsers and, as a benefit, it allows us to filter/pattern match against JSON data without reading-in the whole JSON content. The library is well tested against edge cases like Unicode escapes, ISO dates, and also supports reading of JSON+ superset that understands comments, hex/bin/octal numeric prefixes that make this format very well suited for config files. The library writes standard JSON with extensive ability to serialize IEnumerable<>, IDictionary<,> and POCO classes specifying indentetion, ASCII vs. Unicode and ISO date options.

Friday, November 15, 2013

Aum Cluster - Application Remote Terminals

What is it?

Aum Clusterware supports a concept of application remote terminals. They behave much like a regular command prompt, however the commands are application level - they operate not on OS objects but rather on application objects. Here is a typical command: "gc" - it performs a garbage collection inside your server process.

The way it works is really simple - take a look at the following contract:

    /// 
    /// Represents a contract for working with remote entities using terminal/command approach
    /// 
    [Glued]
    [AuthenticationSupport]
    [RemoteTerminalOperatorPermission]
    [LifeCycle(ServerInstanceMode.Stateful, SysConsts.REMOTE_TERMINAL_TIMEOUT_MS)]
    public interface IRemoteTerminal
    {
        [Constructor]
        RemoteTerminalInfo Connect(string who);

        string Execute(string command);

        [Destructor]
        string Disconnect();
    }
This is a typical NFX.Glue stateful entity that serves remote console sessions, it is hosted on the server that you want to login into, and gets activated by NFX.Glue runtime. All we need to do is add the following lines to our configuration file on the host process:

 glue
 {
  servers
  {
   server {name="TerminalAsync" 
           node="async://*:7700"
           contract-servers="ahgov.HGovRemoteTerminal, ahgov"}
  }
 } 
This exposes HGovRemoteTerminal class as a server using "async" binding on all network interfaces on tcp port 7700. This class is the one that implements the aforementioned IRemoteTerminal contract. Now we can consume the service remotely!

How do I work with/manage my remote application?

Since our app remote terminal architecture relies on NFX.Glue, we can simply connect to remote terminal using Aum.Cluster.Clients.RemoteTerminal class, which resides in Aum.Cluster.dll assembly, but why would you want to do this? The whole point of this approach is to be able to work with application containers remotely without any coding - that is what people use command prompts for or maybe even graphical management tools that send commands on the background.

So we have created a tool - "ascon" which stands for "aum server console". Here is a nice screenshot:

The app runs in verbose or silent modes, here I have typed-in a "help" command:
You can also specify command text right from orignal command-line and also use "silent" switch to suppress all logos and info messages. In this example we get an information for the remote server build set: dates, computer names and who built the software that runs on the server:

Command-Lets

We chose a flexible approach to creating server handlers that understand those little commands that you enter via the ASCON tool. We use reflection to dynamically discover Cmdlet class implementers and that is how commands get dispatched and provide help. Let's look at a VER commandlet:
  public class Ver : Cmdlet
    {
        public Ver(AppRemoteTerminal terminal, IConfigSectionNode args) : base(terminal, args) {}
       
        public override string Execute()
        {
            var result = new StringBuilder(0xff);
            result.AppendLine("Server Version/Build information:");
            result.AppendLine(" App:     " + App.Name);
            result.AppendLine(" NFX:     " + BuildInformation.ForFramework);
            result.AppendLine(" Cluster: " + new BuildInformation( typeof(Aum.Cluster.Clusterware).Assembly ));
            result.AppendLine(" Host:    " + new BuildInformation( Assembly.GetEntryAssembly() ));

            return result.ToString();
        }

        public override string GetHelp()
        {
            return "Returns version/build information";
        }
    }
Enough said! You'v got the idea!

Remote Management - IO Redirection

Here is another nice commandlet "run". I want to ping my host "SEXTOD" from my server, notice how I forgot to specify the args and "PING" responded with help into our server process which redirected IO into this ASCON session using NFX.Glue:

Remote Management - Load Testing and Performance

If you are not convinced that this is cool, here are a few more commands for you: "toil" and "perf".

"TOIL" - is used to load server with garbage work, you can specify options - whether you want CPU only, or RAM only, or both. Take a look:
And here is the result on the server computer:
This is really usable for load testing to stress the poor server out until it cries for medical help! But wait! I am not at the server! How can I see my CPU and RAM spikes? Not to worry about things like physical locality (since we are not fully enlightened beings yet and must crawl in those damn bodies of ours that depend on stupid physics!), we have included another interesting command-let "PERF":
I was "toiling" the server from another console session, RAM graph got really high, then I ran "GC" and RAM went down!

Conclusion

What is remote application console - it is a simple feature that one can not expect to build a reliable cluster-enabled system without, where there are 1000s of machines. Yes, there is SSH and other gooddies, but they are NOT APPLICATION level, they are OS level - and that is where we hugely differ from the rest in our approach, our whole framework is cloud-aware, so our services like application containers and other components are built for the cloud from day one. Remote Application Console is a living testimony to this.

Tuesday, October 8, 2013

NFX: Native Interoperability of .NET with Erlang

One of the components included in the NFX library under NFX.Erlang namespace is a set of classes that represent Erlang language types and the connectivity transport that allows to start an Erlang distributed node in a running .NET process and perform demultiplexed communication with other Erlang nodes in the network. This includes ability to redirect I/O, perform RPC calls, etc.

The component can be logically broken down into two parts:

  • Erlang types
  • Erlang distributed connectivity and message passing

The following basic .NET types (NFX.Erlang namespace) map to the corresponding types in Erlang language, which all implement NFX.Erlang.IErlObject interface:

  • ErlAtom
  • ErlBinary
  • ErlBoolean
  • ErlByte
  • ErlDouble
  • ErlList
  • ErlLong
  • ErlPid
  • ErlPort
  • ErlRef
  • ErlString
  • ErlTuple
  • ErlVar

Most of these types are structs - i.e. they are merely wrappers around corresponding native types that carry no additional memory or performance overhead. These types are instantiated in the intuitive manner:

var n = new ErlLong(1000);
var a = new ErlAtom("abc");
var l = new ErlList(n, a, 100, 10.5, "str");
var t = new ErlTuple(n, a, l);

Most of Erl native types support implicit casting:

ErlLong   n1 = 1000;
ErlAtom   a1 = "abc";
ErlString s1 = "efg";
ErlDouble d1 = 10.128;
ErlByte   b1 = 10;

int       n2 = n1;
string    a2 = a1;
string    s2 = s1;
double    d2 = d1;
byte      b2 = b1;

There are string extension methods that allow to parse strings into Erlang terms:

IErlObject t0 = "{ok, [{a, 10}, {b, good}, {c, 2.0}]}".ToErlObject();
IErlObject t1 = "{ok, [{a, ~i}, {b, ~w},   {c, ~f}]}".ToErlObject(10, new ErlAtom("good"), 2.0);
ErlList    l0 = "[a,  1, c]".To<ErlList>();
ErlList    l1 = "[a, ~w, c]".To<ErlList>(1);

What can you do with Erlang objects (terms)?

The most useful thing you can do with Erlang types is pattern matching.

The basic idea behind pattern matching is that you can overlay a pattern over an Erlang object, such that the pattern can extract values of sub-objects and bind them with variables.  In order to familiarize yourself with pattern matching, we'll introduce another .NET type, called ErlVarBind. It is actually a dictionary mapping variable names to ErlObject's.

In order to illustrate the execution of the code snippets below, we can use the LINQPad program. Once you install it, open it in the "C# Statement(s)" language mode, and add the reference to NFX.dll by right-clicking the query section, going to "Query Properties", and adding NFX.dll to the list of "Additional References", and "NFX.Erlang" to the tab of "Additional Namespace Imports". After having done that, type the following code in the query window, and press "F5" to execute:

var V = new ErlVar("V");           // Create a variable named "V"
var p = "{ok, ~w}".ToErlObject(V); // V is stored as a variable that can be bound
var t = "{ok, 123}".ToErlObject(); // Erlang term to match

ErlVarBind b = t.Match(p);         // Match a term against the pattern
                                   // ErlVarBind is a dictionary of bound variables
if (b != null)
    Console.WriteLine("Value of variable {0} = {1}", V.Name, b[V].ValueAsInt);

When a match is not successful, IErlObject.Match() call returns null. Each Erlang object has a set of properties to retrieve their .NET native value. These properties are called ValueAs{Type}, where {Type} is .NET specific type, such as Int, Double, Decimal, etc.

Erlang terms can be serialized into Erlang External Binary format using NFX.Erlang.ErlOutputStream and NFX.Erlang.ErlInputStream:

var x = "{ok, [{a, 10}]}".ToErlObject();
var s = new ErlOutputStream(x);
Console.WriteLine(s.ToBinaryString());
// Output:  <<131,104,2,100,0,2,111,107,108,0,0,0,1,104,2,100,0,1,97,97,10,106>>

Analogously we can deserialize the binary representation back into the corresponding Erlang object:

var i = new ErlInputStream(new byte[] {131,104,2,100,0,2,111,107,108,0,0,0,1,104,2,100,0,1,97,97,10,106});
Console.WriteLine(i.Read().ToString());
// Output:  {ok,[{a,10}]}

Distributed Erlang: working with remote nodes

In order to illustrate how we can connect a .NET program to an Erlang node, let's fire off an Erlang shell, and give it a security cookie hahaha to be used by inter-node authentication:

$ erl -sname r -setcookie hahaha
(r@pipit)1>

Let's try to connect to this Erlang node and send a message from .NET to Erlang. In order to accomplish this we'll register a named mailbox in the Erlang shell, called "me", and start waiting for incoming message:

(r@pipit)1> register(me, self()).
true
(r@pipit)2> f(M), receive M -> io:format("Got message: ~p\n", [M]) end.

Now, we can send a message from .NET to this process on the Erlang node:

var n = new ErlLocalNode("abc", new ErlAtom("hahaha"));
n.AcceptConnections = false;   // Don't accept incoming connections
n.Start();

var m = n.CreateMbox("test");
n.Send(m.Self, remoteNode: "r@pipit", toName: "me", new ErlString("Hello!"));

What we've done here, we created a .NET local Erlang node called "abc" using the same authentication cookie 'hahaha', and started it. We instruct the node not to register with the local port mapping daemon (epmd process), so that other nodes cannot connect to it by name. Then we created a named mailbox "test". We used this mailbox in order to send messages (sending messages to remote named processes (mailboxes) requires to have a PID of the sender (m.Self)), make an rpc call, and capture the result.

After executing the code above, the Erlang shell prints:

Got message: "Hello!"
ok

Let's try to obtain the UTC time from the Erlang node it by making an RPC call from .NET to erlang:now() function:

var r = m.RPC("r@pipit", "erlang", "now", new ErlList());
Console.WriteLine("Remote time: {0}", r.ValueAsDateTime.ToString());

We used previously registered mailbox "test" in order to make an rpc call and capture the result. Once the call returned, we output the content to console. .NET outputs:

Remote time: 10/9/2013 3:29:47 PM

Along with the synchronous RPC, it's possible to do asynchronous calls. Let's illustrate by example (we reuse the variables from preceeding example):

// The following call is non-blocking - it sends an RPC message
// and returns immediately. Note that ErlList.Empty is analogous to
// "new ErlList()"
m.AsyncRPC("r@pipit", "erlang", "now", ErlList.Empty);

// WaitAny call can take several mailboxes and it returns the index
// of the first mailbox, whose queue has some messages waiting
int i = n.WaitAny(mbox);

if (i < 0)
{
    Console.WriteLine("Timeout waiting for RPC result");
    goto exit;
}

// This call fetches an RPC result from the mailbox
r = m.ReceiveRPC();

Console.WriteLine(
    "AsyncRPC call to erlang:now() resulted in response: {0}",
    r.ValueAsDateTime.ToLocalTime());

Let's beef this up a little by writing a loop that will pattern match all messages received in our "test" mailbox, and print them out to console. When we receive the atom 'stop' we should exit the loop:

bool active = true;
var matcher =
    new ErlPatternMatcher {
        {"stop", (p, t, b, _args) => { active = false; return null; } },
        {"Msg",  (p, t, b, _args) => { Console.WriteLine(b["Msg"].ToString()); return null; } },
    };

while (active) {
    m.ReceiveMatch(matcher);
}

Console.WriteLine("Done!");

Here we introduced another class ErlPatternMatcher that takes an array of actions, where the first item is the pattern to match the incoming message against, and the second item is a lambda function receiving four items: the matched pattern, the Erlang object that was used to match against the pattern, the ErlVarBind map containing matched/bound variables from the patterns, and finally the _args is that array of optional parameters that can be passed to the ErlMbox.ReceiveMatch() call.

Let's test this in the Erlang shell:

(r@pipit)1> {test, ab@pipit} ! "Hello".

    LINQPad's output panel prints: "Hello"
(r@pipit)2> {test, ab@pipit} ! {test, ab@pipit} ! {ok, [1,2,{data, [{a, 12}]}]}.

    LINQPad's output panel prints: {ok,[1,2,{data,[{a,12}]}]}
(r@pipit)2> {test, ab@pipit} ! stop.

    LINQPad's output panel prints: Done!

The example above illustrated RPC calls from .NET to Erlang. However it is also possible to do the reverse. In current implementation an ErlLocalNode starts a dispatching thread per connection that dispatches incoming Erlang messages to corresponding mailboxes. Messages sent to non-existing mailboxes get silently dropped. One of the internal registered mailboxes created on node's startup is the RPC mailbox called "rex". There's a simple Erlang RPC protocol that .NET implementation supports, which makes it possible for Erlang to invoke static member functions in .NET. Here we illustrate a call from the Erlang shell into the .NET node in order to obtain the local time from .NET:

(r@pipit)3> f(Time), {ok, Time} = rpc:call(ab@pipit, 'System.DateTime', 'UtcNow', []), calendar:now_to_local_time(Time).
{{2013,10,8},{1,24,26}}

Having illustrated RPC, we now show how to do I/O redirection. Suppose you do an RPC call from .NET to Erlang, and you want to make sure that all output printed by that call via io:format() and alike functions is sent back to .NET. This is accomplished by the fact that .NET node runs another server thread that polls for data in a special mailbox registered by name "user". This mailbox is also accessible via ErlLocalNode.GroupLeader. All RPC calls by default pass that mailbox information, so that remote nodes could deliver the output there. Let's illustrate:

var n = new ErlLocalNode("d") {
    OnIoOutput = (_encoding, output) =>
        Console.WriteLine("<I/O output>  ==> Received output: {0}", output)
};
 
n.Start();
var m = n.CreateMbox("test");
var c = n.Connection("r@pipit");
var r = m.RPC(c.RemoteNode.NodeName, "io", "format", new ErlList("Hello world!"));
Console.WriteLine("Result: {0}", r.ToString());

When we execute this code, here's what gets printed:

<I/O output>  ==> Received output: "Hello world!"
Result: ok

Runtime configuration

Erlang node implementation supports a powerful concept of NFX framework configuration. In order to auto-configure a .NET application to start an Erlang node at application startup, we need to define a starter configuration section, and provide node startup details:

nfx
{
    starters {
        starter{ name="Erlang" type="NFX.Erlang.ErlApp" }
    }
  
    erlang
    {
        cookie="hahaha"

        node="me" {
            trace="wire"
            accept=true
            address="localhost" // address="127.0.0.1:1234"
            tcp-no-delay=true
            tcp-rcv-buf-size=4096
            tcp-snd-buf-size=4096
        } 

        node="r@localhost" {
            tcp-no-delay=false
            tcp-rcv-buf-size=100000
            tcp-snd-buf-size=100000
        }
    }   
}

The "starters" section in NFX contains a list of static starter types that implement IApplicationStarter behavior. For Erlang node that is implemented by the "NFX.Erlang.ErlApp" type.

Next we define the "erlang" section that tells to use cookie "hahaha" for connecting to distributed Erlang nodes. It also defines a node "me" to be used as the local node name (since it doesn't have the "@hostname" suffix), which will be accessible through a static singleton variable ErlApp.Node. Also at startup the local node "me" will connect to remote node "r@localhost", whose connection configuration details are customized under 'node="r@localhost'.

The local node "me" will register with EPMD daemon, and accept incoming connections from other nodes ("accept=true"), it will enable debug tracing to print wire-level messages ("trace=wire") if ErlApp.Node.OnTrace event has been set in the application code.

The NFX application startup code and sample simplistic implementation that prints out messages received by the "test" mailbox then looks something like this:

static void Main(string[] args)
{
    Configuration argsConfig = new CommandArgsConfiguration(args);

    using(new ServiceBaseApplication(args, null))
        run();
}

static void run()
{
    var mbox = ErlApp.Node.CreateMbox("test");

    while (App.Active)
    {
        var result = mbox.Receive(1000);
        if (result != null)
            Console.WriteLine("Mailbox {0} got message: {1}", mbox.Self, result);
    }
}

Conclusion

NFX implementation of Erlang terms and distributed transport provides a rich set of primitives needed to communicate with applications running Erlang nodes very efficiently with minimal memory with processing overhead.

The implementation takes advantage of modern C# concepts that include Linq, Enumerators, and other paradigms that make writing distributed systems a very pleasant experience.

NFX Erlang is a complete rewrite of its predecessor otp.net. Current version eliminates all deficiencies of the former inherited from initial auto-conversion from corresponding Java code, contains a much cleaner and simpler conceptual model, and gives a .NET programmer a very powerful tool of exploring Erlang interoperability.

Wednesday, September 18, 2013

NFX Online Documentation Released

A Beta version of documentation released for .NET Framework Extension Project here: NFX Library Documentation.

The name "NFX" stands for ".NET Framework Extension", there is also "JFX" and "*FX" in works. All of them form a "Unistack". "Unistack" = "Unified Software Stack", a conceptually-monolithic library that facilitates whatever service/facilities developers need to create scale-able, solid business/data-driven application systems (not just applications). As such, the unistack support many facilities that traditionally are scattered among myriads of disjoint software frameworks.

NFX is a application-system development framework that addresses the following concerns in a UNIFIED (as in the "same pattern") way. Oh, and one more thing. NFX does not use any 3rd party libraries for the services it provides, except for things like database driver (i.e. MongoDB driver). So, NFX is a self-contained library (but for DB drivers) which is less than 2 mb in size compiled that provides: these functions

NFX Library Documentation.

Wednesday, August 21, 2013

What is NFX? What is Unistack?

The name "NFX" stands for ".NET Framework Extension", there is also "JFX" and "*FX" in works. All of them form a "Unistack". "Unistack" = "Unified Software Stack", a conceptually-monolithic library that facilitates whatever service/facilities developers need to create scale-able, solid business/data-driven application systems (not just applications). As such, the unistack support many facilities that traditionally are scattered among myriads of disjoint software frameworks.

What is NFX?

NFX is a application-system development framework that addresses the following concerns in a UNIFIED (as in the "same pattern") way. Oh, and one more thing. NFX does not use any 3rd party libraries for the services it provides, except for things like database driver (i.e. MongoDB driver). So, NFX is a self-contained library (but for DB drivers) which is less than 2 mb in size compiled that provides:
  • Application Container - no counterpart in .NET framework
    • Provides a unified model (the same way of working with) for all application types: console, web, forms, service
    • Provides central hub for all application services: log, instrumentation, glue, throttling, security, zoned time etc.
    • Dependency injection container
    • Policy/Behavior injection container
    • Configuration management
    • Big Memory object database client - promotes stateful in-memory programming model for web and other server applications. Supports virtual out-of-process heaps of native CLR objects, supported by custom-purposed CLR-specific serialization mechanisms, with ability to transform object memory field structure between software upgrades
  • Code Analysis - no counterpart in .NET framework
    • Promotes run-time code/textual analysis that is very useful for pattern matching, serialization of dynamic formats and code inventorization
    • Provides general abstraction for languages, their lexers, parsers and semantic analyzers
    • Facilitates code analysis, provides pattern-matching language polymorphic Finite State Machine implementations
    • Organically supports non-copying serializers/deserializers of various text-based dynamic formats (XML, JSON, Erlang tuple stream, Laconic, etc.)
  • Data Abstraction Layer - replaces System.Data, ORMs (nHibernate, LINQ to SQL, Entity Framework)
    • Decouples business code from particular backing store implementations, supports SQL and NoSQL backends in the same way
    • Scaffolds RDB and NoSQL data, based on meta-information
    • Generates CRUD SQL automatically according to "Convention-over-Config" pattern, with ability to override any underlying provider statement
    • Abstracts non-homogeneous/hybrid data stores in a unified data-access service available within the application container
    • Support as of right now: PostgreSQL, MsSQLServer, MySQL, MongoDB. Planned: Oracle, DB2, Riak, CouchDB, Redis, Mnesia
    • DataStore providers may utilizes NATIVE CAPABILITIES of the target backend, developers can use backend-proprietary features without sacrificing performance (i.e. use CONNECT BY in ORACLE)
    • Ability to optionaly implement Query handlers in code i.e. when particular backend can not perform some action using built-in command
  • Environment / Configuration - compensates for .NET configuration framework deficiencies described here
    • Format-abstract configuration tree in memory
    • Non-file-based configurations (i.e. database configuration, cluster configuration, command-args-based configuration)
    • Supports XML and Laconic configuration file formats by default
    • Embed-able in Microsoft configuration system (i.e. web.config)
    • Full support for variable evaluation within configuration, path concatenation support (slashes)
    • Node navigation a la XPath (but works on any format including JSON and INI files)
    • Full support for imperative macro execution within configuration (IF, LOOP, SET)
    • Structural configuration merging, overrides, with rules (allow/stop/deny)
    • Aspect injection with Behaviors
    • Host environment variables support
    • APIs to configure classes (fields/properties) using attributes declaratively or imperatively
    • Plug-able macros for variable and function evaluation (i.e. DATE= :NOW)
    • Dependency injection class factory utils integrated
    • Unified way to configure any compoent in NFX (be it logger, CRUD data store, MVC, or anything else)
  • Inter-node/process Communication / Distributed Programming / Glue - replaces .NET WCF for cluster/grid programming
    • Promotes service-oriented architecture
    • Contract-based design
    • Synchronous and Asynchronous TCP/IP implementations (based on completion ports)
    • Extendable bindings - define protocol and channel-level interface
    • OneWay method calls, broadcasts
    • Blocking synchronous or asynchronous clients with CallReactor support
    • Message inspectors server and client-side
    • Contract / Server/Method level security support (authentication and authorization)
    • Allows for 100% stateful server programming with BigMemory support (in addition to typical stateless architecture)
    • Detailed instrumentation - bytes/transactions/calls/failures per node/binding/endpoint/contract/method
  • Instrumentation - much more flexible than .NET performance counters (i.e.write into cluster server or NoSQL db or file)
    • Events, gauges, levels, typed classes - Datum-based type safe classes checked at compile time
    • Multiple classifications of instruments (i.e. MyDatabaseFreePrimaryDisk = MyDatabaseIsntrument, IOInstrument, DataVolumeInstrument, etc. )
    • Asynchronous transparent acquisition of data - non performance affecting
    • Plug-able instrumentation backends (i.e. MongoDB, log file, Aum Cluster Server)
    • Application container log integration
    • Multidimensional histograms
  • Code Inventorization - no counterpart in .NET framework
    • Promotes decoration of types and methods with Inventory attribute
    • Allows to automatically discover various components of the system based on their logical classification (i.e. deployment risk)
    • Allows for creation of various automation tools that may generate/transform code based on some other code/metadata provided by inventorization
    • In cloud/cluster framework allows for automatic registration of components, services and their coupling
  • Various IO Formats - no such concept in .NET framework
    • Provides polymorphic StreamerFormat-derived classes that allow for efficient reading/writing of various binary terms
    • Supports unified parsing of binary format such as Slim, Erlang OTP etc.
    • Supports native Hadoop binary formatting (protocol)
  • Logging - no built-in library in .NET framework (replaces MS EntLib, Log4Net, NLog, etc.)
    • Asynchronous logging, priority logging
    • Sinks: MsSQLServer, SMTP, CSV file, debug file, MongoDB, Composite, AsyncComposite, *nix Syslog
    • Filters: Level ranges, days of week, hour ranges, date ranges, message patterns, custom filters (injectable filter statement)
    • SLA, Failures, Failovers (when A fails, log to B)
    • Flood prevention (log message time latch)
    • DEBUG/TRACE integration
    • Instrumentation integration
  • Record Model - MVVM not only for Web - no built-in library in .NET framework
    • Unified model for: Web, console, Forms, or anything else (Wpf, Silverlight etc.)
    • Allows developers to write complex business logic without considering the particular view/UI engine
    • Supports complex lookup scenarios - foreign keys selectable from a different model/screen
    • Various levels of validation: record-level, field-level, deferred
    • Field attributes: Enabled, Visible, Applicable, ReadOnly, Validated, Valid, Modified, Description, Watermark... and many more
    • Field validation attributes: Required, Default, Min/Max checking, Regexp, US Zip/Phone, Lookup Dictionaries, Char Casing, Passwords
    • Automatic DataStore CRUD (no need to write any SQL or NoSQL commands to load/save records) with ability to override default behaviors
    • Stateful programming - models may be retained between calls in BigMemory heap allowing for 100% stateful programming in volatile environments (i.e. on the Web)
    • Custom validation scripts for various targets (i.e. field/record-level validation JavaScript for browser)
  • Relational Model - no built-in library either .NET or any other major framework
    • Declarative well-structured programming language for database design (not only for RDBMS)
    • Database schema becomes parsable configuration file - i.e. can generate code, analyze dependencies etc. Can not parse ORACLE specific DDL now to generate C# code? How about MongoDB?(there is no DDL in MongoDB), however there is RelationSchema in NFX which is the same source be it ORACLE, MongoDB, Google BigTable or anything else that you may to want to model against
    • Pluggable backend compiler generates DDL/script for particular backends
    • Supports: tables, check constraints, keys, indexes, foreign keys, comments, verbatim scripts, domains, identities
    • Allows to execute macros - i.e.create many tables in a loop and make them inter-dependent
    • Outputs DDL statements of various kinds in either different or the single files
    • Supports data domains (even if backend does not support it) with pick-lists and range checks
    • Supports identities that result in different entities (Sequence in ORACLE, IDENTITY in MsSQL etc..)
    • Supports delta schema generation
  • Security - built-in support in .NET is very weak and hard to deal with (especially when there are 100s of permissions to check). .NEt permissions are called "Roles" and basically are boolean flags. In NFX permissions are complex vectors that may support integer levels of access along with any number of custom flags (i.e. "allow to print invoice, but hide The co-payor names")
    • Integrated with all components of NFX
    • Unified security model for: web, console, service, forms, wpf, etc.
    • Declaratively guard mathods/classes/actions, or imperatively check access levels by hand
    • Typed permissions, no need to type strings, type classes instead and get a compile-time error if mispelled
    • Imperative permissions, write addition security assertions in the permission - inversion of control principle
    • Designed to handle 1000s of permissions/grants per user - this is needed for systems with many screens where every field/button may need to be protected
    • Smart security descriptor caching /invalidation
    • Credentials/Identity/Tokens marshalling, transparent impersonation of users on remote hosts via Glued endpoints
    • Built for modern web/distributed systems: Integration with OAuth, Twitter, Facebook, LinkedIn and others using flexible Credentials class (that you can derive from)
  • Serialization - replaces various 3rd party libs working with JSON. Replaces DataContractSerializer for internal Glue programming
    • Slim Serialization - supports efficient binary serialization of CLR types. Implementation uses dynamic compilation of expression trees that yields much better (up to 15 times faster) performace than binary formatter. No need for data contracts.
    • JSON Serialization - very fast reading/writing objects without extra string copies. Perform JSON pattern matching over lexer-provided token stream (no need to parse the content if match does not happen). Output objects into JSON in a custom way (IJSONWritable).
    • Erlang native binary term serialization
    • Portable Object Documents - a format to express complex (including cyclical) native object graphs in normalized way (i.e. without reference cycles). Change object version and migrate objects using transforms
  • Service Model - no counterpart in .NET
    • Promotes a concept of a light-weight service - a "process" that may or may not own a thread/threads
    • Integrates in dependency injection application container. All major components in NFX are services
    • Provides message pumps/queues for asynchronous parallel programming
  • Templatization - in .NET only text templates are implemented for web an VS templates
    • Allows to create templates for any content generation (not only text, i.e. may create templatized images)
    • Provides text-based engine for generic document generation, and web documents. Performs 25% faster than classic ASP.NET pages and up to 40% faster than Razor
    • Web: may be served as pages or MVC views. Templates are 100% embed-able in DLL, no need to deploy files. No one will mend web page files by hand on a 100 server farm
  • Throttling - no counterpart in .NET
    • Declarative/imperative control of throughput
    • Time sliding/spacing throttles
    • Execution quotas (transactions per second, CPU% etc.)
  • Time Services - no counterpart in .NET as there are no "service" and "application" concepts
    • Facilitates distributed cluster-enabled programming in regions that span many time zones
    • Time Zoning in the application container. All components of the container may work in pre-configured zones (i.e. log time, transaction time, local time etc.). Zones defined by policies (possibly cluster-global)
    • Injectable external time sources (high precision/remote clocks)
    • Inter-component Time zone conversions
  • Web-specific MVC (Model View Controller) - supersedes ASP.MVC
    • Tiny implementation (less than 15 classes)
    • Routing support, controllers, actions, parameter binding, JSON support
    • 100% integrated with security/authorization/authentication - just tag action methods with permission attributes (that may be typed)
    • May use MVC without routing
    • Supports 100% stateful controllers (in addition to traditional stateless architecture) - controller instances may live in BigMemory heaps
    • Declarative control of controller lifecycle, timeouts, security, Http constraints etc.
    • Automatic payload parsing/assignment toform action methods
    • Integrated support for NFX.Templatization (return typed views etc.)
  • Web controls for RecordModel MVVM (auto building controls) - no similar concept in .NET or other major frameworks
    • Utilize metadata
    • Think about records and fields, not about divs and CSS
    • Override by hand in custom cases
    • Field views and grids
    • We support foreign keys (when they have to be looked-up from other screens with 100000 rows )
    • Declarative fields placement in HTML template markup
  • Windows forms controls for RecordModel - no similar concept in .NET or other major frameworks
    • Very similar to the controls on the Web
    • Utilize metadata
    • Thinks about records and fields, not boxes/buttons and style properties
    • Override by hand in custom cases
    • Field views, grids and forms
    • We support foreign keys (when they have to be looked-up from other screens with 100000 rows )

Wednesday, August 14, 2013

Aum Configuration as Facilitated By NFX

Problems with "built-in" approaches

Configuration is just another pain that everyone is constantly struggling with. I have never liked the .NET built-in configuration mechanisms. Why? Because, it does not have the so-needed functions that developers (we) have to compensate for daily. Although a typical .NET or Java developer is used to working with those limits, it is time to reconsider.

It is a fact of life - that many systems to this day (in 2013!!!!) use INI-like files with hand-written primitive string parsers. It is a total fiasco when it comes to supporting config files for DEV, TEST, SANDBOX environments.

Here, I have compiled a small list of things lacking in general config frameworks I have been working with in the past 15 years:

  • Centralized network configuration - what if I have to configure 10 servers? Do I copy files? How to keep all configs in one place, say SQL db?
  • Absurdly complex file locations, I once spent 1 hr trying to find the app.config file for some desktop app of mine on Windows Vista computer as installer put the file in the abyss or Profile..... folders
  • Microsoft .NET configuration framework has more than 100 classes, it is very complex yet very inflexible. Many people parse text files by hand
  • Inability to evaluate variables, i.e. in .NET there is no variable support in configuration, one must parse it by hand. INI files lack it. Registry is even more mess and hard to deal with
  • Painful APIs, i.e. an attempt to read a non-existing node must be always precluded by IF statements in .NET. Int values are hard to get as bool, dates as numbers etc...
  • Absence of unified configuration tree that could be hydrated from different sources, be it XML, INI, JSON, or even command-line args
  • And finally - the absence of unified configuration application - every component needs to configure itself "by hand"

Welcome to NFX Configuration

NFX library provides a unified configuration approach by providing:

  • Format-abstract configuration tree in-memory
  • Support for navigation (similar to x-path) within the tree
  • Variable evaluation, node inter-referencing, infinite cycles are detected and stopped
  • Environment variable (external vars) evaluation built-in
  • Pluggable variable value providers
  • Structural merging, overrides with rules. Prohibition (sealed sections) of overrides
  • Pluggable variable evaluation macros (i.e. ::NOW)
  • Support for XML, Laconic, Command-Line Args formats
  • Full support for imperative constructs (macros) - loops, vars, Ifs, blocks
  • Unified model to configure classes,services, properties,fields etc. from configuration
  • Aspect injection with configuration Behaviors - named kits of config values that may be applied to different nodes indirectly. This approach addresses cross-cutting concerns on the configuration level
  • Multiple getters for different nodes and data types (i.e. ValueAs: String/Date/Enum/Int.....) with defaults

Example of an XML-based configuration:

    

    

        

        

        
    
   

Example of a Laconic configuration with the same tree as the above:

    nfx
    {
       log-root="c:\nfx\"
       log-csv="NFX.Log.Destinations.CSVFileDestination, NFX"
       log-debug="NFX.Log.Destinations.DebugDestination, NFX"
       debug-default-action="LogAndThrow"
       debug-conf-refresh="false"
       app-name="test-client"
    

     log{
         name="Logger" default-failover="destFailures"

        destination { type="$(/$log-csv)"
             name="$(/$app-name)"
             filename="$(@/$log-root)$(::now fmt=yyyyMMdd)-$($name).csv.log"
             create-dir="true"
             min-level="Info" }


        destination{  type="$(/$log-csv)"
             name="$(/$app-name)-perf"
             filename="$(@/$log-root)$(::now fmt=yyyyMMdd)-$($name).csv.log"
             create-dir="true"
             min-level="PerformanceInstrumentation"
             max-level="PerformanceInstrumentation" }

        destination{ type="$(/$log-debug)"
             name="$(/$app-name)-debug"
             filename="$(@/$log-root)$(::now fmt=yyyyMMdd)-$($name).log"
             min-level="Debug"
             max-level="TraceZ" }

    }
   }//nfx - notice the use of comments

Example of an command-line configuration used to inject some compiler settings. Yes, same framework for that:

    gluec "c:\mysrc\contacts" -assemblies NFX.dll MyBusiness.dll
                              -out "c:\templates\"
                              -opt single-file=true crlf=dos comments=false
                                   override=true base=MyPages.SimplePageBase
                              -sign name="Dmitriy" date=now 

Because all of the things above come down to the same tree in memory, we can use those completely disjoint ways of specifying settings in the same way to configure classes in code like so:

    /// 
    /// Implements log destination that sends emails
    /// 
    public class SMTPDestination : Destination
    {
     ......
        [Config("$smtp-host")]
        public string SmtpHost { get; set; }
        [Config("$smtp-port")]
        public int SmtpPort { get; set; }

        [Config("$smtp-ssl")]
        public bool SmtpSSL { get; set; }



        [Config("$from-address")]
        public string FromAddress { get; set; }
        [Config("$from-name")]
        public string FromName { get; set; }
    .......
    }

The configuration is a tree in memory, but how do we bind it to the actual code structures (properties, fields etc..)? How do we use it in our code? For that we have a number of ways.

  • Of course you can write regular code to bind any configuration value into any variable at runtime.
  • You can create Settings- derived type safe class that wraps configuration tree in a type-safe way. This is needed primarily for performance reasons when some tight code block may suffer from frequent access to text-based values that involve string parsing
  • You can use ConfigurationAttribute.Apply(object, section) method to apply config section data to some object
  • You can implement IConfigurable.Configure(section) in your class and apply section data by code to your class. This is useful for handling dynamic configuration structures when, for example a "parent" class manages many subordinate child classes that are polymorphic and may be injected by configuration

This topic organically touches inversion-of-control (really dependency injection) container of NFX.ApplicationModel. In Aum (and NFX) any process is hosted in a IApplication derivative, which injects all of its services into its own properties. There are many cases when one needs to configure their own particular components, for that FactoryUtils class is used which "manufactures" object instances as specified by the configuration and checks certain contract assertions supplied:

  /// 
  /// Creates and configures an instance of appropriate configurable object
  /// as specified in supplied config node. Applies configured behaviors
  /// 
  public static T MakeAndConfigure(IConfigSectionNode node, Type defaultType = null,
                                      IApplication application = null, object[] args = null)
            where T : IConfigurable
  ...............

  protected override void DoConfigure(IConfigSectionNode node)
  {
     base.DoConfigure(node);

     foreach (var snode in 
                  node.Children.Where(n =>
                                        n.IsSameName(CONFIG_SINK_SECTION)
                                     ))
         RegisterSink( FactoryUtils.MakeAndConfigure(snode) as LogSink );
 }

Aum Cluster Configuration

Aum Clusterware is built on top of NFX (on CLR platform), so it is all based on the NFX.Environment.Configuration classes, however it takes that configuration capabilities to the next level - the hierarchical configuration tree that the cluster is configured from. Because of it, we do not need to maintain configuration for 1000s of servers, we project configuration segments down the tree to arrive to final configuration section that is built for particular node/service/end point. We do come all the way down to endpoint level.

This topic definitely deserves a post of its own that i'll do in the next month.

Monday, July 22, 2013

Aum Security Permission Model

Overview

It is not a secret that application security is not a simple topic. In this post I will concentrate on the Authorization API side of it - something that is usually overlooked at the original design phases of most applications, then added later.

Aum framework uses permission-based security where named permission sets are called "roles", in other words - Aum uses role-based security at the superficial level, that goes deeper down to permission level. This approach is much more granular than typical frameworks like ASP.MVC because we go down to permission level when we run methods, show pages, glue contracts etc.

What are permissions?

Permission are pieces of security-addressable functionality. We support two kinds of permissions: typed and ad-hoc permissions. Typed permissions are specified in code, and by definition, their namespace and class name establish a presence in security data space of authorization store. Ad-hoc permissions are not typed and must specify their string name and path.
    [Glued]
    [AuthenticationSupport]
    public interface ITestingContract
    {
      [AdHocPermission("/Testing/CategoryA", "Echo", AccessLevel.VIEW_CHANGE)] //adhoc permission
      string Echo(string text);

      [OneWay] 
      [NotificationsPermission] //typed permission
      void Notify(string text);

      object ObjectWork(object dummy);
    }

In most existing systems, permissions are specified as string literals (ASP.MVC) - similar to AdHocPermission in Aum. The advantage of this approach is simplicity for applications that do not need many security-addressable/guarded functions. However, when applications start growing big, it is not fun to keep repeating the same permission name in UI, Web server, App server and maybe 10 other places. Typed permissions solve this by providing a type-safe check at compile time - if you mistype you get an error.

Another great benefit of typed permissions is their inherent imperative nature.

Imperative Typed Permissions

Imperative permissions are typed permissions that specify permission/usecase-specific security conditions in their constructor and may override Check method. Think about it this way - a permission is not a boolean flag anymore, it has a logic of its own. This is a form of refactoring - instead of writing IF statements in every place where we check the permission, we rely on permission to do this work. Ultimately, the act of authorization obtains this boolean flag - PASS/FAIL, but how does it obtain it? Having a boolean flag in your permission store is sometimes not enough as you need to write more IF statements that check more things.

Lets take a concrete example. Suppose we are building a point-of-sale (POS) application for a distribution club (like COSTCO) where every customer is known at the point of checkout. Suppose, we have some club member roles/levels like: SILVER, GOLD, PLATINUM - where every role defines a set of say 100 permissions. "AlcoholCheckout" is one of them. This permission is granted for GOLD level roles and up. What does this mean? This means that an under-aged customer may buy alcoholic beverages if he/she has earned enough membership credits.

How is this problem addressed? Checking just for permission grant is not enough here, as the final authorization decision is based on customer's age. In Aum we would do an imperative typed permission check like so:

        
    public class CheckoutPerformer
    {
      [AlcoholCheckoutPermission]//notice: no extra code here
      void CheckoutAlcohol(CartItem item);

      ................ 
    }


    public class AlcoholCheckoutPermission : TypedPermission
    { 
         public override bool Check()
         {
             return base.Check() && Session.User.Age > 21;
         } 
    }

What we could also do here, instead of checking against a constant (21), we could have looked up legal alcohol sale age limit for the user, depending on his/her locality. This approach allows us to use the same simple authorization schemes in the most complicated scenarios. Having moved the Check logic into a typed permission class we are no-longer limited by boolean checks. For example - we can now inject minimum desired AccessLevel at the point of application which is guarded.

        
    //In this example we supply security assertion in constructor call
    //Only registered voters can donate if they have enough access level
    [PageTemplatePermission(AccessLevel.VIEW_CHANGE, UserKind.RegisteredVoter)]
    public class MayorReElectionDonationPage : WebTemplate
    {
        ................ 
    }

    //Same permission as above. Check is not user-dependent
    [PageTemplatePermission(AccessLevel.VIEW)]
    public class RallyInvitationPage : WebTemplate
    {
        ................ 
        
        //This will not be called if authorization assertion of linked page fails
        [LinkPagePermission(typeof(MayorReElectionDonationPage)] 
        private donationSection()
        {
           ...... emit URL link to MayorReElectionDonationPage
        }  
    }


Sunday, February 3, 2013

NFX / Aum Cluster - The Philosophy

The Problem

The cost of distributed cloud systems development has skyrocketed. Many companies decided to use simple languages like PHP and Ruby that facilitate quick Web site construction but are not suitable for high-computational loads facing big problems down the road (i.e. PHP@Facebook). To this day there is no unified way for organizing/operating clusters of server nodes which would have been as standardized as C APIs for IO tasks. It is currently impossible to create and operate/manage a system instance with features listed below without significant resources / big staff.

There are many cloud-system offerings, however none of them really fit the purpose in my perspective. Those systems can be very broadly be categorized as one of the following:

  • Hardware/Infrastructure management and/or IaaS (Infrastructure as a Service) - Amazon, Mesos (for cluster management), Azure
  • Cloud-based "all-done-for-you" PaaS (Platform as Service) - Google App Engine etc.

Why did I subdivide those numerous offerings this way? Because in my experience, it is either bare metal/OS management that they do for you OR a whole special kind of cloud OS that you need to code against from scratch (i.e. Google App Engine). I can not run Google App Engine locally at home, neither can I use it for a client-server app that my local client wants to have installed in his store. Furthermore, the skill set that is required is completely different from the one many client/server/web-developers have.

In real life it is very hard to build a simple cloud/highly-scalable system. Those offering are great when you read the tech papers but really have very many limitations. For example, Google App Engine requires really special kind of architecture to operate in. One can not take code away from Google App Engine and port it to Azure. Azure, on the other hand, mostly gives you physicial boxes - and some services (like special kind of SQL server) but that is it. .NET framework does not have concepts/classes that would promote the creation of cluster software ( I am not saying that it should). One can not take a console app in C# and run it "in the cloud", because there is no well-defined stipulation of what "run in the cloud" means.

Why Many Startups Face Troubles 2 Years Later?

Many companies/startups elect languages like Python/Ruby or PHP because those languages have a convenient set of libraries allowing them to put some working sites in production very quickly. The problem, however, is that later, when the number of users increases and business requirements start to demand more intense logic, those solutions fail for number of reasons:

  • Usually, the lack of proper architecture of the system as a whole. No consideration may have been given from day one to concerns like user geo affinity, distributed caching, session state management, common configuration formats, logging, instrumentation, management of large installations with many nodes. Developers usually do not consider:
    • That any process (be it web server, app server, tool etc..) need to be remotely controlled in cluster environment so it can be stopped/queried/signaled
    • All tools must be command-line enabled (not UI only), so they can be scripted and executed in an unattended fashion
    • There may be 100s of computers to configure, instead of 1. Are we ready to maintain 100s of configuration files?
    • Time zones in distributed systems, cluster groups, NOCs. Where is time obtained from? What time zone? What time shows in global reports?
    • Any UI element on the screen may be protected by permission (i.e. “Post Charges” button may not show for users who do not have access)
    • Row-based security. Security checks may span not only screens/functions but also some data entities such as rows
    • Web session state may not reside locally (i.e. local disk/memcache) if user reconnects to a different server in the cluster
    • Pushing messages to subscribers/topics. Using appropriate protocols (i.e. UDP). Not thought about when the whole system runs from one web server.
  • Most startups use one central database instance (which is convenient to code against), and have big troubles when they need to split databases so they can scale, because all code depends on central data locations (one connect string used in 100s of classes)
  • The scripting languages (e.g. PHP, Python, Ruby) used for web site implementation are not performant enough for solving general programming problems (try to build a PHP compiler in PHP) involved in high-throughput processing. It is slow for such tasks and was never meant to be used that way. What happens next, is that developers start to use C++ or C where the development paradigm is absolutely incompatible with the one in PHP, complexity keeps increasing as the number of standards internal to the system is increased. You need more bodies to develop this way
  • Security functionality is usually overlooked as well as most applications do not have security beyond user login and conditional menu display on the homepage which depends on 5-10 fixed role checks. Later, businesses need to start protecting individual screens/UI elements with permissions. This usually creates mess in code and eventually precipitates a major re-write. The inter-service security in the backend is usually completely overlooked so any node can call any other node bypassing all checks.
  • The ALM (Application LifeCycle Management) is usually not really though about. “We will deploy and manage changes somehow when we come to it”

Our Vision

We have been dealing with these problems for the past 20 years. We came to realization as far back as 1996 that there is a need for a “Business Operating System”. The main idea is to have a lego-like kit of micro solutions that are very configurable and allow developers to assemble complex systems in no time as the majority of system/architecture/ALM-specific challenges will be solved for you.

When you use Linux or Windows you do not need to understand how files are written to disk. You don’t need to know how video card works. OSes do a great job, but when it comes to business/data-centric apps - there is no similar approach.

These days systems are very much distributed and back-end/cloud based. Inherently, there are many nodes/servers to run your system on, hence we came up with our “Clusterware” concept. One may say that Hadoop does just that, but this is only a slice of Aum Cluster/NFX.

We try to address the whole wide array of different problems, whereas Hadoop mostly concentrates on job orchestration, we do that just as one of Aum Cluster functions.

In a nutshell, Aum Clusterware is a software system that promotes multi-layered/tiered architecture with central coordinator service bus and accompanying set of libraries (less than 5 mbytes total) that unify/provide:

  • Central admin control panel for any admin task
  • Configuration of 10,000s of nodes
  • Inventorization of nodes, services, endpoints, configurations, components, classes and even methods
  • Deployment / Change Management in server farms. Component versioning and distribution
  • Monitoring/Instrumentation/Logging/Alerts
  • Job / Change scheduling
  • Security/Permissions/Modules/Namespaces/Roles
  • User profile management/migration. User grouping. OpenID/Twitter/FB integration
  • Data Partitioning
  • Workload balancing
  • Contract-based service bus. Internal bus uses direct TCP/UDP/ZeroMQ for max throughput
  • Async and Sync messaging and queuing
  • Subscription / Notifications
  • Charge processing
  • Content Management
  • Complex UI construction on Web, Mobile, Desktop purposely build to interact with Aum.RecordModel MVVM
  • Support for SQL and NoSQL databases. Decoupling business logic from database
  • Map:Reduce + scheduling for high volumes of data and/or computations
  • Big memory - stateful web and server instances
  • Full text search
  • Global unique ID generation, obfuscation, validation and resolution
  • Dynamic database partitioning. Home databases. Locality of reference
  • Geo-aware data migrations and replication
  • Social dashboards with profile integration: chat/voting
  • C# code base, interoperable with C++ on Linux, Erlang for complex concurrency tasks

The Philosophy

We want to reduce complexity but without feature loss. The way to achieve this effect is this - reduce the number of standards used in the system. The less standards we need to support/keep in mind/remember - the simpler the whole system becomes. For example: instead of supporting 4 different logging frameworks where one uses this kind of configuration and that uses a completely different set of configuration - we use just one. Once a developer reads logging tutorial he/she can easily understand how logging works in any tier of the whole system.

Another big thing - is runtime/language. I know that I will start a holy war here. Before reading further, please answer the following questions:

  • What primary language/s are Windows and Linux (and others) written in?
  • What language are you web browsers written in?
  • What about databases? Oracle, MySQL, MsSQL, DB2?
  • What about major desktop apps: Photoshop, Office, various Audio and Video editing tools?
  • What about compilers/Interpreters for: C, C++, JavaScript, Java, C#, Ruby, PHP?
For some reason, none of the mission critical software like OS kernels and DB servers, compilers and web browsers are written in PHP or Ruby. It is not because of historical reasons. Has anyone written a new web browser in Erlang, Ruby or Python yet? Of course not.

This is all because Erlang, Ruby, Python, PHP (and 20 others) are specialized tools that simplify some particular aspects of the system architecture/coding, but they all SUCK BIG TIME when it comes to system programming. PHP is not a good tool for database server programming. Ruby was not meant to be used for writing high-performance compilers. So, to build a large sophisticated system one would use 25 different languages for different things. Because all of those disparate components require their own "gurus", configuration/operation standards we decided against those tools/languages.

That did it for us - we had to select a language which is really a general-purpose one, very suitable for system programming and creation of large-scale systems. When I say "system programming", I do not mean "device drivers" and "Linux kernel modules". What I do mean is this:

  • Ability to implement custom data structures efficiently (trees, maps) with execution speed close or equal to C/C++
  • 64 bit support - a must. Must be able to allocate 128Gb ram per process - with no sweat
  • Process model that supports large ram and long execution times (months) without restarts (efficient GC)
  • Process model that supports globals, context globals, threads and lite-weight concurrency models
  • Efficient background/concurrent GC
  • Ability to minimize GC load by using instance pools (or stack-alloced structs in CLR)
  • Fast integer and floating point math. Efficient CPU-register Bit operations
  • Good code modularization
  • Good support for working with strings, especially Unicode compliance
  • Good network stack support - TCP/IP/Sockets
Both JVM and CLR definitely qualify.

Of course no one can touch C and C++ when it comes to efficiency, custom memory allocators and pointer manipulation, but for one tiny defect. Those bare-bones languages are very low-level for the business/data-centric app creation. The lack of good reflection in C++ really kills it for our purposes. Had we taken C++, we would have needed to create a C++ frontend language that would have supported reflection, GC. So basically we would have had to create our own JVM+Java or CLR+C# which is not practical. Instead we decided to use existing CLR/C# or JVM/Java in such a way that our code does not depend on particular features of library bloat that surrounds those platforms, rather we have re-written all base service ourselves, thus bringing a "Unistack" to life.

UNI-STACK = a Unifed Stack of software to get your job done. Use one library instead of 25 3rd parties that all increase complexity. In Unistack everything is named, configured, and operates the same way, thus reducing complexity 10-fold. Unistack was purposely coded to facilitate distributed apps creation, yet (unlike many PaaS) it can be used to write basic client-server apps that all run on one machine without any bloat.

Transaction processing: share nothing, scale horizontally

Configuration/management: share everything or as much as possible

Unify patterns, languages, components

Avoid 3rd parties as much as possible - direct and transitive dependencies

First: reuse, Second: build using Aum/NFX, Third: use open source, Fourth: buy proprietary