Friday, November 1, 2013

Compile and execute a code snippet from your C# program

The problem

Every so often I find myself creating little command line programs that:
  1. Gather a list of items from somewhere
  2. Perform some user-controllable operation on each item
My problem for today is in that user-controllable operation. Somehow I often end up designing and implementing a little mini-programming language.
    > ChangeDocuments "UserName = Frank" "Set RetentionPolicy=30y"

    > ChangeDocuments "Today > ExpirationDate" "Add State=Expired"
The first parameter to this ChangeDocuments program is a quite regular search query that I can feed into the system I am programming against. The system is well designed, so I can feed nicely complex search queries into it:
    RetentionPolicy = 30y AND (ExpirationDate = Null OR AnyOtherField = AnyValue)
When I am testing my little ChangeDocuments program I can clearly see that the query language was designed by people that know that sort-of stuff and parse search queries for a living.

Unfortunately that doesn't apply to me. Which means that the second parameter, the one that I have to implement myself, turns into a mini programming language that gets uglier with every step.

    Set RetentionPolicy=30y
    Add State=Expired
    Clear RetentionPolicy
    Set ExpirationDate=TodayPlus30Years()
Ouch? Did you see that last one? Not only is it ugly, but I'll have to find a way to parse that function call out of there and implement the function. And what if they want a different number of years. Sure, I could write a proper parser for that and implement function calls, call stacks, contexts and all that. But I also need it to be done today and to feel slightly more wieldy then the way I'll end up implementing it. Besides... aren't there enough people that implement programming languages for a living already?

So what I'd really prefer to do, is execute a snippet of C# code. So that the above examples can turn into something more regular, like:

    Set("RetentionPolicy", "30y")
    Add("State", "Expired")
    Clear("RetentionPolicy")
    Set("ExpirationDate", DateTime.Now + TimeSpan.FromDays(30 * 365))
Note that this is still not an ideal language. I'd prefer to have symbols like RetentionPolicy and State to be available like named variables. But short of implementing my own domain-specific language, this is as close as I could get with C# in a reasonable time.

Walkthrough

When I first dynamically compiled code in .NET, I was shocked at how simple this is.

Let's start with this code snippet:

    DateTime.Now + TimeSpan.FromDays(30 * 365)
For the moment, we'll put it in a variable:
     var snippet = "DateTime.Now + TimeSpan.FromDays(30 * 365)";
We'll be compiling the code using .NET's built-in C# compiler. The C# compiler can only handle full "code files" as its input, so the type of thing you'd normally find inside a .cs file.

So we'll wrap our snippet in a little template:

    using System;
    public class Snippet {
        public static void main(string[] args) {
            Console.WriteLine(CODE SNIPPET GOES HERE);
        }
    }
In code:
    var template = "using System;\npublic class Snippet {{\n\tpublic static void main(string[] args) {{\n\t\tConsole.WriteLine({0});\n\t}}\n}}";
    var code = string.Format(template, snippet);
Note that for now we simply write the result of the expression to the console. We'll see other ways to handle this later.

Next up is the bulk of our operation: compiling this code into an assembly.

    var provider = new Microsoft.CSharp.CSharpCodeProvider();

    var parameters = new System.CodeDom.Compiler.CompilerParameters{ GenerateExecutable = false, GenerateInMemory = true };

    var results = provider.CompileAssemblyFromSource(parameters, code);                                                

    if (results.Errors.Count > 0) {
        foreach (var error in results.Errors) {
            Console.Error.WriteLine("Line number {0}, Error Number: {1}, '{2};\r\n\r\n", error.Line, error.ErrorNumber, error.ErrorText);
        }
    } else {
        var type = results.CompiledAssembly.GetType("Snippet");
        var method = type.GetMethod("main" );
        method.Invoke(null, new object[] { new string[0] });
    }
That is really all there is to it.
    10/25/2043 1:21:04 PM
There are tons of additional parameters you can pass to the CSharpCodeProvider, but this minimal set tells it to:
  • generate an assembly, instead of an executable
  • keep the generated assembly in memory, instead of putting it on disk

Complete code

This is the complete code snippet that we constructed so far:
    var snippet = "DateTime.Now + TimeSpan.FromDays(30 * 365)";
    var template = "using System;\npublic class Snippet {{\n\tpublic static void main(string[] args) {{\n\t\tConsole.WriteLine({0});\n\t}}\n}}";

    var code = string.Format(template, snippet);

    var provider = new Microsoft.CSharp.CSharpCodeProvider();

    var parameters = new System.CodeDom.Compiler.CompilerParameters{ GenerateExecutable = false, GenerateInMemory = true };

    var results = provider.CompileAssemblyFromSource(parameters, code);                                                
    if (results.Errors.Count > 0) {
        foreach (System.CodeDom.Compiler.CompilerError error in results.Errors) {
            Console.Error.WriteLine("Line number {0}, Error Number: {1}, '{2};\r\n\r\n", error.Line, error.ErrorNumber, error.ErrorText);
        }
    } else {
        var type = results.CompiledAssembly.GetType("Snippet");
        var method = type.GetMethod("main" );
        method.Invoke(null, new object[] { new string[0] });
    }
I normally run this type of code snippet in LINQPad, which does something quite similar (albeit likely a lot more elaborate) internally. But if you just paste the above into the main of a command line program in your Visual Studio project it'll also work of course.

Possible changes and considerations

Use an instance method In the above code we use a static main method. If you'd instead prefer to use a regular instance method, you'll need to instantiate an object and pass it to Invoke, like this:
    var type = results.CompiledAssembly.GetType("Snippet");
    var obj = Activator.CreateInstance(type);
    var method = type.GetMethod("main" );
    method.Invoke(obj, new object[] { new string[0] });
If you do this, I recommend that you name the method something else than main, since most people will associate main with a static void method. Pass parameters to the snippet The snippet operates in complete isolation so far. To make it a bit more useful, let's pass some parameters into it:

First we'll need to modify our template to do something with the new parameter:

	var template = "using System;\npublic class Snippet {{\n\tpublic static void main(string[] args) {{\n\t\tConsole.WriteLine(args[0], {0});\n\t}}\n}}";
So we just use the first parameter as the format string for Console.WriteLineargs[0], {0}). Then we pass a value for this parameter when we invoke the method:
    method.Invoke(null, new object[] { new string[] { "Expiration date = {0}" } });
And now the snippet will print:
    Expiration date = 10/25/2043 1:21:04 PM
Make the script return a value However interesting writing to the console is, it is probably even more useful if our snippet would return its value instead.

To accomplish this, we'll change the template for the main method to this:

    public static string main(string[] args) {{\n\t\treturn string.Format(args[0], {0});\n\t}}
And the invocation now needs to handle the return value:
    var output = method.Invoke(null, new object[] { new string[] { "Expiration date = {0}" } });
    Console.WriteLine(output);
Note that the snippet itself has remained completely unchanged through all our modifications so far. This is a good design principle if you ever allow the users of your application to specify logic in this way: make sure that any changes you make to your code are backwards compatible. Whatever code the users wrote, should remain working without changes. The line numbers for errors are offset If there is an error in the snippet, the script will write the error message(s) that it gets back from the C# compiler.

So when we change the snippet to:

    var snippet = "DateTime.No + TimeSpan.FromDays(30 * 365)";
We'll get this output:
    Line number 4, Error Number: CS0117, ''System.DateTime' does not contain a definition for 'No';
The error message itself is correct of course, but the snippet we provided is one line, to the line number is clearly wrong. The reason for that is that our snippet is merged into the template and becomes:
    using System;
    public class Snippet {
        public static string main(string[] args) {
            return string.Format(args[0], DateTime.No + TimeSpan.FromDays(30 * 365));
        }
    }
And indeed, this C# contains the problem on line 4.

The solution for this line number offset is to either subtract the offset from the line number in the error message or simply not print the error message. In a simple case such as this the latter option is not as bad as it may sound: we only support short snippets of code, so the line numbers should be of limited value. But then again: never underestimate the ability of your users to do more with a feature than you ever thought possible. Make the wrapper class more complete Probably the most powerful way you can extend the abilities of your snippet-writing users is by providing them more "built-in primitives".

  • Any method you add to the Snippet class, becomes like a built-in, global function to the snippet author. So the Set, Add and Clear methods of my original snippets could be implemented like that.
  • You can also make the Snippet class inherit from your own base class, where you implement these helper functions.
  • Any variables that you define inside the main method before you include the user's snippet, will become like built-in, global variables to your snippet authors.
  • I've had great success in the past with making utilities such as a log variable available like this.
Allow importing more namespaces and referencing more assemblies The template above imports just one namespace and only binds to the default system assemblies of .NET. To allow your snippet authors to use more functionality easily, you can either expand the number of using statements in the template and add additional references to the ReferencedAssemblies of the CompilerOptions.

Alternatively you can give the users a syntax that allows them to specify their own namespaces to import and even assemblies to reference. In the past I got some pretty decent results with this syntax:

   <%@ Import Namespace="Path.Of.Namespace.To.Import" %>
Use VB instead of C# There is also a compiler for Visual Basic code. If you'd prefer to use that, you can find it here:
    var provider = new Microsoft.VisualBasic.VBCodeProvider();

Saturday, April 6, 2013

Gaining knowledge is not the same as asking questions

"Does anyone know how xyz works?"

-- silence --

"Guys, I really need to know how xyz works."

-- silence --

Does this sound familiar? It's rude that nobody answers? Right? After all, somebody should know the answer. It's not like your question is difficult. Right? You are asking something relatively straightforward that certainly somebody must know. If they won't tell you, how do they expect knowledge to spread? How can you ever become knowledgeable on these subjects if others outright refuse to answer your questions? Right?

WRONG!

Knowledge is not created by people asking questions and waiting for others to provide answers. Knowledge is gained by people who do things. Instead of asking somebody else to provide the answer for you, think what you can do to find the answer yourself. In fact, first spend a moment to come up with the most likely answer. And then validate that answer, proving yourself right or wrong. Try it, you'll be surprised how quickly you can try certain things. Heck... it may well be faster to try it then to ask again and again. And even if it's not faster, you will almost certainly gain a better understanding of the thing you are working with. In addition to getting an answer to your question, you'll also have a better understanding of the why around that answer. You'll have gained knowledge and understanding.

Yet many people seem to prefer asking a question over trying to find the answer themselves. Instead of building knowledge they are trying to get other people to give them the knowledge. And while I don't mind showing off my knowledge about a topic, mine is often built from doing things - not from asking others. So ask yourself: do you want to be a knowledge sink? Or do you want to be a knowledge source?

This situation is so common that an entire subculture has arissen. When somebody asks a very open question on Stack Overflow, someone almost always comments with: "what have you tried?". Matt Gemmell even put his original blog post to that effect on a separate domain: whathaveyoutried.com. Have a look around Stack Overflow and see what type of questions solicit the "what have you tried?" response. The phenomenon has spread quite far. Often I feel bad for the person asking the question, because they clearly have no clue what they've done wrong.

Sunday, January 20, 2013

Handling asynchronicity in an API

Designing a good API is one of the more difficult tasks when it comes to software development. Unfortunately it is also one of the important tasks, since it is really hard to change an API after it's been made public. Well OK, maybe it is not hard for you to change the API, but it is hard on the users of your API: they have to update all their client programs if you decide to make changes.

Nowadays many APIs deal with asynchronous operations, especially when an API bridges the gap between a client-side web application and a server-side back-end. Whenever a web application needs some additional data from the server, it needs to initiate a (XMLHttpRequest) call to the server. And since such a call can take a significant amount of time it should (in almost all cases) be handled asynchronously.

Checking if the title is loaded

Here is how I recently saw this done in a JavaScript API:

    function displayTitle(object) {
      if (object.isLoaded()) {
        document.getElementById('title').innerText = object.getTitle();
      } else {
        function onLoaded(object) {
          object.removeEventHandler('load', onLoaded);
          document.getElementById('title').innerText = object.getTitle();
        }
        object.addEventHandler('load', onLoaded);
        object.load();
      }
    }

To display the title of the object, we first have to check if the data for the object has been loaded from the server. If so, we display the title straight away. If not, we load it and then display the title.

That is quite a lot of code, for what is a quite common operation. No wonder so many people dislike asynchronous operations!

Use a callback for the asynchronous operation

Let's see if we can simplify the code a bit.

For example, the event handler is only used once here. And although registered event handlers can be useful for things that happen all the time, clearly in many cases people will want to check if the object is loaded in a regular sequence of code - not respond to whenever the object is (re)loaded.

If we allow the onLoaded function to be passed into the load() call, things clean up substantially:

    function displayTitle(object) {
      if (object.isLoaded()) {
        document.getElementById('title').innerText = object.getTitle();
      } else {
        object.load(function onLoaded(object) {
          document.getElementById('title').innerText = object.getTitle();
        });
      }
    }

The nice thing about this change it that you can add it to the existing API after releasing it:

    MyObject.prototype.loadAndThen = function(callback) {
      function onLoaded(object) {
        object.removeEventHandler('load', onLoaded);
        callback(object);
      }
      this.addEventHandler('load', onLoaded);
      this.load();      
    };

This is mostly a copy of the code we removed between the first and second fragments above. But now instead of everyone having to write/copy this plumbing, you just have to write it once: in the prototype used for the object in question.

Note that I named the function loadAndThen, to avoid conflicting with the existing load function.

Assume asynchronicity

But when I started using the Firebase API a while ago, I noticed how natural their way of handling asynchronous operations feels. If we'd apply their API style to the above example, the displayTitle function would become:

    function displayTitle(object) {
      object.getTitle(function (title) {
        document.getElementById('title').innerText = title;
      });
    }

Since the title might have to be loaded from the server, they require you to always pass in a callback function. And they will simply call that function once the title is loaded.

Now I can see you thinking: "but what happens if the title is already loaded?" That is the beauty of it: if the title is already loaded, they simply invoke the callback straight away.

If we'd like to implement such an API on top of our example, we could implement getTitle like this:

    MyObject.prototype.getTitleAndThen = function(callback) {
      if (this.isLoaded()) {
        callback(this.getTitle());
      } else {
        function onLoaded(object) {
          object.removeEventHandler('load', onLoaded);
          callback(object.getTitle());
        }
        this.addEventHandler('load', onLoaded);
        this.load();      
    };

Like before I gave the function a suffixed name to prevent clashing with the existing getTitle function. But of course if you end up implementing this in your own API, you can just stuff such code in the regular getTitle function (which probably reads the title from a member field of this).

If you think this is a lot of code to add to your framework, look back at our first example. If you don't add the code to your framework, every user will end up adding something similar to their application.

Conclusion

By assuming that certain operations are (or at least can be) asynchronous, you can reduce this code:

    function displayTitle(object) {
      if (object.isLoaded()) {
        document.getElementById('title').innerText = object.getTitle();
      } else {
        function onLoaded(object) {
          object.removeEventHandler('load', onLoaded);
          document.getElementById('title').innerText = object.getTitle();
        }
        object.addEventHandler('load', onLoaded);
        object.load();
      }
    }

To this:

    function displayTitle(object) {
      object.getTitle(function (title) {
        document.getElementById('title').innerText = title;
      });
    }

The biggest disadvantage I see in the second example is that users of your API are more directly confronted with closures. Although I hang around on StackOverflow enough to realize that closures are a real problem for those new to JavaScript, I'm afraid it is for now a bridge that everyone will have to cross at their own pace.