Text 28 May GiveCamp UK

This morning I signed up for GiveCamp UK. It is an event taking place this October in London where people involved in the software industry are donating a weekend of their free time to help charities with the IT needs. 

What is GiveCamp? from givecamp on Vimeo.

If you can spare the time then sign up as a volunteer or if you work for a charity who could use some help submit a proposal. I have no doubt it is going to be a great weekend which will make a real difference to several charities.

Link 9 Feb Tags are magic! - Guardian»
Text 7 Feb 4 notes Pragmatic web service design

Web services are a crucial part of most solutions nowadays, I spend a significant portion of my time designing and writing them and I have read a lot about them to make them better, faster and more resilient each time. This is a summary of how I approach web service design and the things I bear in mind.

Protocols and content types

Unless you require extreme performance from your service then use the most compatible technologies available. Today that means HTTP, JSON and HTML forms. The lowest common denominator in any solution is usually Javascript in the browser. This shapes all your decisions about how to expose your service. HTTP, JSON and HTML forms are the easiest things to work with in Javascript and they are well supported in other languages. XML is an option but JSON is a more efficient transport medium and much easier to work with in Javascript.

Before you write a web service make sure to learn HTTP inside and out. It is a powerful protocol that solves many more problems than most people realise. I would recommend RESTful web services as a starting point, it demonstrates how to create a web service that is sympathetic to HTTP and there is a useful glossary in the back. This is not going to be another post about REST but if you know about it already there will be some familiar concepts.

Using the correct HTTP code is important, think of it as a well established domain-specific language for what the server thought of your request. It removes the need to duplicate something like a response code within the body of the response. A caveat on exploiting all that HTTP gives you is to avoid 300 errors as browsers will redirect the whole page even when it is a response to an AJAX request.

For sending and receiving data I recommend HTML form values in, JSON out. You may want to send requests in as JSON instead and that’s not a bad choice, I just find HTML forms easier. However, I would recommend using the same input and output content type for requests across your whole API whenever possible. It makes it easier to consume an API when you do not have to think what format a given method accepts and responds with.

Avoid supporting several content types, particularly for the first couple of releases. You will end up iterating over your service during initial development and having to maintain compatibility for several content types will be adding wasteful overhead at this point. Get it shipped with one pair of content types then look to fix your bugs, harden your API and learn from real usage patterns. Put this into the next version and repeat. Once things stabilise you can evaluate whether support for several content types is needed based upon real world requests without it being such a burden as your API goes through the churn of first contact with the real world.

Core design considerations

A lot of the considerations are identical to those of designing any API but there are some additional ones that are specific to web services as they involve transmission over a network.

Log everything

It should be obvious but you need to log every request that comes in. You may need to replicate an issue that is a result of several requests, without logging this will be difficult. You want to easily answer the question “what was the consumer doing at the time?”. Logs will also allow you to monitor usage, response times and other useful metrics. As your first attempt at an API is no more than an informed guess you need to be collecting metrics in order to make an educated decision about what to do next.

Comply with the expected behaviour of HTTP

There are several expected behaviours with HTTP such as GET not producing side effects and PUT and DELETE requests being idempotent. This all comes from knowing HTTP as previously encouraged. Being able to justify a design decision by referring to RFCs is awesome.

Less methods returning more data

The biggest bottleneck in communicating with a server is no longer the size of the data being communicated, it is in establishing connections. This is particularly true for internal services. The common bottleneck with web services is the number of concurrent requests they can serve, not the amount of data being transferred. You want to aim to have less methods but return more data from them, therefore reducing the number of requests consumers need to make. Having less methods also makes it easier for the consumer to make the right choice.

Imagine everything a consumer is likely to want to display as a result of making the request and give it to them. The only thing you need to be careful of is your internal implementation bleeding into your API. You do not want to expose yourself in public, especially early on. Following this advice makes it likely the consumer will get all the data they need in one request rather than making a separate request for each part. This reduces the surface area of your API which has several great benefits:

  • as a developer I have less methods to maintain
  • as a consumer I have a lower cognitive load, I might even be able to memorise your API
  • as an administrator less requests helps me with caching and usually means I can use less hardware, improving service and making scaling up cheaper

In terms of web service surface area and required requests, less is definitely more.

This may seem to contradict my recommendation of using JSON but choosing a more efficient protocol when there is no compelling reason not to is foolish.

Highlander principle

There should be one, and only one, way to do something. Not only does it save you effort, it makes things easier for the consumer. Sometimes this may mean that one action requires several requests but this is less confusing in the long run than creating a specific method for every action. It is generally acceptable that the solution is good enough, performing several simple steps is cognitively easier than wading through a sea of methods to find the one intended for your task.

Again I may seem to be contradicting myself as this goes againsts the idea of trying to reduce the number of connections required to do something and it does to some extent. However, when you consider most systems are at least 80% reads, that this situation usually applies to writes and that this should not be a regularly occurring problem if you have modelled the domain correctly then it should be a drop in the ocean of the general usage of the API.

Any method you add on a hunch as to what future use will be you will have to support for the lifetime of the API. It is better to wait until you have real statistics and use-cases to work from than to increase the surface area of your API speculatively. If an action requiring several requests becomes common practice then you can add a method to simplify it and you will be certain that it is adding value to your code base and consumers.

Give people URLs

Whenever possible provide URLs to the consumer, do not make them work them out. Every URL a consumer has to create is a support call waiting to happen. When someone has to create a URL it is likely your internals are bleeding into your API. Once someone else is creating a URL you can never change that URL and that restricts refactoring and scaling options. When you create URLs for your consumers you can rename them and point them towards different servers to name just two options that would be closed to you the second you are not in control of your URLs.

Version from the outset

There will be several versions of your web service. Think about identifying the schema of your data when it is returned, think about how you will host several versions of the API as you cannot just switch one on and the previous version off. It will happen and it will be difficult to lever in at a later date, make it a solved problem.

Example - Twitter’s feed

Here’s the response for a single tweet from my timeline:

[ { ...,
    "entities" : { "hashtags" : [  ],
        "urls" : [  ],
        "user_mentions" : [ { "id" : 1102,
              "id_str" : "1102",
              "indices" : [ 3,
                  10
                ],
              "name" : "David Ulevitch",
              "screen_name" : "davidu"
            } ]
      },
    "favorited" : false,
    "geo" : null,
    "id" : 34612066580955136,
    "id_str" : "34612066580955136",
    "in_reply_to_screen_name" : null,
    "in_reply_to_status_id" : null,
    "in_reply_to_status_id_str" : null,
    "in_reply_to_user_id" : null,
    "in_reply_to_user_id_str" : null,
    "place" : null,
    "retweet_count" : 10,
    "retweeted" : false,
    "retweeted_status" : { 
        ...,
        "id" : 34583191385935872,
        "id_str" : "34583191385935872",
        ...,
        "retweet_count" : 10,
        ...,
        "text" : "The last 5% of a project is always the worst half of a project. :-)",
        "truncated" : false,
        "user" : { "contributors_enabled" : false,
            "created_at" : "Sun Jul 16 02:30:23 +0000 2006",
            "description" : "Positively disruptive.  Started OpenDNS, ...",
            ...,
            "id" : 1102,
            "id_str" : "1102",
            ...,
            "profile_background_color" : "9BE5E9",
            "profile_background_image_url" : "http://a3.twimg.com/profile...jpg",
            ...
          }
      },
    "source" : "web",
    "text" : "RT @davidu: The last 5% of a project is always the worst half of a project. :-)",
    "truncated" : false,
    "user" : { "contributors_enabled" : false,
        "created_at" : "Thu May 29 20:25:50 +0000 2008",
        "description" : "Freelance software developer fond of Linux, ...",
        ...
      }
  } ]

As you can see this contains everything about the tweet, including which tweet was being retweeted, who they were, what their profile preferences are, where their profile image is, everything you could want really. As a consumer I don’t need to make at least two additional request to retrieve the details for the involed users which would lead to a N+1 load on the server.

The only thing that sticks out to me as possibly being bad is the id values, that smells of internals details leaking out. Instead they might use user_url and instead give a URL for the user’s full profile and so forth. There also doesn’t appear to be any mention of a version for the message but they may handle versioning by using different URLs for different versions of the API.

Things to bear in mind during implementation

Do not reinvent the wheel

For example, methods to achieve caching already exist which utilise the caching mechanisms built into HTTP itself. Use these when possible rather than reimplementing it yourself. Squid and Varnish are two open source software based solutions that are easy to set up. Learn about the wide array of established HTTP headers available, it is rare that you need to create a custom header.

Comprehensive documentation

Yes, even for internal use. If your API is not documented it will be hard to use and under utilised. Without documentation you will spend a lot of time explaining how you go about using your API when you could tell a HTML file once and have that explain it to everyone.

Use examples of performing a common task with your API on top of an example for each method. Ideally the reader will be able to run these examples as they read. Ask a designer to throw a template together for you, it doesn’t need to be amazing but it will make reading your documentation a more pleasurable experience.

No breaking changes, ever

Once you’ve published it and it is exposed to the world, you cannot change anything, ever. Not even fix bugs as people will have written workarounds for them. You can add methods or return more data in response but you must never alter what has been published before. People will be relying on it and you will fuck them up.

TL;DR

  • Writing an API is very hard
  • Really learn HTTP
  • Remove choices and complexity by using less content types and exposing less methods
  • Reduce the quantity of requests by returning more data in responses
  • Give URLs to the consumer, don’t let them create them
  • Documentation, documentation, documentation
Video 13 Jan

Nero - Me And You

I love this track and I noticed they’ve made a Tron inspired video for it!

Link 5 Jan 1 note Cross-site XMLHttpRequest with CORS»

I was working on a personal project over the Christmas period and wanted to make cross-site requests from an AJAX client to an API.

As the client and the API were hosted on different domains they violated the same origin policy implemented by almost all browsers. Modern browsers allow cross-site AJAX requests if the API allows cross-origin resource sharing (CORS).

The resources I found at the time on this were a little bare on details but I managed to piece it together and get something working. I was planning on writing a blog post about how to do it but now I know all the terminology I came across this complete article that explains CORS very well, linking to several useful resources.

Text 7 Dec Why Rack should matter to .NET developers

There’s been a lot of talk in the .NET community about Sinatra clones, namely Nancy and Nina in recent times but there are several others as well. Unfortunately, all these frameworks are missing the underlying reason that Sinatra is awesome and that is Rack. Sinatra packs a hell of a lot of functionality yet weighs in at 1198 lines of code including documentation. The main reason for that is the fact it is building on top of Rack which has allowed the authors to concentrate on creating a framework without having to worry about the details of dealing with server requests. So what is Rack, why you should care and why does it matter to you as a .NET developer.

What is Rack?

Rack is a layer of abstraction which sits between servers and frameworks. What it brings as a result is greater interoperability between HTTP server implementations and web frameworks. As the author of a HTTP server if you get your server to speak Rack you can now host applications written in several Ruby frameworks, Sinatra and Rails being two of the most well known. As a web framework author if you get your framework to speak Rack your application can now be hosted on several types of HTTP server.

This greatly reduces the barrier to adoption for both servers and frameworks. Rather than the previous method of having to write an adapter for each server or framework you wished to be compatible with; you are now able to write a single adapter for Rack and be hosted by several types of server or host multiple frameworks depending on which side of the boundary you are sitting.

Needless to say, the Ruby community has jumped all over this. It removes the duplication of effort that was previously required to get the same level of interoperability allowing the Ruby community to focus on innovation. For example, Sinatra did not exist before Rack. Rack didn’t make Sinatra possible but it made it much easier to create. Dealing with HTTP servers had been done for them. They could instead focus on turning requests into responses in an elegant manner.

Why should you care?

First, lets talk about testing. Rack is entirely Ruby based at its core. That means that when you are testing your application you can mimic a real HTTP request being invoked and see what response your application turns that into. Let me show you an example of that being done using RSpec against a simple Sinatra application:

get '/hello/:person' do |person|
  "Hello, #{person}"
end

We can test how that application will respond to a HTTP request using RSpec:

describe Server, 'making a hello request' do
  include Rack::Test::Methods
  
  let(:app) {subject}
  
  describe 'saying hello to world' do
    before do
      get '/hello/world'
    end
    
    it 'should respond "Hello, world"' do
      last_response.body.should == 'Hello, world'
    end
  end
end

I have commented the full source for these examples up on Github if you want further explanation of what is happening here.

We’ve just written an integration test for our application without having to deploy to a server or create our own abstraction of HTTP. This same type of test will work for any application written using a Rack-based framework. Just think of how painful the same thing is in .NET currently. Not only does Rack give us greater interoperability between servers and frameworks, it gives us a way to test them easily as well.

Rack also has the concept of middleware. These are modules built on top of Rack that can then be utilised by any application based upon Rack. This again reduces duplication of effort within the community. Things that can be implemented in the Rack layer, for example caching, are and that removes the need for functional clones for each Rack-based framework that wants the same functionality.

Why it matters to you as a .NET developer

If you’re thinking “well I don’t do integration tests, write HTTP servers or web frameworks, why should I care?” I’ll tell you why you should. For the frameworks you use, a .NET equivalent to Rack would eliminate a large amount of tedious and brittle code that adds no direct value to a framework. This would allow the developers of frameworks to focus their time on refinement giving you more powerful and stable tools to work with. You may not use it yourself but you will benefit from its existence within .NET.

A .NET equivalent to Rack would open the door for a whole new generation of web frameworks to be written. The barrier for writing a quality web framework would be lowered as the nuts and bolts of request and response handling would be a solved problem. We could see some real innovation in .NET rather than a mass of clones of frameworks from other languages.

The door would also be opened for existing servers, such as Apache, to be used for hosting .NET applications and enable entirely new servers to be created. IIS would have competitors. This alone makes it worthwhile. It would make the choice of .NET as a platform less tied, if at all, to choosing Microsoft as a vendor. Even more so than Mono does today, particularly in the web space.

There is a discussion going on right now about creating or choosing a single, common abstraction around HTTP, like Rack, for .NET. Get involved in this discussion and contribute some of your time. This is an opportunity to vastly improve the .NET ecosystem. We cannot afford to miss it.

Text 19 Nov Use integration tests when working in a new language

This is in some respects a follow up to my previous post in that it has been triggered by our internal project in Ruby. When working in Ruby I consciously lean towards integration tests. Sinatra and Sequel make this really as using rack-test and running against an in-memory SQLite database it is easy to simulate a full conversation of HTTP requests and they run reasonably quickly too.

How does this relate to learning a new language? When writing anything in a new language you will make mistakes. You are likely to structure your application poorly. At a minimum you will find there is a much better way of achieving the same result. If you have based your work on unit tests you will have a straight-jacket tying your to your code making the effort to heavily refactor it much greater than it needs to be. Tests will need to be moved, rewritten, thrown away and renamed. However, when you use integration tests you are only verifying the behaviour of the whole system not its moving parts. This means you can replace the internals of your application entirely and your integration tests will not need to be changed and will ensure you have a system that still behaves the same as it did before.

Am I saying that you should never use unit tests in a new language? Of course not. However, you should chose the right level of test to match your level of understanding. Once your application’s structure has ossified it makes sense to exercise the complex parts directly through unit tests. The second you make that shift to using using tests you are setting your structure in stone so you want to make sure that does not happen before it has to.

Text 18 Nov 30 notes Simple tools and fundamental principles

We are doing an experiment in using Ruby for an internal project at work. I am the most experienced in Ruby so I’m leading the choices of gems and so forth and the team questioning those choices has lead to a bit of a personal epiphany.

I know that for a long while I have preferred simple tools and components. Simple tools tend to be more enabling as they give you control over precisely what is happening in your application whilst managing the boilerplate code for you. I’ve also focused my personal development and reading on fundamental principles to give me a solid grounding in concepts that can then be applied to any situation or language.

These opinions are being reflected in the tools I use for my Ruby development and therefore what I am encouraging the other developers at my company to use: Sinatra and Sequel. I enjoy the power you get from both to exploit all that is available from HTTP and SQL. I find it much easier to apply my knowledge of HTTP gained through reading about REST, though I doubt I’ll ever write a truly RESTful application. Things such as returning the correct HTTP codes and headers to enable client and proxy caching are ridiculously easy to do when compared to ASP.NET MVC. Being able to leverage my knowledge of SQL allows me to fine tune that vital query yet Sequel will still give me back easily consumable objects.

Choosing these simple tools for our task will help the team to learn Ruby and build an application, the real point of the task, rather than spending a large period of time learning how to use a more powerful framework such as Rails. The simpler tools should also let them exercise more of their existing knowledge, again allowing them to focus on learning the Ruby language rather than being distracted by Ruby frameworks.

As for me, I’m happy I’ve chosen this method of working and learning. I think it will stand me in good stead for working in any language I may want or need to learn in the future. It will be interesting to hear what the team thinks of this approach as we proceed.

Link 14 Oct The Long Beard's Revenge»

Very interesting article on a reduction in contribution to open source and the possibly underlying forces behind it.

Link 11 Oct 10 Common Mistakes Made by API Providers»

Design crafted by Prashanth Kamalakanthan. Powered by Tumblr.