200 ok: Composition through REST

TL;DR

By restricting application code from the client-side of the Web, REST messaging can become the exclusive driver of application behavior. Generic clients with more fine-grained control of the document can then introduce compositional qualities, improving user control over application behavior.

Compositional Architectures

The basis of composition is a unified interface, and unification is created through constraints. Universal I/O connectors simplify configuration and can result in emergent qualities. The go-to example is Unix's files and streams: components using streams can combine with pipes, resulting in more compact and reusable software.

Plan9 expanded on the Unix premise with a more rigorous unified interface ("everything is a file"), a networked messaging protocol, dynamic file namespace construction through layering, and a self-wiring GUI. It can reconfigure the view of resources -- keyboards, monitors, CPUs, storage, NICs, printers, software -- for each process by layering namespaces. That mechanism is used to enforce permissions, alter software behavior, and bridge machines over the network (without process awareness). By treating every interface as a file, Plan9 is able to apply tools and policies across broad slices of the system, often simplifying its design in the process.

The Web Protocol

REST defines a universal networked namespace of resources, application-driven navigation with hypermedia, and content-type negotiation, with constraints to allow caching, layering, and uncapped client counts (due to statelessness). It's a distributed data-store, but the resources are defined by applications which can apply their own logic (like 9P). The unified interface then makes it possible for clients to be architecturally generic-- but only to a point. Despite the compositional qualities of REST, Web applications don't let end-users take advantage of them. There is no equivalent to the pipe and stream.

Why is this? My belief is this: browsers don't provide managed cross-application access to the client resources. Instead, rooted in the document-browsing model, they've relied on complete isolation in the GUI environment. This policy forms a tight binding to the host of the application, limiting configurability to the options provided by the application itself. By not continuing to refine the granularity of the client architecture (possibly due in part to failures of features like frames) we've missed opportunities to create more powerful uniform controls in the browser.

In this post, I'm going to argue for altering the role of client-side code. By restricting application logic from the client (zero out of-band knowledge) and using Javascript to extend the browser's behaviors, all application-level interactions are forced into REST messages. This provides the opportunity to introduce compositional tools, possibly mimicking concepts from Unix and Plan9.

Client as Browser, Server as Application

This approach requires the frontend to fully separate from the backend. The document software focuses on response interpretation, management of resources (such as the DOM or localStorage), traffic permissioning, and other system properties. Some of these responsibilities could be implemented without scripts (in the browser itself) but the goal is not to eliminate client Javascript - just to restrict it from application-specific behaviors (which are server-side concerns).

In this model, users choose between pages to get different security profiles, UI capabilities, and software configurations, then load interfaces for applications (defined by servers) into the environment. By sharing behaviors (particularly to do with response-interpretation) different pages can reuse applications. One obvious application for this is responsive design - clients can be selected to fit the medium, and applications can reduce layout concerns.

Content Security Policies can be used to embed untrusted content without sanitization. This makes it possible to implement frame-like behaviors at a finer grain with client code. The strike against CSP has been the issue of delivering richness without javascript. This is also solved by the generic client, which interprets directives from the response (status, headers, content-type behaviors, HTML directives) to safely produce high-level UI behaviors.

(It should be noted that iframes, even with some recent additions, are not suitable containers for untrusted code until they execute in a separate thread.)

Available Tools

The concept of "layering" makes it possible to introduce proxies into interactions with an application. This can be (and often is) used to add or modify behaviors, cache intelligently, anonymize, log, run security policies, and separate UIs from data-services (and layouts from UIs). Giving the client simple controls over this behavior - perhaps using a piping language - should result in more compact and flexible programs.

Hypermedia is another important tool to examine further. With data-uris, links can now designate static resources (a good opportunity for client-supplied data). Increasing the kinds of resources which URIs can target (by adding new schemes) increases the opportunities for unified controls.

The Link header centralizes programmatic access to hypermedia and creates a mechanism for exporting configuration (with HEAD requests). Link attributes - particularly "rel" and "title" - can describe an href with application-level semantics. Custom rel attributes (using URLs) can apply additional requirements on the target resource and its supplied representation, such object schema definitions - important for compatibility negotiation. URI Templates parameterize the links, further reducing out-of-band knowledge and simplifying client design. In combination, these techniques enable programmatic navigation.

Using link attributes to navigate hypermedia allows clients to parameterize views of resources. For instance, consider the following navigations:

HEAD http://myhost.com
  HEAD rel=collection title=users (http://myhost.com/users)
  GET  rel=item       title=bob   (http://myhost.com/users/bob)

As far as the consuming program is concerned, there is some initial URL, the "collection:users" navigation, then the "item:pfraze" navigation. Thus, it's view of the process is:

HEAD (x)
  HEAD rel=collection title=users (y)
  GET  rel=item       title=bob   (z)

By expecting host URLs in the "x" parameter, programs can be configured to consume different services by handing them a "dot com," a familiar and simple tool for users.

A notable recent addition to HTML5 is Server-Sent Events, which allows realtime server-to-client updates which work across the network (though at some cost of scale, as each event-stream requires a persistent connection). This is a useful mechanism for syncing the client with the application without requiring an initiating user-action.

Generic Client Designs

The available tools can be exploited in a variety of ways, each with their own effects. The client I'm developing (Grimwire) has been through a number of iterations.

The first iteration was command-line based, included "fat" pipes (response->request pipes that defined the content type), and assigned names to individual "frames" on the page which could be used as aliases to the host that delivered the response. The CLI looked like this:

users> get http://myapp.com/users [json]
friends> get users/pfraze/friends [json]
get users/jdoe [json] post friends
delete friends/harry

This original "fat" pipe technique (defined above by the brackets, eg "[json]") did not leverage proxies. Instead, it issued each request in the pipeline as individual requests from the client, using the body of the previous response as the body of the subsequent request.

The "frame" which issued the request would receive the final response in a "response" DOM event. At that time, I allowed application client code (as widgets) to handle the response and alter the DOM directly. I also had the widgets themselves export Web APIs (as servers) which made it possible to do things like checking a box in a table with ck mail/5. You can find a demo video here.

(In-document servers, I should note, are added through an Ajax library which examines the protocol and routes "httpl://" requests to javascript functions. As mentioned above, this increases the usability of URIs, allowing them to target client resources - like the DOM - and applications which live in Web Workers - described below.)

I dropped the first iteration's approach for a couple reasons. The first is that I think the CLI can be dropped in favor of something simpler and GUI-driven without losing the precision. My feeling is that the CLI, if needed, should be an application, not a client feature.

The other issue was whether it's right to export a server-interface for the DOM. Doing so is in line with Plan9's approach to the GUI (where windowing and rendering is controlled, of course, by files) and matches the intent of a universal interface, but it seems to break the mental model of the browser without gaining much value. It confuses the server/client relationship a bit to handle requests by issuing subsequent requests on the DOM (what does the response do?) and remote servers would be unable to access the same tools. The async nature of requests also made them somewhat less convenient to use for DOM manipulation.

In lieu of the widget-servers, I started using Web Workers to host "local" servers under the "httpl://" namespace. This forms an on-demand collection of apps within the session which can give strong guarantees of privacy without requiring private webhosts. Shared Workers, once they have better support, will also be available to bridge between browser tabs. Ideally, Workers could be used to sandbox untrusted code, which would then be used with the same privacy guarantees-- but the security of that approach isn't proven yet.

Servers in the local namespace are considered elevated regarding Hypermedia discovery - it's assumed that any links they export should be consumed. For instance, the search program looks through all local servers for a rel="http://grimwire.com/rel/index" link, then automatically GETs and indexes that content. The registry of auto-discovered workers could also include links to remote hosts set by the user. This should provide high-level control over software composition with a fairly simple configuration-step (choosing active programs).

Rather than have DOM regions export interfaces, I'm testing embedded information which can flow through operations. You can find older demo videos of this here and here. In the current build, dropping the "Email" operation onto embedded data results in a GET request to the mail application for an interface to insert. The "Email" operation is itself just an exported link with the "http://grimwire.com/rel/transform" rel attribute. The generated request includes serialized data about the drop target, allowing the application to seamlessly populate itself.

In the future, I want to experiment with more layering (proxy-based) tools, and take stronger advantage of Hypermedia discovery rather than configuration. Currently, I use JSONs to define application packages; I think they could be factored out. Additionally, I need to start forming an approach to security and permissioning. Doing so would enable untrusted applications to enter the user's environment, and could expand the possibilities for multi-user environments.

Final Words

Adding a new constraint to the Web is not easy, particularly when it takes away Document API access from the application developers. Some modern projects, such as Meteor, actually look for more application code in the frontend, not less. However, bucking this trend for a little added pain could result in significant payoffs.

The quality of the system is largely dependent on the capabilities of the client, but the fundamental concepts (server-side applications, common response interpretation behaviors) are homogeneous enough to translate between many different clients. By using pages as extensions to the browser, users can customize the client on-demand and choose the environment that suits their needs.

As we move into new computing interfaces (VR/AR, motion tracking, voice-recognition) client-side versatility will become increasingly important. Moreover, effective application of these techniques could account for holes in the architecture of the Web (privacy, user-extensibility). Improved granularity could result in more compact and reusable applications, and create opportunities for effective compositional techniques on the Web.

If you'd like to know more about Grimwire, visit the documentation in the nightly build, the Gougle Group, or the repository, or follow me on twitter.

200 ok

Monday, May 27, 2013

Composition through REST