Tuesday, May 7, 2013

How the Grimwire Search Program Discovers New Applications and Indexes their Content

The Search application on Grimwire recognizes when new applications enter the environment and fetches data from them to populate its index. When applications are closed, their data is automatically depopulated. This post will explain how this works, both so that you can take advantage of the Search app, and so that you can use similar techniques in your own applications.

The Config Server

All of the environment's active configuration is hosted at httpl://config.env. It hosts the /apps collection, which provides the "Applications" interface and their JSON configs.
The only other environment server at this time is httpl://storage.env, a sessionStorage wrapper.
The /apps collection also hosts an event-stream which emits an "update" event when applications are added, removed, or reloaded (which happens on every config change).

// listen for new applications
local.http.subscribe('httpl://config.env/apps')
    .on('update', updateSources);
updateSources();

This is how the Search application becomes aware of new apps in the environment. If you're not familiar with Server-Sent Events, this post gives a brief introduction.

Finding the Data to Index

Every application defines a "startpage," which is the primary entry-point for its APIs. The Search application uses that URL to identify each source.

function updateSources() {
    local.http.dispatch({
        url:'httpl://config.env/apps',
        headers:{ accept:'application/json' }
    }).then(
        function(res) {
            var cfgs = res.body;
            for (var appId in cfgs)
                addSource(cfgs[appId].startpage); // <-- here
        },
        function(res) {
            console.log('Failed to fetch active applications');
        }
    );
}

Grimwire defines a custom rel-type for exported searchable data: http://grimwire.com/rel/index. The Search app checks each application startpage for a link to that relation.

function resolveSourceIndex(startPage) {
    return local.http.navigator(startPage)
        .relation('http://grimwire.com/rel/index')
        .resolve(); // just give me the URL
}

If you're not familiar with the navigator, there's a small introduction here. This function will generate one HEAD request to the startpage URL, then check the returned links for a rel="http://grimwire.com/rel/index".

Getting the Data

The next step is to get the docs and to watch them for updates in the future.

function addSource(sourceUrl) {
    resolveSourceIndex(sourceUrl).succeed(function(indexUrl) {
        getSourceDocuments(sourceUrl, indexUrl);
        local.http.subscribe(indexUrl).on('update', function() {
            getSourceDocuments(sourceUrl, indexUrl);
        });
    });
}

This allows applications to trigger re-indexing so they can maintain freshness.

getSourceDocuments() issues a GET request for json. It expects an object to be returned with the items attribute populated with an array of objects, according to this schema:

{
    items: required array of {
        href: required url (primary key),
        title: required string,
        desc: optional string,
        category: optional string
    }
 }

This data will be used to populate the search dataset and to render results. The use of this schema is implied by the "http://grimwire.com/rel/index" relation, which is why the custom relation is used instead of the standard "index" relation.

Wrapping Up

The two tools in use here are Server-Sent Events for syncing and the navigator for discovery. By listening to the event-stream of the config server's applications, the Search app can react to new programs. Likewise, by listening to the indexes of the apps, Search can keep its data-set fresh. Meanwhile, by looking for the "http://grimwire.com/rel/index" relation in Link headers, Search can discover what data to index and expect a specific schema.

These techniques can be used in your applications, and are not restricted to Worker servers. Server-Sent Events and the Link response header are HTML5 standards, so any remote host can leverage the same tools.

Read more at github.com/grimwire/grimwire.

No comments:

Post a Comment