The Case for Internet Agents on Mobile Devices

In this article, I want to put forward an approach to writing server side code that mobile phones can execute to perform long-running, periodic, and battery intensive tasks in such a way that no monolithic centralised server is needed to host them, and the security of the data required to execute them (e.g. username/passwords or credit-card information) is made more difficult to obtain or read than current systems.

I’m going to start with a bit of philosophy and history, which will probably be old news to many, but I think it sets the scene. The new stuff happens from the second section onwards.

Information Ownership as it Stands

The original premise of the Internet was about freedom and decentralisation. Anybody could have their own server and run a mail system, or a HTTP server, or whatever. This is still an ideal that many netizens hold dear, however monetization of the Internet has meant that it evolved in a very different way. 99% of the population don’t care about their own server, and wouldn’t know how to set one up if they did. People want a service provided to them that is easy to use. Internet companies have capitalised on this, building whole walled in ecosystems, in which their users exist.

Facebook, Google, and Apple are the biggest players in this space, and indeed the biggest innovators on the Internet, because they have a financial model that justifies the development effort. There are, however, some compromises that have been made for this. Security, stability, and even the availability of your Internet persona, and data, are controlled by single entities. On the whole, these companies do a great job of protecting your (their) data and have good reliability.

But when something does go wrong (e.g. the recent Sony data breach) it is a massive event, affecting more than just a few dozen or a few hundred people. More importantly, centralising the information creates a gold mine (almost literally) for the companies that control them, and the government wishing to get access.

While projects like Diaspora are laudable in their efforts to give control back to the people, there are numerous challenges. Foremost, and one I don’t think we should bother fighting, is the innovation financial model. Diaspora is producing open-source software, which doesn’t directly create income (at least, not on the scale that massive corporations want), and they have to produce a competitive product that people want to use, while competing with companies that have orders of magnitude more resources. Simple logic states that the big boys will win, unless there is a compelling reason for people to choose the open route. Data privacy is the obvious choice, but at the moment almost everyone is blasé about information security.

The final challenge, and the topic I want to talk about in this post, surrounds ease of use. People don’t run their own servers because they don’t know how to, or they are laborious to set up and cost money to run even if they do know how. Even people who take security (somewhat) seriously, falter. I ran the email and web sites of my family for some time, and eventually gave up, moving them to Google Apps, which I still feel vaguely embarrassed about.

The Case for Mobile Agents

What if it wasn’t hard to deploy services, and it could be done in such a way that only the devices/people you wanted to had access? Cloud computing provides the opportunity. For the moment, most people use cloud computing services in a manner similar to that of traditional web servers. It is a monolithic server that services all users from the same database and computer(s), because thats how we’ve been taught to use them.

Instead, I can see a mechanism whereby your phone or laptop simply registers code with a cloud computing service, and your needs are achieved that way. This could be managed by the device itself, without user interaction, now that there are APIs for creating and consuming cloud computing services. Furthermore, the interactions could (most of the time) be encrypted, such that only the device creating the agent can retrieve the information, or even get the data required for processing a request, thereby making the agent less of a target for hacking.

I find this especially interesting for mobile phones. Mobile phones are getting more powerful every day, but they are held back by battery limitations. On the server, developers can afford to be wasteful or expansive with resources, less so when writing mobile apps. The traditional way to resolve these issues has been to perform much of the processing on a server, and use the mobile device as an I/O platform to access the web data. This works well for platforms that own the whole experience, but can be difficult for mashup style applications, that may not have the resources to run enterprise servers. Also, there are security implications inherent in releasing information to a third party server.

In Practice

I’ll take an example from my own application NodeDroid, which screen scrapes telco websites to report phone usage. NodeDroid stores the username and password on the phone, accesses the telco’s website to download the HTML pages, then parses the usage information out of it (which is somewhat processor intensive).

It would be awesome if the telco provided an API to do this without screen scraping, but by and large they don’t. At the moment, all of the processing happens on the phone. I toyed with the idea of setting up a service to do the fetching for the phones, but I haven’t for two reasons. The first is that I don’t make any money out of this, so I don’t see why I should have to take on the expense of running a server. Second and more importantly, I didn’t want to see people’s passwords.

If I had a server, the phones would have to send the username/password pair such that my server could pass them on to the telco and fetch the data. Call me an idealist, but there could be a better way. I’ve ranted about this before, in summary I have always felt uncomfortable about spinning up another web application every time I want a new service.

Google provide a cloud computing service called Google App Engine. Every person with a Google account (which is almost everyone who has an Android phone) can create their own cloud apps with very generous free limits. The phone could create a special web app, called an ‘agent’, specifically for fetching its usage data and deploying that to GAE. For security’s sake, the deployed agent would include the username/password to access the telco, but it would encrypt it using public key cryptography and keep the private key on the phone, so that even if somebody broke in to the web app on GAE, they wouldn’t be able to retrieve the info. When the phone wanted to fetch information, it would send the key for the duration of the exchange to the agent, which would decrypt the parameters required to execute the function, return the data to the user, and then forget the key. By doing this, we have achieved a number of advantages:

The workload of fetching the information has been offloaded to the server, saving on battery power and bandwidth usage, as only the information required is sent back to the phone, rather then the whole screen scraping session
The user’s information is kept secure on the server, as it is only available during an individual invocation from the device
The user’s phone is no longer storing sensitive information, and if somebody were to steal the phone, they’d be able to fetch your usage (by invoking the agent) but not retrieve the user/pass to access other information on the telco’s site. They’d need to break in to both the phone and the server to get it, and even then only get one user’s details.
The agents could be distributed to run on any cloud computing provider or on their own server at home, if the user wanted to be extra safe with their information
In the event of one cloud provider going down, it could simply recreate the agent somewhere else to achieve reliability
The server agent could run background or repetitive tasks, saving the phone from having to do that. Notification could be performed by C2DM, even if the source service didn’t support it

I like it as an approach, and I am currently experimenting with a system to do exactly this on Android. It’s not perfect, and there are still some security concerns, but it’s a lot safer, and a lot more personal and private than most approaches used today. In order for this to work, it will have to become easier to write agent code than it is to write a server and create the infrastructure to host it. Agents will be extensions of the phone application, bundled within the application, and won’t require any special servers. I think I can do that. I’ll post more once I have a running system.