I floated a site into the cloud, and it didn't rain down in chunks...
My baby steps migration
By Damon Hart-Davis • In Cloud • At 12:31 GMT 18th March 2011
WAR on the Cloud, part 1:
The "cloud" is still somewhat in its novelty phase as with virtualisation and (say) XML of yesteryear, when simply waving them at an application would magically make all your troubles drop away, like sessions on a crashing web server.
All of these technologies do have their value, just not as panacea.
For example, in the cloud you are likely to pay for every resource you use – CPU, bandwidth, storage, etc – which makes you vulnerable to ambush bills, in the worst case from a targeted DoS attack on your cloud-hosted sites, whereas typical conventional hosting arrangements have most costs capped.
Also, in the cloud access to storage is a complicated affair: if you have a variable number of front-ends, how do they coordinate safe access to writable data? For those operations that suit a database then maybe its concurrency controls are sufficient, but not every app can afford transaction overhead. And getting data in and out of the cloud is likely to be chargeable too.
I have a moderately busy website that gets a few million unique visitors per year (depending on how you count 'em) and that I have already distributed to mirrors in data centres in the UK, US, India and Australia, with a dedicated (expensive) machine in each location on Solaris and Linux. This hand-crafted "cloud" already works fairly well. Downtime on a single instance doesn't have much impact on the total number of daily visitors, and users are automatically bounced to a "closer" mirror where it will improve responsiveness.
For the two main mirrors in the UK and US having a dedicated box makes sense; they are fairly busy most of the time and I can hang other services on them such as DNS secondaries.
The two other boxes are relatively under-utilised and expensive both in terms of money and indeed energy (think carbon footprint) but, as it stands, the technology is too heavyweight to share a box on conventional JSP hosting services.
I wanted to test the waters and see if I could migrate one or both into the cloud, but keep geographically close to end users around AsiaPac.
Rather than try to do this all in one huge "big bang", probably involving changing a significant chunk of the 60k lines of Java in my site, it made sense to first test the ease with which a much simpler site, for example a static Apache or simple JSP/Java site, can be put up within an existing cloud environment, such as Amazon's.
In the first instance Amazon's "AWS Free Usage Tier" looked like a good place to start, with the right price, and setting up an AWS (Amazon Web Services) account took a few minutes.
A few minutes' reading in the AWS FAQs suggested that the "AWS Elastic Beanstalk" could be the right place for my site to end up, since it accepts WAR (Web ARchive) files that I already build, and offers automatic scalability and so on, which is more than I currently achieve with my home-brew technology.
Since I already develop my site/app in Java with Eclipse, the first step was to download the appropriate plugin from Eclipse, which includes the AWS SDK and a bunch of other stuff. The site also has some useful short explanatory videos such as this one on getting started with the AWS Elastic Beanstalk.
Free... as in free beer?
Using the installed features in Eclipse I created a "New AWS Java Web Project" from the pull-down menu next to the new AWS icon, with the aim of making a very simple custom app.
However, when invited to "Run as ... Run on Server" I could not find that menu option, so decided that I probably had to upgrade from my 'normal' Eclipse to the heavyweight JEE version (and another 200MB download) as AWS recommends! (I'd seen errors in Eclipse indicating that I should get the JEE version too.)
I afterwards discovered that the "Run on Server" option doesn't apply to the bare project (somewhat annoying), only to their sample "TravelLog" site. Still, upgrading to the full-fat (but still free) Eclipse did no harm...
Having worked that out and followed the video's instructions, I had the TravelLog site up and running in the cloud in a few more minutes. At the risk of being written off as an AWS fanboi, that was pretty smooth!
I was pretty excited that I had something running in the cloud in a few hours from scratch and I hadn't spent any money yet.
I still wanted to make my own non-static (Java/JSP based) trivial site, and as I couldn't get the sample AWS minimal site working, I instead stripped down the TravelLog, basically to show "Hello World" and the Java time in milliseconds.
The simple site has the following code for its top-level index.jsp ("welcome") file which is sort of minimally correct though in practice all but the line containing "Hello world again" could probably safely be omitted:
<%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <% if (request.getMethod().equals("HEAD")) return; %> <!DOCTYPE html>
Hello world again @ <%= java.lang.System.currentTimeMillis() %>!
for which output on one page load was:
Hello world again @ 1299270813546!
with the number rising on each page refresh in Eclipse or from the browser.
There are still about a dozen configuration files, and the "magic" that bundles my index.jsp into a WAR file for deployment to AWS is hidden from me which I don't like: I want to see the cogs turning, and I don't yet have a fixed friendly URL for the server that I could point an incoming user at (eg via an CNAME alias in DNS). But even this basic service has up to four instances behind a load balancer auto-started as necessary which is very grown-up.
Also, as of March 2011 the Elastic Beanstalk service is only available in the US (where I already have a dedicated machine) but I'm happy to deploy in the US for now for testing. Singapore is likely to be available soon, and AWS service there might replace (say) my Mumbai/Bombay server.
Note that a quiet site running on one server instance can stay entirely within the "free tier" while the AWS offer lasts, and subsequently, according to an AWS-supplied calculation, might cost of the order of US$40 per month, which would be much cheaper than my current dedicated AsiaPac hosts. (Though I did seem inexplicably to have run up a bill of a whole 3 cents while mucking around.)
In general, this is the sort of area where virtualisation and the cloud can reduce total cost of ownership, and where hardware resources are not heavily utilised. Thus transparent sharing with other services will save you money (and energy).
Functionality, storage and delivery
The next step would be to build a WAR file of my own similar in functionality to the trivial one in this "magic" Eclipse project, and learn how to deploy it manually to a stable externally visible aliased URL (or IP address) so that it would be seen as being "my" site transparently to an end user, and so that I don't have to turn my current build process upside-down or tie myself too closely to AWS ...
I should also deal with storage; the mirrors of my site are mainly caches for the large dataset that they front – and a naive implementation would probably lose that cache each time a server was redeployed – so I should consider using the S3 Storage Service or the CloudFront content delivery network (CDN) to do some of the heavy lifting.
I must also find out what I was charged the princely sum of USD0.03, when I had thought I would be entirely in the free tier, and how to permanently remove my trial server from AWS when done (after repeated attempts to kill it, it just kept coming back).
And I really need to find an auto-cutoff so that if I'm running up a huge bill unawares then I get sent a warning or two and eventually the service just gets taken down: my site isn't important enough for me to bankrupt myself if something goes wrong. ®