Monday 5th July 2010

Map rendering on EC2

Over the last two years I’ve been running the OpenCycleMap tileserver on Amazon’s EC2 service. Plenty of other people do the same, and I get asked about it a lot when I’m doing consulting for other companies. I thought it would be good to take some time to say a bit about my experiences, and maybe this will be useful to you at some point.

OpenCycleMap tileEC2 is great if you have a need for lots and lots of computing power, and your need for using CPUs fluctuates. At its best, you have a task that needs hundreds of CPUs, but only for a few hours. So you can spin up as many instances as you like, do your task, and switch them back off again. Map rendering, and here I’m talking about mapnik/mod_tile rendering of OpenStreetMap data, initially seems to hit that use-case – generating map tiles involves lots of processing of the map data, and then you have your finished map images which are trivial to serve.

But that’s not really the case, it turns out. After you’ve finished experimenting with small areas and start moving to a global map, you find that disk IO is by far the most important thing. There are two stages to the data processing – import and rendering. During import you take a 10Gb openstreetmap planet file and feed it into PostGIS with osm2pgsql. You want to use osm2pgsql –slim (to allow diff updates), but that involves huge amounts of writing and reading from disk for the intermediate tables. It can take literally weeks to import. When you’re rendering, renderd lifts the data from the database, renders it, writing the tiles back to disk, and then mod_tile reads the disk store to send the tiles to the client. All in all, lots of disk activity. And hugely more if you mention contours or hillshading.

Which wouldn’t be too bad, except the disks on EC2 suck. It’s not a criticism, since it’s an Elastic Compute Cloud, not an Elastic Awesome-Disks Cloud. It’s a system designed for doing calculations, not handling reading and writing huge datasets to and from disk. So their virtual disks are much slower than you would like or expect from the rest of the specs. On the opencyclemap “large” EC2 instance, roughly one core is being used for processing, and the rest is all blocked on IO. Although it’s marked as having “high” IO performance on their instance types page, I’d suggest for “moderate” and “high” you should read “dreadful” and “merely poor” respectively.

Amazon’s S3 is their storage component of their Web Services suite. So instead of thrashing the disks on EC2, how about storing tiles on S3? It’s possible, but the main drawback is that it makes it much, much harder to generate tiles on-the-fly. If you point your web app at an S3 bucket there’s no way that I know of to pass 404s onto an EC2 instance to fulfil. If you’re happy with added latency, then you could still run a server that queries S3 before deciding to render, and copy the output to S3, but I can’t imagine that being faster than using EC2’s local storage. You can certainly use S3 to store limited tilesets, such as limited geographical areas or a limited number of zooms. But pre-generating a full planet’s worth of z18 tiles would take up terabytes of space, and only a vanishingly small number of tiles would ever be served.

Finally, there is the cost of running a tileserver. Although Amazon are quite cheap if you want a hundred servers for a few hours, the costs start mounting if you have only one server running 24 hours a day – which is what you need from a tileserver or any other kind of webserver. $0.34 per hour seems reasonable until you price for the first four weeks uptime, where all kinds of non-cloud providers come into play, simply paying monthly rent on a server instead. Factoring in bandwidth costs for a moderately well-used tileserver can make it mightily expensive. Any extras can be added too – EBS if you want your database to survive the instance being pulled, or S3 storage.

EC2 is, more or less, exactly not what you want from a tileserver. Expensive to run, slow disks. So why is it popular? First off is buzzwords – cloud, scalable and so on. If you aren’t careful you can easily empty the piggybank on running a handful of tileservers long before you’re running enough to do proper demand-based scaling changing from hour to hour during the day. If you’re trying to “enterprise” your system you’ll worry about failovers long before you need such elastic scaling, and you need your failovers and load balancers running 24×7 too. Second is for capacity planning – if you want to do no planning whatsoever, then EC2 is great! But it’s much cheaper to rent a few servers for the first couple of months, and add more to your pool when (if?) your tileserver gets popular. But a there is a third reason that is quite cool – for people like Development Seed’s TileMill – you can give your tileserver image to someone else extremely easily, and it’s their credit card that gets billed, and they can turn on and off as many servers as they like without hassling you.

CambridgeI’ve been setting up a new tileserver for OpenCycleMap that’s not on EC2, and I’ll post here again later with details of how I got on. I’m also working on another couple of map styles – with terrain data, of course, and if you’re interested in hearing more then get in touch.

So in summary

  • I’d recommend EC2 if you want to pre-generate large numbers of tiles (say a continent down to z16), copy them somewhere and then switch off the renderer
  • I’d consider EC2 for ultra-large setups where you are running 5 or more tileservers already, but only as additional-load machines
  • I wouldn’t recommend EC2 if you want to run an on-the-fly tileserver. Which is what most people want to do.

Any thoughts? Running a tileserver on EC2 and disagree? Let me know below.

Wednesday 17th February 2010

Finishing the UK road network

A few weeks ago I was discussing the progress of mapping the UK at one of the London OSM pub meetups (Harry picked up on it in his diary entry for the evening). I was making the point that we’re making great progress, and if things continue as they are then most towns and villages will be mapped in 12 months time. Now we’ve certainly heard that before (Steve Coast was targetting summer 2008 if I recall) but my guesstimates are based on weekly road length analysis that I’ve seen and I’m currently working on making public.

Fill the Gap

But leaving things to take their due course is the easy way out, and I think we can do most of the remaining work this summer if we collectively put our minds to it. What would that involve? Well, a few dozen mapping parties would be a good start, since there are only currently two scheduled (Witham and Maidstone). CloudMade had been sponsoring a few mapping parties in the past, but that seems to have fizzled out, so it’s up to the community to sort things out ourselves. A good source of ideas for places is the UK Mapping Priorities and Secondary Priorities pages. I’ve been updating the former over the last week, and it’s impressive how many places have been mapped over the last six months. But there are some glaring problem areas – anyone want to organise a Darlington party?

IMG_1695

What else beyond parties? Publicity is something we’ve been reasonably poor at over the years. Getting in the press is a good way to “prime the pump” for gaining new members, and probably encouraging people who might have looked before to look again. We can just make random “press releases” about all kinds of things we do – that’s what everyone else does! I’ve just gone looking on the wiki for previous press releases, and they are woefully lacking. Whilst it’s great to get coverage in the national press, I think we should be aiming for all the local papers that struggle to find anything interesting to print. Of course, if we had those two dozen mapping parties they would be a good excuse for releases. But beyond that, lots and lots of blogging, discussing on forums and things like that. Just try to find ways to put the word out. I’m selling promotional stickers in the OpenCycleMap shop – any more ideas like that? We could get some funding or fundraising for more leaflets to hand out, or for organising stalls at trade shows, or for buying another banner, or buying our own aerial imagery.

And when we have all these new people, we’ll have more awesome tools for them. Grant is sorting out the wiki onto new, faster hardware, and I’ve been finding time to work on Potlatch2. More development helps, so if you’re that way inclined I’d love to have you helping. But it’s completely plausible to finish the UK road network this summer if we get organised and get motivated. Who’s up for the challenge?

Wednesday 13th January 2010

ASTER – Not worth it yet

A few months ago NASA caused a stir by releasing a new global height dataset called ASTER. I use an earlier dataset (SRTM) for OpenCycleMap, which has a few problems that ASTER, at least initially, promised to solve. The three of primary interest to me are:

  • Voids – SRTM has “no data” gaps in certain places of the world, where the radar reflections went haywire. These happen in marshes (not of interest) and mountains (of great interest!), especially over the Alps. ASTER is void-filled already, so the clever-but-inaccurate void-filling I use wouldn’t be necessary
  • Resolution – It’s great that SRTM covers the whole world, but I’d love to see it at a higher resolution. ASTER’s nominal resolution is three times greater than SRTM, so it’s very attractive.
  • Arctic coverage – SRTM only goes as far north as 60°N, which is a bit of a problem in Scandanavia. Although there’s GTOPO30 data for these areas, that’s got a horizontal resolution measured in kilometres, so not exactly great for me. ASTER covers those areas too, up to 83°N.

So far so good. But when I started work with ASTER in December, things spiralled rapidly downhill. First is the pointlessly irritating “order a dataset” website, that sucked up hours of going round in circles. It’s like a shopping website from 1999. You need to use a stupid interface to order which 1°x1° tiles you want, and “All” isn’t an option, despite there being 22,600 of them. It seems geared up for people who want a couple of dozen at a time, and the whole thing has a feel of being run by men with beards and sandals who’d rather you didn’t use their website in anything newer than Netscape 4 on HP-UX.

When I read the README alarm bells started ringing. There’s a section on “mole runs” and “pit artefacts” that sounded a bit worrying, but I wasn’t sure how much of an issue they’d be – if they were small and few and far between then that’s not much of a problem. But the biggest thing that caught my eye, buried on page six after pages of confirmation of how good the accuracy was, right at the end of a section as a throwaway comment, was the following statement:

Also, while the elevation postings in the ASTER GDEM are at 1 arc-second, or approximately 30 m, the detail of topographic expression resolvable in the ASTER GDEM appears to be between 100 m and 120 m.

That’s a bit of a bomb-shell – it’s saying that although it’s got a much higher nominal resolution than SRTM, it’s effective resolution is about the same – there’s not any more actual detail, just more pixels. That was almost enough to make me give up there and then, but it’s still void filled and covering more area of the planet, which would be good improvements. So I grabbed the DEM (thankfully, they’re straightforward GeoTIFFs) and got to work over Snowdon. I did some colouring and contours, and they both looked excellent and much better than what I made from SRTM. But then I tried hill-shading, and disaster!

Here’s the area around Snowdon (click through for original size):

ASTER Snowdon

and a detail of Snowdon itself:

ASTER Snowdon detail

The mole runs are everywhere – all across that image, even on the flat bits. And the pit artefacts are huge – the size of quarries, and really, really obvious. I honestly can’t use that – maybe for a “what the world would look like if it looked like the moon” project, but nothing more serious than that. And considering that SRTM has only a handful of single-pixel voids in that area, the guys making ASTER have made something that’s substantially worse than an oversampled SRTM. And considering they were even using SRTM to fill the gaps in the ASTER data, that’s a pretty poor show. I started reading around and found a few people saying similar things. And when I though about it, the “improvements” to the contours I saw could be recreated with SRTM by using gdalwarp to artificially increase the resolution (with some nice smoothing) before generating the contour lines. So I gain nothing from ASTER in the 95% of the planet that doesn’t have significant voids, and in that same 95% it’s not really usuable.

So for now, I’ve given up with ASTER. I might revisit it for the band between 60°N and 83°N, but it also says in the readme they have voids over eurasia for that area (so much for void-filled, eh?). And it would be interesting to see if someone can fill the large SRTM voids with ASTER (which sounds back to front, hey-ho), but I don’t have time for that. However, as they say in the docs, all these artefacts are happening in the boundaries where they have different numbers of original samples, so maybe a future version will have these automatically smoothed out, and it they can figure out how to stop their 15m sampling getting turned into 120m effective resolution, that would be awesome. But for now I would say it’s not worth using it.

Tuesday 5th January 2010

Hill-shading on OpenCycleMap.org

It was over 18 months ago that I was originally trying to get hill-shading and hill-colouring working on OpenCycleMap (in fact, it wasn’t even called that back then, but that’s a different story). I eventually dropped the hill-shading part of it due to nasty boundary artefacts between source tiles, and due to the fact that the shading, well, didn’t look as nice as I wanted. It was all a bit grey and manky.

Hill Shade Teaser

So instead I launched just the hill-colouring in August 2008, which I was very happy with, and put hill-shading on the back burner. Time passed. Much time.

A few weeks ago, I rolled up my sleeves and got stuck in to figuring out how to do the hillshading properly. With some pointers from Matt, Mike and the OSM Wiki, I played around for a few days until I liked the end result.
Hill Shading results

Here’s a look at Snowdon before hill-shading. The colours do a good job of showing the lie of the land, but it’s a bit flat:
Snowdon Before Shading
Detail:
Snowdon Before Shading (Detail)
And what Snowdon looks like now. The shading lifts the peaks out from the map, and gives them a more solid-object feel:
Snowdon After Shading
Detail:
Snowdon After Shading (Detail)
It really helps most in complex mountains, like here in the Alps, where the contours would otherwise become a jumble and it’s hard to tell valleys from ridges. With the shading, it’s easy.
Valleys in the Alps

It’s a hard balancing act, since OpenCycleMap is first and foremost a map for cyclists, and too much hill-shading overpowers and distracts from the rest of the map. But then again, too little and it doesn’t seem worth the effort! I went for a subtle approach, where it’s enough to make the hills stand out but little enough you might not consciously notice. Unfortunately the effect is diminished in forested areas, and by dense contours, since it’s only the background height colouring that is shaded and those things start obscuring stuff.

Also, I was never really happy with the “drab grey” approach to shading – just making the shadows grey and the highlights white using alpha-blending – so I settled in the end for “hardlight” compositing. It’s a bit like the evolution of GUI buttons from Windows 3.11 (”right, top and left edges are white, other two edges are black, grey in the middle”) with those from MacOSX (”ooh, shiny”). Compare OpenCycleMap to Google Terrain and other hill-shaded maps, and I’m quite proud of the results.

If you have a map project that could do with some good-looking terrain info, then I’m available for freelance work.

Wednesday 9th December 2009

The View from Above

Over the last few months I’ve been involved in three different aerial imagery projects, all of which were to make imagery available for OpenStreetMap contributors. It’s nice that we have imagery available from the guys at Yahoo!, but on occasion we lay our hands on some better stuff.

First off was Stratford-upon-Avon, here in the UK. As an experiment we hired a small plane, put one of our contributors on board with his SLR, and flew around town. All the photos were then put on line, and even though I’d never been to the place before I could use Tim Water’s online map rectifier and re-purpose it slightly to warp the photos and line them up to the map data. Other people did the same, and then I collected all the separate images and processed them into one map layer. A few days later I was at a “Traditional GIS” conference in Stratford, and there was a great deal of interest from people in the aerial imagery project and OSM in general. I can recommend it as a publicity stunt for other conferences!

Central Stratford

You can see more pictures of the end results on flickr, or read more about it on the OpenStreetMap wiki.

Next up was the Philippines. After massive deadly flooding aid agencies on the ground were using OpenStreetMap to help with the disaster relief. Manning Sambale from the OSM Philippines community received a donation of satellite imagery of part of the affected area, and asked for help processing it and making it available. I made some space available for him to upload it, and then processed it into the right format for OSM editors. With such high-quality imagery available so soon after the disaster, OSM volunteers both in-country and working remotely set about mapping the villages and marking on the locations of bridges and damaged areas. You can get a sense of scale of the damage from the image below – the gravel banks covered fields and villages around the river, and the imagery was a huge help.

Philippines imagery

The third project was in Georgia, USA, where I got hold of some fairly recent (2007) imagery from the Department of Agriculture National Aerial Imagery Program (NAIP). Although Yahoo! has good quality imagery available across the whole of the USA, this public-domain imagery was more up to date and slightly higher quality that what Yahoo! has in rural areas of Georgia. This is by far the biggest set of imagery I’ve had access to – hundreds of gigabytes of the stuff – and only a handful of counties were processed.

Georgia Imagery

I’m sure as time goes on we’ll get more and more sources of imagery to help with OSM, and I look forward to lots of “crowd sourcing” experiments like the stuff from Stratford as much as I like the imagery from the professionals.

If you have access to any sources of imagery and need a hand getting it processed, get in touch!

Friday 27th November 2009

State of the Map 2007 Videos – in HD!

I wonder whether everyone missed the donations link last time, so I’ll put it first instead! Go on, drop a pound in the collections tin.


When Jon Burgess found out that I was editing the 2008 videos, he dug out his recordings of the first SOTM conference and sent me a disk full of them. This time I knew what I was doing a bit more, and the quality is much improved – in fact, if you have the bandwidth and computer for it, you can also watch them in full HD glory.

Unfortunately Jon didn’t have enough space to record all the talks, but we have 15 available on vimeo. For a full list of talks and links see the OpenStreetMap wiki, or just have a look at my video account and watch them all!

My pick of the bunch are:

It’s great watching these videos – I wasn’t even “into” OSM enough back then to go the conference! And it’s nice to see the things that are wildly different now, and all the things that are still familiar topics.