Saturday 5th November 2011

The Darwin Traverse

It’s always nice to get some feedback on OpenCycleMap – especially when it’s being put to unexpected use. I received this in an email:

We are french Alpinist and we just want to thanks you for your job in open cycle map.
We’ve just finish a traverse in cordillera darwin in chili and we have used i phone with “motion x gps” apps.
on this aps we have acces to open cycle map relief and it was the best map of the region, just some very little mistake.

Teaser english version Darwin traverse from gmhm chamonix on Vimeo.

Wow! Amazing stuff. Thanks to François from the expedition for getting in touch.

Wednesday 6th April 2011

OpenStreetMap Hack Weekend

Last weekend we held another Hack Weekend for OpenStreetMap, and I thoroughly enjoyed it from start to finish. Especially the start, which involved sitting outside on a warm spring evening with a cold beer and unwinding!

This was probably one of the largest Hack Weekends that we’ve ran so far – I counted 25 people at one point – and I volunteered to help anyone who was interested in using git, developing Potlatch2 and improving the Rails Port (aka the OpenStreetMap website). As part of this I ran a few short workshops which were surprisingly well attended – I’d expected 2 or 3 people for each one but ended up with 10-15 each instead! I’ll be interested to see what workshops people are interested in for the next Hack Weekend.

When I wasn’t running workshops or helping other people, I was working with Richard Fairhurst on the Potlatch 2.0 release – and this was the point where we made it the default editor on the OpenStreetMap website. It’s been painful for the last few months watching thousands of people learning to use potlatch1, so we’ve just made a big step in making OpenStreetMap easier to get started with. The news made it onto OpenGeoData and even ReadWriteWeb. Development doesn’t stop at 2.0, of course – we’ve got lots of in-progress work on branches (including the long-awaited History dialog that I’ve been working on) and it’ll be good to see them being merged in when they are ready. We also managed to spot a few bugs within the first few hours of the new release!

It was also great to see a bunch of people committing code to projects they’d never worked on before – one of the main reasons we run the weekends. There was lots of work on the Rails Port, including improving the layout on mobile screens and working round bugs with postgres 9. But I’ve no idea what everyone was up to at the far end of the room – it was such a big, busy weekend that I couldn’t keep track! One thing that was prevalent were people picking up git for the first time, and our recent migration to using git for Potlatch2 proved really useful when juggling which features to include in 2.0 and which to leave for further development.

I’m looking forward to the next Hack weekend, which Matt is already organising. If you’re tempted to come help develop OSM and learn something new, you should come along!

Monday 31st January 2011

Tweak a little here, fix a little there

Another round of updates to the OpenCycleMap cartography was released a week ago, after a few days of local testing, bug investigating and general “technical-debt” payments.

The biggest fix is that I’ve finally tracked down what was causing all kinds of problems with riverbanks. The OpenCycleMap code dates back from long in the past when the riverbank tag was first introduced, and since then it’s greatly expanded and is now heavily used in multipolygons. There was a bug with some code thinking they were linear features and other code treating them as polygons – which used to work fine, but was recently leading to giant triangles lurching across the landscape. Thankfully it turned out not to be a problem with the relation-handling code in osm2pgsql – I had enough of that last year!

Riverbanks gone mad

A major feature of this update is the map now treats points of interest – like shops, pubs and so on – equally, whether they are tagged as nodes or as areas. So in hyper-detailed places where shop nodes are being replaced by building outlines the names and icons will now show up properly. You can see some examples around Peckham where Tom Chance has been hard at work.

Another ‘technical debt’ problem was regarding the “cycle node networks” widely used in the Netherlands and Belgium. When I originally tried rendering the icons at the junctions mapnik blew up - there was a bug with running ShieldSymbolizer on points. Even though this was fixed in mapnik years ago it was only last summer that I started using a new enough version, and it’s taken until now for me to reinstate (and redesign) the icons. But the new circles certainly look nicer than just numbers on the map, so it’s been worth the wait!

Node network

Pedestrian areas are finally drawn properly, and cafés have been added. Bike shops get a new, clearer icon and suburbs and localities are shown. On the attention to detail front, at medium zoom levels national cycle routes are consistently prioritised over regional and local routes, and place names should behave a bit more predictably as you zoom in. And finally street labels won’t bend back on themselves so much and should therefore be easier to read.

Tangled Mess

The server is chugging away at refreshing all the tiles – it’ll take a week or so to get through them all, but you can see the updates filtering through already in the most popular areas.

Many thanks go to MotionX for supporting the project and this round of updates in particular, and to everyone who diligently filed bug reports and (gently) encouraged me to fix them!

Wednesday 27th October 2010

Tiger Edited Map resurrected

Recently I’ve been working with MapQuest to rebuild the OpenStreetMap “Tiger Edited Map“. It was publicly released last week (blog, link).

Tiger Edited Map

The original map was created by Matt while he was at CloudMade, but it disappeared not long after we left at the end of last year. This is a from-scratch reimplementation with a few bonus features – it’s updated every few minutes, and the stylesheets are available on GitHub. It uses osm2pgsql with extended attribute information to enable styling by openstreetmap id and date ranges (see the nitty-gritty here) – and a word to the wise: don’t turn on extended attributes for nodes unless you have infinite hard drive space and patience to go with it!

It’s great to see how much progress there’s been this year, and it shows where we need to check for the usual TIGER issues. One of the interesting things for me is that it shows a recognisable editing pattern across the entire US – the major roads have all been edited (most multiple times), as have vast swathes of urban areas – enough that OSM is a distinct enough dataset from TIGER to stand out on its own. Hopefully this will inspire more people to fix up the streets in their own areas and drive the quality of OSM data in the US upwards – step by step. My next plans along these lines is to work on the Rapid Assessment Tool I made some time ago – moving along the QA debate from the origins of the data (I believe we’re often too hung up on the word “TIGER”) and onto assessing how good OSM data is on its own merits.

If anyone has any suggestions for improvements to the style – especially changes to the detection algorithm, or similar ideas for other regions – then I’d love to see either forks from the git sources or even plain old comments below!

Monday 4th October 2010

Quick and dirty usability testing of OSM

Last week I joined Ant and Deb from MapQuest in order to help out with the UCL mapping party. On the Wednesday I went out with some new Masters students and got soaked in the rain around Camden, but the main interest for me was the following day when we all gathered in the computer lab to uploaded the newly collected data. While I was helping out I was also scribbling furiously whenever I found someone stuck on some aspect of OSM that I hadn’t expected.

UCL student mapping party

I was briefly worried that there would be a flurry of activity while they logged on and that I’d miss most of it, but actually the account creation was so long and tortuous that it gave me plenty of time to watch. Silver linings, etc, I guess. I took notes, and so here they are, in the order I wrote them down.

  1. Where did the email go? – The biggest hurdle and the one that spread them out was confirming their email. Given that the OSM servers are on the same campus as we were, it took an extraordinary amount of time for them to appear. But the issue here was that on the user signup page there was no indication as to which email address the confirmation email was sent to, and one person was worried there was a typo. It also made it impossible for me to check that there wasn’t a typo in their (to them) brand new address.
  2. Nobody reads the CTs, and everyone ticks the PD box without reading it either – I’ll win no friends with this observation, but I saw nobody scrolling the CTs box, and everyone reflexively ticked the box beside the agree button. I’m guessing they all thought it was a “have you read the above legal stuff” which you normally get on such forms.
  3. Send another confirmation email – There’s no way to trigger sending another copy of the confirmation email. Sometimes they go missing, and at least if there was a button the frustration levels would go down.
  4. Not obvious what the settings page is for – After confirming their email the users end up on the settings page, where almost the first thing it shows you is your email address and a box to put a new email address into. That confused a lot of people. Things like add a friend, set a home location, read some getting started notes etc would be more useful
  5. Highlight unrecognised tags – I found one guy who had, and it’s not clear how, ended up with all his name tags with a capital N. These would be better highlighted while editing that it’s an unexpected tag.
  6. Anxiety over tags missing from autocomplete lists – on two occasions I had people worried that what they were typing (in both cases “office”) wasn’t in the autocomplete list. I had to explain that there are things on Map Features (and elsewhere on the wiki) that aren’t in the list, and that’s not a problem.
  7. Confusion over the preset dropdown (10a and 10b on this image) Three people struggled to make it stay open (i.e. click – hold – move – release). One guy kept selecting different things, and didn’t realise it was adding more tags and changing one (amenity) that he’d already set, until I pointed it out. I had to explain the small icon (10a) was a button that changed what was on the dropdown. Most of the icons used in 10a weren’t understood (car and bike were good, the football and postbox less so). Many people made the same mistake of adding a POI, adding the correct tags, and then worrying that it said “(no preset)” and tried to find the correct thing in the menu – i.e. misunderstood the purpose of it.
  8. Couldn’t find double-click – Since they were entering POIs they’d already collected, they rapidly found themselves without an appropriate one on the POI panel and searched the wiki. With the tags in hand, they were then stumped on how to add a blank POI. One guy worked out he could change the tags on an existing one, but either instructions (”double click”) or a multi-purpose / “blank” POI icon would be better.
  9. Couldn’t add extra tags – three or four times people needed the + icon pointed out to them
  10. Map Features – long descriptions – most people found themselves on Map Features reading the key, value and short description, but I didn’t see anyone realise that they could click the value for more details. This should perhaps go (automatically) onto the end of the short description text as a “More details…” link.
  11. Confusion with abandoned features – repeatedly people found proposed and/or abandoned features, and similar wiki-works-in-progress. As well as not understanding, they also didn’t care, and didn’t read the page either – they were just skim-reading to find the tags they needed. I’d lean strongly towards clearing off the 3-year-old abandoned pages, but I realise there are “wiki-historians” who want to keep everything for posterity.
  12. Search beyond Map Features – most people searched up and down the Map Features page using the browser-based search (Ctrl+F). They were then stuck when they couldn’t find the thing they were looking for, and had to be pointed towards the search box to search the rest of the wiki. Again, it wasn’t clear that there are plenty of things obscure enough to not be on the main list. Also, “Also known as” and “similar to” and “see also” sections of the tag documentation are worth their weight in gold. A surprising number of pages don’t have them.

A lot of the most interesting stuff I found was regarding Potlatch 1, and (fortunately?) very little of it applies to Potlatch2 since the UI has been overhauled. I’d love to also work on the Friends functionality of the website, since when the students started “friending” each other, pretty much nothing happened. We could show friends edits, diary entries etc. One thing that stood out for me though, was we should remove the PD tickbox from the CTs. It’s added confusion if you read it, and most people don’t so the point of it is moot. It’s not on the critical path for signup so it shouldn’t be in the signup flow at all. It can live in the user settings page or somewhere similar. It’s not legally binding and it’s not working a straw poll either. Finally, it would be great if there was more stuff possible before the email was confirmed, like adding friends – or even links to introduction videos or something like that.

I’ll leave you with the best and least-expected I-never-thought-of-that example of the day. I watched one student find the entry in Map Features for the shop that he wanted to add. He highlighted the icon, right clicked and selected Copy, then changed tabs to Potlatch and right clicked in order to paste the icon where he wanted it to appear.

If only, my friend, if only.

Thanks to Muki Haklay and Thomas Koukoletsos from UCL for inviting us along. If anyone has any similar opportunities for me to come and watch people learning OSM, please get in touch.

Monday 5th July 2010

Map rendering on EC2

Over the last two years I’ve been running the OpenCycleMap tileserver on Amazon’s EC2 service. Plenty of other people do the same, and I get asked about it a lot when I’m doing consulting for other companies. I thought it would be good to take some time to say a bit about my experiences, and maybe this will be useful to you at some point.

OpenCycleMap tileEC2 is great if you have a need for lots and lots of computing power, and your need for using CPUs fluctuates. At its best, you have a task that needs hundreds of CPUs, but only for a few hours. So you can spin up as many instances as you like, do your task, and switch them back off again. Map rendering, and here I’m talking about mapnik/mod_tile rendering of OpenStreetMap data, initially seems to hit that use-case – generating map tiles involves lots of processing of the map data, and then you have your finished map images which are trivial to serve.

But that’s not really the case, it turns out. After you’ve finished experimenting with small areas and start moving to a global map, you find that disk IO is by far the most important thing. There are two stages to the data processing – import and rendering. During import you take a 10Gb openstreetmap planet file and feed it into PostGIS with osm2pgsql. You want to use osm2pgsql –slim (to allow diff updates), but that involves huge amounts of writing and reading from disk for the intermediate tables. It can take literally weeks to import. When you’re rendering, renderd lifts the data from the database, renders it, writing the tiles back to disk, and then mod_tile reads the disk store to send the tiles to the client. All in all, lots of disk activity. And hugely more if you mention contours or hillshading.

Which wouldn’t be too bad, except the disks on EC2 suck. It’s not a criticism, since it’s an Elastic Compute Cloud, not an Elastic Awesome-Disks Cloud. It’s a system designed for doing calculations, not handling reading and writing huge datasets to and from disk. So their virtual disks are much slower than you would like or expect from the rest of the specs. On the opencyclemap “large” EC2 instance, roughly one core is being used for processing, and the rest is all blocked on IO. Although it’s marked as having “high” IO performance on their instance types page, I’d suggest for “moderate” and “high” you should read “dreadful” and “merely poor” respectively.

Amazon’s S3 is their storage component of their Web Services suite. So instead of thrashing the disks on EC2, how about storing tiles on S3? It’s possible, but the main drawback is that it makes it much, much harder to generate tiles on-the-fly. If you point your web app at an S3 bucket there’s no way that I know of to pass 404s onto an EC2 instance to fulfil. If you’re happy with added latency, then you could still run a server that queries S3 before deciding to render, and copy the output to S3, but I can’t imagine that being faster than using EC2’s local storage. You can certainly use S3 to store limited tilesets, such as limited geographical areas or a limited number of zooms. But pre-generating a full planet’s worth of z18 tiles would take up terabytes of space, and only a vanishingly small number of tiles would ever be served.

Finally, there is the cost of running a tileserver. Although Amazon are quite cheap if you want a hundred servers for a few hours, the costs start mounting if you have only one server running 24 hours a day – which is what you need from a tileserver or any other kind of webserver. $0.34 per hour seems reasonable until you price for the first four weeks uptime, where all kinds of non-cloud providers come into play, simply paying monthly rent on a server instead. Factoring in bandwidth costs for a moderately well-used tileserver can make it mightily expensive. Any extras can be added too – EBS if you want your database to survive the instance being pulled, or S3 storage.

EC2 is, more or less, exactly not what you want from a tileserver. Expensive to run, slow disks. So why is it popular? First off is buzzwords – cloud, scalable and so on. If you aren’t careful you can easily empty the piggybank on running a handful of tileservers long before you’re running enough to do proper demand-based scaling changing from hour to hour during the day. If you’re trying to “enterprise” your system you’ll worry about failovers long before you need such elastic scaling, and you need your failovers and load balancers running 24×7 too. Second is for capacity planning – if you want to do no planning whatsoever, then EC2 is great! But it’s much cheaper to rent a few servers for the first couple of months, and add more to your pool when (if?) your tileserver gets popular. But a there is a third reason that is quite cool – for people like Development Seed’s TileMill – you can give your tileserver image to someone else extremely easily, and it’s their credit card that gets billed, and they can turn on and off as many servers as they like without hassling you.

CambridgeI’ve been setting up a new tileserver for OpenCycleMap that’s not on EC2, and I’ll post here again later with details of how I got on. I’m also working on another couple of map styles – with terrain data, of course, and if you’re interested in hearing more then get in touch.

So in summary

  • I’d recommend EC2 if you want to pre-generate large numbers of tiles (say a continent down to z16), copy them somewhere and then switch off the renderer
  • I’d consider EC2 for ultra-large setups where you are running 5 or more tileservers already, but only as additional-load machines
  • I wouldn’t recommend EC2 if you want to run an on-the-fly tileserver. Which is what most people want to do.

Any thoughts? Running a tileserver on EC2 and disagree? Let me know below.