osm2mysql_The Most Popular Pub Names

By Ross Lawley, MongoEngine maintainer and Scala Engineer at 10gen Earlier in the year I gave a talk at MongoDB London about the different aggregation options with MongoDB. The topic recently came up again in conversation at a user group,

By Ross Lawley, MongoEngine maintainer and Scala Engineer at 10gen

Earlier in the year I gave a talk at MongoDB London about the different aggregation options with MongoDB. The topic recently came up again in conversation at a user group, so I thought it deserved a blog post.

Gathering ideas for the talk

I wanted to give a more interesting aggregation talk than the standard “counting words in text”, and as the aggregation framework gained shiny 2dsphere geo support in 2.4, I figured I’d use that. I just needed a topic…

What is top of mind for us Brits?

Two things immediately sprang to mind: weather and beer.

I opted to focus on something close to my heart: beer :) But what to aggregate about beer? Then I remembered an old pub quiz favourite…

What is the most popular pub name in the UK?

I know there is some great open data, including a wealth of information on pubs available from the awesome open street map project. I just need to get at it and happily the Overpass-api provides a simple “xapi” interface for OSM data. All I needed was anything tagged with amenity=pub within in the bounds of the UK and with their xapi interface this is as simple as a wget:

http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59]

Once I had an osm file I used the imposm python library to parse the xml and then convert it to following GeoJSON format:

{

"_id" : 451152,

"amenity" : "pub",

"name" : "The Dignity",

"addr:housenumber" : "363",

"addr:street" : "Regents Park Road",

"addr:city" : "London",

"addr:postcode" : "N3 1DH",

"toilets" : "yes",

"toilets:access" : "customers",

"location" : {

"type" : "Point",

"coordinates" : [-0.1945732, 51.6008172]

}

}

Then it was a case of simply inserting it as a document into MongoDB. I quickly noticed that the data needed a little cleaning, as I was seeing duplicate pub names, for example: “The Red Lion” and “Red Lion”. Because I wanted to make a wordle I normalised all the pub names.

If you want to know more about the importing process, the full loading code is available on github: osm2mongo.py

Top pub names

It turns out finding the most popular pub names is very simple with the aggregation framework. Just group by the name and then sum up all the occurrences. To get the top five most popular pub names we sort by the summed value and then limit to 5:

db.pubs.aggregate([

{"$group":

{"_id": "$name",

"value": {"$sum": 1}

}

},

{"$sort": {"value": -1}},

{"$limit": 5}

]);

For the whole of the UK this returns:The Red Lion

The Royal Oak

The Crown

The White Hart

The White Horse

662a45e0a77e7107efe156a48a4d8184.png

Top pub names near you

At MongoDB London I thought that was too easy, so filtered to find the top pub names near the conference and showing off some of the geo functionality that became available in MongoDB 2.4. To limit the result set match and ensure the location is within a 2 mile radius by using $centreSphere. Just provide the coordinates [ , ] and a radius of roughly 2 miles (3959 is approximately the radius of the earth, so divide it by 2):

db.pubs.aggregate([

{ "$match" : { "location":

{ "$within":

{ "$centerSphere": [[-0.12, 51.516], 2 / 3959] }}}

},

{ "$group" :

{ "_id" : "$name",

"value" : { "$sum" : 1 } }

},

{ "$sort" : { "value" : -1 } },

{ "$limit" : 5 }

]);

What about where I live?

At the conference I looked the most popular pub name near the conference. Thats great if you happen to live in the centre of London but what about everyone else in the UK? So for this blog post I decided to update the demo code and make it dynamic based on where you live.

See: pubnames.rosslawley.co.uk

Apologies for those outside the UK - the demo app doesn’t have data for the whole world - its surely possible to do.

Cheers

All the code is available in my repo on github including the bson file of the pubs and the wordle code - so fork it and start playing with MongoDB’s great geo features!

### 回答1: gis_osm_natural_free是指免费的开放街图数据集中关于自然环境特征的一部分。这个数据集包含开放街图社区成员所贡献的、地球上自然地形特征的地理信息。在gis_osm_natural_free数据集中,包括了河流、湖泊、山脉、海洋、岛屿和其他各种自然地形。这个数据集可以用于绘制地图和空间建模,它能够为使用它的用户提供一些关于环境的重要信息,比如说地形、地势、自然保护区等等。gis_osm_natural_free同属于OSM数据集的一部分,然而,它与其他数据集不同的地方在于,它的数据是在自然环境中获取的数据,而不是基于人工或建筑物形成的。同时,gis_osm_natural_free是一份开源数据集,它可以被任何人自由地下载和使用,这使得它成为了广大开发者和研究人员们的重要数据源。总而言之,gis_osm_natural_free是一个包含有自然地形特征的数据集,它提供了基础的关于地理环境特征的地理信息,为使用者提供了多种可能性和应用场景。 ### 回答2: gis_osm_natural_free是OpenStreetMap(OSM)的一个GIS数据集,是自然要素的免费版本。OSM是一个由志愿者创建和维护的基于地理位置的数据集,它涵盖了世界各地的地理信息,包括地图、交通、建筑物和自然要素。自然要素从树木到河流到湖泊等都被记录在gis_osm_natural_free数据集中,因此可用于许多应用程序,例如城市规划、环境管理、水文学等。该数据集提供了一个开放的平台,可以让人们共享和获取准确的地理信息。此外,gis_osm_natural_free的开源性质,使得用户可以自由地使用、修改和分享数据,从而推动地理信息学和地理信息科学的发展。因此,gis_osm_natural_free对于提高人们的社会和环境责任感,促进可持续发展具有重要意义。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值