python 谷歌地图api_在python中使用bing或google API获取位置坐标

Here is my problem. I have a sample text file where I store the text data by crawling various html pages. This text contains information about various events and its time and location. I want to fetch the coordinates of these locations. I have no idea on how I can do that in python. I am using nltk to recognize named entities in this sample text. Here is the code:

import nltk

with open('sample.txt', 'r') as f:

sample = f.read()

sentences = nltk.sent_tokenize(sample)

tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]

tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]

chunked_sentences = nltk.batch_ne_chunk(tagged_sentences, binary=True)

#print chunked_sentences

#print tokenized_sentences

#print tagged_sentences

def extract_entity_names(t):

entity_names = []

if hasattr(t, 'node') and t.node:

if t.node == 'NE':

entity_names.append(' '.join([child[0] for child in t]))

else:

for child in t:

entity_names.extend(extract_entity_names(child))

return entity_names

entity_names = []

for tree in chunked_sentences:

# Print results per sentence

# print extract_entity_names(tree)

entity_names.extend(extract_entity_names(tree))

# Print all entity names

#print entity_names

# Print unique entity names

print set(entity_names)

Sample file is something like this:

La bohème at Covent Garden

When: 18 Jan 2013 (various dates) , 7.30pm Where: Covent Garden,

London, John Copley's perennially popular Royal Opera production of

Puccini's La bohème is revived for the first of two times this season,

aptly over the Christmas period. Sir Mark Elder conducts Rolando

Villazón as Rodolfo and Maija Kovalevska as Mimì. Mimì meets poet

Rodolfo (Dmytro Popov sings the role on 5 and 18 January) one cold

Christmas Eve in Paris' Latin Quarter. Fumbling around in the dark

after her candle has gone out, they fall in love. Rodolfo lives with

three other lads: philosopher Colline (Nahuel di Pierro/Jihoon Kim on

18 January), musician Schaunard (David Bizic) and painter Marcello

(Audun Iversen), who loves Musetta (Stefania Dovhan). Both couples

break up and the opera ends in tragedy as Rodolfo finds Mimì dying of

consumption in a freezing garret.

I want to fetch coordinates for Covent Garden,London from this text. How can I do it ?

解决方案

You really have two questions:

How to extract location text (or potential location text).

How to get location (latitude, longitude) by calling a Geocoding service with location text.

I can help with the second question. (But see edit below for some help with your first question.)

With the old Google Maps API (which is still working), you could get the geocoding down to one line (one ugly line):

def geocode(address):

return tuple([float(s) for s in list(urllib.urlopen('http://maps.google.com/maps/geo?' + urllib.urlencode({'output': 'csv','q': address})))[0].split(',')[2:]])

Here’s the readable 7 line version plus some wrapper code (when calling from the command line remember to enclose address in quotes):

import sys

import urllib

googleGeocodeUrl = 'http://maps.google.com/maps/geo?'

def geocode(address):

parms = {

'output': 'csv',

'q': address}

url = googleGeocodeUrl + urllib.urlencode(parms)

resp = urllib.urlopen(url)

resplist = list(resp)

line = resplist[0]

status, accuracy, latitude, longitude = line.split(',')

return latitude, longitude

def main():

if 1 < len(sys.argv):

address = sys.argv[1]

else:

address = '1600 Amphitheatre Parkway, Mountain View, CA 94043, USA'

coordinates = geocode(address)

print coordinates

if __name__ == '__main__':

main()

It's simple to parse the CSV format, but the XML format has better error reporting.

Edit - Help with your first question

I looked in to nltk. It's not trivial, but I can recommend Natural Language Toolkit Documentation, CH 7 - Extracting Information from Text, specifically, 7.5 Named Entity Recognition. At the end of the section, they point out:

NLTK provides a classifier that has already been trained to recognize named entities, accessed with the function nltk.ne_chunk(). If we set the parameter binary=True , then named entities are just tagged as NE; otherwise, the classifier adds category labels such as PERSON, ORGANIZATION, and GPE.

You're specifying True, but you probably want the category labels, so:

chunked_sentences = nltk.batch_ne_chunk(tagged_sentences)

This provides category labels (named entity type), which seemed promising. But after trying this on your text and a few simple phrases with location, it's clear more rules are needed. Read the documentation for more info.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值