调用 谷歌地图API 计算距离
1. 代码功能讲述
- 该代码首先导入必要的库:requests用于进行HTTP请求,BeautifulSoup用于解析HTML内容,pandas用于数据处理,tqdm用于显示进度条。
- 谷歌地图API密钥被声明为google_map_api。这个密钥是进行地理编码和距离计算所必需的。 Distance matrix API 网址
- 初始URL被定义为init_url,这是My School网站上学校搜索页面的URL。
- 代码使用requests.get()方法向初始URL发出HTTP GET请求并检索页面的HTML内容。然后使用BeautifulSoup解析HTML内容并存储在soup对象中。
- 代码通过定位具有类“pagination”的
<div>
元素,然后找到其中所有的<a>
标签来找到分页的页面链接。通过从倒数第二个页面链接中提取文本来确定总页面数。 - 基本URL设置为“https://www.myschool.edu.au/”,因为后续页面URL将相对于此基本URL。
- 创建一个名为page_urls的列表来存储搜索结果中所有页面的URL。通过遍历页面链接,提取
href
属性并将其附加到基本URL来实现。最后一页链接被排除,因为它是“下一页”按钮的链接。 - 代码使用for循环遍历页面URL。对于每个页面,它发送一个HTTP GET请求,解析HTML内容,并找到学校信息元素。
- 在循环内,初始化一个进度条(tqdm)来跟踪处理进度。对于每个学校元素,代码提取学校名称、州信息和郊区。
- 调用school_distance()函数来计算学校与UQ之间的距离。最后,打印每个学校的学校信息(名称、州、距离)。在处理每个学校后,更新进度条。
2. 代码关键函数
2.1 geocode()
geocode(address)接受一个地址作为输入,并使用Google Maps地理编码API检索地址的纬度和经度坐标。该函数构造API请求URL,使用requests.get()发送请求,并解析响应以提取坐标。
# 地理编码API请求函数
def geocode(address):
url = "https://maps.googleapis.com/maps/api/geocode/json"
params = {
"address": address,
"key": google_map_api # 替换为您的Google Maps API密钥
}
response = requests.get(url, params=params)
data = response.json()
if data["status"] == "OK":
location = data["results"][0]["geometry"]["location"]
return location["lat"], location["lng"]
else:
return None
2.2 school_distance()
school_distance()函数接受学校名称、州和郊区作为输入。它使用geocode()函数检索学校和昆士兰大学(UQ)的坐标。然后,它使用calculate_distance()函数计算学校和UQ之间的距离。该函数处理地理编码和距离计算中的潜在错误,并以公里为单位返回距离。
def school_distance(school_name, school_state, school_suburb):
# 学校名称、州和区信息
schools = [{"name": school_name, "state": school_state, "suburb": school_suburb},]
uq_name = "The University of Queensland"
# 获取昆士兰大学的经纬度
uq_address = f"{uq_name}, QSL, St Lucia" # 可能需要根据实际情况调整
uq_location = geocode(uq_address)#经纬度
if uq_location:
uq_coordinates = f"{uq_location[0]}, {uq_location[1]}"
for school in schools:# 使用学校名称、州和区信息搜索学校的地址
school_address = f"{school['name']}, {school['suburb']}, {school['state']}, AU"
school_location = geocode(school_address)# 获取学校的经纬度
if school_location:
school_coordinates = f"{school_location[0]},{school_location[1]}"
# 计算学校和昆士兰大学之间的距离
distance = calculate_distance(school_coordinates, uq_coordinates)
if distance:
distance_km = distance / 1000 # 将距离从米转换为千米
print(
f"The distance between {school['name']} and the University of Queensland is {distance_km} kilometers.")
return distance_km
else:
print(f"Distance calculation failed for {school['name']}.")
return 'Calculation failed'
else:
print(f"Geocoding failed for {school['name']}.")
return 'Geocoding failed'
else:
print("Geocoding failed for the University of Queensland.")
return 'Geocoding failed'
2.3 calculate_distance()
calculate_distance(origin, destination)接受两组坐标(起点和目的地),并使用Google Maps距离矩阵API计算它们之间的距离。该函数构造API请求URL,发送请求,并解析响应以提取距离。
# 距离计算API请求函数
def calculate_distance(origin, destination):
url = "https://maps.googleapis.com/maps/api/distancematrix/json"
params = {
"origins": origin,
"destinations": destination,
"key": google_map_api # 替换为您的Google Maps API密钥
}
response = requests.get(url, params=params)
data = response.json()
if data["status"] == "OK":
distance = data["rows"][0]["elements"][0]["distance"]["value"]
return distance
else:
return None
3. 完整代码
import requests
from bs4 import BeautifulSoup
import pandas as pd
from tqdm import tqdm
google_map_api = '????'##自己获取
# 发起HTTP请求并获取网站的HTML内容
init_url = "https://www.myschool.edu.au/school-search?FormPosted=True&SchoolSearchQuery=&SchoolSector=C%2CI&SchoolType=S&State=Qld"
response = requests.get(init_url)
html_content = response.text
# 解析HTML内容
soup = BeautifulSoup(html_content, "html.parser")
# 查找页数链接
page_links = soup.find("div", class_="pagination").find_all("a")
total_pages = int(page_links[-2].text.strip())
# 获取每个页面的网址
base_url = "https://www.myschool.edu.au/" # 基础网址
page_urls = [f"{base_url}{link['href']}" for link in page_links][:-1]
# 地理编码API请求函数
def geocode(address):
url = "https://maps.googleapis.com/maps/api/geocode/json"
params = {
"address": address,
"key": google_map_api # 替换为您的Google Maps API密钥
}
response = requests.get(url, params=params)
data = response.json()
if data["status"] == "OK":
location = data["results"][0]["geometry"]["location"]
return location["lat"], location["lng"]
else:
return None
# 距离计算API请求函数
def calculate_distance(origin, destination):
url = "https://maps.googleapis.com/maps/api/distancematrix/json"
params = {
"origins": origin,
"destinations": destination,
"key": google_map_api # 替换为您的Google Maps API密钥
}
response = requests.get(url, params=params)
data = response.json()
if data["status"] == "OK":
distance = data["rows"][0]["elements"][0]["distance"]["value"]
return distance
else:
return None
def school_distance(school_name, school_state, school_suburb):
# 学校名称、州和区信息
schools = [{"name": school_name, "state": school_state, "suburb": school_suburb},]
uq_name = "The University of Queensland"
# 获取昆士兰大学的经纬度
uq_address = f"{uq_name}, QSL, St Lucia" # 可能需要根据实际情况调整
uq_location = geocode(uq_address)#经纬度
if uq_location:
uq_coordinates = f"{uq_location[0]}, {uq_location[1]}"
for school in schools:# 使用学校名称、州和区信息搜索学校的地址
school_address = f"{school['name']}, {school['suburb']}, {school['state']}, AU"
school_location = geocode(school_address)# 获取学校的经纬度
if school_location:
school_coordinates = f"{school_location[0]},{school_location[1]}"
# 计算学校和昆士兰大学之间的距离
distance = calculate_distance(school_coordinates, uq_coordinates)
if distance:
distance_km = distance / 1000 # 将距离从米转换为千米
print(
f"The distance between {school['name']} and the University of Queensland is {distance_km} kilometers.")
return distance_km
else:
print(f"Distance calculation failed for {school['name']}.")
return 'Calculation failed'
else:
print(f"Geocoding failed for {school['name']}.")
return 'Geocoding failed'
else:
print("Geocoding failed for the University of Queensland.")
return 'Geocoding failed'
for id_, url in enumerate(page_urls):
print(f"==============第 {id_} 页: {url}============")
response = requests.get(url)
html_content = response.text
soup = BeautifulSoup(html_content, "html.parser")
# 定位学校信息的HTML元素
school_elements = soup.find_all("div", class_="school-section")
pbar = tqdm(total=len(school_elements), desc='处理进程')
# 提取学校信息
for school_element in school_elements:
pbar.update(1)
# 提取学校名称
school_name = school_element.find("h2").text
# 提取 state 信息
state_info = school_element.find_all("div", class_="col")[2].find_all("p")
state_label = state_info[0].text
state_value = state_info[1].text
# 计算距离
_distance = school_distance(school_name, state_value.split(',')[1], state_value.split(',')[0])
# 打印学校信息(或进行其他处理)
print("School Name:", school_name)
print(state_label + ":", state_value)
print("School Name:", school_name)
print(f"The distance between {school_name} and the University of Queensland is {_distance} kilometers.")
print("+++++++++++++++++++++++++++++++++++++\n")
pbar.close()
print('\n\n------------finish!!!------------')