python爬取数据保存为csv时生成编号,使用python和Beautifulsoup4从抓取数据中写入和保存CSV文件...

最新推荐文章于 2024-05-12 23:19:42 发布

奇妙博物馆

最新推荐文章于 2024-05-12 23:19:42 发布

阅读量541

点赞数

文章标签： python爬取数据保存为csv时生成编号

I am trying to scrape data from the PGA.com website to get a table of all of the golf courses in the United States. In my CSV table I want to include the Name of the golf course ,Address ,Ownership ,Website , Phone number. With this data I would like to geocode it and place into a map and have a local copy on my computer

I utilized Python and Beautiful Soup4 to extract my data. I have reached as far to extract the data from the website but I am having difficulty on writing the script to export the data into a CSV file displaying the parameters I need.

Attached below is my script. I need help on creating code that will transfer my extracted code into a CSV file and how to save it into my desktop.

Here is my script below:

import csv

import requests

from bs4 import BeautifulSoup

url = "http://www.pga.com/golf-courses/search?searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"

r = requests.get(url)

soup = BeautifulSoup(r.content)

g_data1=soup.find_all("div",{"class":"views-field-nothing-1"})

g_data2=soup.find_all("div",{"class":"views-field-nothing"})

for item in g_data1:

try:

print item.contents[1].find_all("div",{"class":"views-field-counter"})[0].text

except:

pass

try:

print item.contents[1].find_all("div",{"class":"views-field-course-type"})[0].text

except:

pass

for item in g_data2:

try:

print item.contents[1].find_all("div",{"class":"views-field-title"})[0].text

except:

pass

try:

print item.contents[1].find_all("div",{"class":"views-field-address"})[0].text

except:

pass

try:

print item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text

except:

pass

This is what I currently get when I run the script. I want to take this data and make into a CSV table for geocoding later.

1801 Merrimac Trl

Williamsburg, Virginia 23185-5905

12551 Glades Rd

Boca Raton, Florida 33498-6830

Preserve Golf Club

13601 SW 115th Ave

Dunnellon, Florida 34432-5621

1000 Acres Ranch Resort

465 Warrensburg Rd

Stony Creek, New York 12878-1613

1757 Golf Club

45120 Waxpool Rd

Dulles, Virginia 20166-6923

27 Pines Golf Course

5611 Silverdale Rd

Sturgeon Bay, Wisconsin 54235-8308

3 Creek Ranch Golf Club

2625 S Park Loop Rd

Jackson, Wyoming 83001-9473

3 Lakes Golf Course

6700 Saltsburg Rd

Pittsburgh, Pennsylvania 15235-2130

3 Par At Four Points

8110 Aero Dr

San Diego, California 92123-1715

3 Parks Fairways

3841 N Florence Blvd

Florence, Arizona 85132

3-30 Golf & Country Club

101 Country Club Lane

Lowden, Iowa 52255

401 Par Golf

5715 Fayetteville Rd

Raleigh, North Carolina 27603-4525

93 Golf Ranch

406 E 200 S

Jerome, Idaho 83338-6731

A 1 Golf Center

1805 East Highway 30

Rockwall, Texas 75087

A H Blank Municipal Course

808 County Line Rd

Des Moines, Iowa 50320-6706

A-Bar-A Ranch Golf Course

Highway 230

Encampment, Wyoming 82325

A-Ga-Ming Golf Resort, Sundance

627 Ag A Ming Dr

Kewadin, Michigan 49648-9397

A-Ga-Ming Golf Resort, Torch

627 Ag A Ming Dr

Kewadin, Michigan 49648-9397

A. C. Read Golf Club, Bayou

Bldg 3495, Nas Pensacola

Pensacola, Florida 32508

A. C. Read Golf Club, Bayview

Bldg 3495, Nas Pensacola

Pensacola, Florida 32508

解决方案

All you really need to do here is put your output in a list and then use the CSV library to export it. I'm not entirely clear on what you are getting out views-field-nothing-1 but to just focus on view-fields-nothing, you could do something like:

courses_list=[]

for item in g_data2:

try:

name=item.contents[1].find_all("div",{"class":"views-field-title"})[0].text

except:

name=''

try:

address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text

except:

address1=''

try:

address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text