Python最简单的网页抓取

最新推荐文章于 2024-09-09 23:28:21 发布

lune819

最新推荐文章于 2024-09-09 23:28:21 发布

阅读量289

点赞数

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/weixin_41386674/article/details/80758869

版权

Python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1. Install 2 python packages:

$ sudo pip install requests
$ sudo easy_install beautifulsoup4

2. Creat test.py

 
  #coding=utf-8  
 
  import requests 
 
  from bs4 
  import BeautifulSoup 
 
  #get url 
 
  def 
  get_html( 
  url): 
 
   response = requests.get(url) 
 
   response.encoding = 
  'utf-8' 
 
  return response.text 
 
  #get title 
 
  def 
  get_title( 
  html): 
 
   soup = BeautifulSoup(html, 
  'html.parser') 
 
   soup.select( 
  'p')[ 
  0].get_text() 
 
   title_content = soup.select( 
  'title')[ 
  0].get_text() 
 
  return title_content 
 
  #get text 
 
  def 
  print_p( 
  html): 
 
   soup = BeautifulSoup(html, 
  'html.parser') 
 
  for p 
  in soup.select( 
  'p'): 
 
  print p.get_text() 
 
   url = 
  "http://www.cityu.edu.hk/" 
 
   html = get_html(url) 
 
   title_content = get_title(html) 
 
  print title_content 
 
   print_p(html)