Python豆瓣电影首页类别爬虫小案例之词法分析数据采集项目

最新推荐文章于 2023-06-19 20:07:50 发布

Andy Chen 陈郑游

最新推荐文章于 2023-06-19 20:07:50 发布

阅读量1k

点赞数

分类专栏： ————Python与深度学习 ————[ Python ] 文章标签：豆瓣电影首页类别爬虫 Python爬虫 urllib.request.urlopen urllib 数据采集项目

本文链接：https://blog.csdn.net/javawebrookie/article/details/80467989

版权

————Python与深度学习同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

————[ Python ]

3 篇文章 2 订阅

订阅专栏

Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.

这是 python 官网的介绍，大概就是： python 简单易学，是强大的编程语言，它具有高效的高级数据结构以及面向对象编程的简单而有效的方法。其实我喜欢上 python 语言也是因为这句话 “人生苦短、我用python ”，然后我去接触它，确实不错。社区支持也很不错，用来开发一些项目还是很容易上手的，比如爬虫等。

精力有限以后可能少更新博客了：GitHub经常更新

https://github.com/andyczy/czy-study-py-ml-deepLearning

这是关于Python的GitHub库

1、为啥起名： Python 365之旅？

365 ：是提醒自己要不断的学习
之旅：是把学习的过程看成旅行，享受
Python：少基础，多案例

可能我写的东西很差劲，但我会努力改进，有建议可以留言。

2、Python基础

2.1、基本输出

# 2017-12-20
# python 基本语句 —— 解释型（无需编译）、交互式、面向对象、跨平台、简单好用
a = 'hello word'
str = "欢迎来到 aweekit "
# 英文是占1个字符
print("字符串截取："+a[2:])
# 中文是占3个字符
print("输出中文："+ str )

控制台：

2.2、切片（类似 java 集合操作？）

list = ['chenzhengyou', 786, 2.23, 'able', 70.2]
tinylist = [123, 'able']

print(list)  # 输出完整列表
print(list[0])  # 输出列表的第一个元素
print(list[1:3])  # 输出第二个至第三个元素
print(list[2:])  # 输出从第三个开始至列表末尾的所有元素
print(tinylist * 2)  # 输出列表两次
print(list + tinylist)  # 打印组合的列表

控制台：

3、Python 简单爬虫

豆瓣电影首页类别爬虫：地址（http://movie.douban.com/j/search_tags?type=movie）是json格式。

#encoding:UTF-8
import urllib.request
 
# url = "http://www.baidu.com"
url = "http://movie.douban.com/j/search_tags?type=movie"
data = urllib.request.urlopen(url).read()
data = data.decode('UTF-8')
print(data)

控制台：