python小白逆袭大神_飞桨深度学习学院（Python小白逆袭大神）学习心得

最新推荐文章于 2022-09-28 21:54:37 发布

weixin_39794734

最新推荐文章于 2022-09-28 21:54:37 发布

阅读量225

点赞数

文章标签： python小白逆袭大神

飞桨深度学习学院(Python小白逆袭大神)学习心得百度飞浆

初心

我是一个小白，只有c++的基础和自学的深度学习的相关知识，上这节课的初心是去拓展自己，学习现在非常热门的深度学习和python语言，提升自己的能力。

学习内容

这节课一共7天，加上开营和结营8个课节

课节1: 前置课程

课节2: Day1-人工智能概述与入门基础

课节3: Day2-Python进阶

课节4: Day3-人工智能常用Python库

课节5: Day4-PaddleHub体验与应用

课节6: Day5-EasyDL体验与作业发布

课节7: Day6-PaddleHub创意赛发布

课节8: Day7-课程结营

有四次作业和一次综合作业。(难，是真！但我还是尽力的做完了，淦！)

收获

首先AI Studio我是第一次接触，可以线上去跑一些项目，功能真的很强大，而且线上还有许多已经发布了的例程，对于感兴趣的可以跑一下，学习一些思路和语言的逻辑(对于我是有如神助)。

Day1-Python基础练习

1、输出 9*9 乘法口诀表(注意格式)

def table():

#在这里写下您的乘法口诀表代码吧！

for i in range (1,10):

for j in range (1,i+1):

print("{}*{}={:<2}".format(j,i,i*j),end=" ")

print ("")

if __name__ == '__main__':

table()

2、查找特定名称文件

#导入OS模块

import os

#待搜索的目录路径

path = "Day1-homework"

#待搜索的名称

filename = "2020"

#定义保存结果的数组

result = []

def findfiles(path,filename):

#在这里写下您的查找文件代码吧！

for root,dirs,files in os.walk(path):

for item in files:

item_path=os.path.join(path,item)

if filename in item_path:

result.append(item_path)

for index,item in enumerate(result):

print ([index+1,item])

if __name__ == '__main__':

findfiles(path,filename)

Day2-《青春有你2》选手信息爬取

爬虫，爬取好看的小姐姐们

import json

import re

import requests

import datetime

from bs4 import BeautifulSoup

import os

#获取当天的日期,并进行格式化,用于后面文件命名，格式:20200420

today = datetime.date.today().strftime('%Y%m%d')

def crawl_wiki_data():

"""

爬取百度百科中《青春有你2》中参赛选手信息，返回html

"""

headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'

}

url='https://baike.baidu.com/item/青春有你第二季'

try:

response = requests.get(url,headers=headers)

print(response.status_code)

#将一段文档传入BeautifulSoup的构造方法,就能得到一个文档的对象, 可以传入一段字符串

soup = BeautifulSoup(response.text,'lxml')

#返回的是class为table-view log-set-param的

tables = soup.find_all('table',{'class':'table-view log-set-param'})

crawl_table_title = "参赛学员"

for table in tables:

#对当前节点前面的标签和字符串进行查找

table_titles = table.find_previous('div').find_all('h3')

for title in table_titles:

if(crawl_table_title in title):

return table

except Exception as e:

print(e)

def parse_wiki_data(table_html):

'''

从百度百科返回的html中解析得到选手信息，以当前日期作为文件名，存JSON文件,保存到work目录下

'''

bs = BeautifulSoup(str(table_html),'lxml')

all_trs = bs.find_all('tr')

error_list = ['\'','\"']

stars = []

for tr in all_trs[1:]:

all_tds = tr.find_all('td')

star = {}

#姓名

star["name"]=all_tds[0].text

#个人百度百科链接

star["link"]= 'https://baike.baidu.com' + all_tds[0].find('a').get('href')

#籍贯

star["zone"]=all_tds[1].text

#星座

star["constellation"]=all_tds[2].text

#身高

star["height"]=all_tds[3].text

#体重

star["weight"]= all_tds[4].text

#花语,去除掉花语中的单引号或双引号

flower_word = all_tds[5].text

for c in flower_word:

if c in error_list:

flower_word=flower_word.replace(c,'')

star["flower_word"]=flower_word

#公司

if not all_tds[6].find('a') is None:

star["company"]= all_tds[6].find('a').text

else:

star["company"]= all_tds[6].text

stars.append(star)

json_data = json.loads(str(stars).replace("\'","\""))

with open('work/' + today + '.json', 'w', encoding='UTF-8') as f:

json.dump(json_data, f, ensure_ascii=False)

def crawl_pic_urls():

'''

爬取每个选手的百度百科图片，并保存

'''

with open('work/'+ today + '.json', 'r', encoding='UTF-8') as file:

json_array = json.loads(file.read())

headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'

}

for star in json_array:

name = star['name']

link = star['link']

pic_urls = []

pic_set = set()

#！！！请在以下完成对每个选手图片的爬取，将所有图片url存储在一个列表pic_urls中！！！

response = requests.get(link,headers=headers)

print(response.status_code)

#将一段文档传入BeautifulSoup的构造方法,就能得到一个文档的对象, 可以传入一段字符串

soup = BeautifulSoup(response.text,'html.parser')

#picture

imgs_middle = soup.find_all('a',{'class':'image-link'})

imgs_left = soup.find_all('div',{'class':'summary-pic'})

imgs_url = []

imgs= soup.find_all('a',{'class':'lemma-album'})

for item in imgs_left:

imgResponse2 = requests.get("https://baike.baidu.com"+item.a['href'],headers=headers)

imgSoup2 = BeautifulSoup(imgResponse2.text,'html.parser')

imgs_a2= imgSoup2.find_all('a',{'class':'pic-item'})

for a_item in imgs_a2:

if a_item.img:

if a_item.img['src'] not in pic_set:

pic_set.add(a_item.img['src'])

pic_urls.append(a_item.img['src'])

#！！！根据图片链接列表pic_urls, 下载所有图片，保存在以name命名的文件夹中！！！

down_pic(name,pic_urls)

def down_pic(name,pic_urls):

'''

根据图片链接列表pic_urls, 下载所有图片，保存在以name命名的文件夹中,

'''

path = 'work/'+'pics/'+name+'/'

if not os.path.exists(path):

os.makedirs(path)

for i, pic_url in enumerate(pic_urls):

try:

pic = requests.get(pic_url, timeout=15)

string = str(i + 1) + '.jpg'

with open(path+string, 'wb') as f:

f.write(pic.content)

print('成功下载第%s张图片: %s' % (str(i + 1), str(pic_url)))

except Exception as e:

print('下载第%s张图片时失败: %s' % (str(i + 1), str(pic_url)))

print(e)

continue

def show_pic_path(path):

'''

遍历所爬取的每张图片，并打印所有图片的绝对路径

'''

pic_num = 0

for (dirpath,dirnames,filenames) in os.walk(path):

for filename in filenames:

pic_num += 1

print("第%d张照片：%s" % (pic_num,os.path.join(dirpath,filename)))

print("共爬取《青春有你2》选手的%d照片" % pic_num)

if __name__ == '__main__':

#爬取百度百科中《青春有你2》中参赛选手信息，返回html

html = crawl_wiki_data()

#解析html,得到选手信息，保存为json文件

parse_wiki_data(html)

#从每个选手的百度百科页面上爬取图片,并保存

crawl_pic_urls()

#打印所爬取的选手图片路径

show_pic_path('/home/aistudio/work/pics/')

print("所有信息爬取完成！")

Day3-《青春有你2》选手数据分析

后四天的代码太长了，不抛了。

思路：基于第二天实践使用Python来爬去百度百科中《青春有你2》所有参赛选手的信息，进行数据可视化分析。(就是画个图)

Day4-《青春有你2》选手识别

任务简介：图像分类是计算机视觉的重要领域，它的目标是将图像分类到预定义的标签。近期，许多研究者提出很多不同种类的神经网络，并且极大的提升了分类算法的性能。本文以自己创建的数据集：青春有你2中选手识别为例子，介绍如何使用PaddleHub进行图像分类任务。

这个我的结果不是很好，因为爬取的图片，有些小姐姐的百度百科的图片和人匹配不上(或者是多人的合照)，所以训练的结果不是很好，60%正确率。

Day5-综合大作业

这个我搞了好久。。

第一步：爱奇艺《青春有你2》评论数据爬取

爬取任意一期正片视频下评论

评论条数不少于1000条

第二步：词频统计并可视化展示

数据预处理：清理清洗评论中特殊字符(如：@#￥%、emoji表情符),清洗后结果存储为txt文档

中文分词：添加新增词(如：青你、奥利给、冲鸭)，去除停用词(如：哦、因此、不然、也好、但是)

统计top10高频词

可视化展示高频词

第三步：绘制词云

根据词频生成词云

可选项-添加背景图片，根据背景图片轮廓生成词云

第四步：结合PaddleHub，对评论进行内容审核

大家也能看出来我的结果，有两个错误，一个是算上了回车，另一个是断词(出现了欣虞书)。害！

qq_32332831

原创文章 1获赞 0访问量 27

关注

私信