Python基础（操作数据库和爬虫）

一鸣888

已于 2023-08-10 14:50:28 修改

阅读量217

点赞数

分类专栏：整理精华文章标签： python 开发语言爬虫 etl

于 2023-06-27 12:22:12 首次发布

本文链接：https://blog.csdn.net/HelloWowofei/article/details/131414514

版权

整理精华专栏收录该内容

18 篇文章 3 订阅

订阅专栏

python用来做什么？ 3.7
获取不同的数据源里面的不同类型的数据；
对数据本身进行处理；
将数据存储到数据库中。

编程语言：JAVA C
脚本语言：python js ruby

Python写代码的地方：pycharm eclipse vs nodepad++ editplus sublime idle …

注释

单行注释

三引号段落注释 ‘’’ ‘’’

变量
n=10
n2=‘hello’
n3=1.666
变量名赋值符号值

传递赋值
a=b=c=d=10
多变量赋值
a,b,c,d=10,20,30,40
a,b=b,a
自增赋值
a+=2
a-=3
a*=4
a/=5

数据类型 print(type(变量和值))
布尔：真假 bool 做判断的时候
a=True
b=False

数字：整型 int 浮点型 float 复数 complex
a=666
b=1.5678
c=3+2j
数字运算：

- - /
    ** 幂运算
    a=3
    print(a**4)
    // 取整运算
    a=10
    print(a//3)
    % 取余运算
    a=10
    print(a%3)

python中的小数运算会出现误差：
print(0.1+0.1-0.3)

字符串：str
a=‘hello’
b=“world”
c=‘’‘I’m Lilei’‘’ 三引号里面可以回车，也可以有其他的特殊符号

输入的操作：input() 默认是接受的字符串
n1=input(‘输入一个数字吧：’)
n2=input(‘输入另一个数字吧：’)
print(n1+n2)

如果要输出数字，要将输入的内容进行数据类型的转换
n1=float(input(‘输入一个数字吧：’))
n2=float(input(‘输入另一个数字吧：’))
print(n1+n2)

字符串的相加：
a=‘hello’
b=‘world’
c=a+b
print©

字符串重复的操作：
print(‘hello’*100)

字符串操作的函数：都不会作用到字符串本身
python中字符串的序号是从0开始
大写 s=s.upper()
小写 s=s.lower()
首字母大写 s=s.title()
查找：找到字符在字符串中的位置，没有找到返回-1，多个只会查找第一个
a=s.find(‘d’)
print(a)

find和index两个不同的方法有什么区别？
找不到数据的时候，find会返回-1，index会报ValueError的错误

替换 s=s.replace(‘o’,‘-’) 字符串.replace(old, new)
去除空格
去除中间的 s=s.replace(’ ‘,’')
去除左边的 s=s.lstrip()
去除右边的 s=s.rstrip()
去除两边的 s=s.strip()

统计 a=s.count(‘o’)
判断前置 s.startswith(‘He’)
判断后置 s.endswith(‘orld’)

截取使用某个符号，将字符串切割成一个包含了多个字符串的列表
s=“lilei,hanmeimei,lucy,tom,mike”
s1=s.split(‘,’)
print(s1)

字符串格式化
使用%号进行数据在字符串中的传递
name=‘天下一霸’
level=30
exp=54320231.55
items=‘屠龙刀’

npc=“%s，你当前是%d等级，任务经验是%.2f，奖励物品为%s！”%(name,level,exp,items)
print(npc)

也可以全部用%s来表示
npc=“%s，你当前是%s等级，任务经验是%s，奖励物品为%s！”%(name,level,exp,items)

%s 字符串
%d 整数
%f 小数 %.小数位数f

如果字符串中本来就包含了%，那么就不能用上面的方式来格式化数据
例如网络地址的内容： “https://image.baidu.com/search/index?tn=baiduimage&ipn=r&ct=201326592&cl=2&lm=-1&st=-1&fm=index&fr=&hs=0&xthttps=111111&sf=1&fmq=&pv=&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype=2&ie=utf-8&word=%E7%86%8A%E7%8C%AB&oq=%E7%86%8A%E7%8C%AB&rsp=-1”
需要用字符串本身的format()方法来格式化数据：
wangzhan=‘sougou’
url=“https://image.{}.com/search/index”
“?tn={}image&ipn=r&ct=201326592&cl=”
“2&lm=-1&st=-1&fm=index&fr=&hs=0&xthttps=111111&sf=1&fmq=&pv=”
“&ic=0&nc=1&z=&se=1&showtab=0&fb=0&width=&height=&face=0&istype”
“=2&ie=utf-8&word=%E7%86%8A%E7%8C%AB&oq=%E7%86%8A%E7%8C%AB&rsp=-1”.format(wangzhan,wangzhan)
print(url)

npc=“{}，你当前是{}等级，任务经验是{}，奖励物品为{}！”.format(name,level,exp,items)

字符串的切片：
取出某个下标
字符串[下标序号]
s=‘abcdefghijklmn’
print(s[3])

取出某个范围的内容
字符串[开始序号: 结束序号+1] 前闭后开的取值范围

负数的下标序号，表示从后面开始数
-1是最后一个，-2是倒数第二个，以此类推

字符串[：] 从开始到结束
字符串[开始序号：]
字符串[：结束序号]

字符串[开始序号: 结束序号+1, 步长] 在开始到结束的范围内，隔多远获取一个数
print(s[1:10])
print(s[1:10:2]) 获取的是1 3 5 7 9的下标值

#mjg
print(s[-2:-9:-3])
从后往前打印所有内容
print(s[::-1])

列表：list
a=[“饼干”,“西瓜”,“hello”,666,1.23456]
新增
在列表中追加新的元素：列表.append(元素内容)
a=[“饼干”,“西瓜”,“hello”,666,1.23456]
a.append(‘草莓’)
print(a)
a.append(‘奥利奥’)
print(a)
将元素放在固定的某个位置：列表.insert(列表的序号，元素内容)
a.insert(1,‘橘子’)
print(a)

删除
按照序号删除元素：列表.pop(列表的序号)
a.pop(4)
print(a)
按照内容删除元素：列表.remove(元素内容) 只会删除第一个匹配的内容
a.remove(‘西瓜’)
print(a)

修改使用赋值的语句实现内容的修改
列表[列表的序号]=新值
a[2]=‘你好’
print(a)

查询：和字符串的切片查询是一样的
print(a[0])
print(a[1:5])
print(a[1:5:2])
print(a[::-1])

排序：升序降序
升序：列表.sort()
a=[5,8,3,1,0,9]
a.sort()
print(a)

降序：
a=[5,8,3,1,0,9]
a.sort()
print(a[::-1])
或者
a.sort(reverse=True)
print(a)

拼接：用某个符号将列表拼接成一个字符串
names=[‘lilei’,‘han’,‘lucy’,‘tom’]
names2=‘-’.join(names)
print(names2)

统计：
print(a.count(‘西瓜’))

a=[1,2,3]
b=[4,5,6]
print(a+b) 拼接多个列表
print(a*10) 重复列表的内容

元组：tuple 不可变的数据类型
不能新增，不能删除，不能修改，不能排序…
a=(“hello”,“大米”,1000,1.666)

单元素的元组定义：需要在元素后面添加逗号
a=(1,)

字典：键值对类型 key-value类型映射类型 dict
字典的元素是成对存在的，关键字是不能重复的，字典是无序的数据类型
{关键字:值, 关键字:值}
menu={“炒饭”:12,“河粉”:10}
查询
查询所有的关键字 print(menu.keys())
查询所有的值 print(menu.values())
查询关键字对应的值 print(menu[‘河粉’])

新增：关键字不存在的时候字典名[关键字]=值
menu[‘奶茶’]=13
print(menu)

修改：关键字已经存在的时候
menu[‘奶茶’]=16
print(menu)

删除：字典名.pop(关键字)
menu.pop(‘炒饭’)
print(menu)

清空字典的内容：字典名.clear()

集合：set {元素1, 元素2…} 不能重复的数据类型集合也是无序的数据类型
进行数据的去重操作
a=[‘hello’,‘han’,234,1.2,234,‘han’]
a1=set(a)
print(a1)

计算数据的交集、并集、差集
a=[1,2,3,4]
b=[3,4,5,6]
#计算两个列表的交集
a=set(a)
b=set(b)
print(a&b)
#计算两个列表的并集
print(a|b)
#计算两个列表的差集
print(a^b)

数据类型的运算：
成员运算 in not in
a=“helloworld”
print(‘owo’ not in a)

a={“炒饭”:12,“河粉”:10,“黄焖鸡”:15,“炒饭”:16}
print(‘黄焖鸡’ in a)

对象运算 is is not
判断数据的来源是否是一个内存地址

比较运算 > < >= <= != ==

逻辑运算 and or not

数据类型长度的查看：数字没有长度
len(数据)

输入输出
打印输出函数：
print(n)
print(n,n2,n3)

逻辑操作：判断 if 循环 for while
python的语法逻辑，是根据缩进行决定的
判断逻辑：
如果判断:
执行语句
if num>0:
print(‘开始判断’)
print(‘正数’)

多层判断：
if 条件判断1:
执行语句
elif 条件判断2:
执行语句
elif 条件判断3:
执行语句
…

剩余逻辑进行判断
if 条件判断1:
执行语句
elif 条件判断2:
执行语句
elif 条件判断3:
执行语句
…
else:
执行语句

color=‘blank’
if color==‘blue’:
print(‘蓝色’)
elif color==‘red’:
print(‘红色’)
elif color==‘green’:
print(‘绿色’)
elif color==‘pink’:
print(‘粉色’)
else:
print(‘其他颜色’)

#练习：有一个字典 {“炒饭”:12,“河粉”:10,“黄焖鸡”:15}
#有个用户提示输入的窗口，如果用户输入的是字典中存在的关键字，那么就打印对应的值
#否则就打印提示语句没有这个菜
‘’’
d={“炒饭”:12,“河粉”:10,“黄焖鸡”:15}
cai=input(“点个菜：”)
if cai in d:
print(d[cai])
else:
print(‘没有这个菜’)
‘’’

#输入一个用户名，如果这个用户名的长度大于5小于12，并且不是数字开头，提示用户名正确，
#否则提示用户名错误
user=input(“用户名：”)
if not “0”<=user[0]<=“9” and len(user)>5 and len(user)<12:
print(“正确”)
else:
print(“错误”)

循环控制：
for循环基本语法
for 变量名字 in 循环范围:
执行语句

循环范围可以是：字符串、列表、元组、字典、集合
#循环练习：有一个字符串，“I’m a single boy”，将空格所在的序号打印出来
‘’‘xuhao=0
for i in “I’m a single boy”:
if i==" ":
print(“序号是”,xuhao)
xuhao+=1’‘’

#有一个列表 [‘hello’,123,‘world’,2.222,666,‘hi’]
#循环这个列表，将这个列表中的整数打印出来
for i in [‘hello’,123,‘world’,2.222,666,‘hi’]:
if type(i)==type(100):
print(i)

使用range()进行循环的控制
for i in range(10):
print(i)

range(10) 0-9
range(20) 0-19
range(1,11) 1-10 前闭后开的范围
range(1,11,2) 1 3 5 7 9 可以接步长
range(10,0,-1) 10-1
range(10,0,-2) 10 8 6 4 2

#打印出100以内的所有偶数的和
s=0
for i in range(0,101,2):
s=s+i
print(s)
#打印出100以内所有不是7的倍数以及不包含7的其他数字
for i in range(101):
if i%7!=0 and ‘7’ not in str(i):
print(i)
#用一个for循环，范围使用range()方法，打印出下面这个图案

for i in range(1,8):
if i<=4:
print((4-i)’ '+i’* ‘)
else:
print((i-4)’ '+(8-i)’* ')

#九九乘法表
for i in range(1,10):
for j in range(1,i+1):
print(‘{}x{}={}’.format(j,i,i*j), end=‘\t’)
print()

while的循环控制：判断为真，进入循环，为假的时候跳出循环
while 判断:
执行语句

n=1
while n<=10:
print(n)
n+=1

计算数字的while游戏：
n=3
user=int(input(“计算%dx2的结果=”%(n)))
while user==n2:
print(‘答对了，你真聪明！’)
n=n2
user=int(input(“计算%dx2的结果=”%(n)))
print(“真的是太笨了，这都不会！！！”)

#练习
#有一张纸厚度是1mm，珠穆朗玛峰高度是8848m，
#请问这个纸要对折多少次，才会超过山的高度
zhi=1
shan=8848000
cishu=0
while zhi<shan:
zhi=zhi*2
cishu=cishu+1
print(cishu)

循环的关键字：
结束循环 break
for i in range(1,11):
if i==5:
break
print(i)

跳过本次的循环，直接开始下一次循环 continue
for i in range(1,11):
if i==5:
continue
print(i)

#可以循环五次，让用户输入英文单词，将输入的单词保存在一个列表中，
#如果用户输入的是quit，那么马上退出，并且打印出已经保存的内容
liebiao=[]
for i in range(5):
w=input(“word:”)
if w==‘quit’:
break
liebiao.append(w)

print(liebiao)

常用模块：
在当前代码页面，导入需要的模块
import 模块名字

随机模块：random
import random
#随机小数，大于0小于1的小数
a=random.random()
print(a)
#随机整数，大于等于开始值，小于等于结束值
a=random.randint(1,5)
print(a)
#在有序的数据类型中进行数据的随机，字符串列表元组
a=random.choice(“abcdefg”)
print(a)

#有一个列表，[‘Apple’,‘Pear’,‘banana’]，在里面随机的抽选一个内容，
#如果首字母是大写的，那么就把随机的内容打印出来
liebiao=[‘Apple’,‘Pear’,‘banana’]
a=random.choice(liebiao)
if a[0]==a[0].upper():
print(a)
else:
print(‘小写的不要’)

#自己造一个双色球的号码随机，红色区域是1-32的数字，数字不能重复，
#蓝色是1-6的数字，不能重复，红色是选择5个，蓝色是选择2个
#将这7个数字随机出来
reds=[]
for i in range(5):
red=random.randint(1,32)
while red in reds:
red=random.randint(1,32)
reds.append(red)

blues=[]
for i in range(2):
blue=random.randint(1,6)
while blue in blues:
blue=random.randint(1,6)
blues.append(blue)

reds.sort()
blues.sort()
print(reds+blues)

#和电脑玩石头剪刀布，电脑随机一个，你选择一个，最后打印谁赢了
diannao=random.randint(1,3)
games={1:‘石头’,2:‘剪刀’,3:‘布’}
print(“电脑出的是：”,games[diannao])

user=int(input(“1:‘石头’,2:‘剪刀’,3:‘布’：”))

if diannaouser:
print(‘平手’)
elif diannao1 and user2 or diannao2 and user3 or diannao3 and user==1:
print(‘电脑赢了’)
else:
print(‘你赢了’)

#在上面的字典中，随机一个品种，并且将对应的值一起打印出来
menu={‘炒饭’:18,‘炒粉’:16,‘盖浇饭’:14}
a=random.choice(list(menu.keys()))
b=menu[a]
print(a)
print(b)

#有一只猴子，有一堆桃子，每天会吃掉桃子数量的一半再加一个。
#第九天的时候，发现只有一个桃子了，请问一开始有几个桃子
tao=1
for i in range(1,9):
tao=(tao+1)*2
print(tao)

users=((‘smith’,10,1500,100),(‘allen’,20,800,200),(‘miller’,30,1600),(‘scott’,30,1200))

#取出数据，打印成如下格式：
#姓名：smith 部门:10 工资：1500
#姓名：allen 部门:20 工资：800 奖金:200
#姓名：miller 部门:30 工资：1600
#姓名：scott 部门:30 工资：1200
for user in users:
print(“姓名：{} 部门:{} 工资：{}”.format(user[0],user[1],user[2]),end=’ ')
if len(user)==4:
print(“奖金:{}”.format(user[3]))
else:
print()

有个列表[1,2,2,3,8,7,2]，删除里面所有的2

liebiao=[1,2,2,3,8,7,2]
for i in range(liebiao.count(2)):
liebiao.remove(2)
print(liebiao)

时间模块：获取当前的系统时间
import datetime
a=datetime.datetime.now().strftime(“%Y-%m-%d %H:%M:%S”)
print(a)

文件的读写：
文本文档的读写操作：
写入的操作
#确定文件的位置和名字
filename=“C:/文件/a1.txt”

#双击打开文件
file=open(filename,“a”) #w write覆盖写入 a append追加写入

#写入文件的内容
file.write(“\n”)
file.write(“lilei”)

#关闭保存文件
file.close()

#练习：users=((‘smith’,10,1500,100),(‘allen’,20,800,200),(‘miller’,30,1600),(‘scott’,30,1200))
#将上面元组中的名字保存到一个names.txt文件中，一个名字占一行
users=((‘smith’,10,1500,100),(‘allen’,20,800,200),(‘miller’,30,1600),(‘scott’,30,1200))

filename=‘C:/文件/names.txt’
file=open(filename,‘a’)
for u in users:
file.write(u[0]+‘\n’)
file.close()

读取的操作：一次open就只能从上往下读取一次
#确定要读取的文件位置和名字
filename=‘C:/文件/names.txt’

#打开文件
file=open(filename,‘r’) # r read

#读取内容

read() 是把整个文件的内容当成一个字符串

neirong1=file.read()
print(neirong1)

readlines() 是把整个文件的内容当成一个列表

neirong2=file.readlines()
print(neirong2)

#关闭文件
file.close()

#读取users.txt文件，处理成姓名,性别,年龄,爱好的格式
filename=“C:/文件/users.txt”
file=open(filename,‘r’)
contents=file.readlines()
file.close()

filename2=“C:/文件/users_2.txt”
file2=open(filename2,‘a’)
‘’‘cishu=1
hang=’’
for i in contents:
hang=hang+‘,’+i.replace(‘\n’,‘’)
cishu=cishu+1

if cishu==5:
    file2.write(hang[1:]+'\n')
    cishu=1
    hang=''

‘’’

cishu=0
for i in contents:
i=i.replace(‘\n’,‘’)
cishu+=1
if cishu%4==0:
file2.write(i+‘\n’)
else:
file2.write(i+‘,’)

file2.close()

json文件的读取：和文本的读写方法是一样的
什么是json：json是和字典一样的一种键值对的格式，一般是在网络数据的传输和数据的格式化存储上

filename=“C:/文件/alipay.js”
file=open(filename,‘r’)
contents=file.read()
file.close()

#将字符串格式的json，转换成字典格式
import json
c=json.loads(contents)

print(c[‘alipay_trade_pay_response’][‘voucher_detail_list’][0][‘name’])
print(c[‘alipay_trade_pay_response’][‘voucher_detail_list’][0][‘memo’])

练习：读取alipay.js文件，将trade_no、real_amount、goods_name读取出来
答案：
import json

filename=“C:/文件/alipay.js”
file=open(filename,‘r’)
contents=file.read()
file.close()

c=json.loads(contents)

#trade_no
trade_no=c[‘alipay_trade_pay_response’][‘trade_no’]
print(trade_no)
#real_amount
real_amount=c[‘alipay_trade_pay_response’][‘fund_bill_list’][0][‘real_amount’]
print(real_amount)
#goods_name
goods_name=c[‘alipay_trade_pay_response’][‘discount_goods_detail’]
g=json.loads(goods_name)
goods_name=g[0][‘goods_name’]
print(goods_name)

课后练习：

将这个products文件中，所有的ProductID，ProductName，UnitPrice三个属性读取出来，并且另存到另一个文件中，存放的格式例子如下：
1,Chai,18.0000
2,Chang,19.0000
答案：
import json

filename=“C:/文件/products.txt”
file=open(filename,‘r’)
contents=file.read()
file.close()

c=json.loads(contents)
value=c[‘value’]

for v in value:
ProductID=v[‘ProductID’]
ProductName=v[‘ProductName’]
UnitPrice=v[‘UnitPrice’]
filename2=“C:/文件/product_list.txt”
file2=open(filename2,‘a’)
file2.write(str(ProductID)+‘,’)
file2.write(ProductName+‘,’)
file2.write(UnitPrice+‘\n’)
file2.close()

csv文件内容的读取和写入：
读取的部分
#导入csv操作的模块
import csv

#定位文件的位置
filename=“C:/文件/香港酒店数据.csv”

#使用csv的读取工具打开文件
file=open(filename,‘r’)
f=csv.reader(file)

#使用for循环以行为单位读取文档数据
for i in f:
print(i[1]+‘\t’+i[2])

#关闭窗口
file.close()

写入的部分
import csv
filename=“C:/文件/new.csv”
file=open(filename,‘w’)
f=csv.writer(file)
#定义要写入什么内容
#外面的列表，表示整个文件，里面的列表，表示每一行数据
datas=[[‘苹果’,18],[‘桃子’,26],[‘栗子’,28],[‘橘子’,6]]
#开始写入数据
for i in datas:
#以行为单位写入，每次写入一个列表
f.writerow(i)

file.close()

#练习：读取香港酒店的数据，将里面的酒店中文名，地址，价格读取出来，
#将这三个数据，写入到另外的一个csv文件中
答案：
import csv
filename=“C:/文件/香港酒店数据.csv”
file=open(filename,‘r’)

f1=csv.reader(file)

hangs=[]
for i in f1:
hang=[i[2],i[4],i[8]]
hangs.append(hang)

file.close()

print(hangs)

filename2=“C:/文件/hotel.csv”
file2=open(filename2,‘w’)
f2=csv.writer(file2)

for h in hangs:
f2.writerow(h)
file2.close()

xml文件的读取方法：
#导入操作xml文件的模块
import xml.dom.minidom
#从上面的模块中取出一个方法
from xml.dom.minidom import parse

#打开xml文件，file是打开的窗口
file=xml.dom.minidom.parse(“C:/文件/product.xml”)
#获取文件中所有的元素内容，content是窗口里面的内容
content=file.documentElement

#先找到所有的需要的数据，共同的上级的标签名字
products=content.getElementsByTagName(“m:properties”)

#通过标签的名字，获取每个需要的数据的部分数据
for p in products:
#根据标签先找到元素所在的位置
ProductID=p.getElementsByTagName(“d:ProductID”)
#根据找到的位置，取出来元素中间的文本信息
id=ProductID[0].childNodes[0].data

ProductName=p.getElementsByTagName("d:ProductName")
name=ProductName[0].childNodes[0].data

UnitPrice=p.getElementsByTagName("d:UnitPrice")
price=UnitPrice[0].childNodes[0].data

print(id,name,price)

第三方模块：
pip
easy_install

excel的写入和读取：
写入的模块：xlwt
pip install xlwt

#导入excel写入的模块
import xlwt

#定义写入文件的位置和名字
filename=“C:/文件/x1.xls”

#创建excel文件
wb=xlwt.Workbook(encoding=‘utf-8’)

#在excel文件中创建表单
st=wb.add_sheet(‘BIGDATA’)
st2=wb.add_sheet(‘TEST’)

#在表单中写入数据行列数据
st.write(2,3,‘李雷’)
st2.write(2,1,‘哈哈哈’)

#保存这个文件
wb.save(filename)

读取的模块：xlrd
easy_install xlrd

#导入读取excel的模块
import xlrd

#指定文件位置
filename=“C:/文件/香港酒店数据.xls”

#打开文件
wb=xlrd.open_workbook(filename)

#使用不同的方式选择要读取的表单
#使用表单的序号
#st=wb.sheet_by_index(1)
#使用表单的名称
st=wb.sheet_by_name(‘BIGDATA’)

#获取表单中有内容的行数
nr=st.nrows

#使用for循环读取文件的行
for i in range(nr):
print(st.row_values(i))

excel文档的日期是需要专门去处理的
#还原时间的格式，时间在excel里面是用1900-1-1开始至今的天数来表示的
t=st.cell(0,4) #2020-8-1
#将日期格式当成元组进行存储
t_v=xlrd.xldate_as_tuple(t.value,datemode=0)
#使用strftime方法处理时间的格式
import datetime
from datetime import date
d=date(*t_v[:3]).strftime(‘%Y-%m-%d’)
print(d)

第三方平台的网络接口的数据：API
一个接口就是一个http的链接

requests
安装模块：pip install requests

#导入网络请求的模块
import requests,json

#定义网络请求的地址
url=“https://api.inews.qq.com/newsqa/v1/automation/foreign/country/ranklist”

#进行网络数据的请求
shuju=requests.get(url)

#查看返回的内容
datas=json.loads(shuju.text)

#获取字典中关键的信息
for d in datas[‘data’]:
name=d[‘name’]
date=d[‘date’]
confirm=d[‘confirm’]
print(name,date,confirm)
print(‘----------------------’)

练习：
获取接口：https://api.inews.qq.com/newsqa/v1/automation/modules/list?modules=FAutoCountryMerge，将里面法国每一天的确诊数据保存到excel文档中，
保存日期date和确诊confirm两个属性的内容
import requests,json,xlwt

url=‘https://api.inews.qq.com/newsqa/v1/automation/modules/list?modules=FAutoCountryMerge’
response=requests.get(url)
datas=json.loads(response.text)
faguo=datas[‘data’][‘FAutoCountryMerge’][‘法国’]

#准备一个excel文件
filename=“C:/文件/faguo.xls”
wb=xlwt.Workbook(encoding=‘utf-8’)
st=wb.add_sheet(‘yiqing’)

hang=0
for i in faguo[‘list’]:
date=i[‘date’]
date=‘2020/’+date.replace(‘.’,‘/’)
confirm=i[‘confirm’]
st.write(hang,0,date)
st.write(hang,1,confirm)
hang+=1

wb.save(filename)

豆瓣读书榜：
https://read.douban.com/j/index//charts?type=intermediate_finalized&index=featured&verbose=1&limit=50
保存书的书名title 作者名name 分类shortName 评分averageRating
保存到excel中，excel的第一行需要加上标题头

import requests

url=“https://read.douban.com/j/index//charts?type=intermediate_finalized&index=featured&verbose=1&limit=50”

#豆瓣返回了418的错误，这个错误是被服务器拒绝了不让申请数据的错误
#添加浏览器信息，模拟浏览器访问

h={‘User-Agent’: ‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3314.0 Safari/537.36 SE 2.X MetaSr 1.0’}

response=requests.get(url,headers=h)
print(response.text)

答案：
import requests,json

url=“https://read.douban.com/j/index//charts”
“?type=intermediate_finalized&index=featured&verbose=1&limit=50”

#使用字典保存浏览器的信息
h={}
h[“User-Agent”]=“Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3314.0 Safari/537.36 SE 2.X MetaSr 1.0”
response=requests.get(url,headers=h)
res=json.loads(response.text)

#捕获需要的信息书名作者种类

lists=res[‘list’]
for book in lists:
title=book[‘works’][‘title’]
author=book[‘works’][‘author’][0][‘name’]
kinds=book[‘works’][‘kinds’][0][‘shortName’]
print(title,author,kinds)

正则表达式：在一个非常大的字符串中，通过一定的规则和规律，去查找符合这个规律的小的字符串
import re

正则公式可能用到的符号：
src=“(https://imgsa.baidu.com/forum/w%3D580/sign=.+?.jpg)” ><

1.要捕获的数据部分，用小括号括起来
2.动态的部分，使用.作为通配符，一个.表示通配一个字符
3.重复任意字符使用+,+号表示重复前面的字符1次或者N次
4.很多任意字符的匹配，使用 .+
5.用\将某个特殊符号还原
6.使用?这个非贪婪符号，表示匹配的数据有很多，找离开始字符最近的结束字符作为匹配数据

abcabdaabjababaaaaaaaacaaaab
(a.+b)
abcabdaabjababaaaaaaaacaaaab

abcabdaabjababaaaaaaaacaaaab
(a.+?b)
abcab
aab
abab
aaaaaaaacaaaab

使用步骤

制定规则
guize=‘’‘左边的字符串(http://.+?)右边的字符’‘’
转换公式
gongshi=re.compile(guize)
3.查找内容
re.findall(gongshi, 网站页面的代码内容)

‘’’
import requests,json

url=“https://read.douban.com/j/index//charts”
“?type=intermediate_finalized&index=featured&verbose=1&limit=50”

#捕获需要的信息书名作者种类

‘’’

#下载贴吧某个帖子的所有图片
import requests,re

url=“https://tieba.baidu.com/p/5272519674”

response=requests.get(url)

#下载贴吧某个帖子的所有图片
import requests,re

page=1
name=1

while page>=1:
#找到翻页在url中是如何来体现的
url=“https://tieba.baidu.com/p/5272519674?pn={}”.format(page)

response=requests.get(url)

#保存这个html页面的源代码信息
html=response.text

#制定这个正则表达式查询数据的规则
zhengze='''src="(https://imgsa.baidu.com/forum/w%3D580/sign=.+?\.jpg)" '''

#将自己定义的规则，转换成re模块中的公式
gongshi=re.compile(zhengze)

#使用这个公式，进入到html这个大的字符串里面进行匹配内容的查找
#找到的所有数据，都在一个列表中
imgs=re.findall(gongshi,html)

for img in imgs:
    #因为每一个图片都是一个http的地址，所以还可以继续使用requests方法去get它们
    r=requests.get(img)
    #图片要获取的是二进制信息
    i=r.content
    file=open("C:/tieba/{}.jpg".format(name),'wb')  #wb表示写入二进制信息
    file.write(i)
    file.close()
    #完成图片名的序号自增
    name+=1

#判断当前页面，是否有翻页的标志
if '''">下一页</a>''' in html:
    page+=1
else:
    page=-1

异常捕获部分：
错误类型名字可以使用Exception进行简单的替换
try:
尝试运行的可能会报错的代码
except 错误类型名字 as e:
对错误的处理
else:
没有出错的时候运行的代码

#尝试去进行文件的打开操作
try:
file=“C:/文件/1234.txt”
f=open(file,‘r’)
#如果没有找到怎么办，Exception是所有异常的基础类型
except Exception as e:
print(e)
#如果找到了怎么办
else:
print(f.read())
f.close()

print(“helloworld”)

数据库模块：cx_Oracle
pip install cx_Oracle

让64位的Python识别和连接32位的oracle：
1.解压instantclient_11_2的压缩包，得到一个文件夹
2.将这个文件夹的地址，添加到path环境变量中
3.找到环境变量–系统变量–path，点击编辑，在变量值的最后面，追加
;C:\software\instantclient_11_2;
4.在instantclient_11_2文件夹中，新建一个network文件夹，在network文件夹再新建一个admin文件夹
5.将oracle里面的tnsnames.ora文件（plsql–支持信息–info–tns file），复制到刚才的admin文件夹中
6.将instantclient_11_2文件夹里面，所有的dll文件，复制到python的根目录中
7.将oci.dll oraocci11.dll oraociei11.dll 三个文件，复制到python/Lib/site-packages文件夹下面
8.重新运行python代码

连接数据库
conn=cx_Oracle.connect(“用户名/密码@ip地址：端口号/SID”)
创建游标
cursor=conn.cursor()
定义sql语句
sql=“”
执行
x=cursor.execute(sql)
查询的语句
x.fetchall()
DML操作
conn.commit()
关闭游标
cursor.close()
关闭数据库

conn.close()

实现数据的保存：
将https://api.inews.qq.com/newsqa/v1/automation/modules/list?modules=FAutoGlobalStatis,FAutoContinentStatis,FAutoGlobalDailyList,FAutoCountryConfirmAdd这个接口中的数据，写入到数据库保存起来，实现一个全量更新的操作

自定义函数：可以被反复使用的功能单一的代码块
函数一定有返回值，但是返回值可以不用设置，如果不设置返回值，那么会返回None值
新建函数的语法：
def 函数名(输入参数的设置) ：
函数的代码块
return 返回值

函数的新建时候的参数类型：
#定义没有输入参数的函数
def sum_():
a=10+20
return a

#定义包含了输入参数的函数
def sum_2(x,y):
a=x+y
return a

#定义包含了默认值的函数，默认值只能从后往前给
def sum_3(y,x=0):
a=x+y
return a

#非关键字参数，可以让用户输入任意的参数 *参数名，表示一个元组类型
def sum_4(*x):
s=0
for i in x:
s+=i
return s

#关键字参数 **参数名，表示一个字典类型
def sum_5(**x):
a=x[‘jiashu’]+x[‘beijiashu’]
return a

#添加一个主函数的入口
if name == “main”:
#调用函数
print(sum_5(jiashu=9,beijiashu=8))

例子：
#创建数据库链接
def conn_oracle(username,password,ip,port,sid):
import cx_Oracle
conn=cx_Oracle.connect(“{}/{}@{}:{}/{}”.format(username,password,ip,port,sid))
cursor=conn.cursor()
return cursor

#查询数据
def select_oracle(tb):
sql=“select * from {}”.format(tb)
cursor=conn_oracle(‘bigdata’,‘111111’,‘192.168.2.109’,‘1521’,‘ORCL’)
s=cursor.execute(sql)
return s.fetchall()

#添加一个主函数的入口
if name == “main”:
#调用函数
print(select_oracle(‘salgrade’))

自定义类：拥有相同特征的事物的类型
新建一个类
class 类的名字():
类的特征
def 函数名(self):
函数的内容

#定义新的类
class jisuanji():
#定义特征
def skin(self,vip=False):
if vip==True:
return(‘腾讯风格’)
else:
return(‘window风格’)
def sum_(self,*x):
s=0
for i in x:
s+=i
return s

#所有的类都一定需要实例化在某个具体的对象身上
#定义新的类
class jisuanji():
#定义特征
def skin(self,vip=False):
if vip==True:
return(‘腾讯风格’)
else:
return(‘window风格’)

def sum_(self,*x):
    s=0
    for i in x:
        s+=i
    return s

#类的继承，子类会继承大类所有的特点
class fangdaijisuan(jisuanji):
def lixi(self):
return ‘房屋利息’

#添加一个主函数的入口
if name == “main”:
a=fangdaijisuan()
print(a.lixi())
print(a.skin(True))

将所有功能相似的函数，放在一个类里面，方便后期的维护和管理

一鸣888

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录