从log或txt中分割出json格式并取出关键字对应值打印到excel

最新推荐文章于 2023-07-14 16:38:07 发布

just_listen5

最新推荐文章于 2023-07-14 16:38:07 发布

阅读量756

点赞数 1

分类专栏： python 文章标签： python json 输出表格正则表达式

本文链接：https://blog.csdn.net/just_listen5/article/details/84335909

版权

python 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

最近比较闲找了个实习，学了一把python，用来处理工作中测试得到的log文件，特此写一下以上过程所学到的东西。____Xuefeng Zhang

一.使用正则表达式分割log中接口处的json格式数据。

1.1 正则表达式 re.findall 的简单用法

正则 re.findall  的简单用法（返回string中所有与pattern相匹配的全部字串，返回形式为数组）
语法：

1	`findall(pattern, string, flags=0)`

import re

Python 正则表达式 re findall 方法能够以列表的形式返回能匹配的子串

# print (help(re.findall))
# print (dir(re.findall))

findall查找全部r标识代表后面是正则的语句

regular_v1 = re.findall(r"docs","https://docs.python.org/3/whatsnew/3.6.html")

print (regular_v1)

# ['docs']

符号^表示匹配以https开头的的字符串返回,

regular_v2 = re.findall(r"^https","https://docs.python.org/3/whatsnew/3.6.html")

print (regular_v2)

# ['https']

用$符号表示以html结尾的字符串返回,判断是否字符串结束的字符串

regular_v3 = re.findall(r"html$","https://docs.python.org/3/whatsnew/3.6.html")

print (regular_v3)

# ['html']

# [...]匹配括号中的其中一个字符

regular_v4 = re.findall(r"[t,w]h","https://docs.python.org/3/whatsnew/3.6.html")

print (regular_v4)

# ['th', 'wh']

“d”是正则语法规则用来匹配0到9之间的数返回列表

regular_v5 = re.findall(r"\d","https://docs.python.org/3/whatsnew/3.6.html")

regular_v6 = re.findall(r"\d\d\d","https://docs.python.org/3/whatsnew/3.6.html/1234")

print (regular_v5)

# ['3', '3', '6']

print (regular_v6)

# ['123']

小d表示取数字0-9，大D表示不要数字，也就是出了数字以外的内容返回

regular_v7 = re.findall(r"\D","https://docs.python.org/3/whatsnew/3.6.html")

print (regular_v7)

# ['h', 't', 't', 'p', 's', ':', '/', '/', 'd', 'o', 'c', 's', '.', 'p', 'y', 't', 'h', 'o', 'n', '.', 'o', 'r', 'g', '/', '/', 'w', 'h', 'a', 't', 's', 'n', 'e', 'w', '/', '.', '.', 'h', 't', 'm', 'l']

“w”在正则里面代表匹配从小写a到z,大写A到Z，数字0到9

regular_v8 = re.findall(r"\w","https://docs.python.org/3/whatsnew/3.6.html")

print (regular_v8)

#['h', 't', 't', 'p', 's', 'd', 'o', 'c', 's', 'p', 'y', 't', 'h', 'o', 'n', 'o', 'r', 'g', '3', 'w', 'h', 'a', 't', 's', 'n', 'e', 'w', '3', '6', 'h', 't', 'm', 'l']

“W”在正则里面代表匹配除了字母与数字以外的特殊符号

regular_v9 = re.findall(r"\W","https://docs.python.org/3/whatsnew/3.6.html")

print (regular_v9)

# [':', '/', '/', '.', '.', '/', '/', '/', '.', '.']

1.2 获得两个字符串之间一大段文本内容:

用re或者string.find.以下用的是re代码

import re
#文本所在TXT文件
file = '123.txt'
 
#关键字1,2(修改引号间的内容)
w1 = '123'
w2 = '456'
 
f = open(file,'r')
buff = f.read()
#清除换行符,请取消下一行注释
#buff = buff.replace('\n','')
pat = re.compile(w1+'(.*?)'+w2,re.S)
result = pat.findall(buff)
print(result)

python正则表达式中re.S的作用

在Python的正则表达式中，有一个参数为re.S。它表示“.”（不包含外侧双引号，下同）的作用扩展到整个字符串，包括“\n”。看如下代码：

import re
a = '''asdfhellopass:
    123
    worldaf
    '''
b = re.findall('hello(.*?)world',a)
c = re.findall('hello(.*?)world',a,re.S)
print 'b is ' , b
print 'c is ' , c

运行结果如下：

b is  []
c is  ['pass:\n\t123\n\t']

正则表达式中，“.”的作用是匹配除“\n”以外的任何字符，也就是说，它是在一行中进行匹配。这里的“行”是以“\n”进行区分的。a字符串有每行的末尾有一个“\n”，不过它不可见。

如果只有一行字符串使用效果如下：

# -*- coding: cp936 -*-
import re

string = "xxxxxxxxxxxxxxxxxxxxxxxx entry '某某内容' for aaaaaaaaaaaaaaaaaa"
result = re.findall(".*entry(.*)for.*",string)
for x in result:
    print x

输出：
# '某某内容'

如果不使用re.S参数，则只在每一行内进行匹配，如果一行没有，就换下一行重新开始，不会跨行。而使用re.S参数以后，正则表达式会将这个字符串作为一个整体，将“\n”当做一个普通的字符加入到这个字符串中，在整体中进行匹配。

二.使用python解析Json字符串-获取Json字符串关键字

看代码

import json  
  
data = {  
    "statusCode": 200,  
    "data": {  
        "totoal": "5",  
        "height": "5.97",  
        "weight": "10.30",  
        "age": "11"  
    },  
    "msg": "成功"  
}  
  
#dumps:把字典转换为json字符串  
s = json.dumps(data)  
print s  
  
#loads:把json转换为dict  
s1 = json.loads(s)  
print s1  
#打印statusCode对应的值  
print s1["statusCode"]  
#打印data下age对应的值 
print s1["data"]["age"]

三.使用python对Excel进行读写操作

我们可以使用xlwt module将数据写入Excel表格，使用xlrd module从Excel读取数据。更好的建议有pymysql数据库以及CSV格式python输出csv格式，工作需要用前面的。

3.1 python安装xlrd-1.10和xlwt-1.3.0

首先下载安装包https://pan.baidu.com/s/1HvtpAgEfdtn1JAVOPNcJhw 密码：is83（来源网站）

cmd找到xlrd的路径，然后写setup.py install 安装

xlwt的安装同上

这样就安装完成了。

然后验证，打开Python 命令行，不报错，表明你安装成功了，恭喜你可以继续学习了。

3.2 对Excel进行读写操作

3.2.1 对Excel的写操作：

# -*- coding: utf-8 -*-
#导入xlwt模块
import xlwt
# 创建一个Workbook对象，这就相当于创建了一个Excel文件
book = xlwt.Workbook(encoding='utf-8', style_compression=0)
'''
Workbook类初始化时有encoding和style_compression参数
encoding:设置字符编码，一般要这样设置：w = Workbook(encoding='utf-8')，就可以在excel中输出中文了。
默认是ascii。当然要记得在文件头部添加：
#!/usr/bin/env python
# -*- coding: utf-8 -*-
style_compression:表示是否压缩，不常用。
'''
#创建一个sheet对象，一个sheet对象对应Excel文件中的一张表格。
# 在电脑桌面右键新建一个Excel文件，其中就包含sheet1，sheet2，sheet3三张表
sheet = book.add_sheet('test', cell_overwrite_ok=True)
# 其中的test是这张表的名字,cell_overwrite_ok，表示是否可以覆盖单元格，其实是Worksheet实例化的一个参数，默认值是False
# 向表test中添加数据
sheet.write(0, 0, 'EnglishName')  # 其中的'0-行, 0-列'指定表中的单元，'EnglishName'是向该单元写入的内容
sheet.write(1, 0, 'Marcovaldo')
txt1 = '中文名字'
sheet.write(0, 1, txt1.decode('utf-8'))  # 此处需要将中文字符串解码成unicode码，否则会报错
txt2 = '马可瓦多'
sheet.write(1, 1, txt2.decode('utf-8'))
 
# 最后，将以上操作保存到指定的Excel文件中
book.save(r'e:\test1.xls')  # 在字符串前加r，声明为raw字符串，这样就不会处理其中的转义了。否则，可能会报错

3.2.2 对Excel的写操作

表格如图

# -*- coding: utf-8 -*-
import xlrd
xlsfile = r"C:\Users\Administrator\Desktop\test\Account.xls"# 打开指定路径中的xls文件
book = xlrd.open_workbook(xlsfile)#得到Excel文件的book对象，实例化对象
sheet0 = book.sheet_by_index(0) # 通过sheet索引获得sheet对象
print "1、",sheet0
sheet_name = book.sheet_names()[0]# 获得指定索引的sheet表名字
print "2、",sheet_name
sheet1 = book.sheet_by_name(sheet_name)# 通过sheet名字来获取，当然如果知道sheet名字就可以直接指定
nrows = sheet0.nrows    # 获取行总数
print "3、",nrows
#循环打印每一行的内容
for i in range(nrows):
    print sheet1.row_values(i)
ncols = sheet0.ncols    #获取列总数
print "4、",ncols
row_data = sheet0.row_values(0)     # 获得第1行的数据列表
print row_data
col_data = sheet0.col_values(0)     # 获得第1列的数据列表
print "5、",col_data
# 通过坐标读取表格中的数据
cell_value1 = sheet0.cell_value(0, 0)
print "6、",cell_value1
cell_value2 = sheet0.cell_value(0, 1)
print "7、",cell_value2

参考资料：

http://blog.csdn.net/majordong100/article/details/50708365

http://www.cnblogs.com/lhj588/archive/2012/01/06/2314181.html

http://www.cnblogs.com/snake-hand/p/3153158.html

至此，自动处理就完成了，下面讲一下报的bug

1.UnicodeEncodeError: 'ascii' codec can't encode character...

在python2.7下，因为想从数据库中读出来分类名进行写入到文件,提示

UnicodeEncodeError: 'ascii' codec can't encode character u'\uff08' in position 12: ordinal not in range(128)

不用fp.write，用print打印却正常，这到底是怎么回来呢？

#! /usr/bin/python
# -*- coding: utf-8 -*-
import sys
print sys.getdefaultencoding();

运行上面的程序提示

ascii

原来如此，在程序的头部加上

import sys

reload(sys)
sys.setdefaultencoding('utf-8')

再次运行，错误消失。

总结一下，python2.7是基于ascii去处理字符流，当字符流不属于ascii范围内，就会抛出异常（ordinal not in range(128)。

2.ValueError: Expecting , delimiter

检查json格式错误，少了个逗号。

just_listen5

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
从log或txt中分割出json格式并取出关键字对应值打印到excel

最近比较闲找了个实习，学了一把python，用来处理工作中测试得到的log文件，特此写一下以上过程所学到的东西。____Xuefeng Zhang一.使用正则表达式分割log中接口处的json格式数据。1.1 正则表达式 re.findall 的简单用法正则 re.findall 的简单用法（返回string中所有与pattern相匹配的全部字串，返回形式为数组）语法： ...
复制链接

扫一扫