因工作需要对一个脚本里的var变量进行提取,看了很多教程,安装了一堆的库lxml、bs4、json,后来发现都没有用,最基本的re和requests就够了,先上var所在script 内容
<script>
var MyMarhq = '';
clearInterval(MyMarhq);
$('.tbl-body tbody').empty();
$('.tbl-header tbody').empty();
var str = '';
var Items = [{"cbbm":"部门","cbbmbm":"109","cbrbm":"360001128","cbrmc":"贾*","count":3},{"cbbm":"部门","cbbmbm":"502","cbrbm":"360001560","cbrmc":"张*","count":1},{"cbbm":"部门","cbbmbm":"109","cbrbm":"360001068","cbrmc":"赵*","count":5},{"cbbm":"部门","cbbmbm":"109","cbrbm":"360001121","cbrmc":"王*","count":1},{"cbbm":"部门","cbbmbm":"109","cbrbm":"360001564","cbrmc":"逄*","count":3}];
var Items_ = 0
需要提取 var Items 后面的内容,想了很多办法,都不好用,最后用正则表达式搞定。
import re
import requests
url = 'your url'#var 所在网页
resp = requests.get(url)
text = resp.text
# print(text)
cbrbms = re.findall(r""".+?cbrbm":"(.+?)"
.+?cbrmc":"(.+?)"
""", text, re.VERBOSE | re.DOTALL)
# '.+?'意思是任意一串字符,cbrbm和后面的符号表示用于识别的关键字,()表示要提取的内容,()后的双引号表示提取结束的标识, re.VERBOSE | re.DOTALL什么意思我也不太懂,但是一个不能少,少了就提取不出来。
效果就是提取了,cbrbm(人员编码)和cbrmc(人员名称),取回来是list类型。