2020年05月_weixin_45903952

06月 05月 04月 03月 02月 01月

原创 python docx寻找到文中图片下载并反相处理

寻找到图片有两种方法，见下面代码，inline_shapes是内联图片的遍历，找到rid,用document_par.related_parts[rID]获取图片。用图片._blob保存。from docx import Document #pip3 install python-docx from docx.shared import Inches #英寸import os #from docx import Documentfrom docx.shared import Ptfro

2020-05-31 18:54:34 423

原创 python docx加入表格在表格中加图，设置框线

想在docx表格中插入图使表格框线设置为白色，即隐藏。以下代码为设置框线from docx.oxml import OxmlElementfrom docx.oxml.ns import qndef set_cell_border(cell, **kwargs): """ Set cell`s border Usage: set_cell_border( cell, top={"sz": 12, "val": "single",

2020-05-30 09:19:40 4190

原创 python 再复习一下遍历目录下文件及子文件夹

“./”：代表目前所在的目录。" . ./"代表上一层目录。“/”：代表根目录。注意点的位置就是了import osfor image in os.listdir(os.path.join(os.getcwd(),"利润表")): print(image)for root, dirs, files in os.walk("./", topdown=False): # "./利润表"ge print("所有文件: ") for name in files:

2020-05-24 19:05:27 447

原创 python open 判断图形进行分页截取

分析图像，在横向有一灰线，首先要判断位置，约在480-530间，只有白色和灰线，则取500为判断点位，另外在两页间，有广告，广告高度小于200广告与页面间也有灰线，判断小于200的，视为广告，不截取import osimport cv2 #pip install opencv-python# from matplotlib import pyplot as pltdef cut(start_y, end_y, width, number): save_path = "D:\\ima

2020-05-24 18:19:52 153

原创 pandas 出现：A value is trying to be set on a copy of a slice from a DataFrame的解决方法

想改变pandas dataframe中某数值的方法，用 dfc[‘A’][0] = 12 明显错误test.py:28: SettingWithCopyWarning:A value is trying to be set on a copy of a slice from a DataFrameSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.

2020-05-24 12:46:44 2641

原创 python pandas 两行或多行合并相加，并删除原有行

下面的方法只有方法一是正确的，方法二、三都无法正确处理，有知道怎么正确写法的请指教。import pandas as pdchengji = ([[100, 95, 100, 99], [90, 98, 99, 100], [88, 95, 98, 88], [99, 98, 97, 87], [96.5, 90, 96, 85], [94, 94, 93, 91], [91, 99, 92, 87], [85, 88, 85, 90], [90, 9

2020-05-23 23:33:24 6675

原创 pandas直取符合条件的某值

想取到名称为BB的class的值import pandas as pddf = pd.DataFrame([['AA',1,2,3],['BB',1,2,3],['CC',1,2,3],['DD',1,2,3]],columns=['name','age','class','type'])print(df)print(df['class'][df['name']=='BB'].values[0])print(df.loc[df['name']=='BB','class'].values[0])p

2020-05-23 14:33:24 1387

原创 python 几种用字典统计的写法

统计list中第一个元素的数量#统计list中的数量list1 = [['abc',6],['bcd',3],['bcd',2]]list2=list(set([x[0] for x in list1]))print([[0]*len(list2)])dict1=dict(zip(list2,[0]*len(list2)))for x in list1: dict1[x[0]]=dict1[x[0]]+x[1]print(dict1)dict2={}for x in list1

2020-05-20 19:33:55 779

原创 python 操作excel表格中图形到word

在excel 中图形为chart **的形式存在，图像以picture **的形式存在，则只取shape名称为chart的导入到WORD，使用docx 模块写入WORDfrom PIL import ImageGrab, Imageimport docxfrom docx.shared import Inchesfrom docx import Documentimport timeimport win32com.client as win32myDocument = Document(

2020-05-19 23:04:48 1983

原创 python docx首行缩进两字符的设定方法

用python 处理docx文档时，想设置首行缩进2字符，有的帖子给出用0.74CM代替，但设置字体后，很显然不是两个字符，找了网上的帖子，都没有合适的办法，于是手动设置文档后，读取后知道了：这是先设置的 # 首行缩进0.74厘米，即2个字符 paragraph_format.first_line_indent = Cm(0.74) 应该是这样设置 paragraph_format.first_line_indent =406400怎么知道属性的呢，用下面的办法

2020-05-18 20:06:36 10451 5

原创 pandas 指定某两行或多行相加

指定pandas某两行的数据相加，如果是列相加，直接data[‘列1’]=data[‘列2’]+data[‘列3’]即可，但行相加，则没有直接可用方法，采用下面的sum()的方法可以实现两行或多行相加data.loc[heji[0]]=data.loc[data[‘p’].isin(heji[1])].sum()data.loc[heji[0],‘p’]=heji[0]import pandas as pdchengji = [['N', 95,0], ['N', 100,88], ['N', 8

2020-05-16 10:23:43 14521

原创 pandas dataframe对除数是零的处理

如下例data2[‘营业成本率’] = data2[‘营业成本本年累计’]/data2[‘营业收入本年累计’]*100但有营业收入本年累计为0的情况，则营业成本率为inf,即无穷大，而需要在表中体现为零，用如下方法填充： data2['营业成本率'] = data2['营业成本本年累计']/data2['营业收入本年累计']*100 data2['营业成本率'].replace([np.inf, -np.inf, "", np.nan], 0, inplace=True)

2020-05-15 23:10:24 6023 2

原创 python 批量转换docx只转换了一个出现pywintypes.com_error被调用的对象已与其客户端断开连接

如下，把txt文件或.doc文件转换为docx，结果只转换了一个pywintypes.com_error: (-2147417848, ‘被调用的对象已与其客户端断开连接。’, None, None)#转换doc为docxdef doc2docx(fn): word = client.Dispatch("Word.Application") # 打开word应用程序 #for file in files: doc = word.Documents.Open(fn) #打开wor

2020-05-15 19:38:39 2888

原创 python 关于generator 和lambda转为列表的解决

我想给newtitle加个wslist，怎么出来的是[<function . at 0x0000000015300790>,或者是generator object at 0x00000000152F5F20>titlename=(['new_date','u01','u02','u03','s01','s02','s03'])wslists=['pm25','pm10','so2']for wslist in wslists: newtitle=([lambda x=x:ws

2020-05-13 21:05:26 456

原创 python 爬虫之soup标签内值的取法

import requestsfrom bs4 import BeautifulSoupres = requests.get('http://books.toscrape.com/catalogue/category/books/travel_2/index.html')soup = BeautifulSoup(res.text, 'html.parser')article = soup.find_all('article', class_='product_pod')print(article

2020-05-13 18:18:18 755

原创 python爬虫练习之soup1

import reimport requestsfrom bs4 import BeautifulSoupimport bs4def getHTMLText(url, headers): try: r = requests.get(url, timeout=30, headers=headers) r.raise_for_status() r.encoding = r.apparent_encoding return r.t

2020-05-12 20:28:39 229

原创 python爬虫练习基础篇

import reimport requestsfrom bs4 import BeautifulSoupheaders={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'}url='http://pccz.court.gov.cn/pcajxxw/pcws/wsxq?id=9EE627

2020-05-12 19:51:29 328

原创 python去除网页内容标签形式

2020-05-12 19:25:09 600

原创 python openpyxl 画框线及背景色

from openpyxl.chart.shapes import GraphicalProperties,LineProperties #去掉 chart.graphical_properties = GraphicalProperties(ln=LineProperties(noFill=False)) props1 = GraphicalProperties(solidFill="8BADD9") # Style the lines chart.series[...

2020-05-11 21:25:10 2432

原创 python pandas 写入dataframe和Series改列名

dataframe可以用下面的方法直接改列名test.columns = [‘c’,‘b’]test.rename(columns={‘a’:‘c’},inplace=True)而只有一列时，需要用下面的方法data0=data0[‘Close’]data0.rename(ticker,inplace=True)

2020-05-09 23:11:05 2977

原创 python替换字符中的某个字符

要求在字符串里找到不同的，并用？代替，总结：方法1较快s=['83BD 44FaFFFF','83BD 55FEFFFF','83BD 66FEFFFF']first=s[0]diff=[]for s1 in s: for num in range(len(s1)): if first[num]!=s1[num] and num not in diff: diff.append(num)#方法1result=[]for s1 in s:

2020-05-08 22:44:49 1105

原创 python list中按某值排序

incomelist=[['10932','zhs',15805,4100,2310,983,330,1000],['10933','zhs',15002,4200,2320,986,330,1000],['10934','zhs',15003,4300,2330,989,330,1000],]print(sorted(incomelist,key=lambda x:x[2]),rev...

2020-05-07 20:16:17 6147 2