数据处理
呜啦吧哈
这个作者很懒,什么都没留下…
展开
-
统计计数
from collections import Countern = ['a','b','c','a','d','d']m = dict(Counter(n))输出键是元素,值是元素出现的次数print(m){'a': 2, 'b': 1, 'c': 1, 'd': 2}原创 2021-05-20 14:00:32 · 120 阅读 · 0 评论 -
python表格合并处理
提取表格#%%从pdf中提取表格import tabuladf = tabula.read_pdf("E:\\testdata\\zhushengnan\\Controlled Substances by CSA Schedule-17.pdf", pages='all')表头处理import pandas as pddf_result = []for df_1 in df: column = df_1.columns.values.tolist() df_c .原创 2020-11-04 16:15:45 · 823 阅读 · 1 评论 -
excel批量改格式
import osimport os.pathimport win32com.client as win32## 根目录rootdir = u'E:\\testdata\\combine\\2020Q1'# 三个参数:父目录;所有文件夹名(不含路径);所有文件名for parent, dirnames, filenames in os.walk(rootdir): for ...转载 2020-04-09 11:41:07 · 902 阅读 · 1 评论 -
删除重复数据,保留最新数据
delete T from(select ROW_NUMBER() over(partition by [AproveNo] order by [date] desc) as row_number, *from ku.dbo.biao) T where T.row_number>1;原创 2020-03-02 15:21:16 · 448 阅读 · 0 评论 -
翻译接口
有道http://fanyi.youdao.com/translate?&doctype=json&type=AUTO&i=要翻译的文本搜狗https://deepi.sogou.com/doccenter/texttranslatedoc?fr=process,接入文档url = "http://fanyi.sogou.com:80/reventondc/...原创 2020-01-20 18:34:48 · 2405 阅读 · 0 评论 -
理解numpy.bincount
import numpy as npy = np.bincount(array([1, 0, 1]))##理解:最大值为1,所以输出有1+1=2个>array([1, 2], dtype=int64)##索引0出现1次,索引1出现2次,所以输出[1,2]np.bincount(np.array([y_train[i] for i in small_k])).argsor...原创 2019-09-17 14:30:51 · 203 阅读 · 0 评论 -
函数积累
有值为yes,无值为no(可用于缺失值处理)def set_Cabin_type(df): df.loc[ (df.Cabin.notnull()), 'Cabin' ] = "Yes" df.loc[ (df.Cabin.isnull()), 'Cabin' ] = "No" return df...原创 2019-09-03 16:26:05 · 195 阅读 · 0 评论 -
python 对dataframe按照id分组然后以分号拼接后面的字符
import pandas as pddf = {'id':['a','a','b','c','c'], 'en':['apple','app','w','cf','as']}df = pd.DataFrame(df)grouped = df.groupby('id')result = grouped['en'].unique()result2 = ...原创 2019-06-17 10:44:37 · 2746 阅读 · 0 评论 -
解决python 问题合集
1、解决matplotlib图例中文乱码?2、升级pippython -m pip install --upgrade pip遇到 could not install packages due to an environmenterror:在install后加上--userpython -m pip install --user --upgrade pip...转载 2019-06-04 10:47:44 · 227 阅读 · 0 评论 -
格式化字符串
#格式化字符串name = 'jenny'a = "%s is 好人" % name转载 2019-06-03 13:56:44 · 125 阅读 · 0 评论 -
Anaconda创建deepchem环境
管理员身份运行创建python3.6.5环境conda create --name py365 python=3.6.5 激活环境conda activate py365查看python版本python --version安装tensorflowconda install tensorflow安装deepchempip install deepchem安装deep...原创 2019-04-10 15:59:49 · 2077 阅读 · 0 评论 -
Sql sever命令积累
#筛选当天更新数据,并通过Rank排序select *from table where DATEDIFF (dd,FDate,GETDATE())=0 order by Rank #筛选当天更新数据,并通过Rank倒序select *from table where DATEDIFF (dd,FDate,GETDATE())=0 order by Rank desc #修改值u...原创 2019-01-11 10:56:17 · 135 阅读 · 0 评论 -
dataframe数据处理
创建dataframedf1 = pd.DataFrame(data1, columns = ["a", "b"])ndarray转化为DataFrameIn:type(y)Out:numpy.ndarrayfrom pandas import DataFramepre = DataFrame(y,columns=['pre'])import pandas a...原创 2019-01-09 11:31:45 · 798 阅读 · 0 评论 -
Python正则积累
去除标点test['clean'] = test.content.apply(lambda x:re.sub(r'[\s+\.\!\/_,$%^*(+\"\')]+|[+——()?【】“”!,。?、~@#¥%……&*()]+::', " ",x))取出中文chapter['ch'] = chapter.content.apply(lambda x:re.sub(r'[...原创 2019-01-09 16:35:09 · 195 阅读 · 0 评论 -
运用python将数据转为refworks格式
转为refworks格式import pandas as pd#from pandas.core.frame import DataFrame"""读取csv文件"""rawdata = pd.read_csv('E:\\testdata\\co2000.csv',encoding = "utf-8")"""将含有NaN的行去掉"""#dat原创 2019-03-13 13:32:23 · 3722 阅读 · 8 评论 -
查询匹配(python)
import pymssqlimport pandas as pdimport re###连接sql sever并读取数据到dataframeconn = pymssql.connect('WIN****', 'sa', '*****', 'jsontest')cursor = conn.cursor()sql = "select drugid,pclass,pName,pNam...原创 2019-04-28 10:05:39 · 329 阅读 · 0 评论 -
运用python翻译
参考来源:https://www.cnblogs.com/webRobot/p/5407193.htmlimport requests,bs4####单词去除空格,名词符号####def word_format(word): #去掉空格 word1 = word.strip() word2 = word1.replace("\n",'') word3 ...转载 2019-05-07 15:33:08 · 124 阅读 · 0 评论 -
运用Python批量拆分字符
环境:python3原始表:拆分后的表:import pandas as pd#from pandas.core.frame import DataFrame"""读取文件"""rawdata = pd.read_csv('E:\\testdata\\test1.csv',encoding = "GBK")rawdata2原创 2018-11-08 15:51:21 · 603 阅读 · 0 评论