提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
前言
前天在给用户通过SQL操作数据时,由于数据过多超过10万以上的数据,在DB直接弄出来挺费劲,发现Python里这pandas工具做数据还是挺管用的,几十万的数据Python轻松搞定,学习了下pandas使用
提示:以下是本篇文章正文内容,下面案例可供参考
一、pandas是什么?
示例:pandas 是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。Pandas 纳入了大量库和一些标准的数据模型,提供了高效地操作大型数据集所需的工具。pandas提供了大量能使我们快速便捷地处理数据的函数和方法。
二、使用步骤
1.开发工具
操作系统:Windows 10
开发软件:spyder 3
Python版本:3.6
2.详细代码
代码如下(示例):
# -*- coding: utf-8 -*-
import datetime
import numpy as np
import sqlalchemy
import pandas as pd
#db_engine = sqlalchemy.create_engine('oracle+cx_oracle://mesai:ai#525@WBOEE_UAT',echo=True) # oracle
db_engine = sqlalchemy.create_engine('oracle+cx_oracle://mesai:ai#525@wboee.world',echo=True)
try:
# SQL 查询语句
sql = "SELECT * FROM PDD_ACTIVE_DEVICE WHERE ACT_FLAG='Y' AND DEVICE_TYPE='PRD' AND STATUS='Released' AND IDENTIFY_STATE='Active' AND create_time>'20201230000000'"
df_ai_pdd = pd.read_sql_query(sql, db_engine)
sql = "SELECT * FROM mesai.pdd_active_wirebond"
df_ai_wirebond = pd.read_sql_query(sql, db_engine)
df_ai = pd.merge(df_ai_pdd,df_ai_wirebond,left_on='sysid', right_on='deviceid',how='left')
col_name=df_ai.columns.tolist()
col_name.insert(1,"sub_con")
df_ai=df_ai.reindex(columns=col_name)
df_ai['sub_con']=""
df_ai["tpn"]=""
df_ai["bonding"] = np.select([df_ai['cust_code'] == "GDS",
df_ai['cust_code'] == "SLW"],
[df_ai['s_scc_dwg_no'], df_ai['s_scc_dwg_no']+df_ai['s_scc_dwg_rev']],
default = 'XXX')
df_ai["wiretype"]=""
df_ai["wiretype"] = np.where(df_ai["wire_type"].isnull(),"", df_ai["wire_type"])
df_ai["wiretype"] = np.select([df_ai['wiretype'] == "AG WIRE",df_ai['wiretype'] == "GOLD WIRE",
df_ai['wiretype'] == "COPPER WIRE"],
["A","G","C"],
default = '')
df_ai["pin1方向"]="对缺角"
df_ai["pin1"]=""
df_ai["pin1"]=np.where(df_ai["cust_code"]=="SLW","NA", "")
df_ai["str"]=""
df_ai["m_q"]=""
data = df_ai[(df_ai["cust_code"].isin(['GDS','SLW']))]
data_01 = data[data["cust_code"]=="GDS"]
data_01 = data_01[['sub_con','s_cust_device_desc','cust_device','cust_code','s_scc_dwg_no', 's_scc_dwg_rev','s_marking_spec','tpn','bonding','pin1方向','wiretype','pin1','str','m_q']].rename(columns={'s_cust_device_desc':'cpn','cust_device':'ipn','s_marking_spec':'marking'})
#CUST_DEVICE NOT LIKE 'G%'
data_02 = data[(data["cust_code"]=="SLW")&(~df_ai_pdd["cust_device"].str.startswith("G"))]
data_02 = data_02[['sub_con','s_cust_device_desc','cust_device','cust_code','s_scc_dwg_no', 's_scc_dwg_rev','s_marking_spec','tpn','bonding','pin1方向','wiretype','pin1','str','m_q']].rename(columns={'s_cust_device_desc':'cpn','cust_device':'ipn','s_marking_spec':'marking'})
data_new = pd.concat([data_01,data_02])
print(data_new.head(50))
print(data_new.shape)
data_new.to_excel("E:\\WebFiles\\{0}.xlsx".format(datetime.datetime.now().strftime('%Y%m%d%H%M%S')),index=False)
print("执行成功!!")
except Exception as e:
print('exception!!!' + e)
finally:
#db_conn.close()
db_engine.dispose()
总结
记录点点滴滴