假设,客服值班时间9:00~23:00,需要统计这个时间段内每隔10分钟的并发量。客服信息表存储的数据库是MySQL数据库。
客服消息明细表
message_record table
id | empid | name | first_msg_time | last_msg_time | date |
---|---|---|---|---|---|
20202020 | 100200300 | 韩梅 | 2020-02-13 09:01:04 | 2020-02-13 18:17:19 | 2020-02-13 |
20202021 | 100200302 | 李雷 | 2020-02-13 09:20:28 | 2020-02-13 22:02:46 | 2020-02-13 |
第一步:调用包
import pymysql
import pandas as pd
import datetime
from collections import Counter
from tqdm import tqdm_notebook
第二步:提取数据
sql = '''
SELECT
id,
first_msg_time ,
year(date) as 'year',
month(date) as 'month',
day(date) as 'day',
hour(first_msg_time ) as 'first_msg_hour',
MINUTE(first_msg_time ) as 'first_msg_min',
last_msg_time,
hour(last_msg_time) as 'last_msg_hour',
MINUTE(last_msg_time) as 'last_msg_day',
day(last_msg_time) as 'day' # 获取工单结束日期,判断是否有跨天工单
FROM message_record
WHERE date ='2020-02-13'
and first_msg_time>='2020-02-13 09:00:00'
and first_msg_time<='2020-02-13 23:00:00'
order by firstMsg asc '''
自定义mysql数据库调用函数
def get_mysql_data(sql, db='abc'):
connection = pymysql.connect(host='100.00.100.100', user='admin', password='123456789abcdefh',port=3306,db=db,charset='utf8')
cur = connection.cursor()
sql_select = sql
cur.execute(sql_select)
result = cur.fetchall()
col_result = cur.description
columns = []
for i in range(len(col_result)):
columns.append(col_result[i][0])
df = pd.DataFrame(columns=columns)
for i in range(len(result)):
df.loc[i] = list(result[i])
cur.close()
connection.close()
return df
df=sqlDao.get_mysql_data(sql)
df.head(10)
第三步:处理数据
分钟分组
计算工单创建时间所处分钟区间和工单结束时间所处分钟区间
bins=[0,10,20,30,40,50,60]
labels=[0,10,20,30,40,50]
df['first_msg_min']=pd.cut(df['first_msg_min'],bins=bins,labels=labels,right=False)
df['last_msg_min']=pd.cut(df['last_msg_min'],bins=bins,labels=labels,right=False)
df.head(10)
计算工单跨区时长
跨天工单结束时间几座23:00
当天工单结束时间以工单实际结束时间为准
def get_interval_value(x):
if x["day"] == 13:
return (x['last_msg_hour']*60+x['last_msg_min']-x['first_msg_hour']*60-x['first_msg_min'])/10+1
else:
return (23*60+0-x['first_msg_hour']*60-x['first_msg_min'])/10+1
df['interval']=df.apply(get_interval_value,axis=1)
第四步:计算并发量
interval_time=[]
for index, row in tqdm_notebook(df.iterrows()):
first_interval_time=datetime.datetime(2020,2,13,row['first_msg_hour'],row['first_msg_min'])
delta=datetime.timedelta(minutes=10)
for i in range(int(row['interval'])):
print(first_interval_time+delta*i )
interval_time.append(first_interval_time+delta*i)
result = Counter(interval_time)
df_result=pd.DataFrame.from_dict(result,orient='index',columns=['count'])
df_result
OVER !!!