一、计算查找重复的号码,查找长度不对的号码(应该用trim过滤下,待完善)
输入的excel表只有一列,列名为id,值是身份证号
import pandas as pd
path = r'C:\Users\BackUp\Desktop\培训人员名单(1).xlsx'
data = pd.read_excel(path)
df2 = pd.DataFrame()
df2['id_18'] = data['id']
df2['id_17'] = data['id'].apply(lambda x:x[:17])
df2['id_last'] = data['id'].apply(lambda x:x[-1])
df2['dup18'] = df2.id_18.duplicated()
df2['dup17'] = df2.id_17.duplicated()
print('重复项:')
print(df2[df2.id_18.duplicated()])
print('前17位存在非法字符:')
print(df2[df2.iloc[:,1].str.contains(pat='\s')])
print('最后一位不是数字或者不是Xx:')
print(df2[df2.iloc[:,2].str.contains(pat='[^0-9Xx]')])
print('长度不是18位:')
print(df2[df2.iloc[:,0].str.len()!=18])
二、在将第一步查找出的长度错误的号码修改正确后,检查第18位校验位与前17位的校验和是否一致,不一致说明号码填写有误。
import numpy as np
listid17 = df2['id_17'].apply(lambda x:list(map(int,list(x))))
X = np.array(listid17.tolist())
X = np.matrix(X)
theta = np.matrix(np.array([7,9,10,5,8,4,2,1,6,3,7,9,10,5,8,4,2]))
y = X*theta.T%11
laststr = ['1','0','X','9','8','7','6','5','4','3','2']
ar = [laststr[int(i)] for i in y]
s = pd.Series(ar)
df2['check'] = s
df2['id_last'] = df2['id_last'].str.upper()
df2['checkfail'] = df2['id_last']!=df2['check']
print(df2[df2['id_last']!=df2['check']])
df2.to_excel(r'D:\我的文档\文件\在线培训\结果1130.xlsx')
id_18 | id_17 | id_last | dup18 | dup17 | check | checkfail | |
0 | 11111111111111111616 | 1111111111111111161 | 6 | FALSE | FALSE | 6 | FALSE |
1 | 11111111111111111930 | 1111111111111111193 | 0 | FALSE | FALSE | 0 | FALSE |
2 | 11111111111111119028 | 1111111111111118402 | 8 | FALSE | FALSE | 8 | FALSE |
https://jingyan.baidu.com/article/7f41ececff944a593d095c8c.html