领导给了小张100万条电话号码数据和一批靓号规则,要求根据靓号规则将100万条电话号码数据筛选出来。靓号规则如下(号码后4位):
(1)AAAA
(2)ABCD
(3)AABB
(4)*AAA
(5)A*AA
(6)AA*A
(7)AAA*
(8)*520
(9)*521
(10)1314
思路如下:
#因电话号码为保密数据,为模拟应用场景,使用faker库生成虚拟数据,得到100万条手机号码数据。
import pandas as pd
import numpy as np
import faker
f = faker.Faker('zh-cn')
df = pd.DataFrame({
'phone_number':[f.phone_number() for i in range(1000000)]
})
#将数据转换为string类型,方便使用文本函数
df = df.astype('string')
#将手机号的后四位单独生成h4,h3,h2,h1四列,如后四位为5247,刚h4列为5,h3列为2,h2列为4,h1列为7。
df['h4'] = df.phone_number.str.slice(start = -4,stop = -3)
df['h3'] = df.phone_number.str.slice(start = -3,stop = -2)
df['h2'] = df.phone_number.str.slice(start = -2,stop = -1)
df['h1'] = df.phone_number.str.slice(start = -1)
#将h1,h2,h3,h4列类型转换为整形
df = df.astype({'h1':'int64','h2':'int64','h3':'int64','h4':'int64'})
此时生成的df表格如下:
接下来,开始筛选操作。
#按照靓号规则,分别筛选靓号并赋值生成新的DataFrame
df1 = df.query('h4 == h3 == h2 ==h1') #筛选靓号规则(1)AAAA
df2 = df.query('(((h4-h3 ==1) & (h3-h2 ==1) & (h2-h1 ==1)) | ((h1-h2==1) & (h2-h3==1) & (h3-h4 ==1)))') #筛选靓号规则(2)ABCD
df3 = df.query('( h4==h3 )&(h2 == h1)&( ( h3-h1 == 1 )|( h1 - h3 ==1 ) )') #筛选靓号规则(3)AABB
df4 = df.query('(h3 == h2 == h1)') #筛选靓号规则(4)*AAA
df5 = df.query(' (h4 == h2== h1) ') #筛选靓号规则(5)A*AA
df6 = df.query(' (h4 == h3== h1) ') #筛选靓号规则(6)AA*A
df7 = df.query(' (h4 == h3== h2) ') #筛选靓号规则(7)AAA*
df8 = df.query('(h3 == 5) &(h2 ==2)&(h1 ==0)') #筛选靓号规则(8)*520
df9 = df.query('(h3 == 5) &(h2 ==2)&(h1 ==1)') #筛选靓号规则(9)*521
df10 = df.query('(h4 == 1) &(h3 == 3)&(h2 ==1) &(h1 ==4)') #筛选靓号规则(10)1314
#将筛选出来的10个df合并
df00 = pd.concat([df1,df2,df3,df4,df5,df6,df7,df8,df9,df10],axis = 0,join = 'outer')
最后筛选出46411条靓号数据如下: