之前一篇博文提到了随机密码。出于深入探索目的,尝试暴力枚举6至12位所有的密码组合。初步编辑修改后,可运行的代码如下:
numandalp = {
"0","1","2","3","4","5","6","7","8","9",
"a","b","c","d","e","f","g","h","i","j","k","l","m",
"n","o","p","q","r","s","t","u","v","w","x","y","z",
"A","B","C","D","E","F","G","H","I","J","K","L","M",
"N","O","P","Q","R","S","T","U","V","W","X","Y","Z"
}
def console_log(f):
def core(*x):
a = f(*x)
count = 0
for _ in a:
print(_,end=",")
count += 1
if count%1000==0:
print("已输出{}条".format(count))
print("输出完毕!")
return a
return core
@console_log
def easydict(basic,n):
result = []
if n == 1:
for _ in basic:
result.append(_)
else:
for _ in basic:
lastresult = easydict(basic,n-1)#装饰器输出txt文件乱码
for lr in lastresult:
result.append(_ + lr)
return result
print(len(easydict(numandalp,2)))
需要先说明一下:这样的代码不符合通常的习惯:即弄一个主函数,然后结尾总是判断一个名字属性再运行主函数。我之所以会写成这样,纯粹是因为脚本编写习惯,而且我们通常看主函数接着那个判断语句,完全是在主函数以外的,这就有点四不像了:(1)以严格著称的C语言,主函数以外,是没有这样单独一条语句露出来的,(2)这其实就是脚步文件才存在的写法。
我这种脚本式写法对于做小玩意是无伤大雅的,即使是用自带的 IDLE 按F5,或者复制到python.exe,都能运行起来。不过到了数百行代码的东西,或者几个py文件的时候,建议还是按照多数Python开发的习惯写,这样在阅读上也比较规范。
言归正传,实际上上面的代码并不建议真的拿去运行,当然是因为坑我已经踩了。主要的坑就是:递归。递归造成的性能的问题,去到5位密码测试时,就已经卡死停不下来了,这还是python.exe环境的,自带 IDLE 相信只会瞬间崩溃。另外就是这个递归还加上了装饰器,当我用2位密码去测试时,发现每一层每一次返回都执行了一次日志,这就是纯粹的多余了。递归带有装饰器的问题,有没有解,怎样解,暂且不去深究。
如果不用递归,还是可以老老实实去用循环的,但由于组合太多,实现起来却不是那么回事:每一位有 10 + 26 + 26 = 62 种可能,那么两位就有 62 × 62 = 3844 种组合,四位有 3844 × 3844 = 14776336 种,已经是千万级的数据量了。假设1秒钟能输出1000次,则也要1万多秒,已经是接近3个小时。如果1秒钟只能输出100次,即每次0.01秒,时间就要超过1天。
def save_to_txt(f):
def core(*x):
a = f(*x)
with open(r"d:\fourcharps.txt","a") as pss:
count = 0
for _ in a:
pss.write(_)
pss.write(",")
count += 1
if count%10000==0:
print("已输出{}条".format(count))
print("输出完毕!")
return a
return core
@save_to_txt
def easydict2(basic,n):
result = []
if n == 1:
for _ in basic:
result.append(_)
elif n == 2:
for i in basic:
for j in basic:
result.append(i + j)
elif n==3:
for i in basic:
for j in basic:
for k in basic:
result.append(i + j + k)
elif n==4:
for a in basic:
for b in basic:
for c in basic:
for d in basic:
result.append(a + b + c + d)
elif n==5:
...
return result
print(len(easydict2(numandalp,4)))
以上可以说是最简单粗暴的代码,疑似“屎山”,往下就不必多写了。这样循环的函数带装饰器,看来也是运行正常的。不过也有两个出人意料的结果:(1)四位数的密码并没有要很久,几分钟就输出完毕了,(2)输出的文本文件足足有70多M(已经是分隔符改成逗号的结果)!这就导致已经无法打开记事本查看了,因为这个容量的txt文件只会卡死。于是只能故技重施:先看看两位的结果,结果还是意料之外:
------------------------------------- PythonKaiser 分割线 PythonKaiser -------------------------------------
------------------------------------- PythonKaiser 分割线 PythonKaiser -------------------------------------
尽管这不能算是乱码(这只是读不懂的文字),而且读取后输出还是正常的字符,但还是无法直接查看结果是否重复。事已至此,要考虑的恐怕不是简单的组合、输出了,而是要怎样做这件事才会得到最初简单的目标:能正常打开,能目测检查。
另外借助bat脚本输出的,速度慢得多:
来到四位的时候就已经出问题了。因为手提电脑会自动睡眠,尽管查看 CPU 和内存占用并不高,充电还是会超级发热(表扬下华为电脑和充电器,这种高温居然还不自动关机烧掉),所以我的跑了两天还不到一半。两者的特点也对比出来:bat的运行慢,但直接查看正常;python运行快,直接查看不正常。针对各自问题,可以考虑改进方法:(1)bat每次组合分开保存文件,这样文件数目会指数上升。(2)python保存时改变文件编码,看是否会正常。
枚举的结果除了保存为txt,还可以保存到sqlite3数据库里面(建议事先创建好数据库文件和表)。以四位数为例,时间仍然是不算长的,不过5分钟多一点。不过这样做,自然是很难查看全部的,相当于黑盒子,只能间接检查总数是对的,而且 db 文件达到近300M大小!另外,我的测试是从1位到4位逐列输出,随机挑一些组合检查,结果会发现每一列的数据都是 62 + 62 × 62 + 62 × 62 × 62 + 62 × 62 × 62 × 62 = 15018570 行?继续测试后不难发现:输出每一列时,其他的列都相应添加了实际是NULL的数据(以SqliteStudio的角度看),即Python获取到的None,这些都算在行数内,导致查找速度极慢,这类小问题,暂且认为是数据库的问题,同时也可以是换其他数据库用的现实理由,哈哈~
def save_to_sqlite(f):
def core(*x):
with sqlite3.connect(r"d:\greatdatas.db") as stsql:
c = 1
t0 = time.time()
for _ in f(*x):
stsql.cursor().execute("insert into allpasswords (b{}) values ('{}')".format(x[1],_))
if c%10000==0:
print("已录入{}条,用时{:.2f}分钟".format(c,(time.time()-t0)/60))
stsql.commit()#优化位置,时间仅是每条都执行的百分之一
c += 1
stsql.commit()#收尾,但实际上漏了仍然保存成功
return 0
return core
@save_to_sqlite
def easydict3(basic,n):
result = []
if n == 1:
for _ in basic:
yield _
elif n == 2:
for i in basic:
for j in basic:
yield i + j
elif n==3:
for i in basic:
for j in basic:
for k in basic:
yield i + j + k
elif n==4:
for a in basic:
for b in basic:
for c in basic:
for d in basic:
yield a + b + c + d
elif n==5:
...
return result
easydict3(numandalp,1)
easydict3(numandalp,2)
easydict3(numandalp,3)
easydict3(numandalp,4)