通过github、百度等搜索一些域名后发现域名包含三级、四级域名,如jira.xxx.edu.cn,我们需要的是xxx.edu.cn,所以写了个小脚本实现下:
#!/usr/bin/env python import sys url_list = [] def main(): with open(sys.argv[1],'r') as fr: for line in fr.readlines(): line = line.strip() if '.' in line.split('.edu')[0]: if len(line.split('.')) > 2: line = line.split('.')[-3] + '.' + line.split('.')[-2] + '.' + line.split('.')[-1] #print line url_list.append(line) with open('edu.txt','a') as fw: for uri in url_list: fw.write(uri + '\n') if __name__ == '__main__': main()