Python能用古诗词数据库做什么5:获取诗人简介(含代表作)

本文关注如何利用古诗词数据库获取指定的某个诗人简介(包括代表作),以及获取指定朝代或所有朝代的诗人简介并保存为txt文件。

导入模块

由于古诗词数据库中的数据均以繁体形式存储,因此本程序需要用到zhconv模块,用来进行中文繁简体转换。

import zhconv

获取指定诗人的简介

下面的函数的作用是从数据库中获取指定姓名的某个诗人的简介:

def desc(name):
    
    for audict in aulist:
        if zhconv.convert(audict['name'], 'zh-cn') == name:
            if 'desc' in audict:
                return audict['desc']
            else:
                return audict['short_description']

在该数据库中,关于诗人简介的键名有“desc”和“short_description”两种。

获取指定诗人的诗的索引的列表

def poemx(name):
    
    poemxlist = []
    for poemdictx in range(len(poemlist)):
        poemdict = poemlist[poemdictx]
        if zhconv.convert(poemdict['author'], 'zh-cn') == name:
            poemxlist.append(poemdictx)
    return poemxlist

本函数的作用是在全体诗歌列表poemlist中找出所有指定诗人诗作的索引,并保存在poemxlist列表中并返回。poemlist在下文的函数中会给出定义。

获取诗的索引的列表中不超过十首知名度最大的诗的索引的列表

def toppoemx(poemxlist):
    
    if len(poemxlist) <= 10:
        return poemxlist
    else:
        toppoemxlist = []
        for time in range(10):
            topx = max(poemxlist, key=lambda poemx: get_a_zmd(zmdlist[poemx]))
            toppoemxlist.append(poemxlist.index(topx))
            poemxlist.remove(topx)
        return toppoemxlist

本函数的作用是在诗作数量超过10首的情况下,找出10首知名度最大的作为该诗人的代表作。返回值仍然是一个索引的列表。

获取指定单个诗人简介

def poet(name):
    
    poetintro = '姓名:%s\n' % name
    
    poetintro += '简介:%s\n' % zhconv.convert(desc(name), 'zh-cn')
    
    poetintro += '代表作:\n'
    toppoemxlist = toppoemx(poemx(name))
    if 'title' in poemlist[0]:
        titlekey = 'title'
    else:
        titlekey = 'rhythmic'
    for topx in range(len(toppoemxlist)):
        toppoemdictx = toppoemxlist[topx]
        toppoemdict = poemlist[toppoemdictx]
        title = zhconv.convert(toppoemdict[titlekey], 'zh-cn')
        paras = zhconv.convert('\n'.join(toppoemdict['paragraphs']), 'zh-cn')
        poetintro += '《' + title + '》\n' + paras + '\n'
    
    return poetintro

指定单个诗人的简介由姓名、简介、(不超过10首)代表作组成,均利用zhconv模块转换为简体,最后返回一个字符串。

打印指定单个诗人简介

# 打印指定单个诗人简介
def ask_for_a_poet():
    
    global poemlist, aulist, zmdlist
    
    name = input('输入诗人或词人简体姓名:')
    
    d = input('''t: 唐诗
s: 宋诗
c: 宋词
input one letter: ''')
    
    if d == 't':
        dec_tang()
        dec_auOtang()
        dec_zmdOtang()
        poemlist = tanglist
        aulist = auOtanglist
        zmdlist = zmdOtanglist
    elif d == 's':
        dec_song()
        dec_auOsong()
        dec_zmdOsong()
        poemlist = songlist
        aulist = auOsonglist
        zmdlist = zmdOsonglist
    elif d == 'c':
        dec_ci()
        dec_auOci()
        dec_zmdOci()
        poemlist = cilist
        aulist = auOcilist
        zmdlist = zmdOcilist
    
    print(poet(name))

目前支持创作唐诗、宋诗、宋词中某一类别的诗人。
对于指定的某一诗人,只需将其所属类别的相关JSON文件进行解码(参见Python能用古诗词数据库做什么1:准备工作——JSON文件解码)后用上文所述函数获取简介即可。

获取所有诗人简介并保存为txt文件

def poets():
    
    global poemlist, aulist, zmdlist
    
    dec_tang()
    dec_auOtang()
    dec_zmdOtang()
    dec_song()
    dec_auOsong()
    dec_zmdOsong()
    dec_ci()
    dec_auOci()
    dec_zmdOci()
    
    now = datetime.now()
    nowymd = (now.year, now.month, now.day)
    afterfiletitle = '\n制作者:少陵野小Tommy\n数据来源:https://github.com/chinese-poetry/\n制作日期:%d/%d/%d' % nowymd
    
    start_skip = input('''"tangchaoshiren":
press enter to start or input anything to skip: ''')
    if start_skip == '':
        # 唐朝诗人
        tangpoetfile = open(cwd + 'poetry/chinesepoetry/tangchaoshiren.txt', 'w', encoding='UTF-8')
        print('"tangchaoshiren.txt" has been created.')
        poemlist = tanglist
        aulist = auOtanglist
        zmdlist = zmdOtanglist
        donenum = 0
        print('Writing in ... 0 / 3675')
        tangpoetfile.write('唐朝诗人简介' + afterfiletitle)
        for audict in aulist:
            if 'author' in audict:
                name = zhconv.convert(audict['author'], 'zh-cn')
            else:
                name = zhconv.convert(audict['name'], 'zh-cn')
            poetintro = poet(name)
            tangpoetfile.write('\n\n' + poetintro)
            donenum += 1
            print('Writing in ... %d / 3675' % donenum)
        tangpoetfile.close()
    
    start_skip = input('''"songchaoshiren":
press enter to start or input anything to skip: ''')
    if start_skip == '':
        # 宋朝诗人
        songpoetfile = open(cwd + 'poetry/chinesepoetry/songchaoshiren.txt', 'w', encoding='UTF-8')
        print('"songchaoshiren.txt" has been created.')
        poemlist = songlist
        aulist = auOsonglist
        zmdlist = zmdOsonglist
        donenum = 0
        print('Writing in ... 0 / 8934')
        songpoetfile.write('宋朝诗人简介' + afterfiletitle)
        for audict in aulist:
            name = zhconv.convert(audict['name'], 'zh-cn')
            poetintro = poet(name)
            songpoetfile.write('\n\n' + poetintro)
            donenum += 1
            print('Writing in ... %d / 8934' % donenum)
        songpoetfile.close()
    
    start_skip = input('''"songchaociren":
press enter to start or input anything to skip: ''')
    if start_skip == '':
        # 宋朝词人
        cipoetfile = open(cwd + 'poetry/chinesepoetry/songchaociren.txt', 'w', encoding='UTF-8')
        print('"songchaociren.txt" has been created.')
        poemlist = cilist
        aulist = auOcilist
        zmdlist = zmdOcilist
        donenum = 0
        print('Writing in ... 0 / 1563')
        cipoetfile.write('宋朝词人简介' + afterfiletitle)
        for audict in aulist:
            name = zhconv.convert(audict['name'], 'zh-cn')
            poetintro = poet(name)
            cipoetfile.write('\n\n' + poetintro)
            donenum += 1
            print('Writing in ... %d / 1563' % donenum)
        cipoetfile.close()
    
    print('Done.')

本函数可以分别得到唐朝诗人、宋朝诗人、宋朝词人的简介,并保存在三个txt文件中。用户可以对三者分别选择获取或不获取。本函数包含了一个简易进度条,获取简介并写入txt文件的过程需要较长的时间。

完整程序

结合之前的4篇文章,本程序的完整代码如下:

import json
from copy import copy
from pypinyin import pinyin, Style
from collections import Counter
from datetime import datetime
import zhconv

decoded = []

# 询问要解码的JSON文件
def ask_dec_require():
    
    # A -> All
    # E -> Except
    # O -> Of
    
    require = input('''A: 全部
AEpzai: 除诗词平仄、知名度、作者、介绍外全部
tang: 全唐诗
song: 全宋诗
ci: 全宋词
hua: 花间集
nan: 南唐二主词
lun: 论语
shij: 诗经
you: 幽梦影
siwu: 四书五经(仅《大学》《孟子》《中庸》)
meng: 蒙学(《三字经》《百家姓》《千字文》《弟子规》《幼学琼林》《朱子家训》《千家诗》《古文观止》《唐诗三百首》《声律启蒙》《文字蒙求》《增广贤文》)
na: 纳兰性德诗集
caocao: 曹操诗集
chuci: 楚辞
qv: 元曲
pzOtang: 全唐诗平仄
pzOsong: 全宋诗平仄
zmdOtang: 全唐诗知名度
zmdOsong: 全宋诗知名度
zmdOci: 全宋词知名度
auOtang: 全唐诗作者
auOsong: 全宋诗作者
auOci: 全宋词作者
auOnan: 南唐二主词作者
introOnan: 南唐二主词介绍
tang300: 唐诗三百首
ci300: 宋词三百首
separate with blanks: ''')
    
    rlist = require.split()
    return rlist

cwd = 'E:/python/'

# 读取JSON文件
def read_json(path):
    readjs = open(cwd + path, 'r', encoding='utf-8')
    return json.load(readjs)

# 全唐诗
def dec_tang():
    
    global tanglist
    
    if 'tang' not in decoded:
        tanglist = []
        print('decoding 全唐诗 ...')
        for namenum in range(0, 57001, 1000):
            path = 'poetry/chinesepoetry/json/poet.tang.%d.json' % namenum
            tanglist += read_json(path)
        decoded.append('tang')
        print('Successfully decoded: 全唐诗')
        print('Total number: %d' % len(tanglist))
    
    else:
        print('全唐诗 has been decoded.')

# 全宋诗
def dec_song():
    
    global songlist
    
    if 'song' not in decoded:
        songlist = []
        print('decoding 全宋诗 ...')
        for namenum in range(0, 254001, 1000):
            path = 'poetry/chinesepoetry/json/poet.song.%d.json' % namenum
            songlist += read_json(path)
        decoded.append('song')
        print('Successfully decoded: 全宋诗')
        print('Total number: %d' % len(songlist))
    
    else:
        print('全宋诗 has been decoded.')

# 全宋词
def dec_ci():
    
    global cilist
    
    if 'ci' not in decoded:
        cilist = []
        print('decoding 全宋词 ...')
        for namenum in range(0, 21001, 1000):
            path = 'poetry/chinesepoetry/ci/ci.song.%d.json' % namenum
            cilist += read_json(path)
        decoded.append('ci')
        print('Successfully decoded: 全宋词')
        print('Total number: %d' % len(cilist))
    
    else:
        print('全宋词 has been decoded.')

# 花间集
def dec_hua():
    
    global hualist
    
    if 'hua' not in decoded:
        hualist = []
        print('decoding 花间集 ...')
        namelist = ['1', '2', '3', '4', '5', '6', '7', '8', '9', 'x']
        for name in namelist:
            path = 'poetry/chinesepoetry/wudai/huajianji/huajianji-%s-juan.json' % name
            hualist += read_json(path)
        decoded.append('hua')
        print('Successfully decoded: 花间集')
        print('Total number: %d' % len(hualist))
    
    else:
        print('花间集 has been decoded.')

# 南唐二主词
def dec_nan():
    
    global nanlist
    
    if 'nan' not in decoded:
        print('decoding 南唐二主词 ...')
        path = 'poetry/chinesepoetry/wudai/nantang/poetrys.json'
        nanlist = read_json(path)
        decoded.append('nan')
        print('Successfully decoded: 南唐二主词')
        print('Total number: %d' % len(nanlist))
    
    else:
        print('南唐二主词 has been decoded.')

# 论语
def dec_lun():
    
    global lunlist
    
    if 'lun' not in decoded:
        print('decoding 论语 ...')
        path = 'poetry/chinesepoetry/lunyu/lunyu.json'
        lunlist = read_json(path)
        decoded.append('lun')
        print('Successfully decoded: 论语')
        print('Total number: %d' % len(lunlist))
    
    else:
        print('论语 has been decoded.')

# 诗经
def dec_shij():
    
    global shijlist
    
    if 'shij' not in decoded:
        print('decoding 诗经 ...')
        path = 'poetry/chinesepoetry/shijing/shijing.json'
        shijlist = read_json(path)
        decoded.append('shij')
        print('Successfully decoded: 诗经')
        print('Total number: %d' % len(shijlist))
    
    else:
        print('诗经 has been decoded.')

# 幽梦影
def dec_you():
    
    global youlist
    
    if 'you' not in decoded:
        print('decoding 幽梦影 ...')
        path = 'poetry/chinesepoetry/youmengying/youmengying.json'
        youlist = read_json(path)
        decoded.append('you')
        print('Successfully decoded: 幽梦影')
        print('Total number: %d' % len(youlist))
    
    else:
        print('幽梦影 has been decoded.')

# 四书五经
def dec_siwu():
    
    global siwulist
    
    if 'siwu' not in decoded:
        siwulist = []
        print('decoding 四书五经 ...')
        namelist = ['daxue', 'mengzi', 'zhongyong']
        for name in namelist:
            path = 'poetry/chinesepoetry/sishuwujing/%s.json' % name
            siwulist.append(read_json(path))
        decoded.append('siwu')
        print('Successfully decoded: 四书五经')
        print('Total number: %d' % len(siwulist))
    
    else:
        print('四书五经 has been decoded.')

# 蒙学
def dec_meng():
    
    global menglist
    
    if 'meng' not in decoded:
        menglist = []
        print('decoding 蒙学 ...')
        namelist = ['sanzijing-new', 'baijiaxing', 'qianziwen', 'dizigui', 'youxueqionglin', 'zhuzijiaxun', 'qianjiashi', 'guwenguanzhi', 'tangshisanbaishou', 'shenglvqimeng', 'wenzimengqiu', 'zengguangxianwen']
        for name in namelist:
            path = 'poetry/chinesepoetry/mengxue/%s.json' % name
            menglist.append(read_json(path))
        decoded.append('meng')
        print('Successfully decoded: 蒙学')
        print('Total number: %d' % len(menglist))
    
    else:
        print('蒙学 has been decoded.')

# 纳兰性德诗集
def dec_na():
    
    global nalist
    
    if 'na' not in decoded:
        print('decoding 纳兰性德诗集 ...')
        path = 'poetry/chinesepoetry/nalanxingde/纳兰性德诗集.json'
        nalist = read_json(path)
        decoded.append('na')
        print('Successfully decoded: 纳兰性德诗集')
        print('Total number: %d' % len(nalist))
    
    else:
        print('纳兰性德诗集 has been decoded.')

# 曹操诗集
def dec_caocao():
    
    global caocaolist
    
    if 'caocao' not in decoded:
        print('decoding 曹操诗集 ...')
        path = 'poetry/chinesepoetry/caocaoshiji/caocao.json'
        caocaolist = read_json(path)
        decoded.append('caocao')
        print('Successfully decoded: 曹操诗集')
        print('Total number: %d' % len(caocaolist))
    
    else:
        print('曹操诗集 has been decoded.')

# 楚辞
def dec_chuci():
    
    global chucilist
    
    if 'chuci' not in decoded:
        print('decoding 楚辞 ...')
        path = 'poetry/chinesepoetry/chuci/chuci.json'
        chucilist = read_json(path)
        decoded.append('chuci')
        print('Successfully decoded: 楚辞')
        print('Total number: %d' % len(chucilist))
    
    else:
        print('楚辞 has been decoded.')

# 元曲
def dec_qv():
    
    global qvlist
    
    if 'qv' not in decoded:
        qvlist = []
        print('decoding 元曲 ...')
        path = 'poetry/chinesepoetry/yuanqu/yuanqu.json'
        qvlist = read_json(path)
        decoded.append('qv')
        print('Successfully decoded: 元曲')
        print('Total number: %d' % len(qvlist))
    
    else:
        print('元曲 has been decoded.')

# 全唐诗平仄
def dec_pzOtang():
    
    global pzOtanglist
    
    if 'pzOtang' not in decoded:
        pzOtanglist = []
        print('decoding 全唐诗平仄 ...')
        for namenum in range(0, 57001, 1000):
            path = 'poetry/chinesepoetry/strains/json/poet.tang.%d.json' % namenum
            pzOtanglist += read_json(path)
        decoded.append('pzOtang')
        print('Successfully decoded: 全唐诗平仄')
        print('Total number: %d' % len(pzOtanglist))
    
    else:
        print('全唐诗平仄 has been decoded.')

# 全宋诗平仄
def dec_pzOsong():
    
    global pzOsonglist
    
    if 'pzOsong' not in decoded:
        pzOsonglist = []
        print('decoding 全宋诗平仄 ...')
        for namenum in range(0, 254001, 1000):
            path = 'poetry/chinesepoetry/strains/json/poet.song.%d.json' % namenum
            pzOsonglist += read_json(path)
        decoded.append('pzOsong')
        print('Successfully decoded: 全宋诗平仄')
        print('Total number: %d' % len(pzOsonglist))
    
    else:
        print('全宋诗平仄 has been decoded.')

# 全唐诗知名度
def dec_zmdOtang():
    
    global zmdOtanglist
    
    if 'zmdOtang' not in decoded:
        zmdOtanglist = []
        print('decoding 全唐诗知名度 ...')
        for namenum in range(0, 57001, 1000):
            path = 'poetry/chinesepoetry/rank/poet/poet.tang.rank.%d.json' % namenum
            zmdOtanglist += read_json(path)
        decoded.append('zmdOtang')
        print('Successfully decoded: 全唐诗知名度')
        print('Total number: %d' % len(zmdOtanglist))
    
    else:
        print('全唐诗知名度 has been decoded.')

# 全宋诗知名度
def dec_zmdOsong():
    
    global zmdOsonglist
    
    if 'zmdOsong' not in decoded:
        zmdOsonglist = []
        print('decoding 全宋诗知名度 ...')
        for namenum in range(0, 254001, 1000):
            path = 'poetry/chinesepoetry/rank/poet/poet.song.rank.%d.json' % namenum
            zmdOsonglist += read_json(path)
        decoded.append('zmdOsong')
        print('Successfully decoded: 全宋诗知名度')
        print('Total number: %d' % len(zmdOsonglist))
    
    else:
        print('全宋诗知名度 has been decoded.')

# 全宋词知名度
def dec_zmdOci():
    
    global zmdOcilist
    
    if 'zmdOci' not in decoded:
        zmdOcilist = []
        print('decoding 全宋词知名度 ...')
        for namenum in range(0, 21001, 1000):
            path = 'poetry/chinesepoetry/rank/ci/ci.song.rank.%d.json' % namenum
            zmdOcilist += read_json(path)
        decoded.append('zmdOci')
        print('Successfully decoded: 全宋词知名度')
        print('Total number: %d' % len(zmdOcilist))
    
    else:
        print('全宋词知名度 has been decoded.')

# 全唐诗作者
def dec_auOtang():
    
    global auOtanglist
    
    if 'auOtang' not in decoded:
        print('decoding 全唐诗作者 ...')
        path = 'poetry/chinesepoetry/json/authors.tang.json'
        auOtanglist = read_json(path)
        decoded.append('auOtang')
        print('Successfully decoded: 全唐诗作者')
        print('Total number: %d' % len(auOtanglist))
    
    else:
        print('全唐诗作者 has been decoded.')

# 全宋诗作者
def dec_auOsong():
    
    global auOsonglist
    
    if 'auOsong' not in decoded:
        print('decoding 全宋诗作者 ...')
        path = 'poetry/chinesepoetry/json/authors.song.json'
        auOsonglist = read_json(path)
        decoded.append('auOsong')
        print('Successfully decoded: 全宋诗作者')
        print('Total number: %d' % len(auOsonglist))
    
    else:
        print('全宋诗作者 has been decoded.')

# 全宋词作者
def dec_auOci():
    
    global auOcilist
    
    if 'auOci' not in decoded:
        print('decoding 全宋词作者 ...')
        path = 'poetry/chinesepoetry/ci/author.song.json'
        auOcilist = read_json(path)
        decoded.append('auOci')
        print('Successfully decoded: 全宋词作者')
        print('Total number: %d' % len(auOcilist))
    
    else:
        print('全宋词作者 has been decoded.')

# 南唐二主词作者
def dec_auOnan():
    
    global auOnanlist
    
    if 'auOnan' not in decoded:
        print('decoding 南唐二主词作者 ...')
        path = 'poetry/chinesepoetry/wudai/nantang/authors.json'
        auOnanlist = read_json(path)
        decoded.append('auOnan')
        print('Successfully decoded: 南唐二主词作者')
        print('Total number: %d' % len(auOnanlist))
    
    else:
        print('南唐二主词作者 has been decoded.')

# 南唐二主词介绍
def dec_introOnan():
    
    global introOnanlist
    
    if 'introOnan' not in decoded:
        print('decoding 南唐二主词介绍 ...')
        path = 'poetry/chinesepoetry/wudai/nantang/intro.json'
        introOnanlist = read_json(path)
        decoded.append('introOnan')
        print('Successfully decoded: 南唐二主词介绍')
        print('Total number: %d' % len(introOnanlist))
    
    else:
        print('南唐二主词介绍 has been decoded.')

# 唐诗三百首
def dec_tang300():
    
    global tang300list
    
    if 'tang300' not in decoded:
        print('decoding 唐诗三百首 ...')
        path = 'poetry/chinesepoetry/json/唐诗三百首.json'
        tang300list = read_json(path)
        decoded.append('tang300')
        print('Successfully decoded: 唐诗三百首')
        print('Total number: %d' % len(tang300list))
    
    else:
        print('唐诗三百首 has been decoded.')

# 宋词三百首
def dec_ci300():
    
    global ci300list
    
    if 'ci300' not in decoded:
        print('decoding 宋词三百首 ...')
        path = 'poetry/chinesepoetry/ci/宋词三百首.json'
        ci300list = read_json(path)
        decoded.append('ci300')
        print('Successfully decoded: 宋词三百首')
        print('Total number: %d' % len(ci300list))
    
    else:
        print('宋词三百首 has been decoded.')

# 除诗词平仄、知名度、作者外全部
def dec_AEpzai():
    
    dec_tang()
    dec_song()
    dec_ci()
    dec_hua()
    dec_nan()
    dec_lun()
    dec_shij()
    dec_you()
    dec_siwu()
    dec_meng()
    dec_na()
    dec_caocao()
    dec_chuci()
    dec_qv()

# 全部
def dec_A():
    
    dec_AEpzai()
    dec_pzOtang()
    dec_pzOsong()
    dec_zmdOtang()
    dec_zmdOsong()
    dec_zmdOci()
    dec_auOtang()
    dec_auOsong()
    dec_auOci()
    dec_auOnan()
    dec_introOnan()
    dec_tang300()
    dec_ci300()

# 询问要解码的文件并解码
def ask_dec():
    
    rlist = ask_dec_require()
    for r in rlist:
        eval('dec_%s()' % r)

# 获取单个词牌名
def get_tune_title(cidict):
    
    title = cidict['rhythmic']
    
    if '・' in title:
        tune_title = ''
        for title1 in title:
            if title1 != '・':
                tune_title += title1
            else:
                return tune_title
    
    else:
        return title

# 获取所有词牌名
def get_all_tune_titles():
    
    global all_tune_titles_dict
    
    dec_ci()
    
    all_tune_titles_dict = {}
    for cidictx in range(len(cilist)):
        cidict = cilist[cidictx]
        tune_title = get_tune_title(cidict)
        
        if tune_title in all_tune_titles_dict:
            all_tune_titles_dict[tune_title].append(cidictx)
        else:
            all_tune_titles_dict[tune_title] = [cidictx]

marktuple = (',', '。', '?', '!', ';') # 句末标点符号

# 获取一首词中所有句末字的索引-韵母和平仄的字典
def vowelpz(paras):
    
    charax_vowelpz_dict = {}
    for charax in range(len(paras)-2):
        if paras[charax+1] in marktuple:
            py = pinyin(paras[charax], style=Style.FINALS_TONE3, neutral_tone_with_five=True)[0][0]
            if int(py[-1]) <= 2:
                py = py[:-1] + '△'
            else:
                py = py[:-1] + '▲'
            charax_vowelpz_dict[charax] = py
    
    return charax_vowelpz_dict

# 获取一首词韵脚平韵或仄韵与索引的列表
def get_a_vowelpz(paras):
    
    charax_vowelpz_dict = vowelpz(paras)
    count_vowelpz = Counter(charax_vowelpz_dict.values())
    most_common_vowelpz = count_vowelpz.most_common(1)[0][0]
    vowelpz_name = most_common_vowelpz[-1]
    vowelpzx_list = []
    for charax in charax_vowelpz_dict:
        if charax_vowelpz_dict[charax] == most_common_vowelpz:
            vowelpzx_list.append(charax)
    
    return [vowelpz_name, vowelpzx_list]

# 获取单个词牌名韵脚平韵或仄韵与索引的列表
def get_atunetitle_vowelpz(cix_paras_dict):
    
    # 得到该词牌名词数
    cinum = len(cix_paras_dict)
    
    # 得到平韵或仄韵的列表和韵脚索引的列表
    vowelpz_name_list = []
    vowelpzx_list = []
    for cix in cix_paras_dict:
        paras = cix_paras_dict[cix]
        vowelpz_name_vowelpzx_list = get_a_vowelpz(paras)
        vowelpz_name_list.append(vowelpz_name_vowelpzx_list[0])
        vowelpzx_list += vowelpz_name_vowelpzx_list[1]
    
    # 得到平韵或仄韵
    count_vowelpz_name = Counter(vowelpz_name_list)
    most_common_vowelpz_name = count_vowelpz_name.most_common(1)[0][0]
    
    # 得到韵脚索引
    common_vowelpzx_list = []
    count_vowelpzx = Counter(vowelpzx_list)
    for vowelpzx in count_vowelpzx:
        if count_vowelpzx[vowelpzx] / cinum >= 0.4:
            common_vowelpzx_list.append(vowelpzx)
    
    return [most_common_vowelpz_name, common_vowelpzx_list]

missedwords = ('□', '︽', 'め', 'ど', 'け', '》', 'ホ', 'え')

# 获取指定单个词牌格律
def get_a_gelv(tune_title):
    
    cixlist = all_tune_titles_dict[tune_title]
    
    # 索引-词的字典
    cix_paras_dict = {}
    for cix in cixlist:
        cix_paras_dict[cix] = ''.join(cilist[cix]['paragraphs'])
    
    # 获取字数
    paras_len_list = []
    for cix in cix_paras_dict:
        paras_len = len(cix_paras_dict[cix])
        paras_len_list.append(paras_len)
    count_paras_len = Counter(paras_len_list)
    most_common_len = count_paras_len.most_common(1)[0][0]
    
    # 排除总字数不统一的词和缺字的词
    cix_paras_dict_copy = copy(cix_paras_dict)
    for cix in cix_paras_dict:
        paras = cix_paras_dict[cix]
        paras_len = len(paras)
        isneeded = True
        for noneedword in missedwords:
            if noneedword in paras:
                isneeded = False
        if paras_len != most_common_len or isneeded == False:
            del cix_paras_dict_copy[cix]
    cix_paras_dict = copy(cix_paras_dict_copy)
    
    # 平韵或仄韵和韵脚索引的列表
    vowelpz_name_vowelpzx_list = get_atunetitle_vowelpz(cix_paras_dict)
    vowelpz_name = vowelpz_name_vowelpzx_list[0]
    vowelpzx_list = vowelpz_name_vowelpzx_list[1]
    
    # 索引-格律的字典
    cix_gelv_dict = {}
    for cix in cix_paras_dict:
        para = cix_paras_dict[cix]
        pylist = pinyin(para, style=Style.TONE3, neutral_tone_with_five=True)
        tonepara = ''
        for pyx in range(len(pylist)):
            if pyx in vowelpzx_list:
                tonepara += vowelpz_name
            else:
                py = pylist[pyx]
                tonenum = py[0][-1] # 声调标识
                if tonenum.isdigit():
                    if int(tonenum) <= 2: # 阴平或阳平
                        tonepara += '○'
                    else: # 上声或去声
                        tonepara += '●'
                else: # 标点符号
                    tonepara += tonenum
            cix_gelv_dict[cix] = tonepara
    
    # 索引-同一位置的字的平仄的列表的字典
    chara_gelv_dict = {}
    for charax in range(most_common_len):
        chara_gelv_list = []
        try:
            for cix in cix_gelv_dict:
                chara_gelv = cix_gelv_dict[cix][charax]
                chara_gelv_list.append(chara_gelv)
        except IndexError:
            pass
        chara_gelv_dict[charax] = chara_gelv_list
    
    # 得到格律
    universal_gelv = ''
    for charax in chara_gelv_dict:
        chara_gelvlist = chara_gelv_dict[charax]
        count_chara_gelv = Counter(chara_gelvlist)
        chara_gelv_withweight_list = count_chara_gelv.most_common(2)
        if len(chara_gelv_withweight_list) == 1:
            if chara_gelv_withweight_list[0][1] == '○':
                chara_gelv_withweight_list.append(('●', 0))
            elif chara_gelv_withweight_list[0][1] == '●':
                chara_gelv_withweight_list.append(('○', 0))
            else:
                chara_gelv_withweight_list.append((0, 0))
        elif len(chara_gelv_withweight_list) == 0:
            continue
        highername = chara_gelv_withweight_list[0][0]
        lowername = chara_gelv_withweight_list[1][0]
        higherweight = chara_gelv_withweight_list[0][1]
        lowerweight = chara_gelv_withweight_list[1][1]
        weight_scale = higherweight / (higherweight + lowerweight)
        if highername in ('○', '●', '△', '▲'):
            if weight_scale >= 0.8:
                chara_gelv = highername
            else:
                chara_gelv = '⊙'
        else:
            chara_gelv = highername
        universal_gelv += chara_gelv
    
    return universal_gelv

# 获取一首诗或词的知名度
def get_a_zmd(zmdcidict):
    
    return zmdcidict['baidu'] + zmdcidict['so360'] + zmdcidict['google'] + zmdcidict['bing']

# 获取单个词牌名词中知名度最大者的索引
def get_max_zmd(tune_title):
    
    cixlist = all_tune_titles_dict[tune_title]
    zmd_list = []
    for cix in cixlist:
        zmdcidict = zmdOcilist[cix]
        zmd_list.append(get_a_zmd(zmdcidict))
    
    max_zmd = max(zmd_list)
    return zmd_list.index(max_zmd)

# 按句分行
def separate_para(para):
    
    return_para = ''
    
    # 句末添加换行符
    for chara in para:
        return_para += chara
        if chara in marktuple:
            return_para += '\n'
    
    return return_para

# 获取全部词牌格律并保存为txt文件
def cigelv():
    
    get_all_tune_titles()
    
    gelvfile = open(cwd + 'poetry/chinesepoetry/ci/cipaigelv.txt', 'w', encoding='UTF-8')
    print('"cipaigelv.txt" has been created.')
    
    donenum = 0
    
    dec_zmdOci()
    
    print('Writing in ... 0 / 1352')
    
    now = datetime.now()
    nowymd = (now.year, now.month, now.day)
    
    gelvfile.write('词牌格律\n')
    gelvfile.write('制作者:少陵野小Tommy\n')
    gelvfile.write('数据来源:https://github.com/chinese-poetry/\n')
    gelvfile.write('制作日期:%d/%d/%d\n\n' % nowymd)
    
    for tune_title in all_tune_titles_dict:
        try:
            gelvfile.write('%s\n' % tune_title)
            gelvfile.write(separate_para(get_a_gelv(tune_title)))
            example_cix = all_tune_titles_dict[tune_title][get_max_zmd(tune_title)]
            example_ci_dict = cilist[example_cix]
            example_ci = ''.join(example_ci_dict['paragraphs'])
            example_ci_author = example_ci_dict['author']
            example_ci_title = example_ci_dict['rhythmic']
            gelvfile.write('%s(%s《%s》)\n\n' % (separate_para(example_ci), example_ci_author, example_ci_title))
        except IndexError:
            gelvfile.write('【暂失】\n\n')
            continue
        donenum += 1
        print('Writing in ... %d / 1352' % donenum)
    
    gelvfile.close()
    print('Done.')

# 询问、获取并打印单个词牌格律
def ask_for_a_gelv():
    
    tune_title = input('输入词牌名:')
    get_all_tune_titles()
    gelv = get_a_gelv(tune_title)
    print(separate_para(gelv))

# 询问需要说明的JSON文件
def ask_intro_require():
    
    # A -> All
    # W -> With
    # O -> Of
    
    require = input('''A: 全部
tsWau: 全唐诗、全宋诗及其作者
ciWau: 全宋词及其作者
hua: 花间集
nanWai: 南唐二主词及其作者和介绍
lun: 论语
shij: 诗经
you: 幽梦影
siwu: 四书五经(仅《大学》《孟子》《中庸》)
meng: 蒙学(《三字经》《百家姓》《千字文》《弟子规》《幼学琼林》《朱子家训》《千家诗》《古文观止》《唐诗三百首》《声律启蒙》《文字蒙求》《增广贤文》)
na: 纳兰性德诗集
caocao: 曹操诗集
pzOts: 全唐诗、全宋诗平仄
zmdOtsc: 全唐诗、全宋诗、全宋词知名度
separate with blanks: ''')
    
    if 'A' in require:
        require = 'tsWau ciWau hua nanWai lun shij you siwu meng na caocao pzOts zmdOtsc'
    rlist = require.split()
    return rlist

# 获取说明文件路径
def intropath(rlist):
    
    pathdict = {
    'tsWau': 'json',
    'ciWau': 'ci',
    'hua': 'wudai/huajianji',
    'nanWai': 'wudai/nantang',
    'lun': 'lunyu',
    'shij': 'shijing',
    'you': 'youmengying',
    'siwu': 'sishuwujing',
    'meng': 'mengxue',
    'na': 'nalanxingde',
    'caocao': 'caocaoshiji',
    'pzOts': 'strains',
    'zmdOtsc': 'rank'
    }
    
    pathlist = []
    for r in rlist:
        try:
            rpath = cwd + 'poetry/chinesepoetry/%s/README.md' % pathdict[r]
            pathlist.append(rpath)
        except KeyError:
            print('"%s" is not a valid instruction.' % r)
    
    return pathlist

# 打开说明文件并打印
def openintro(pathlist):
    
    for path in pathlist:
        with open(path, 'r', encoding='UTF-8') as f:
            lines = f.readlines()
        print()
        for line in lines:
            if line != '\n':
                print(line)

# 获取指定JSON文件说明
def intro():
    
    rlist = ask_intro_require()
    pathlist = intropath(rlist)
    openintro(pathlist)

# 获取指定诗人的简介
def desc(name):
    
    for audict in aulist:
        if zhconv.convert(audict['name'], 'zh-cn') == name:
            if 'desc' in audict:
                return audict['desc']
            else:
                return audict['short_description']

# 指定诗人的诗的索引的列表
def poemx(name):
    
    poemxlist = []
    for poemdictx in range(len(poemlist)):
        poemdict = poemlist[poemdictx]
        if zhconv.convert(poemdict['author'], 'zh-cn') == name:
            poemxlist.append(poemdictx)
    return poemxlist

# 获取诗的索引的列表中不超过十首知名度最大的诗的索引的列表
def toppoemx(poemxlist):
    
    if len(poemxlist) <= 10:
        return poemxlist
    else:
        toppoemxlist = []
        for time in range(10):
            topx = max(poemxlist, key=lambda poemx: get_a_zmd(zmdlist[poemx]))
            toppoemxlist.append(poemxlist.index(topx))
            poemxlist.remove(topx)
        return toppoemxlist

# 获取指定单个诗人简介
def poet(name):
    
    poetintro = '姓名:%s\n' % name
    
    poetintro += '简介:%s\n' % zhconv.convert(desc(name), 'zh-cn')
    
    poetintro += '代表作:\n'
    toppoemxlist = toppoemx(poemx(name))
    if 'title' in poemlist[0]:
        titlekey = 'title'
    else:
        titlekey = 'rhythmic'
    for topx in range(len(toppoemxlist)):
        toppoemdictx = toppoemxlist[topx]
        toppoemdict = poemlist[toppoemdictx]
        title = zhconv.convert(toppoemdict[titlekey], 'zh-cn')
        paras = zhconv.convert('\n'.join(toppoemdict['paragraphs']), 'zh-cn')
        poetintro += '《' + title + '》\n' + paras + '\n'
    
    return poetintro

# 打印指定单个诗人简介
def ask_for_a_poet():
    
    global poemlist, aulist, zmdlist
    
    name = input('输入诗人或词人简体姓名:')
    
    d = input('''t: 唐诗
s: 宋诗
c: 宋词
input one letter: ''')
    
    if d == 't':
        dec_tang()
        dec_auOtang()
        dec_zmdOtang()
        poemlist = tanglist
        aulist = auOtanglist
        zmdlist = zmdOtanglist
    elif d == 's':
        dec_song()
        dec_auOsong()
        dec_zmdOsong()
        poemlist = songlist
        aulist = auOsonglist
        zmdlist = zmdOsonglist
    elif d == 'c':
        dec_ci()
        dec_auOci()
        dec_zmdOci()
        poemlist = cilist
        aulist = auOcilist
        zmdlist = zmdOcilist
    
    print(poet(name))

# 获取所有诗人简介并保存为txt文件
def poets():
    
    global poemlist, aulist, zmdlist
    
    dec_tang()
    dec_auOtang()
    dec_zmdOtang()
    dec_song()
    dec_auOsong()
    dec_zmdOsong()
    dec_ci()
    dec_auOci()
    dec_zmdOci()
    
    now = datetime.now()
    nowymd = (now.year, now.month, now.day)
    afterfiletitle = '\n制作者:少陵野小Tommy\n数据来源:https://github.com/chinese-poetry/\n制作日期:%d/%d/%d' % nowymd
    
    start_skip = input('''"tangchaoshiren":
press enter to start or input anything to skip: ''')
    if start_skip == '':
        # 唐朝诗人
        tangpoetfile = open(cwd + 'poetry/chinesepoetry/tangchaoshiren.txt', 'w', encoding='UTF-8')
        print('"tangchaoshiren.txt" has been created.')
        poemlist = tanglist
        aulist = auOtanglist
        zmdlist = zmdOtanglist
        donenum = 0
        print('Writing in ... 0 / 3675')
        tangpoetfile.write('唐朝诗人简介' + afterfiletitle)
        for audict in aulist:
            if 'author' in audict:
                name = zhconv.convert(audict['author'], 'zh-cn')
            else:
                name = zhconv.convert(audict['name'], 'zh-cn')
            poetintro = poet(name)
            tangpoetfile.write('\n\n' + poetintro)
            donenum += 1
            print('Writing in ... %d / 3675' % donenum)
        tangpoetfile.close()
    
    start_skip = input('''"songchaoshiren":
press enter to start or input anything to skip: ''')
    if start_skip == '':
        # 宋朝诗人
        songpoetfile = open(cwd + 'poetry/chinesepoetry/songchaoshiren.txt', 'w', encoding='UTF-8')
        print('"songchaoshiren.txt" has been created.')
        poemlist = songlist
        aulist = auOsonglist
        zmdlist = zmdOsonglist
        donenum = 0
        print('Writing in ... 0 / 8934')
        songpoetfile.write('宋朝诗人简介' + afterfiletitle)
        for audict in aulist:
            name = zhconv.convert(audict['name'], 'zh-cn')
            poetintro = poet(name)
            songpoetfile.write('\n\n' + poetintro)
            donenum += 1
            print('Writing in ... %d / 8934' % donenum)
        songpoetfile.close()
    
    start_skip = input('''"songchaociren":
press enter to start or input anything to skip: ''')
    if start_skip == '':
        # 宋朝词人
        cipoetfile = open(cwd + 'poetry/chinesepoetry/songchaociren.txt', 'w', encoding='UTF-8')
        print('"songchaociren.txt" has been created.')
        poemlist = cilist
        aulist = auOcilist
        zmdlist = zmdOcilist
        donenum = 0
        print('Writing in ... 0 / 1563')
        cipoetfile.write('宋朝词人简介' + afterfiletitle)
        for audict in aulist:
            name = zhconv.convert(audict['name'], 'zh-cn')
            poetintro = poet(name)
            cipoetfile.write('\n\n' + poetintro)
            donenum += 1
            print('Writing in ... %d / 1563' % donenum)
        cipoetfile.close()
    
    print('Done.')

if __name__ == '__main__':
    
    needs = input('''a: 得到全部词牌格律并保存在"poetry/chinesepoetry/ci/cipaigelv.txt"文件中
s: 得到指定单个词牌格律
d: JSON文件解码
i: 得到JSON文件说明
pa: 得到全部诗人简介并保存在"poetry/chinesepoetry/[tangchaoshiren/songchaoshiren/songchaociren].txt"文件中
ps: 得到指定单个诗人简介
separate with blanks: ''')
    needlist = needs.split()
    need_todo_dict = {
    'a': 'cigelv()',
    's': 'ask_for_a_gelv()',
    'd': 'ask_dec()',
    'i': 'intro()',
    'pa': 'poets()',
    'ps': 'ask_for_a_poet()'
    }
    for need in needlist:
        try:
            eval(need_todo_dict[need])
        except KeyError:
            print('"%s" is not a valid instruction.' % need)

运行这段代码,并按指示操作,即可获取某个诗人简介,或在相应文件夹内看到某朝代所有诗人简介的txt文件。
唐朝诗人简介(局部)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值