随记1_数据库连接、纠词pycorrect、python三种输出格式、拐点、繁体转简体

最新推荐文章于 2024-03-30 14:46:45 发布

weixin_44115638

最新推荐文章于 2024-03-30 14:46:45 发布

阅读量622

点赞数 14

文章标签： python 开发语言

本文链接：https://blog.csdn.net/weixin_44115638/article/details/136329446

版权

数据库连接并获取数据

Mysql数据库 和Oracle数据库

import pymysql
# import oracledb
 
# 建立与数据库的连接
conn = pymysql.connect(host='localhost', port=3306, user='root', password='password', database='mydatabase')
# conn = oracledb.connect(user='root',password='password',dsn="localhost:1521/mydatabase")
cursor = conn.cursor()
 
# 执行查询操作
sql = "SELECT * FROM mytable"
cursor.execute(sql)

# fetchall()查询所有数据，如果是update或者delete，则是commit()
result = cursor.fetchall()
# conn.commit()
print(result)
 
# 关闭连接
cursor.close()
conn.close()

中文纠错方法PyCorrector

开始使用PyCorrector的前提条件：

安装VC，记得选C++的桌面开发组件

在这里插入图片描述

transformer也得update至最新

pip install --upgrade transformers

T5模型：本项目基于PyTorch实现了用于中文文本纠错的T5模型，使用Langboat/mengzi-t5-base的预训练模型fine-tune中文纠错数据集，模型改造的潜力较大，效果好
Kenlm模型：本项目基于Kenlm统计语言模型工具训练了中文NGram语言模型，结合规则方法、混淆集可以纠正中文拼写错误，方法速度快，扩展性强，效果一般

pycorrect作为文本纠错工具，里面有很多封装好的类，其__init__函数如下，可以去到每个类，找到纠错的函数，一般为correct函数，调用即可。教程可参考：pycorrect主页

%% __init__函数
from pycorrector.confusion_corrector import ConfusionCorrector
from pycorrector.corrector import Corrector
from pycorrector.deepcontext.deepcontext_corrector import DeepContextCorrector
from pycorrector.detector import Detector
from pycorrector.detector import USER_DATA_DIR
from pycorrector.en_spell_corrector import EnSpellCorrector
from pycorrector.ernie_csc.ernie_csc_corrector import ErnieCscCorrector
from pycorrector.gpt.gpt_corrector import GptCorrector
from pycorrector.macbert.macbert_corrector import MacBertCorrector
from pycorrector.proper_corrector import ProperCorrector
from pycorrector.seq2seq.conv_seq2seq_corrector import ConvSeq2SeqCorrector
from pycorrector.t5.t5_corrector import T5Corrector
from pycorrector.utils import text_utils, tokenizer, io_utils, math_utils, evaluate_utils
from pycorrector.utils.evaluate_utils import eval_sighan2015_by_model_batch, eval_sighan2015_by_model
from pycorrector.utils.get_file import get_file
from pycorrector.utils.text_utils import (
    get_homophones_by_char,
    get_homophones_by_pinyin,
    traditional2simplified,
    simplified2traditional,
)
from pycorrector.version import __version__

%% test
import sys
# ProperCorrector用于成语、专名纠错
sys.path.append("..")
from pycorrector.proper_corrector import ProperCorrector

m = ProperCorrector()
x = [
    '报应接中迩来',
    '今天在拼哆哆上买了点苹果',
]

for i in x:
    print(i, ' -> ', m.correct(i))

# output:
报应接中迩来  ->  {'source': '报应接踵而来', 'target': '报应接踵而来', 'errors': [('接中迩来', '接踵而来', 2)]}
今天在拼哆哆上买了点苹果  ->  {'source': '今天在拼多多上买了点苹果', 'target': '今天在拼多多上买了点苹果', 'errors': [('拼哆哆', '拼多多', 3)]}

#t5纠错模型
import sys
from pycorrector.t5.t5_corrector import T5Corrector
nlp = T5Corrector("shibing624/mengzi-t5-base-chinese-correction")
i = "今天新情很好"
print(i, ' => ', nlp.correct(i))

# output:
今天新情很好  =>  {'source': '今天新情很好', 'target': '今天心情很好', 'errors': [('新', '心', 2)]}

三种输出方式

基本输出

# 浮点数输出
num = 3.1415926
print(".1f" % num) # 取1位小数

# 整数输出
num = 123
print("%d" %num)

# 字符串输出
print('My name is %s,my age is %d' % ("Lily",18))

format标准化输出

# 时间输出
import datetime
d = datetime.datetime(2022, 4, 29, 9, 52, 20)
print('{:%Y-%m-%d %H:%M:%S}'.format(d))

# 字符串输出
print('My name is {},my age is {}'.format('Lily',18))

f-String格式化输出

name = 'Lily'
age = 18
print(f"My name is {name},my age is {age}")

学习seaborn

拐点分析：

import numpy as np
#假设有一条曲线数据
x= np.linspace(0,10,100)
y= np.sin(x)

#计算导数
dy= np.gradient(y)

#寻找导数变化的点
changes = np.diff(np.sign(dy))

#找到对应的拐点
turning_points = np.where(changes)[0]
#打印拐点的坐标

for idx in turning_points:
	print(f"拐点坐标：({x[idx]}，{y[idx]})")

繁体转简体

import opencc
converter = opencc.OpenCC('t2s.json') #简体转繁体是‘s2t.json’
simplified_result = converter.convert(text)

weixin_44115638

关注

14
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
随记1_数据库连接、纠词pycorrect、python三种输出格式、拐点、繁体转简体

开始使用PyCorrector的transformer也得update至最新pycorrect作为文本纠错工具，里面有很多封装好的类，其__init__函数如下，可以去到每个类，找到纠错的函数，一般为correct函数，调用即可。
复制链接

扫一扫