python马尔可夫链_3阶马尔可夫链 自然语言处理python

一、简介:

把每三个三个单词作为一个整体进行训练。

举一个例子:

input:

my dream is that I can be an engineer, so I design more applications for people to use.

my dream is that I can be a bird, so I can fly to everywhere I want.

itis also my dream that I can be a house, so I can warm you in the cold winter.

生成的马尔可夫链:

{'START': ['my dream is'], 'my dream is': ['that i can'], 'dream is that': ['i can be'], 'is that i': ['can be a'], 'that i can': ['be a house,'], 'i can be': ['a house, so'], 'can be an': ['engineer, so i'], 'be an engineer,': ['so i design'], 'an engineer, so': ['i design more'], 'engineer, so i': ['design more applications'], 'so i design': ['more applications for'], 'i design more': ['applications for people'], 'design more applications': ['for people to'], 'more applications for': ['people to use.\nmy'], 'applications for people': ['to use.\nmy dream'], 'for people to': ['use.\nmy dream is'], 'people to use.\nmy': ['dream is that'], 'to use.\nmy dream': ['is that i'], 'use.\nmy dream is': ['that i can'], 'can be a': ['house, so i'], 'be a bird,': ['so i can'], 'a bird, so': ['i can fly'], 'bird, so i': ['can fly to'], 'so i can': ['warm you in'], 'i can fly': ['to everywhere i'], 'can fly to': ['everywhere i want.\nit'], 'fly to everywhere': ['i want.\nit is'], 'to everywhere i': ['want.\nit is also'], 'everywhere i want.\nit': ['is also my'], 'i want.\nit is': ['also my dream'], 'want.\nit is also': ['my dream that'], 'is also my': ['dream that i'], 'also my dream': ['that i can'], 'my dream that': ['i can be'], 'dream that i': ['can be a'], 'be a house,': ['so i can'], 'a house, so': ['i can warm'], 'house, so i': ['can warm you'], 'i can warm': ['you in the'], 'can warm you': ['in the cold'], 'warm you in': ['the cold winter.'], 'you in the': ['cold winter.'], 'in the cold': ['winter.'], 'END': ['the cold winter.', 'winter.', 'cold winter.']}

生成的文本:

my dream is that i can be a house, so i can warm you in the cold winter.

代码:

# I sperate the input.txt with space, and use dictionary to store the next three words after the current 3 words.

# in the same time, store the first three word as the beginning, and the last three or two or one words as the end

# how to generate output.txt: form the start, start to look for the next three words in ramdom, once meets the end, the geration is end.

import random

fhand=open("E:\\a2.txt",'r',encoding='UTF-8')

dataset_file=fhand.read()

# dataset_file='my friend makes the best raspberry pies'

dataset_file=dataset_file.lower().split(' ')

model={}

for i, word in enumerate(dataset_file):

if i == len(dataset_file) - 3:

model['END'] = model.get('END', []) + [dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]]

model['END'] = model.get('END', []) + [dataset_file[i + 2]]

model['END'] = model.get('END', []) + [dataset_file[i + 1] +" "+dataset_file[i + 2]]

elif i == 0:

model['START'] = model.get('START', []) + [dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]]

# model['START']=model.get('START',[])+[dataset_file[i]]

# model['START']=model.get('START',[])+[dataset_file[i]+" "+dataset_file[i+1]]

model[dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]] = model.get(word, []) + [

dataset_file[i + 3] + " " + dataset_file[i + 4] + " " + dataset_file[i + 5]]

elif i <= (len(dataset_file) - 6):

model[dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]] = model.get(word, []) + [

dataset_file[i + 3] + " " + dataset_file[i + 4] + " " + dataset_file[i + 5]]

elif i == (len(dataset_file) - 5):

model[dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]] = model.get(word, []) + [

dataset_file[i + 3] + " " + dataset_file[i + 4]]

elif i == (len(dataset_file) - 4):

model[dataset_file[i] + " " + dataset_file[i + 1] + " " + dataset_file[i + 2]] = model.get(word, []) + [

dataset_file[i + 3]]

print(model)

generated = []

while True:

if not generated:

words = model['START']

elif generated[-1] in model['END']:

break

else:

words = model[generated[-1]]

generated.append(random.choice(words))

fhand=open("E:\output.txt",'a')

for word in generated:

fhand.write(word+" ")

print(word,end=' ')

Python标准模块--functools

1 模块简介 functools,用于高阶函数:指那些作用于函数或者返回其它函数的函数,通常只要是可以被当做函数调用的对象就是这个模块的目标. 在Python 2.7 中具备如下方法, cmp_to_ ...

自然语言26&lowbar;perplexity信息

http://www.ithao123.cn/content-296918.html 首页 > 技术 > 编程 > Python > Python 文本挖掘:简单的自然语言统计 ...

可爱的 Python &colon; Python中的函数式编程,第三部分

英文原文:Charming Python: Functional programming in Python, Part 3,翻译:开源中国 摘要:  作者David Mertz在其文章

Python 与 Javascript 之比较

最近由于工作的需要开始开发一些Python的东西,由于之前一直在使用Javascript,所以会不自觉的使用一些Javascript的概念,语法什么的,经常掉到坑里.我觉得对于从Javascript转 ...

python学习菜单

一.python简介 二.python字符串 三.列表 四.集合.元组.字典 五.函数 六.python 模块 七.python 高阶函数 八.python 装饰器 九.python 迭代器与生成器  ...

Python 与 Javascript 比较

最近由于工作的需要开始开发一些Python的东西,由于之前一直在使用Javascript,所以会不自觉的使用一些Javascript的概念,语法什么的,经常掉到坑里.我觉得对于从Javascript转 ...

时间序列算法理论及python实现(1-算法理论部分)

如果你在寻找时间序列是什么?如何实现时间序列?那么请看这篇博客,将以通俗易懂的语言,全面的阐述时间序列及其python实现. 就餐饮企业而言,经常会碰到如下问题. 由于餐饮行业是胜场和销售同时进行的, ...

MyFirstDay&lpar;附6篇python亲历面试题&rpar;

一直以来都是在看别人写的内容,学习前辈们的经验,总感觉自己好像没有什么值得拿出来分享和交流的知识,最近在准备换工作(python后端开发),坐标上海,2019年3月,半个月面了6家(感觉效率是真不高. ...

Python 练习汇总

1. Python练习_Python初识_day1 2. Python练习_Python初识_day2 3. Python练习_初识数据类型_day3 4. Python练习_数据类型_day4 5. ...

随机推荐

CSS3初学篇章&lowbar;2(伪类选择符)

id与class选择符 id与class选择符都是自定义标签名字的选择符,但id是唯一的,class却可重复使用. id选择符以"#"定义 class选择符以".&quo ...

InfluxDB Cli中查询结果中time格式显示设置

InfluxDB Cli中,time默认显示为19位时间戳格式,平时查询起来特不方便,那么,如何设置成为我们人类能看懂的时间格式呢? 方法有二: 1.$ influx -precision rfc33 ...

C语言学习笔记 -冒泡排序

//冒泡排序 void main(){ , , , , }; ]); ; i

MAMP Pro3&period;5注册码

MAMP这个就不用介绍了,堪称MAC下的苏菲玛索,官方下载地址:https://www.mamp.info/en/mamp-pro/   ,400多大洋,土豪朋友请直接购买吧,正版还是要支持的. 和我 ...

Oracle&bsol;PLSQL Developer报&OpenCurlyDoubleQuote;动态执行表不可访问,本会话的自动统计被禁止”的解决方案

现象: 第一次用PLSQL Developer连接数据库,若用sys用户登录并操作则正常,若用普通用户比如haishu登录并创建一个表则报错“动态执行表不可访问,本会话的自动统计被禁止.在执行菜单里你 ...

javaweb代码生成器&comma;专注于javaweb项通用目的代码生成器

该项目为javaWEB项目通用代码生成器,根据数据库表和自定义代码模板生成相应的jsp,js,java文件,生成到指定路径下,javaweb项目开发利器: 项目开源地址:https://gitee.c ...

&lbrack;Swift&rsqb;LeetCode706&period; 设计哈希映射 &vert; Design HashMap

Design a HashMap without using any built-in hash table libraries. To be specific, your design should ...

Set集合中的treeSet问题:cannot be cast to java&period;lang&period;Comparable;

使用TreeSet保存自定义对象时, 必须让定义对象的类实现Comparable接口,并重写compareTo()方法 否则报 实体类User:cannot be cast to java.lang. ...

49&period;CSS--- 特殊css3效果展示

1.设置多行文本超出显示省略号

显示超过两行就显示省略号,其余隐藏,隐藏不了的情况下给这个模块添加一个高度和 ...

hibernate连接mysql,查询条件中有中文时,查询结果没有记录,而数据库有符合条件的记录(解决方法)

今天在另一台服务器上重新部署了网站,结果出现了以下问题: ——用hibernate做mysql的数据库连接时,当查询条件中有中文的时候,查询结果没有记录,而数据库中是存在符合条件的记录的. 测试了以下 ...

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值