自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(20)
  • 收藏
  • 关注

原创 ValueError: invalid literal for int() with base 10: ‘2021-04-05T00:00:00.000+00:00‘

经过haggle with gpt 3.5 一番之后,它终于能给出了正确的代码,如下。将如标题中的时间提取出来是,书上给的代码是。

2024-04-29 22:03:52 96

原创 复制粘贴学NLP

文科生学NLP

2024-04-07 07:51:21 700

原创 文本清理代码快速查找(copy and paste series)

文本清理代码快速查找(copy and paste series)去掉标点text = re.sub("[\s+.!/_,$%^(+"’]+|[+——!,。?、~@#¥%……&()]+",“”,text)新的改变我们对Markdown编辑器进行了一些功能拓展与语法支持,除了标准的Markdown编辑器功能,我们增加了如下几点新功能,帮助你用它写博客:全新的界面设计 ,将会...

2020-02-10 14:01:03 406

原创 网页爬虫权威指南 (chap1-2)(web scraping with python, 2e. by Ryan Mitchell)

Chapter 1 Begining to Scrapefrom urllib.request import urlopenhtml = urlopen(‘http://www.chinadaily.com.cn/a/202002/07/WS5e3c81dea310128217275978**.html’**)print(html.read())from urllib.request im...

2020-02-07 11:33:21 423

转载 中文情感分析数据

情感分析资源大全(语料、词典、词嵌入、代码) 原创 ...

2020-02-06 20:54:01 4856

转载 中文数据集

中文NLP语料整理新闻文本...

2020-02-06 20:40:57 1444

原创 博客搬家

准备达里搬运各大网站的博客到此处。颤抖吧!

2020-02-06 20:23:33 87

转载 史上最全数据集网站汇总

如果你是一个初学者,你每完成一个新项目后自身能力都会有极大的提高,如果你是一个有经验的数据科学专家,你已经知道这里所蕴含的价值。 本文将为您提供一个网站/资源列表,从中你可以使用数据来完成你自己的数据项目,甚至创造你自己的产品。一.如何使用这些资源?如何使用这些数据源是没有限制的,应用和使用只受...

2020-02-06 19:53:02 900

原创 (转发)免费数据集下载(持续更新中...)

刚刚知道这个网站,记录下·https://blog.csdn.net/alec1987/article/details/69388699自然语言处理RCV1英语新闻数据20news 英语新闻数据First Quora Release Question Pairs 问答数据JRC Names各国语言专有实体名称Multi-Domain Sentiment V2.0LETOR 信息检索...

2020-02-06 19:33:04 518

原创 Recipe 6-3. Next Word Prediction

in this section, we will build an LSTM model to learn sequences of wordsfrom email data. We will use this model to predict the next word.file_content = pd.read_csv(‘spam.csv’, encoding = “ISO-8859-1...

2020-02-06 11:17:20 536

原创 Recipe 6-2. Classifying Text with Deep Learning

from unlocking text data with machine learning and deep learning using pythonProblemWe want to build a text classification model using CNN, RNN, and LSTM.SolutionThe approach and NLP pipeline woul...

2020-02-06 10:52:41 243

转载 CHAPTER 6 Deep Learning for NLP

In this chapter, we will implement deep learning for NLP:Recipe 1. Information retrieval using deep learningRecipe 2. Text classification using CNN, RNN, LSTMRecipe 3. Predicting the next word/sequ...

2020-02-06 08:57:40 259

原创 Step 6-3 Query enhancement/expansion

It is very important to understand the possible synonyms of the entities tomake sure search results do not miss out on potential relevance. Say, forexample, men’s shoes can also be called as male sh...

2020-02-06 08:53:34 161

原创 Recipe 5-5. Clustering Documents

Document clustering yet again includes similar steps, so let’s have a look atthem:TokenizationStemming and lemmatizationRemoving stop words and punctuationComputing term frequencies or TF-IDFCl...

2020-02-06 08:22:08 213

转载 text summerization :treerank and feature-based

Import BeautifulSoup and urllib libraries to fetch data from Wikipedia.from bs4 import BeautifulSoupfrom urllib.request import urlopenFunction to get data from Wikipediadef get_only_text(url):pag...

2020-02-06 07:54:38 374

原创 Text summarization

Text summarization is the process of making large documents into smallerones without losing the context, which eventually saves readers time. Thiscan be done using different techniques like the foll...

2020-02-05 22:08:43 150

原创 text processing

Import librariesfrom nltk.corpus import stopwordsfrom textblob import TextBlobfrom textblob import WordLower casing and removing punctuationsdf[‘Text’] = df[‘Text’].apply(lambda x: " “.join(x.low...

2020-02-05 20:42:39 367

转载 CHAPTER 4 Advanced Natural Language Processing

标题CHAPTER 4 Advanced Natural Language Processinghttps://doi.org/10.1007/978-1-4842-4267-4_4In this chapter, we are going to cover various advanced NLP techniquesand leverage machine learning algori...

2020-02-04 16:30:32 125

原创 fasttext for word nlp

from gensim.models import FastTextfrom sklearn.decomposition import PCAfrom matplotlib import pyplot#Example sentencessentences = [[‘I’, ‘love’, ‘nlp’],[‘I’, ‘will’, ‘learn’, ‘nlp’, ‘in’, ‘2’,‘mo...

2020-02-04 16:23:37 171

原创 慢慢学习着用吧

unlocking Text Data with Machine learning & Deep Learning Using Pythononly a few lines for now, more later when i am more farmiliar with this shit.But to train these models, it requires a huge...

2020-02-04 16:19:50 229

复制运行了 kaggle 上的ted talk 项目 除了最后的那及部分 其他已经通关

再自己的电脑上 复制运行了 kaggle 上的ted talk 项目。除了最后的那几部分由于无法连接外网,自己的机器配置也达不到外,其他简单的数据描述相关分析,简单的可视化等已经通关。 是以记之

2024-05-01

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除