This is a small project in python using machine learning to detect whether a given text is spam or ham(not spam). I made this project after completing the course of “Applied Text Mining in Python” by the University of Michigan on Coursera, the link to the course is given at the end of the blog. Here, I’ll try to explain each step in my code and if you want the whole code, the GitHub link too will be available at the end.
这是python中的一个小项目,使用机器学习来检测给定的文本是垃圾邮件还是垃圾邮件(不是垃圾邮件)。 在完成密西根大学Coursera的“ Python中的应用文本挖掘”课程之后,我完成了这个项目,该课程的链接在博客的末尾给出。 在这里,我将尝试解释代码中的每个步骤,如果您需要整个代码,那么GitHub链接也将在最后可用。
I used my local machine for this project and the specification is mentioned below. Though its a very small project so you wouldn't need a lot of computation power for it.
我将本地计算机用于此项目,并且在下面提到了规范。 尽管这是一个很小的项目,所以您不需要很多计算能力。
规范 (Specification)
Name: Acer Predator Helios 300 (2019)
名称:宏cer捕食者Helios 300(2019)
Graphics Card: NVIDIA GeForce GTX 1660 Ti
显卡:NVIDIA GeForce GTX 1660 Ti
Processor Name: Intel Core i7–9750H
处理器名称:Intel Core i7–9750H
RAM: 16 GB
内存:16 GB
导入库 (Importing The Libraries)
I used Pandas and NumPy for data manipulation, matplotlib for graph plotting, and sklearn for preprocessing, model creation and model evaluation. I’ll explain what each library is doing when I use them in my code moving forward.
我使用Pandas和NumPy进行数据操作,使用matplotlib进行图形绘制,并使用sklearn进行预处理,模型创建和模型评估。 我将解释在以后的代码中使用它们时每个库的功能。
P.S. — %matplotlib notebook, is a jupyter notebook magic function.
PS —%matplotlib笔记本,是jupyter笔记本的魔术功能。
分析数据 (Analysing The Data)
Using the df.head(), df being the Pandas DataFrame o