【系列笔记一】-USYD悉尼大学Data1002 Grok Module 3 课件作业 assignment讲解

最新推荐文章于 2024-10-30 16:52:22 发布

不二程序猿

最新推荐文章于 2024-10-30 16:52:22 发布

阅读量887

点赞数

分类专栏：悉尼大学 DATA1002 文章标签： python 恰饭程序人生经验分享

本文链接：https://blog.csdn.net/weixin_43773228/article/details/112802125

版权

Module 3 Text processing and data cleaning

这里有GROK-Module3 的全部内容，篇幅有点长，请有耐心看完。每一个大目录的最后一个小目录是程序小测验，算final成绩，会重点讲解。
Module3 一共有6大章节：1、Introduction 2、Transforming data 3、Filtering data 4、Filtering and transforming 5、Advanced filtering and transforming 6、Alternative transforms

文章目录

Module 3 Text processing and data cleaning
前言
Introduction
Pattern 1: Transforming data
Pattern 2: Filtering data
Pattern 3: Filtering and transforming
总结

前言

创作不易，拒绝抄袭，可以引用，标明出处。
小编会尽力去完善每一个知识点，如果有错误，漏掉的内容欢迎留言私信补充。

Introduction

In this module we will learn how to process text-based data. We start by looking at how to write programs that open and read from text files.
这一模块我们学习如何处理文本数据，从如何编写打开和读取文本文件开始。

From there, we will concentrate on two important concepts in the field of text processing: transforming and filtering. These two tasks are routinely applied in data cleaning and data mining。
有两个非常重要的概念：转换和过滤，经常用在数据清理和数据挖掘。

The patterns for this module are:这篇模块的内容
1.Transforming data 转换数据
2.Filtering data 过滤数据
3.Filtering and transforming 过滤和转换
4.Advanced filtering and transforming 高级过滤和转换

Pattern 1: Transforming data

Transforming data

Below we have a text file that contains the beginning of our novel.(only head）
（这是一个文本文件，包括的是小说开头。（只选取前六行））

pride_and_prejudice.txt

It
Is
A
Truth
Universally
Acknowledged

We want to transform each word such that it only contains lower case characters.
（我们将所有的字母转换成小写）

for word in open("pride_and_prejudice.txt"):
    word_new = word.lower()
    print(word_new)

When you run this code, you should see the following output:

it
is
a
truth
universally
acknowledged

Breaking it down

The first line in our example program initiates a so-called loop that runs through each line of the file. This is a standard syntax when working with files.
（程序的第一行是循环，贯穿文章的每一行，这是处理文件的标准语法。）

for word in open("pride_and_prejudice.txt"):

The loop variable plays a special role in the for statement: to this variable we assign each line from the file in turn.
（循环变量在for语句中起着特殊的作用：我们依次将文件中的每一行赋给这个变量。）