【系列笔记一】-USYD悉尼大学Data1002 Grok Module 3 课件 作业 assignment讲解

Module 3 Text processing and data cleaning

这里有GROK-Module3 的全部内容,篇幅有点长,请有耐心看完。每一个大目录的最后一个小目录是程序小测验,算final成绩,会重点讲解。
Module3 一共有6大章节:1、Introduction 2、Transforming data 3、Filtering data 4、Filtering and transforming 5、Advanced filtering and transforming 6、Alternative transforms



前言

创作不易,拒绝抄袭,可以引用,标明出处。
小编会尽力去完善每一个知识点,如果有错误,漏掉的内容欢迎留言私信补充。


Introduction

In this module we will learn how to process text-based data. We start by looking at how to write programs that open and read from text files.
这一模块我们学习如何处理文本数据,从如何编写打开和读取文本文件开始。


From there, we will concentrate on two important concepts in the field of text processing: transforming and filtering. These two tasks are routinely applied in data cleaning and data mining。
有两个非常重要的概念:转换和过滤,经常用在数据清理和数据挖掘。

The patterns for this module are:这篇模块的内容
1.Transforming data 转换数据
2.Filtering data 过滤数据
3.Filtering and transforming 过滤和转换
4.Advanced filtering and transforming 高级过滤和转换



Pattern 1: Transforming data

Transforming data

Below we have a text file that contains the beginning of our novel.(only head)
这是一个文本文件,包括的是小说开头。(只选取前六行

pride_and_prejudice.txt

It
Is
A
Truth
Universally
Acknowledged

We want to transform each word such that it only contains lower case characters.
我们将所有的字母转换成小写

for word in open("pride_and_prejudice.txt"):
    word_new = word.lower()
    print(word_new)

When you run this code, you should see the following output:

it
is
a
truth
universally
acknowledged


Breaking it down

The first line in our example program initiates a so-called loop that runs through each line of the file. This is a standard syntax when working with files.
程序的第一行是循环,贯穿文章的每一行,这是处理文件的标准语法。

for word in open("pride_and_prejudice.txt"):

The loop variable plays a special role in the for statement: to this variable we assign each line from the file in turn.
循环变量在for语句中起着特殊的作用:我们依次将文件中的每一行赋给这个变量。


Transform each wo

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值