i2ds——数据导入笔记

最新推荐文章于 2022-06-12 11:36:14 发布

零级伪码农

最新推荐文章于 2022-06-12 11:36:14 发布

阅读量594

点赞数

分类专栏：笔记文章标签： r语言数据分析

本文链接：https://blog.csdn.net/weixin_46585008/article/details/109751645

版权

文章目录

Introduction
Paths and the working directory
The readr and readxl packages
- readr
- readxl
Exercise
Downloading files
R-base importing functions
Text versus binary files
Unicode versus ASCII
Organizing data with spreadsheets

Introduction

Currently, one of the most common ways of storing and sharing data for analysis is through electronic spreadsheets. A spreadsheet stores data in rows and columns. It is basically a file version of a data frame.
When creating spreadsheets with text files, like the ones created with a simple text editor, a new row is defined with return and columns are separated with some predefined special character. The most common characters are comma (,), semicolon (;), space ( ), and tab (a preset number of spaces or \t).
The first row contains column names rather than data. We call this a header, and when we read-in data from a spreadsheet it is important to know if the file has a header or not. Most reading functions assume there is a header. To know if the file has a header, it helps to look at the file before trying to read it. This can be done with a text editor or with RStudio. In RStudio, we can do this by either opening the file in the editor or navigating to the file location, double clicking on the file, and hitting View File.
Google Sheets and Microsoft Excel can’t be viewed with a text editor.

Paths and the working directory

A spreadsheet containing the US murders data is included as part of the dslabs package. Finding this file is not straightforward, but the following lines of code copy the file to the folder in which R looks in by default. We explain how these lines work below.

filename <- "murders.csv"   #文件名
dir <- system.file("extdata", package = "dslabs")   #dslabs包中extdata文件夹目录
fullpath <- file.path(dir, filename)    #文件所在的完整路径
file.copy(fullpath, "murders.csv")    #将上述文件拷贝到当前工作环境，并命名

在这里插入图片描述
This code does not read the data into R, it just copies a file. But once the file is copied, we can import the data with a simple line of code. Here we use the read_csv function from the readr package, which is part of the tidyverse.

library(tidyverse)
dat <- read_csv(filename)

-- Column specification ------
cols(
  state = col_character(),
  abb = col_character(),
  region = col_character(),
  population = col_double(),
  total = col_double()
)

The data is imported and stored in dat. The rest of this section defines some important concepts and provides an overview of how we write code that tells R how to find the files we want to import.

The filesystem

You can think of your computer’s filesystem as a series of nested folders, each containing other folders and files. Data scientists refer to folders as directories. We refer to the folder that contains all other folders as the root directory. We refer to the directory in which we are currently located as the working directory. The working directory therefore changes as you move through folders: think of it as your current location.

Relative and full paths

The path of a file

最低0.47元/天解锁文章

零级伪码农

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
i2ds——数据导入笔记

文章目录IntroductionPaths and the working directoryThe filesystemRelative and full pathsThe working directoryGenerating path namesCopying files using pathsThe readr and readxl packagesreadrreadxlExerciseDownloading filesR-base importing functionsText versus bi
复制链接

扫一扫