写在前面:这篇文章主要归纳了R的基本数据类型和语法,如题目所示,3天入门R不是梦,只需要有Python和Java的基础就行啦
#因为参考的是英文资料,就用英文整理的笔记,如造成阅读不方便,实在抱歉!
Data Type:
-
scalars: one value
-
vectors: a set of scalars arranged in a one-dimensional array; data values are all same mode;
-
c() // enter at the same time
-
scan() // enter one by one; stop: leave an entry blank
-
Vectors of same lengths can add up together, element-wise
-
Warning ! short vector add long vector, first element of short vector is append to end to make two vectors same length, then second ....
-
combine two vectors: c(v1, v2)
-
vectors with same entries : rep(entry, times)
-
rep(3,7)
[1] 3 3 3 3 3 3 3
-
factors: a special type of character vector;
-
create a char vector first then convert into factor
-
settings <- c("high", "medium", "low")
settings <- factor(settings)
# levels: high low medium ----> values of categorical variables
-
matrices: collections of data values in two dimensions;
-
declare: matrix(), takes arguments as a data vector and specification parameters, # of rows and cols
-
e.g. matrix(c(2,3,4,5)), nrow = 2, ncol = 2) or matrix with all 1 matrix(1, nrow = 2, ncol = 3)
-
if matrix needs more num to fullfill, then cycle the num already have from begining
-
-
arrays: a matrix with more than two dimensions;
-
lists: contain data with diff modes and encompass other data objects;
-
list()
-
e.g. list(5, 6, "seven", T)
-
List values are indexed with [[ ]]
-
-
data frames: like a spreadsheet;
-
each col is a vector; within each col, data elements must be the same mode;
-
diff vectors have diff modes;
-
all vectors must be same length
-
To create a data frame object, first create vectors that make up the data frame, e.g. v1, v2, v3 (equal length)
-
data.frame( col_name = v1, ...)
-
properties: type, value & data mode, variable name
Data Modes:
-
character: single o double quotations;
-
numeric
-
logical T/F
R-Syntax:
ctrl + shift+c: comment on or out
headers:
-
# level1 , ## level2 .etc
-
followed at least 4 #, =, -
Command
|
Syntax
|
Notes
|
|
Command
|
Syntax
|
Notes
|
Run script in console
|
ctrl + return
|
|
|
Vector
check vector
|
(2,4,5,6)
("s","d","e")
(T, F, T)
is.vector(var_name)
|
single dimension
even a single value is a vector
|
打印数字
Use function
|
a:b
seq(10)
seq(30,0,-3)
|
either increase or decrease
1 to 10
decrease with step
|
|
Matrix
|
matrix(c(2,3,4,5,6,7),
nrow = 2)
|
2 4 6
3 5 7
default: one go 1st row and next go second row
|
Assignment
Multiple assign
|
a <- 2
a <- b <- 2
|
a gets 2
|
|
byrow = T
|
c(2,3,4,5,6,7), nrow = 2, byrow = T
|
2 3 4
5 6 7
|
Assign multi-value
|
x <- c(1,2,3,4)
|
c = combine/concatenate
|
|
Array
|
array(
c(1:24), c(4, 3, 2))
|
give data
dimension
(row, col, tables)
|
Auto print
|
(x <- 3)
|
don't need extra x
|
|
Data Frame
|
cbind()
var_name var_name
value value
|
auto change double into char, even T,F
|
log
|
log(4)
log10(100)
|
base e
base 10
|
|
|
as.data.frame(cbind())
|
remains diff data type
|
Numeric
|
默认类型 double
|
double precision by default
|
|
List
|
list()
|
can have mix data type
list will show
index of each element
|
Check type
|
typeof(var_name)
mode()
|
same result as above
|
|
Coerce
强制改变类型
|
s <- c(1, "b, T)
o <- as.integer(5)
s1 <- as.numeric(c("1", "2"))
|
all become char
typeof(o) --> integer
typeof(s1) --> double
|
character |
"c"
"a long char"
|
no string in R
|
|
Coerce matrix to data frame
|
matrix(1:9, nrow= 3)
|
|
Logical
|
TRUE/FALSE
|
赋值时可以T/F
|
|
|
as.data.frame(matrix(1:9, nrow= 3))
|
|
Clear
environment
|
rm(list = ls())
|
|
|
Clear Packages
|
p_unload(all)
detach("package:datasets", unload = T)
|
for base
|
Clear plots
|
dev.off()
|
only if there is a plot
|
|
Clear Console
|
cat("\014")
|
ctrl + L
|
load package:
pacman::p_load(pacman, xx, xx, xx)
Pipes: break down complex syntax and make it easy to follow "
%>% "
Import data from csv doc or excel doc:
Graph:
bargraph()
bargraph(x, col="col_name")
-
index (red3, grey7)
-
index number
bargraph(x, col= colors() [507])
-
RGB (0-1) or (0-255)
-
hexcode
Working with Vectors:
-
Address specific element: variable_name[index]
-
Select a range: variable_name[start:end] inclusive both sides
-
Overwrite: e.g. z[3] <- 7
-
Sort data from small to large: sort()
-
Order of element: order() // according to the order after sorting, not original order.
z <- c(2,3,6,7,3,4)
order(z)
[1] 1 2 5 6 3 4 // sort(z) --> 2, 3, 3, 4, 6, 7
//order generates index of sorted vector
-
Extract subsets of data from vectors:
-
directly identify specifc elements and assign to new variable
z3 <- z[c(2, 3)] // elements with index 2 and 3
-
create a logical criterion to select certain elements
z3 <- z[z>100]
-
Get length of vector: length()
Working with Data Frames:
-
Address specific col/ vector: data_frame_name $vector_name
-
Address specific element of a data frame: data_frame_name $vector_name[index]
e.g. xy$x[2]
-
Add col or row to a data frame:
-
add row: rbind() e.g. df <- rbind(xy, w) // work for matrices as well
-
add col: cbind(), e.g. xyz <- cbind(xy, z)
-
-
Checking and Changing Types:
-
" is.what"
-
Check data object: is.vector(), is.data.frame()
-
Check data mode: is.character(), is.numeric()
-
" as.what", assign new data object to a variable
-
Change data object: e.g. y <- as.matrix(x)
-
Change data mode: e.g. numerical to char z <- as.character(x)
-
!!! illogical conversion, such as convert char to num, convert to NA values.
-
Missing Data
-
Using NA value
-
Computations performed on NA, NA carries to the reuslt. e.g. NA * 2 = NA
-
Check is a NA or not: is.na()
Listing and Deleting objects in Memory:
-
List current objects in current workspace memory
-
ls()
-
objects()
-
-
Remove specific object
-
rm(object1, object2,...)
-
Data Edit:
-
data.entry() ---> pop up a spreadsheet like table
Save Work:
-
save to file: save everything (command, output, etc)
-
savehistory: save commands and objects
-
format: *.Rhistory file
-
savehistory(file = "fileName.Rhistory")
-
including the command "savehistory()"
-
-
save image: save objects only
-
save.image() ---> create a .RData file
-
save.image(filename) ---> create a file with name
-
create a R workspace file
-
Load a previously saved workspace: load("directory")
-
Importing Files:
Read functions
-
read.table();
-
read in a flat file data file, ASCII text format.
-
Arguments:
-
file name, header=T,
-
fileEncoding="xxx" (optional)
-
row.names = xx (optional)or col.names = "xxx"
-
-
data frame object
-
separator:
-
sep = " ", sep = "\t" tab-delimited file
-
-
-
read.csv()
-
read comma delineated spreadsheet file data
-
-
scan()
-
used with an argument of a file name
-
import files of diff types
-
-
read.xlsx()
-
use library(xlsx)
-
argument: file_name, sheetName="xx", header = T
-
Get Help:
-
help(xx)
-
?xxx
-
apropos() : don't know exact name of what you're looking for
R - Programming:
Syntax
-
Semicolons separate statements: x <- 5; y <- 7
-
Comment: use #
-
Case sensitive
Arithmetic Operators:
-
+ - * / ^
Logical & Relational Operators:
-
flow control
-
order
-
selcetion
-
repeting
-
Conditional Operators:
-
if (condition is true)
then do this
-
can become one line
if (x<=y) z <- x+y
if(q<t) {w <- q+t} else w<-q-t #{} is not mandatory but good to have! don't use ()--> print w directly!
-
{} curly bracket sets are frequently used to block sections of code;
-
indicate code continues on the next line
Looping:
-
While
-
For
-
For loops can be nested.
-
in matrix, don't need to define data mode
Subsetting with Logical Operators:
-
use outcomes of a logical vector statement for subsettig a vector
-
Only elements where outcomes are equals True will be selected
Functions:
ceiling(5.4) --> 6
Writing Functions:
-
Address a col from a data.frame: data frame_name $ col_name
# e.g.
add_two_num <- function(num1, num2) {
num1 + num2}
Data Summary Functions in R:
-
summary() Function:
-
present descriptive statistics: like min, 1st Qu...
-
Graphics:
-
High level plotting functions
-
plot()
-
with single argument, like plot(x)
-
values in y-axis; indices of value on x-axis
-
-
xlim=range(a:b)
-
main="name" (main title)
-
-
Low-level plotting functions
-
Add additional information to an existing plot (such as lines)
-
title(main="xxx") OR title("xxxx") similar as arguments of high-level plotting functions
-
text(x, y, label="xxxx")
-
lines(x,y)
-
!! working with multiple plots, low-level plotting function used to apply to most recently added plot
-
-
Graphical parameter functions
-
control graphics window
-
fine-tune appearance of graphics with color, text and fonts
-
par():
-
split graphics screen to display more than one plot on graphic device at one time
-
use mfrow or mfcol parameters of par function
-
mfrow: draw plots in row order (row1, col 1; row1, col 2) horizontal display
-
mfcol: draw plots in col order (row 1, col 1; row 2, col 1) vertical display
-
-
High-Level
Lower-Level
Graphical Parameters
-
Histogram
-
hist()
-
argument, las = 1(顺时针90度): rotate labels on y-axis
-
las can take value (0, 1, 2, 3)
-
0: label parallel to axis(default)
-
1: horzontally
-
2: perpendicular 垂直 to axis
-
3: placed vertically
-
-
breaks=c(num1, num2...)
-
breaks=c(seq(begin_num, end_num, step)
-