5001: Statistical Machine Learning I, Textbook: An Introduction to machine leatring.

最新推荐文章于 2024-07-10 23:29:18 发布

BinaryJack

最新推荐文章于 2024-07-10 23:29:18 发布

阅读量168

点赞数 1

分类专栏：香港城市大学课程笔记文章标签：统计学机器学习

本文链接：https://blog.csdn.net/qq_40473204/article/details/108339680

版权

香港城市大学课程笔记专栏收录该内容

3 篇文章 0 订阅

订阅专栏

First class

What is data?

Collection of data objects and their attributes
Object is also known as record, point, case, sample, entity or
instance
An attribute is a property or characteristic of an object
Examples: eye color of a person, age, height, weight
Attribute is also known as variable, field, characteristic, or feature
A collection of attributes that describe an object

Types of variables

Types of data

Data matrix

If data objects have the same fixed set of numeric variables, then the data objects can be thought of as points in a multidimensional space, where each dimension represents a distinct variable
Such data set can be represented by an n × p matrix, where there are n rows, one for each object, and p columns, one for each variable

Text data

Transaction data

Graph data

Data quality

Noise and outliers

Missing values

Sampling bias

Sample distortion arises from a mismatch between the random
sample and the population of interest

Convenience sample

（survival bias）
can only collect the information of the returned fighters

Population drift

the datasets of USA can not be used at the analysis of Hongkong
the datasets 3 months before on Google can not be used at the analysis 3 months later

What is data exploration?

Data exploration techniques

Summary statistics

Visualization

Sea surface temperature

Iris data

Histogram

2-d histogram

Boxplot

Scatter plot

Matrix plot

Iris similarity matrix

Parallel coordinates plot

Other visualization techniques

Star plots

Chernoff faces

1 Introduction

生僻单词

astrophysics
quadratic discriminant analysis model
demographic information
computationally infeasible
scalar

Content

Statistical learning

Statistical learning refers to a vast set of tools for understanding data. These tools can be classified as supervised or unsupervised.

supervised statistical learning

supervised statistical learning involves building a statistical model for predicting, or estimating, an output based on one or more inputs

unsupervised statistical learning

With unsupervised statistical learning, there are inputs but
no supervising output; nevertheless we can learn relationships and structure from such data.

three real-world data sets

Wage Data (predicting a continuous or quantitative output value. regression problem)

Stock Market Data (predicting whether a given day’s stock market performance will fall into the Up bucket or the Down bucket. classification problem)

Gene Expression Data (wish to understand which types of customers are similar to each other by grouping individuals according to their observed characteristics.clustering problem)

Notation and Simple Matrix Algebra

n

We will use n to represent the number of distinct data points, or observations, in our sample.

p

We will let p denote the number of variables that are
available for use in making predictions.

Variable Names

在这里插入图片描述

Xij(x小写，下标ij)

we will let Xij represent the value of the jth variable for the
ith observation, where i = 1, 2,…,n and j = 1, 2,…,p.
（另）
在这里插入图片描述

X

We let X denote a n×p matrix whose (i, j)th element is xij

yi

We use yi to denote the ith observation of the variable on which we
wish to make predictions

2 Statistical Learning

生僻单词

2.1 What Is Statistical Learning?

BinaryJack

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
5001: Statistical Machine Learning I, Textbook: An Introduction to machine leatring.

Statistical Machine Learning I1 Introduction生僻单词ContentStatistical learningsupervised statistical learningunsupervised statistical learningthree real-world data setsNotation and Simple Matrix AlgebranpVariable NamesXij(x小写，下标ij)Xyi2 Statistical Learning生僻单
复制链接

扫一扫