Model Representation
To establish notation for future use, we’ll use to denote the “input” variables (living area in this example), also called input features, and
to denote the “output” or target variable that we are trying to predict (price). A pair (
,
) is called a training example, and the dataset that we’ll be using to learn—a list of m training examples {
,
); i = 1, . . . , m};i=1,...,m—is called a training set. Note that the superscript “(i)” in the notation is simply an index into the training set, and has nothing to do with exponentiation. We will also use X to denote the space of input values, and Y to denote the space of output values. In this example, X = Y = ℝ.
To describe the supervised learning problem slightly more formally, our goal is, given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y. For historical reasons, this function h is called a hypothesis. Seen pictorially, the process is therefore like this:
When the target variable that we’re trying to predict is continuous, such as in our housing example, we call the learning problem a regression problem. When y can take on only a small number of discrete values (such as if, given the living area, we wanted to predict if a dwelling is a house or an apartment, say), we call it a classification problem.
1.监督学习中的函数模型
波特兰市房价,数据集有准确的size和price关系标签,适合作为监督学习。
这还是一个回归问题,因为要获得的结果是准确的连续数值。
分类问题做的是离散值的预测分析,比如0或1。
训练集符号说明
假设函数&&线性回归方程
hypothesis 假设
Linear regression with one variable 即一元线性方程