#Supervised: Data points have known outcome
#Unsupervised Daya points have unknown outcome
#Regression Outcome is continuous(numerical) #结果是连续的(数值)
#Classification Outcome is a category #结果是一个分类
#Supervised learning overview
#Data with answer + Model -> Fit -> Model
#Data without answers + Model -> Predict -> Predicted answers
#Regression:Numerical answers
#Movie data with revenue + Model -> Fit -> Model
#Movie data(unkown revenue(收入)) + Model -> Predict -> Predicted revenue
#Classification:categorical(分类) answers
#Labeled data + Model -> Fit -> Model
#Unlabeled data + Model -> Predict -> Labels
#Target:Predicted category or value of the data(column to predict)
#Feature:Properties of the data used for prediction(non-target columns)
#Label:the target value for a single data point
#What is need for classification?
#Model data with:
# Features that can be quantitated (可以量化的特征)
# Labels that are known
#Method to measure similarity(相似性度量方法)
#comparison of feature scaling methods
#Standard Scaler:Mean center data and scale to unit variance (平均中心数据和规模单位方差)
#Minimum-Maximum Scaler:Scale data to fixed range(usually 0-1)(将数据缩放到固定范围)
#Maximum Absolute Value Scaler:Scale maximum absolute value(最大绝对值东标)
#Import the class containing the scaling method
from sklearn.preprocessing import StandardScaler
#Create an instance(实例) of the class
StdSc = StandardScaler()
#Fit the scaling parameters and then transform the data (拟合缩放参数,然后转换数据)
StdSc = StdSc.fit(x_data)
x_scaled = KNN.transform(x_data)
#Other scaling methods exist:MinMaxScaler,MaxAbsScaler
#Charecteristics of a KNN model
#Fast to creat model because it simply stores data(快速创建模型,因为只存储数据)
#Slow to predict because many distance calculations(预测缓慢,因为有许多距离需要计算)
#Can require lots of memory if data set is large(如果数据量大,可能需要大量内存)
#K Nearest Neighbors:The Syntax
#Import the class containing the classification method
from sklearn.neighbors import KNeighborsClassifier
#Create an instance of the class
KNN = KNeighborsClassifier(n_neighbors=3)
#Fit the instance on the data and then predict the expected value(在数据上拟合实例,然后预测预期值)
KNN = KNN.fit(x_data, y_data)
y_predict = KNN.predict(x_date)
#Regression can be done with KneighborsRegressor