Machine Learning A-Z(1)

Data preprocessing


数据获取地址: https://www.superdatascience.com/下载数据集/

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Data.csv')

X = dataset.iloc[:,:-1].values# 获取自变量,通过行列指定获取的位置,values获取数据,返回一个numpy
y = dataset.iloc[:, 3].values


"""
缺失数据的处理
直接删除:如果数据量很大,那么可以直接删除;如果数据量很小或者确实数据中其他参数含有比较重要的信息那么该方法会造成很大误差
根据未缺失数据填充
"""
from sklearn.preprocessing import Imputer
# Imputer方法用来对缺失数据进行处理
imputer = Imputer(missing_values='NaN', strategy='mean', axis=0)# 将标记为NaN的数据认为缺失数据,strategy表明填充策略,本句采用均值,axis指定取均值的位置
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])


"""
将数据中不为数字(字符串等)的数据进行标记
"""
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])# 对第一行的数据用十进制数进行标记
onehotencoder = OneHotEncoder(categorical_features=[0])# 将数据编码为one-hot形式
# 直接labelencoder数据会让数据带有大小区别,而且会在不同数据之间建立联系
# one-hot标记可以使欧式距离的计算更加方便
# 参考:https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f
X = onehotencoder.fit_transform(X).toarray()# 将数据转化为numpy
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)


"""
将数据集切分为训练集和测试集
"""
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)# 将数据切分为训练和测试集,比例通过test_size指定,random_state保证随机结果一直不变


"""
特征缩放
将不同数量级的数据缩放至同一数量级
加快收敛速度
"""
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
# 这里和课程中不一样,课程中将每个维度都进行了归一化,我认为前三个用作标记string的维度不需要归一化
X_train[:,3:]  = sc_X.fit_transform(X_train[:,3:])
X_test[:,3:] = sc_X.transform(X_test[:,3:])# 这里不用fit_transform是因为sklearn中的fit函数用来训练某个类需要的参数
                               # 经过X_train = sc_X.fit_transform(X_train)这一步,sc_X中存储着由X_train得到的均值等数据
                               # 为了让test与train归一化相同数值,所以用X_train得到的参数
               
Scala:Applied Machine Learning by Pascal Bugnion English | 23 Feb. 2017 | ISBN-13: 9781787126640 | 1843 Pages | EPUB/PDF (conv) | 33.15 MB Leverage the power of Scala and master the art of building, improving, and validating scalable machine learning and AI applications using Scala's most advanced and finest features. About This Book Build functional, type-safe routines to interact with relational and NoSQL databases with the help of the tutorials and examples provided Leverage your expertise in Scala programming to create and customize your own scalable machine learning algorithms Experiment with different techniques; evaluate their benefits and limitations using real-world financial applications Get to know the best practices to incorporate new Big Data machine learning in your data-driven enterprise and gain future scalability and maintainability Who This Book Is For This Learning Path is for engineers and scientists who are familiar with Scala and want to learn how to create, validate, and apply machine learning algorithms. It will also benefit software developers with a background in Scala programming who want to apply machine learning. What You Will Learn Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations Deploy scalable parallel applications using Apache Spark, loading data from HDFS or Hive Solve big data problems with Scala parallel collections, Akka actors, and Apache Spark clusters Apply key learning strategies to perform technical analysis of financial markets Understand the principles of supervised and unsupervised learning in machine learning Work with unstructured data and serialize it using Kryo, Protobuf, Avro, and AvroParquet Construct reliable and robust data pipelines and manage data in a data-driven enterprise Implement scalable model monitoring and alerts with Scala In Detail This Learning Path aims to put the entire world of machine learning with Scala in fron
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值