《Learning From Data》第一章(二)读书笔记

1.3 Is learning feasible
学习可行性:目标函数f是机器学习的最终目的,由于它是未知的,如何在提供的有限信息中确定目标函数呢?
在学习过程中,目标函数f通过数据集合D学习得到,对于训练数据之外的事件不能保证预测结果,这样的过程只能说是在记忆,而不是学习,也就是说真正的学习过程就是在数据中学习且能预测未知事件结果。(书中举了一个例子大家可以学习下,理解上述过程)
一个学习系统是否成功的标准:预测值与真实值非常接近。
本节举了一个例子,假设瓶子里装了无限数量的红色和绿色小球,随机选取红色球的概率为u,绿色球的概率为1-u,随机选取N个独立的小球,其中红色球的比例v,我们是否能从v中得到真实红色球的分布值u?
根据上述假设,可以得出vu的关系度准则:Hoeffding Inequality
Hoeffding Inquality
其中,P[.]表示事件的概率,衡量vu的接近程度(记住Hoeffding Inequality 准则,下面内容推到原型),上述过程与假设学习过程相似,同样的问题,通过学习数据集得到目标函数,预测未知事件。
下图为加入概率的学习模型:
这里写图片描述
接下来作者引入两个误差准则:
这里写图片描述
样本内误差
这里写图片描述
样本外误差
根据Hoeffding Inequality可以写成:
这里写图片描述
样本内误差就像抽取概率v,依靠样本的随机变量,样本外误差就像球的分布概率u未知但不是随机。
两个误差符合:1、两个误差尽可能接近。2、样本内误差尽可能小。
1.4 Error and Noise
讨论误差和噪声之间的不同:
误差主要取决于假设与实际的接近程度,就像在指纹系统中有误识率或者错误接受率等等。而噪声就像高中时的电路物力实验,电流和电压的关系,本来是一条直线,但是正是因为电路中电阻什么的变化影响,导致输入电流与输出电压不一致。

earning Data Mining with Python - Second Edition by Robert Layton English | 4 May 2017 | ASIN: B01MRP7VFV | 358 Pages | AZW3 | 2.85 MB Key Features Use a wide variety of Python libraries for practical data mining purposes. Learn how to find, manipulate, analyze, and visualize data using Python. Step-by-step instructions on data mining techniques with Python that have real-world applications. Book Description This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations. What you will learn Apply data mining concepts to real-world problems Predict the outcome of sports matches based on past results Determine the author of a document based on their writing style Use APIs to download datasets from social media and other online services Find and extract good features from difficult datasets Create models that solve real-world problems Design and develop data mining applications using a variety of datasets Perform object detection in images using Deep Neural Networks Find meaningful insights from your data through intuitive visualizations Compute on big data, including real-time data from the internet About the Author Robert Layton is a data scientist working mainly on text mining problems for industries including the finance, information security, and transport sectors. He runs dataPipeline to build algorithms for practical use, and Eurekative, helping bringing start-ups to life in regional Australia. He has presented at the last four PyCon AU conferences, at multiple international research conferences, and has been training in some capacity for five years. He has a PhD in cybercrime analytics from the Internet Commerce Security Laboratory at Federation University Australia, where he was the Inaugural Young Alumni of the Year in 2014 and is currently and Honorary Research Fellow. You can find him on LinkedIn at https://www.linkedin.com/in/drrobertlayton and on Twitter at @robertlayton. Robert writes regularly on data mining and cybercrime, in a private, consultancy, and a research capacity. Robert is an Official Member of the Ballarat Hackerspace, where he helps grow the future-tech sector in regional Victoria.
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值