Introduction to Machine Learning

Chapter1 Introduction

1.1 What Is Machine Learning?

To solve a problem on acomputer, we need an algorithm. An algorithm is a sequence ofinstructions that should be carried out to transform the input tooutput. For example, one can devise an algorithm for sorting. Theinput is a set of numbers and the output is their ordered list. Forthe same task, there may be various algorithms and we may beinterested in finding the most efficient one, requiring the leastnumber of instructions or memory or both.

For some tasks, however, wedo not have an algorithm - for example, to tell spam emails fromlegitimate email. We know what the input is: an email document thatin the simplest case is a file of characters. We know what the outputshould be: a yes/no output indicating whether the message is spam ornot. We do not know how to transform the input to the output. Whatcan be considered spam changes in time and from individual toindividual.

What we lack in knowledge, wemake up for in data. We can easily compile thousands of examplemessages some of which we know to be spam and what we want is to“learn” what constitutes spam from them. In other words, we wouldlike the computer (machine) to extract automatically the algorithmfor this task. There is no need to learn to sort numbers, we alreadyhave algorithms for that; but there are many applications for whichwe do not have an algorithm but do have example data.

With advances in computertechnology, we currently have the ability to store and process largeamounts of data, as well as to access it from physically distantlocations over a computer network. Most data acquisition devices aredigital now and record reliable data. Think, for example, of asupermarket chain that has hundreds of stores all over a countryselling thousands of goods to millions of customers. The point ofsale terminals record the details of each transactions: date,customer identification code, goods bought and their amount, totalmoney spent, and so forth. This typically amounts to gigabytes ofdata every day. What the supermarket chain wants it to be able topredict who are the likely customers for a product. Again, thealgorithm for this is not evident; it changes in time and bygeographic location. The stored data becomes useful only when it isanalyzed and turned into information that we can make use of, forexample, to make predictions.

We may not be able toidentify the process completely, but we believe we can construct agood and useful approximation. That approximation may not explaineverything, but may still be able to account for some part of thedata. We believe that thought identifying the complete process maynot be possible, we can still detect certain patterns orregularities. This is the niche of machine learning. Such patternsmay help us understand the process, or we can use those patterns tomake predictions: Assuming that the future, at least the near future,will not be much different from the past when the sample data wascollected, the future predictions can also be expected to be right.

Application of machinelearning methods to large databases is called data mining. Theanalogy is that a large volume of each and raw material is extractedfrom a mine, which when processed leads to a small amount of veryprecious material; similarly, in data mining, a large volume of datais processed to construct a simple model with valuable use, forexample, having high predictive accuracy. Its application areas areabundant: In addition to retail, in finance banks analyze their pastdata to build models to use in credit applications, fraud detection,and the stock market.

1.2.5 Reinforcement Learning

In some applications, theoutput of the system is a sequence of action. In such a case, asingle action is not important; what is important is the policy thatis the sequence of correct actions to reach the goal. There is nosuch thing as the best action in any intermediate state; an action isgood if it is part of a good policy. In such a case, the machinelearning program should be able to assess the goodness of policiesand learn from past good action sequences to be able to generate apolicy. Such learning methods are called reinforcement learningalgorithms.

Chapter2 Supervised Learning

We discuss supervisedlearning starting from the simplest case, which is learning a classfrom its positive and negative examples. We generalize and discussthe case of multiple classes, then regression, where the outputs arecontinuous.

2.1 Learning a Class from Examples

Let us say we want to learnthe class, C, of a “family car”. We have a set of examples ofcars, and we have a group of people that we survey to whom we showthese cars. The people look at the cars and label them; the cars thatthey believe are family cars are positive examples, and the othercars are negative examples. Class learning is finding a descriptionthat is shared by all positive examples. Class learning is finding adescription that is shared by all positive examples and none of thenegative examples. Doing this, we can make a prediction: Given a carthat we have not seen before, by checking with the descriptionlearned, we will be able to say whether it is a family car or not. Orwe can do knowledge extraction: This study may be sponsored by a carcompany, and the aim may be to understand what people expect from afamily car.

Chapter3 Bayesian Decision Theory

We discuss probability theoryas the framework for making decisions under uncertainty. Inclassification, Bayes' rule is used to calculate the probabilities ofthe classes. We generalize to discuss how we can make rationaldecisions among multiple actions to minimize expected risk. We alsodiscuss learning association rules from data.

3.1 Introduction

Programming computers to makeinference from data is a cross between statistics and computerscience, where statisticians provide the mathematical framework ofmaking inference from data and computer scientists work on theefficient implementation of the inference methods.

Data comes from a processthat is not completely known. This lack of knowledge is indicated bymodeling the process as a random process. Maybe the process isactually deterministic, but because we do not have access to completeknowledge about it, we model it as random and use probability theoryto analyze it. At this point, it may be a good idea to jump theappendix and review basic probability theory before continuing withthis chapter.

Chapter4 Parametric Methods

Having discussed how to makeoptimal decisions when the uncertainty is modeled usingprobabilities, we now see how we can estimate these probabilitiesfrom a given training set. We start with the parametric approach forclassification and regression. We discuss the semiparametric and nonparametric approaches in later chapters. We introduce bias/variancedilemma and model selection methods for trading off model complexityand empirical error.

4.1 Introduction

A statistic is any value thatis calculated from a given sample. In statistical inference, we makea decision using the information provided by a sample.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值