Machine learning system design - Data for machine learning

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第十二章《机器学习系统设计》中第98课时《机器学习数据》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助.
————————————————

In the previous video, we talked about evaluation metrics. I'd like to switch tracks a bit and touch on another important aspect of machine learning system design, which will often come up, which is the issue of how much data to train on. Now, in some earlier videos, I'd cautioned against blindly going out and just spending lots of time collecting lots of data, because it's only sometimes that would actually help. But it turns out that under certain conditions, and I will say in this video what those conditions are, getting a lot of data and training on a certain type of learning algorithm can be a very effective way to get a learning algorithm to do very good performance. And this arises often enough that if those conditions hold true for your problem and if you're able to get a lot of data, this could be a very good way to get a very high performance learning algorithm. So, in this video, let's talk more about that.

Let me start with a story. Many years ago, two researchers Michelle Banko and Eric Broule run the following fascinating study. They were interested in studying the effect of using different learning algorithms versus trying them out on different training set sciences.They were considering the problem of classifying between confusable words, so for example, in the sentence: for breakfast, I ate, should it be to, two or too? Well, for this example, for breakfast I ate two, 2 eggs. So, this is one example of a set of confusable words and that's a different set. So they took machine learning problems like these supervised learning problems to try to categorize what's the appropriate word to go into a certain position in an English sentence. They took a few different learning algorithms which were sort of considered state of the art back in the day, when they ran the study in 2001. So they took a variance on logistic regression called the Perceptron. They also took some of their algorithms. The Winnow algorithm is also very similar to a regression but different in some ways. And they used a naive Bayes algorithm which is something I'll actually talk about in this course. The exact details of these algorithms aren't important. But what they did was they varied the training set size and tried out these learning algorithms on the range of training set sizes and that's the result they got. The trends are very clear. First, most of these algorithms give remarkably similar performance. And second, as the training set size increases, on the horizontal axis is the training set size in millions, as you go from a hundred thousand up to a thousand million, that's a billion training examples, the performance of the algorithms all pretty much monotonically increase. And in fact that if you picked any algorithm, maybe pick a "inferior algorithm", but if you give that "inferior algorithm" more data, then from these examples, it looks like it will most likely beat even a "superior algorithm". So since this original study was very influential, there's been a range of many different studies showing similar results. That show that many different learning algorithms tend to, can sometimes, depending on details, can give pretty similar ranges of performance, but what can really drive performance is you can give the algorithm a ton of training data. These results like this has led to a saying in machine learning that often in machine learning, it's not who has the best learning algorithm that wins, it's who has the most data. So when is this true and when is this not true? Because we have a learning algorithm for which this is true then getting a lot of data is often maybe the best way to ensure that we have an algorithm with very high performance rather than debating worrying about exactly which of these algorithms to use.

Let's try to lay out a set of assumptions under which having a massive training set we think will be able to help. Let's assume that in our machine learning problem, the features x have sufficient information with which we can use to predict y accurately. For example, if we take the confusable words problem that we had in the previous slide. Let's say that its features x capture what are the surrounding words around the blank that we're trying to fill in. So the features capture that we want to have, with sentences for breakfast I ate __ eggs. Then that's pretty much information to tell me that the word I want in the middle is TWO and that is not the word TO or TOO. So the features capture what are the surrounding words that give me enough information to pretty unambiguously decide what is the label y, or in other words what is the word that should be used in that blank out of this set of three confusable words. So that's an example that the features x has sufficient information to predict y. For a counterexample. Consider a problem of predicting the price of a house from only the size of the house and from no other features. So, if you imagine I tell you that a house is 500 square feet, but I don't give you any other features. I don't tell you if the house is in an expensive part of the city. Or if I don't tell you the number of rooms in the house, or how nicely furnished the house is or whether the house is new or old. If I don't tell you anything other than that this is a 500 square foot house, there's so many other factors that would affect the price of a house. If all you know is the size, it's actually very difficult to predict the price accurately. So that will be a counterexample to this assumption that the features have sufficient information to predict the price to the desired level of accuracy. The way I think about testing this assumption, one way I often think about is, I often I ask myself: given the features x, given the features, given the same information available to a learning algorithm, if we were to go to human expert in this domain, can a human expert actually or can a human expert confidently predict the value of y. For this first example, if we go to an expert human English speaker, then the human expert English speaker would just probably be able to predict what word should go in here. So, a good English speaker can predict this well, so this gives me confidence that x allows us to predict y accurately. But in contrast, if we go to an expert in human prices. Maybe an expert realtor, someone who sells houses for a living. If I just tell them the size of a house and I tell them what the price is. Even an expert in pricing or selling houses would not be able to tell me. So this is a sign that for the housing price example knowing only the size doesn't give me enough information to predict the price of the house. So, let's say this assumption holds. Let's see then, when having a lot of data could help.

Suppose the features have enough information to predict the value of y. And let's suppose we use a learning algorithm with a large number of parameters so maybe logistic regression or linear regression with a large number of features. Or one thing that I sometimes do is using neural network with many hidden units. That would be another learning algorithm with a lot of parameters. So, these are all powerful learning algorithms with a lot of parameters that can fit very complex functions. I am going to think of these as low-bias algorithms because they can fit very complex functions. And because we have very powerful learning algorithm, changes are, if we run these algorithms on the data sets, it would be able to fit the training set well. And hopefully, the training error will be small (J_{train}(\theta )). Now, let's say we use a massive training set. In that case, if we a huge training set, then hopefully even though we have a lot of parameters, but if the training set is sort of even much larger than the number of parameters, then hopefully these algorithms will be unlikely to overfit. Because we have such a massive training set and by unlikely to overfit what that means is that the training error will hopefully be close to the test error (J_{train}(\theta ) \approx J_{test}(\theta )). Finally, putting these two together, if the training set error is small, and the test error is close to the test error, what this two together imply is that hopefully the test set error (J_{test}(\theta )) will also be small. Another way to think about this is that in order to have a high performance learning algorithm, we want it not to have high bias and not to have high variance. So, the bias problem we're going to address by making sure we have a learning algorithm with many parameters and so that gives us a low bias algorithm. And by using a very large training set, this ensures that we don't have a variance problem either. So hopefully our algorithm will have low variance. So by putting these two together, that we end up with a low bias and a low variance learning algorithm and this allows us to do well on the test set. And fundamentally, it's a key ingredients of assuming that the features have enough information and we have a rich class of functions, that's what guarantees low bias, and then it having a massive training set that is what gurantees low variance.

So this gives us a set of conditions rather hopefully some understanding of what's the sort of problem where if you have a lot of data and you train a learning algorithm with a lot of parameters, that might be a good way to give a high performance learning algorithm. And really, I think the key test that I often ask myself are: first, can a human expert look at the features x and confidently predict the value y? Because that's sort of a certificatin that y can be predicted accurately from the features x. And second, can we actually get a large training set, and training the learning algorithm with a lot of parameters in the training set and if you can do both then that more often give you a very high performance learning algorithm.

<end>

基于SSM框架的智能家政保洁预约系统,是一个旨在提高家政保洁服务预约效率和管理水平的平台。该系统通过集成现代信息技术,为家政公司、家政服务人员和消费者提供了一个便捷的在线预约和管理系统。 系统的主要功能包括: 1. **用户管理**:允许消费者注册、登录,并管理他们的个人资料和预约历史。 2. **家政人员管理**:家政服务人员可以注册并更新自己的个人信息、服务类别和服务时间。 3. **服务预约**:消费者可以浏览不同的家政服务选项,选择合适的服务人员,并在线预约服务。 4. **订单管理**:系统支持订单的创建、跟踪和管理,包括订单的确认、完成和评价。 5. **评价系统**:消费者可以在家政服务完成后对服务进行评价,帮助提高服务质量和透明度。 6. **后台管理**:管理员可以管理用户、家政人员信息、服务类别、预约订单以及处理用户反馈。 系统采用Java语言开发,使用MySQL数据库进行数据存储,通过B/S架构实现用户与服务的在线交互。系统设计考虑了不同用户角色的需求,包括管理员、家政服务人员和普通用户,每个角色都有相应的权限和功能。此外,系统还采用了软件组件化、精化体系结构、分离逻辑和数据等方法,以便于未来的系统升级和维护。 智能家政保洁预约系统通过提供一个集中的平台,不仅方便了消费者的预约和管理,也为家政服务人员提供了一个展示和推广自己服务的机会。同时,系统的后台管理功能为家政公司提供了强大的数据支持和决策辅助,有助于提高服务质量和管理效率。该系统的设计与实现,标志着家政保洁服务向现代化和网络化的转型,为管理决策和控制提供保障,是行业发展中的重要里程碑。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值