C1W1-11_logistic-regression-testing

最新推荐文章于 2024-03-28 14:21:28 发布

cymx66688

最新推荐文章于 2024-03-28 14:21:28 发布

阅读量256

点赞数

分类专栏： CourseraNLP英文字幕翻译文章标签： nlp

本文链接：https://blog.csdn.net/cymx66688/article/details/107941391

版权

CourseraNLP英文字幕翻译专栏收录该内容

13 篇文章 0 订阅

订阅专栏

视频链接

Now that you have your data, you will use this data to predict our new data points. For example, given a new tweet, you will use this data to say whether this tweet is positive or negative. In doing so, you want to analyze whether your model generalizes well or not. In this video, we will show you whether your model generalizes well or not, and specifically, we’ll show you how to compute the accuracy of your model. Let’s take a look at how you can do this. For this, you will need X_val and Y_val. Data that was set-aside during trainings, also known as the validation sets and Theta, the sets of optimum parameters that you got from training on your data. First, you will compute the sigmoid function for X_val with parameters Theta, then you will evaluate if each value of h of Theta is greater than or equal to a threshold value, often set to 0.5.

现在你已经有了数据，你将使用这个数据来预测我们新的数据点。例如，给一个新的推特，你使用这个数据来看这条推特是积极的还是消极的。在这情况下，你想要分析你的模型是否可以很好地概括。在这个视频中，我们将给你展示你的模型是否可以很好地概括，具体来讲，我们将给你展示如何计算模型的准确率。让我们看看你要怎么做吧。对于这个，你需要验证集X和验证集Y。在训练中被搁置的数据，也被称为验证集。 $\theta$ 是你从数据训练中得到的一组最优参数。首先，你将对验证集X计算带有参数 $\theta$ 的sigmoid函数，然后你将评估带有 $\theta$ 的h每个值是否大于或等于阈值，阈值通常设为0.5。

testing-logistic-regression

For example, if your h X Theta is equal to the following vector, 0.5, 0.8, 0.5, etc., up to the number of examples from your validation set, you’re going to assert if each of its components is greater than or equal to 0.5. So is 0.3 greater than or equal to 0.5? No. So our first prediction is equal to 0. Is 0.8 greater than or equal to 0.5? Yes. So our prediction for the second example is 1. Is 0.5 greater than or equal to 0.5? Yes. So our third prediction is equal to 1, and so on. At the end, you will have a vector populated with zeros and ones indicating predicted negative and positive examples, respectively.

例如，如果你的 $h(X_val, \theta)$ 等于下面的向量0.5,0.8,0.5等等，直到验证集的示例数(即 $h(X_val, \theta)=\begin{matrix} [0.3&0.8&0.5&...&h_m]^T\end{matrix}$ )，你将断言它的每个部分是否大于等于0.5。所以0.3大于等于0.5吗？不。因此我们的第一个预测等于0。0.8大于等于0.5吗？是的。因此我们第二个样本预测是1。0.5大于等于0.5吗？是的。因此我们第三个预测等于1,等等。最后，你将会有一个填充了0和1的向量，分别表示预测的正例和负例。
testing-logistic-regression-2

After building the predictions vector, you can compute the accuracy of your model over the validation sets. To do so, you will compare the predictions you have made with the true value for each observation from your validation data. If the values are equal and your prediction is correct. This metric gives an estimate of the times that’s your logistic regression will correctly work on unseen data. So if your accuracy is equal to 0.5, it means that 50 percent of the time, your model is expected to work well.

在创建预测向量之后，你可以计算验证集上你的模型准确率。这样做的话，你将对你做的预测和验证数据中每一个观察的真实值作比较。如果值等于你的预测值，那么是正确的。这个度量给了一个你的逻辑回归将正确作用于未知数据上的次数估计。所以如果你的准确率等于0.5，这意味着50%的情况下，你的模型可以工作的很好。
testing-logistic-regression-3

For instance, if your Y_val and prediction vectors for five observations look like this, you’ll compare each of their values and determine whether they match or not. After that, you’ll have the following vector with a single 0 in the third position where the prediction and the label disagree. Next, you have to sum the number of times that your predictions were right and divide that number by the total number of observations in your validation sets. For example, you get an accuracy equal to 80 percent.
testing-logistic-regression-4

Congratulations on finishing the first week of this specialization. You learned many concepts this week. The first thing you learned is you learned how to preprocess a text. You learned how to extract features from that text. You learned how to use those extracted features and train a model using those. Then you learned how to test your model. In this week’s programming exercise, you’re going to get a chance to implement all these concepts that we spoke about. Feel free to go ahead and do the programming exercise. There’s also an optional video at the end of this week which covers the intuition behind the cost function for logistic regression. If you don’t want to watch that video, feel free to go to next week, where you will learn about a new class vocation …

恭喜你完成了这个专项的第一周。这周你学习了很多概念。第一件事你学习到了如何预处理一个文本。你学习如何从文本中抽取特征。你学习如何使用那些已抽取的特征，并且使用那些特征训练一个模型。然后你学习了如何预测你的模型。在这周的编程练习中，你将有一个机会来执行我们讲的所有这些概率。请继续做编程练习。在这周的最后也有一个可选的视频，涵盖了逻辑回归成本函数背后的直觉。如果你不想要看这个视频，请继续下周，你将会学习一个新的类别…