coursera 机器学习 logistic regression 逻辑回归的项目

github : https://github.com/twomeng/logistic-regression-

ex1. m 

  1 %% Machine Learning Online Class - Exercise 2: Logistic Regression
  2 %
  3 %  Instructions
  4 %  ------------
  5 % 
  6 %  This file contains code that helps you get started on the logistic
  7 %  regression exercise. You will need to complete the following functions 
  8 %  in this exericse:
  9 %
 10 %     sigmoid.m
 11 %     costFunction.m
 12 %     predict.m
 13 %     costFunctionReg.m
 14 %
 15 %  For this exercise, you will not need to change any code in this file,
 16 %  or any other files other than those mentioned above.
 17 %
 18 
 19 %% Initialization
 20 clear ; close all; clc
 21 
 22 %% Load Data
 23 %  The first two columns contains the exam scores and the third column
 24 %  contains the label.
 25 
 26 data = load('ex2data1.txt');
 27 X = data(:, [1, 2]); y = data(:, 3);
 28 
 29 %% ==================== Part 1: Plotting ====================
 30 %  We start the exercise by first plotting the data to understand the 
 31 %  the problem we are working with.
 32 
 33 fprintf(['Plotting data with + indicating (y = 1) examples and o ' ...
 34          'indicating (y = 0) examples.\n']);
 35 
 36 plotData(X, y);
 37 
 38 % Put some labels 
 39 hold on;
 40 % Labels and Legend
 41 xlabel('Exam 1 score')
 42 ylabel('Exam 2 score')
 43 
 44 % Specified in plot order
 45 legend('Admitted', 'Not admitted')
 46 hold off;
 47 
 48 fprintf('\nProgram paused. Press enter to continue.\n');
 49 pause;
 50 
 51 
 52 %% ============ Part 2: Compute Cost and Gradient ============
 53 %  In this part of the exercise, you will implement the cost and gradient
 54 %  for logistic regression. You neeed to complete the code in 
 55 %  costFunction.m
 56 
 57 %  Setup the data matrix appropriately, and add ones for the intercept term
 58 [m, n] = size(X);
 59 
 60 % Add intercept term to x and X_test
 61 X = [ones(m, 1) X];
 62 
 63 % Initialize fitting parameters
 64 initial_theta = zeros(n + 1, 1);
 65 
 66 % Compute and display initial cost and gradient
 67 [cost, grad] = costFunction(initial_theta, X, y);
 68 
 69 fprintf('Cost at initial theta (zeros): %f\n', cost);
 70 fprintf('Expected cost (approx): 0.693\n');
 71 fprintf('Gradient at initial theta (zeros): \n');
 72 fprintf(' %f \n', grad);
 73 fprintf('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n');
 74 
 75 % Compute and display cost and gradient with non-zero theta
 76 test_theta = [-24; 0.2; 0.2];
 77 [cost, grad] = costFunction(test_theta, X, y);
 78 
 79 fprintf('\nCost at test theta: %f\n', cost);
 80 fprintf('Expected cost (approx): 0.218\n');
 81 fprintf('Gradient at test theta: \n');
 82 fprintf(' %f \n', grad);
 83 fprintf('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n');
 84 
 85 fprintf('\nProgram paused. Press enter to continue.\n');
 86 pause;
 87 
 88 
 89 %% ============= Part 3: Optimizing using fminunc  =============
 90 %  In this exercise, you will use a built-in function (fminunc) to find the
 91 %  optimal parameters theta.
 92 
 93 %  Set options for fminunc
 94 options = optimset('GradObj', 'on', 'MaxIter', 400);
 95 
 96 %  Run fminunc to obtain the optimal theta
 97 %  This function will return theta and the cost 
 98 [theta, cost] = ...
 99     fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);
100 
101 % Print theta to screen
102 fprintf('Cost at theta found by fminunc: %f\n', cost);
103 fprintf('Expected cost (approx): 0.203\n');
104 fprintf('theta: \n');
105 fprintf(' %f \n', theta);
106 fprintf('Expected theta (approx):\n');
107 fprintf(' -25.161\n 0.206\n 0.201\n');
108 
109 % Plot Boundary
110 plotDecisionBoundary(theta, X, y);
111 
112 % Put some labels 
113 hold on;
114 % Labels and Legend
115 xlabel('Exam 1 score')
116 ylabel('Exam 2 score')
117 
118 % Specified in plot order
119 legend('Admitted', 'Not admitted')
120 hold off;
121 
122 fprintf('\nProgram paused. Press enter to continue.\n');
123 pause;
124 
125 %% ============== Part 4: Predict and Accuracies ==============
126 %  After learning the parameters, you'll like to use it to predict the outcomes
127 %  on unseen data. In this part, you will use the logistic regression model
128 %  to predict the probability that a student with score 45 on exam 1 and 
129 %  score 85 on exam 2 will be admitted.
130 %
131 %  Furthermore, you will compute the training and test set accuracies of 
132 %  our model.
133 %
134 %  Your task is to complete the code in predict.m
135 
136 %  Predict probability for a student with score 45 on exam 1 
137 %  and score 85 on exam 2 
138 
139 prob = sigmoid([1 45 85] * theta);
140 fprintf(['For a student with scores 45 and 85, we predict an admission ' ...
141          'probability of %f\n'], prob);
142 fprintf('Expected value: 0.775 +/- 0.002\n\n');
143 
144 % Compute accuracy on our training set
145 p = predict(theta, X);
146 
147 fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
148 fprintf('Expected accuracy (approx): 89.0\n');
149 fprintf('\n');

plotData. m 

 1 function plotData(X, y)
 2 %PLOTDATA Plots the data points X and y into a new figure 
 3 %   PLOTDATA(x,y) plots the data points with + for the positive examples
 4 %   and o for the negative examples. X is assumed to be a Mx2 matrix.
 5 
 6 % Create New Figure
 7 figure; hold on;
 8 
 9 % ====================== YOUR CODE HERE ======================
10 % Instructions: Plot the positive and negative examples on a
11 %               2D plot, using the option 'k+' for the positive
12 %               examples and 'ko' for the negative examples.
13 %
14 
15 pos = find(y==1); % find() return the position of y == 1 
16 neg = find(y==0); 
17 
18 plot(X(pos,1),X(pos,2),'k+','LineWidth',2,'MarkerSize',7);
19 plot(X(neg,1),X(neg,2),'ko','LineWidth',2,'MarkerFaceColor','y','MarkerSize',7);
20 
21 
22 
23 
24 
25 
26 
27 % =========================================================================
28 
29 
30 
31 hold off;
32 
33 end

 

costFuction.m 

 1 function [J, grad] = costFunction(theta, X, y)
 2 %COSTFUNCTION Compute cost and gradient for logistic regression
 3 %   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
 4 %   parameter for logistic regression and the gradient of the cost
 5 %   w.r.t. to the parameters.
 6 
 7 % Initialize some useful values
 8 m = length(y); % number of training examples
 9 
10 % You need to return the following variables correctly 
11 J = 0;
12 grad = zeros(size(theta));
13 
14 % ====================== YOUR CODE HERE ======================
15 % Instructions: Compute the cost of a particular choice of theta.
16 %               You should set J to the cost.
17 %               Compute the partial derivatives and set grad to the partial
18 %               derivatives of the cost w.r.t. each parameter in theta
19 %
20 % Note: grad should have the same dimensions as theta
21 %
22 alpha = 1 ;
23 J = (-1/m) * ( y' * log(sigmoid(X * theta)) + ( 1-y )'* log(1-sigmoid(X*theta)));
24 grad = ( alpha / m ) * X'* (sigmoid( X * theta ) - y);
25 
26 
27 
28 
29 
30 
31 % =============================================================
32 
33 end

 

 内置函数fminuc() 

绘制边界函数: 

if size(X, 2) <= 3
    % Only need 2 points to define a line, so choose two endpoints 选择在坐标轴上的两个临界点
    plot_x = [min(X(:,2))-2,  max(X(:,2))+2];  

    % Calculate the decision boundary line theta(1) + theta(2) * x(1) + theta(3) * x(2) = 0 计算两个y在边界上的值
    plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));

    % Plot, and adjust axes for better viewing 两点确定一条直线
    plot(plot_x, plot_y)
    
    % Legend, specific for the exercise
    legend('Admitted', 'Not admitted', 'Decision Boundary')
    axis([30, 100, 30, 100])

predict. m 

 1 function p = predict(theta, X)
 2 %PREDICT Predict whether the label is 0 or 1 using learned logistic 
 3 %regression parameters theta
 4 %   p = PREDICT(theta, X) computes the predictions for X using a 
 5 %   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
 6 
 7 m = size(X, 1); % Number of training examples
 8 
 9 % You need to return the following variables correctly
10 p = zeros(m, 1);
11 
12 % ====================== YOUR CODE HERE ======================
13 % Instructions: Complete the following code to make predictions using
14 %               your learned logistic regression parameters. 
15 %               You should set p to a vector of 0's and 1's
16 %
17 
18 % cal h(x) -- predictions 
19 % pos = find(sigmoid( X * theta ) >= 0.5 );
20 % neg = find(sigmoid( x * theta ) < 0.5 );
21 p = sigmoid( X * theta ); 
22 for i = 1:m
23     if p(i) >= 0.5 
24         p(i) = 1;
25     else 
26         p(i) = 0;
27     end 
28 end 
29 % =========================================================================
30 end

 

ex2_reg. m 

  1 %% Machine Learning Online Class - Exercise 2: Logistic Regression
  2 %
  3 %  Instructions
  4 %  ------------
  5 %
  6 %  This file contains code that helps you get started on the second part
  7 %  of the exercise which covers regularization with logistic regression.
  8 %
  9 %  You will need to complete the following functions in this exericse:
 10 %
 11 %     sigmoid.m
 12 %     costFunction.m
 13 %     predict.m
 14 %     costFunctionReg.m
 15 %
 16 %  For this exercise, you will not need to change any code in this file,
 17 %  or any other files other than those mentioned above.
 18 %
 19 
 20 %% Initialization
 21 clear ; close all; clc
 22 
 23 %% Load Data
 24 %  The first two columns contains the X values and the third column
 25 %  contains the label (y).
 26 
 27 data = load('ex2data2.txt');
 28 X = data(:, [1, 2]); y = data(:, 3);
 29 
 30 plotData(X, y);
 31 
 32 % Put some labels
 33 hold on;
 34 
 35 % Labels and Legend
 36 xlabel('Microchip Test 1')
 37 ylabel('Microchip Test 2')
 38 
 39 % Specified in plot order
 40 legend('y = 1', 'y = 0')
 41 hold off;
 42 
 43 
 44 %% =========== Part 1: Regularized Logistic Regression ============
 45 %  In this part, you are given a dataset with data points that are not
 46 %  linearly separable. However, you would still like to use logistic
 47 %  regression to classify the data points.
 48 %
 49 %  To do so, you introduce more features to use -- in particular, you add
 50 %  polynomial features to our data matrix (similar to polynomial
 51 %  regression).
 52 %
 53 
 54 % Add Polynomial Features
 55 
 56 % Note that mapFeature also adds a column of ones for us, so the intercept
 57 % term is handled
 58 X = mapFeature(X(:,1), X(:,2));
 59 
 60 % Initialize fitting parameters
 61 initial_theta = zeros(size(X, 2), 1);
 62 
 63 % Set regularization parameter lambda to 1
 64 lambda = 1;
 65 
 66 % Compute and display initial cost and gradient for regularized logistic
 67 % regression
 68 [cost, grad] = costFunctionReg(initial_theta, X, y, lambda);
 69 
 70 fprintf('Cost at initial theta (zeros): %f\n', cost);
 71 fprintf('Expected cost (approx): 0.693\n');
 72 fprintf('Gradient at initial theta (zeros) - first five values only:\n');
 73 fprintf(' %f \n', grad(1:5));
 74 fprintf('Expected gradients (approx) - first five values only:\n');
 75 fprintf(' 0.0085\n 0.0188\n 0.0001\n 0.0503\n 0.0115\n');
 76 
 77 fprintf('\nProgram paused. Press enter to continue.\n');
 78 pause;
 79 
 80 % Compute and display cost and gradient
 81 % with all-ones theta and lambda = 10
 82 test_theta = ones(size(X,2),1);
 83 [cost, grad] = costFunctionReg(test_theta, X, y, 10);
 84 
 85 fprintf('\nCost at test theta (with lambda = 10): %f\n', cost);
 86 fprintf('Expected cost (approx): 3.16\n');
 87 fprintf('Gradient at test theta - first five values only:\n');
 88 fprintf(' %f \n', grad(1:5));
 89 fprintf('Expected gradients (approx) - first five values only:\n');
 90 fprintf(' 0.3460\n 0.1614\n 0.1948\n 0.2269\n 0.0922\n');
 91 
 92 fprintf('\nProgram paused. Press enter to continue.\n');
 93 pause;
 94 
 95 %% ============= Part 2: Regularization and Accuracies =============
 96 %  Optional Exercise:
 97 %  In this part, you will get to try different values of lambda and
 98 %  see how regularization affects the decision coundart
 99 %
100 %  Try the following values of lambda (0, 1, 10, 100).
101 %
102 %  How does the decision boundary change when you vary lambda? How does
103 %  the training set accuracy vary?
104 %
105 
106 % Initialize fitting parameters
107 initial_theta = zeros(size(X, 2), 1);
108 
109 % Set regularization parameter lambda to 1 (you should vary this)
110 lambda = 1;
111 
112 % Set Options
113 options = optimset('GradObj', 'on', 'MaxIter', 400);
114 
115 % Optimize
116 [theta, J, exit_flag] = ...
117     fminunc(@(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
118 
119 % Plot Boundary
120 plotDecisionBoundary(theta, X, y);
121 hold on;
122 title(sprintf('lambda = %g', lambda))
123 
124 % Labels and Legend
125 xlabel('Microchip Test 1')
126 ylabel('Microchip Test 2')
127 
128 legend('y = 1', 'y = 0', 'Decision boundary')
129 hold off;
130 
131 % Compute accuracy on our training set
132 p = predict(theta, X);
133 
134 fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
135 fprintf('Expected accuracy (with lambda = 1): 83.1 (approx)\n');

costFunction_Reg.m 

 1 function [J, grad] = costFunctionReg(theta, X, y, lambda)
 2 %COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
 3 %   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
 4 %   theta as the parameter for regularized logistic regression and the
 5 %   gradient of the cost w.r.t. to the parameters. 
 6 
 7 % Initialize some useful values
 8 m = length(y); % number of training examples
 9 
10 % You need to return the following variables correctly 
11 J = 0;
12 grad = zeros(size(theta));
13 
14 % ====================== YOUR CODE HERE ======================
15 % Instructions: Compute the cost of a particular choice of theta.
16 %               You should set J to the cost.
17 %               Compute the partial derivatives and set grad to the partial
18 %               derivatives of the cost w.r.t. each parameter in theta
19 alpha = 1;
20 other = lambda ./ (2 * m) * theta.^2 ; 
21 J = (-1/m) * ( y' * log(sigmoid(X * theta)) + ( 1-y )'* log(1-sigmoid(X*theta))) + other ;
22 grad = ( alpha / m ) * X'* (sigmoid( X * theta ) - y) + lambda ./ m * theta ;
23 
24 % =============================================================
25 
26 end

 

转载于:https://www.cnblogs.com/twomeng/p/9533429.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
课程简介:  本项目课程是一门极具综合性和完整性的大型项目课程;课程项目的业务背景源自各类互联网公司对海量用户浏览行为数据和业务数据分析的需求及企业数据管理、数据运营需求。 本课程项目涵盖数据采集与预处理、数据仓库体系建设、用户画像系统建设、数据治理(元数据管理、数据质量管理)、任务调度系统、数据服务层建设、OLAP即席分析系统建设等大量模块,力求原汁原味重现一个完备的企业级大型数据运营系统。  拒绝demo,拒绝宏观抽象,拒绝只讲不练,本课程高度揉和理论与实战,并兼顾各层次的学员,真正从0开始,循序渐进,每一个步骤每一个环节,都会带领学员从需求分析开始,到逻辑设计,最后落实到每一行代码,所有流程都采用企业级解决方案,并手把手带领学员一一实现,拒绝复制粘贴,拒绝demo化的实现。并且会穿插大量的原创图解,来帮助学员理解复杂逻辑,掌握关键流程,熟悉核心架构。   跟随项目课程,历经接近100+小时的时间,从需求分析开始,到数据埋点采集,到预处理程序代码编写,到数仓体系搭建......逐渐展开整个项目的宏大视图,构建起整个项目的摩天大厦。  由于本课程不光讲解项目的实现,还会在实现过程中反复揉和各种技术细节,各种设计思想,各种最佳实践思维,学完本项目并勤于实践的话,学员的收获将远远超越一个项目的具体实现,更能对大型数据系统开发产生深刻体悟,对很多技术的应用将感觉豁然开朗,并带来融会贯通能力的巨大飞跃。当然,最直接的收获是,学完本课程,你将很容易就拿到大数据数仓建设或用户画像建设等岗位的OFFER课程模块: 1. 数据采集:涉及到埋点日志flume采集系统,sqoop业务数据抽取系统等; 2. 数据预处理:涉及到各类字典数据构建,复杂结构数据清洗解析,数据集成,数据修正,以及多渠道数据的用户身份标识打通:ID-MAPPING等;3. 数据仓库:涉及到hive数仓基础设施搭建,数仓分层体系设计,数仓分析主题设计,多维分析实现,ETL任务脚本开发,ETL任务调度,数据生命周期管理等;4. 数据治理:涉及数据资产查询管理,数据质量监控管理,atlas元数据管理系统,atlas数据血缘管理等;5. 用户画像系统:涉及画像标签体系设计,标签体系层级关系设计,各类标签计算实现,兴趣类标签的衰减合并,模型标签的机器学习算法应用及特征提取、模型训练等;6. OLAP即席分析平台:涉及OLAP平台的整体架构设计,技术选型,底层存储实现,Presto查询引擎搭建,数据服务接口开发等;7. 数据服务:涉及数据服务的整体设计理念,架构搭建,各类数据访问需求的restapi开发等;课程所涉及的技术: 整个项目课程中,将涉及到一个大型数据系统中所用到的几乎所有主要技术,具体来说,包含但不限于如下技术组件:l Hadoopl Hivel HBasel SparkCore /SparkSQL/ Spark GRAPHX / Spark Mllibl Sqoopl Azkabanl Flumel lasal Kafkal Zookeeperl Solrl Prestop

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值