This article covers
- The sigmoid function and the logistic regression classifier(Sigmoid 函数和Logistic 回归分类器)
- Our first look at optimization
- The gradient descent optimization algorithm(梯度下降最优化算法)
- Dealing with missing values in our data.
General approach to logistic regression
- Collect: Any method
- Prepare: Numeric values are needed for a distance calculation.A structured data format is best.
- Analyze: Any method
- Train: We'll spend most of the time training, where we try to find optimal coefficients to classify our data.
- Test: Classification is quick and easy once the training step is done.
- Use: This application needs to get some input data and output structured numeric values.Next, the appplication applies the simple regression calculation on this input data and determines which class the input data should belong to.The application then takes some action on the calculated class.
gradient ascent and stochastic gradient ascent .These optimization algorithms will be used to train our classifier.
5.1 Classification with logistic regression and the sigmoid function: a tractable step function
Logistic regression
- Pros: Computationally inexpensive,easy to implement,knowledge representation easy to interpret
- Cons:Prone to underfitting,may have low accuracy
- Works with: Numeric values, nominal values
give all of our features and it will predict the class, the function will split out a 0 or a 1. it's easier tyo deal with mathematically. This function is called the sigmoid. The sigmod is given by the following equation:
5.2 Using optimization to find the best regression coefficients
The input to the sigmod function described will be z,where z is given by the following :
5.2.1 Gradient ascent
The first optimization algorithm we're going to look at is called gradient ascent. Gradient ascent is based on the idea that if we want to find the maximum point on a function, then the best way to move is in the direction of the gradient.
The gradient with the symbol
The gradient of a function
Let's put this into action on our logistic regression classifier and some python.
First, we need a dataset
5.2.2 Train:using gradient ascent to find the best parameters.
'''
Author: Maxwell Pan
Date: 2022-04-19 06:46:59
LastEditTime: 2022-04-19 10:10:03
FilePath: \cp05\logRegres.py
Description: Logistic regression
Software:VSCode,env:
'''
import numpy as np
# Logistic regression gradient ascent optimization functions
def loadDataSet():
dataMat = []; labelMat = []
fr = open('testSet.txt')
for line in fr.readlines():
lineArr = line.strip().split()
dataMat.append([1.0,float(lineArr[0]),float(lineArr[1])])
labelMat.append(int(lineArr[2]))
return dataMat,labelMat
def sigmoid(inX):
return 1.0/(1+np.exp(-inX))
def gradAscent(dataMatIn,classLabels):
dataMatrix = np.mat(dataMatIn)
labelMat = np.mat(classLabels).transpose() # Convert to NumPy matrix data type.
m,n = np.shape(dataMatrix)
alpha = 0.001
maxCycles = 500
weights = np.ones((n,1))
for k in range(maxCycles):
h = sigmoid(dataMatrix*weights)
error = (labelMat - h)
weights = weights + alpha * dataMatrix.transpose()*error
return weights # Matrix multiplication
Type the following at your notebook.
import logRegres
dataArr,labelMat=logRegres.loadDataSet()
logRegres.gradAscent(dataArr,labelMat)
5.2.3 Analyze: plotting the decision boundary
We're solving for a set of weights used to make a line that separates the different classes of data.
# Plotting the logistic regression best-fit line and dataset.
def plotBestFit(wei):
import matplotlib.pyplot as plt
weights = wei.getA()
dataMat,labelMat=loadDataSet()
dataArr = np.array(dataMat)
n = np.shape(dataArr)[0]
xcord1 = []; ycord1 = []
xcord2 = []; ycord2 = []
for i in range(n):
if int(labelMat[i]) == 1:
xcord1.append(dataArr[i,1]);ycord1.np.append(dataArr[i,2])
else:
xcord2.append(dataArr[i,1]);ycord2.np.append(dataArr[i,2])
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xcord1,ycord1,s=30,c='red',marker='s')
ax.scatter(xcord2,ycord2,s=30,c='green')
x = np.arange(-3.0, 3.0, 0.1)
y = (-weights[0]-weights[1]*x)/weights[2]
y = y.reshape((60,1))
ax.plot(x, y)
plt.xlabel('X1');plt.ylabel('X2')
plt.show()
import logRegres
import imp
imp.reload(logRegres)
weights = logRegres.gradAscent(dataArr,labelMat)
logRegres.plotBestFit(weights)
5.2.4 Train:stochastic gradient ascent
# Stochastic gradient ascent
def stocGradAscent0(dataMatrix, classLabels):
dataMatrix=np.array(dataMatrix)
m,n = np.shape(dataMatrix)
alpha = 0.01
weights = np.ones(n)
for i in range(n):
h = sigmoid(sum(dataMatrix[i]*weights))
error = classLabels[i] - h
weights = weights + alpha * error * dataMatrix[i]
return weights
def stocGradAscent1(dataMatrix, classLabels,numIter=150):
dataMatrix=np.array(dataMatrix)
m,n = np.shape(dataMatrix)
weights = np.ones(n)
for j in range(numIter):
dataIndex = list(range(m))
for i in range(m):
alpha = 4/(1.0+j+i)+0.01
randIndex = int(random.uniform(0,len(dataIndex)))
h = sigmoid(sum(dataMatrix[randIndex]*weights))
error = classLabels[randIndex] - h
weights = weights + alpha * error * dataMatrix[randIndex]
del(dataIndex[randIndex])
return weights
import logRegres
dataArr,labelMat=logRegres.loadDataSet()
weights=logRegres.stocGradAscent1(dataArr,labelMat)
logRegres.plotBestFit(weights)
5.3 Example: estimating horse fatalities from colic
Example: using logistic regression to estimate horse fatalities from colic
1. Collect: Data file provided.
2. Prepare: Parse a text file in Python, and fill in missing values.
3. Analyze: Visually inspect the data.
4. Train: Use an optimization algorithm to find the best coefficients.
5. Test: To measure the success, we’ll look at error rate. Depending on the error rate, we may decide to go back to the training step to try to find better values for the regression coefficients by adjusting the number of iterations and step size.
6. Use: Building a simple command-line program to collect horse symptoms and output live/die diagnosis won’t be difficult. I’ll leave that up to you as an exercise.
5.3.1 Prepare:dealing with missing values in the data.
Here are some options:
■ Use the feature’s mean value from all the available data.
■ Fill in the unknown with a special value like -1.
■ Ignore the instance.
■ Use a mean value from similar items.
■ Use another machine learning algorithm to predict the value.
5.3.2 Test: classifying with logistic regression
def classifyVector(inX, weights):
prob = sigmoid(sum(inX*weights))
if prob > 0.5:
return 1.0
else:
return 0.0
def colicTest():
frTrain = open('horseColicTraining.txt')
frTest = open('horseColicTest.txt')
trainingSet = [];trainingLabels = []
for line in frTrain.readlines():
currLine = line.strip().split('\t')
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
trainingSet.append(lineArr)
trainingLabels.append(float(currLine[21]))
trainWeights = stocGradAscent1(np.array(trainingSet), trainingLabels,500)
errorCount = 0; numTestVec = 0.0
for line in frTest.readlines():
numTestVec += 1.0
currLine = line.strip().split('\t')
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
if int(classifyVector(np.array(lineArr),trainWeights))!= int(currLine[21]):
errorCount += 1
errorRate = (float(errorCount)/numTestVec)
print("the error rate of this test is: %f" % errorRate)
return errorRate
def multiTest():
numTests = 10; errorSum=0.0
for k in range(numTests):
errorSum += colicTest()
print("after %d iterations the average error rate is: %f" %(numTests,errorSum/float(numTests)))
import logRegres
import imp
imp.reload(logRegres)
logRegres.multiTest()
4.5 Summary
Logistic regression is finding best-fit parameters to a nonlinear function called the sigmoid. Methods of optimization can be used to find the best-fit parameters. Among the optimization algorithms, one of the most common algorithms is gradient ascent. Gradient ascent can be simplified with stochastic gradient ascent. Stochastic gradient ascent can do as well as gradient ascent using far fewer computing resources. In addition, stochastic gradient ascent is an online algorithm; it can update what it has learned as new data comes in rather than reloading all of the data as in batch processing. One major problem in machine learning is how to deal with missing values in the data. There’s no blanket answer to this question. It really depends on what you’re doing with the data. There are a number of solutions, and each solution has its own advantages and disadvantages. In the next chapter we’re going to take a look at another classification algorithm similar to logistic regression. The algorithm is called support vector machines and is considered one of the best stock algorithms.