2.Explore Your Data

Using Pandas to Get Familiar With Your Data

任何机器学习项目的第一步都是熟悉数据。 您将使用Pandas库。 Pandas是科学家用于挖掘和处理数据的主要工具。 大多数人在他们的代码中将pandas缩写为pd。 我们使用如下命令执行此操作。

[1]

import pandas as pd

Pandas库中最重要的部分是DataFrame。 DataFrame包含您可能认为是表格的数据类型。 这类似于Excel中的工作表或SQL数据库中的表。
对于您希望使用此类数据进行的大多数事情,Pandas都有强大的方法。
例如,我们将查看澳大利亚墨尔本的房价数据。 在动手练习中,您将相同的处理方法应用于新的数据集,该数据集含有爱荷华州的房价。
示例(墨尔本)数据位于文件路径../input/melbourne-housing-snapshot/melb_data.csv。
我们使用以下命令加载和挖掘数据:

【2】

# save filepath to variable for easier access
melbourne_file_path = '../input/melbourne-housing-snapshot/melb_data.csv'
# read the data and store data in DataFrame titled melbourne_data
melbourne_data = pd.read_csv(melbourne_file_path) 
# print a summary of the data in Melbourne data
melbourne_data.describe()
 RoomsPriceDistancePostcodeBedroom2BathroomCarLandsizeBuildingAreaYearBuiltLattitudeLongtitudePropertycount
count13580.0000001.358000e+0413580.00000013580.00000013580.00000013580.00000013518.00000013580.0000007130.0000008205.00000013580.00000013580.00000013580.000000
mean2.9379971.075684e+0610.1377763105.3019152.9147281.5342421.610075558.416127151.9676501964.684217-37.809203144.9952167454.417378
std0.9557486.393107e+055.86872590.6769640.9659210.6917120.9626343990.669241541.01453837.2737620.0792600.1039164378.581772
min1.0000008.500000e+040.0000003000.0000000.0000000.0000000.0000000.0000000.0000001196.000000-38.182550144.431810249.000000
25%2.0000006.500000e+056.1000003044.0000002.0000001.0000001.000000177.00000093.0000001940.000000-37.856822144.9296004380.000000
50%3.0000009.030000e+059.2000003084.0000003.0000001.0000002.000000440.000000126.0000001970.000000-37.802355145.0001006555.000000
75%3.0000001.330000e+0613.0000003148.0000003.0000002.0000002.000000651.000000174.0000001999.000000-37.756400145.05830510331.000000
max10.0000009.000000e+0648.1000003977.00000020.0000008.00000010.000000433014.00000044515.0000002018.000000-37.408530145.52635021650.000000

Interpreting Data Description

结果显示原始数据集中每列的8个数字。 第一个数字(计数)显示有多少行具有有效值。
由于许多原因而出现缺失值。 例如,在调查单卧室房屋时,不会收集第二间卧室的大小。 我们将回到缺失数据的主题。
第二个值是平均值, 在此之下,std是标准差,它衡量数值如何展开。
要解释最小值,25%,50%,75%和最大值,请设想将每列从最低值到最高值进行排序。 第一个(最小)值是min。 如果你在列表中走四分之一,这就是25%的值。 第50和第75百分位数也是同样定义。

 

Your Turn

开始的第一个编程练习

 

I understand that you would like me to help you with a task that involves implementing a Perceptron model in Python. The task also requires using a specific dataset as the training set and exploring how the loss function changes at each step. Additionally, you need to make sure that you can determine/change the misclassified data point at each step so that multiple hyperplanes can be generated. To complete the task, you can follow these steps: 1. Import the required libraries, including numpy and matplotlib, in Python. 2. Define the training data set as a numpy array with the given values: {(3, 3), (4, 3), (1, 1)}. 3. Define the labels for the training data set as a numpy array with the given values: {1, 1, -1}. 4. Define the weight vector as a numpy array of zeros with the same shape as the feature vector. 5. Define the bias term as a scalar value of zero. 6. Define the learning rate as a scalar value of 1. 7. Implement the Perceptron algorithm by iterating over the training data set until all data points are correctly classified. 8. In each iteration, calculate the dot product of the weight vector and the feature vector, add the bias term, and obtain the predicted label as the sign of the result. 9. If the predicted label does not match the true label, update the weight vector and bias term according to the Perceptron update rule. 10. Track the loss function at each iteration by calculating the number of misclassified data points. 11. Plot the loss function against the number of iterations using matplotlib. Here is an example implementation of the Perceptron model in Python: ```python import numpy as np import matplotlib.pyplot as plt # Define the training data set and labels X = np.array([(3, 3), (4, 3), (1, 1)]) y = np.array([1, 1, -1]) # Define the weight vector and bias term w = np.zeros(X.shape[1]) b = 0 # Define the learning rate lr = 1 # Define the Perceptron algorithm def perceptron(X, y, w, b, lr): misclassified = True iterations = 0 loss = [] while misclassified: misclassified = False for i in range(X.shape[0]): z = np.dot(X[i], w) + b if np.sign(z) != y[i]: w += lr * y[i] * X[i] b += lr * y[i] misclassified = True loss.append(np.sum(np.sign(np.dot(X, w) + b) != y)) iterations += 1 return w, b, loss, iterations # Run the Perceptron algorithm w, b, loss, iterations = perceptron(X, y, w, b, lr) # Plot the loss function plt.plot(range(iterations), loss) plt.xlabel("Iterations") plt.ylabel("Loss") plt.title("Perceptron Loss Function") plt.show() ``` This implementation should output a plot of the loss function against the number of iterations, which will show how the loss function changes at each step. Additionally, you can use the weight vector and bias term to generate multiple hyperplanes by changing the misclassified data point at each step.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值