Basic Data Exploration(1)

本文介绍了如何使用Pandas对墨尔本房价数据进行初步分析,包括加载数据、查看描述性统计(如平均数、标准差、四分位数等),帮助数据科学家熟悉数据分布。
摘要由CSDN通过智能技术生成

1—Using Pandas to Get Familiar With Your Data 

The first step in any machine learning project is familiarize yourself with the data. (Pandas is the primary tool data scientists use for exploring and manipulating data. Most people abbreviate pandas in their code as pd. We'll do this with the command.)

import pandas as pd

Let’s load and explore the data about home prices in Melbourne, Australia with the following commands.In the hands-on exercises, I will apply the same processes to a new dataset, maybe the home-prices data from Iowa or Florida.

Input:

import pandas as pd
melbourne_file_path = '/Users/mac/Desktop/melb_data.csv'
# read the data and store data in DataFrame titled melbourne_data
melbourne_data = pd.read_csv(melbourne_file_path) 
# print a summary of the data in Melbourne data
melbourne_data.describe()

Output:

Out[5]: 
              Rooms         Price  ...    Longtitude  Propertycount
count  13580.000000  1.358000e+04  ...  13580.000000   13580.000000
mean       2.937997  1.075684e+06  ...    144.995216    7454.417378
std        0.955748  6.393107e+05  ...      0.103916    4378.581772
min        1.000000  8.500000e+04  ...    144.431810     249.000000
25%        2.000000  6.500000e+05  ...    144.929600    4380.000000
50%        3.000000  9.030000e+05  ...    145.000100    6555.000000
75%        3.000000  1.330000e+06  ...    145.058305   10331.000000
max       10.000000  9.000000e+06  ...    145.526350   21650.000000

[8 rows x 13 columns]

2—Interpreting Data Description

The results show 8 numbers for each column in your original dataset. 

count: Shows how many rows have non-missing values.

(Yes, the count shows the total number of non-missing values, which means there are missing values in the dataset because the size of the 2nd bedroom wouldn't be collected when surveying a 1 bedroom house. )

mean: Which is the average.

std: Std is the standard deviation, which measures how numerically spread out the values are.

min: The first (smallest) value is the min.

(Q: Quartile)

Q1-25%: If you go 25% way through the list, you'll find a number that is bigger than 25% of the values and smaller than 75% of the values. (pronounced "25th percentile")

(For example, if I score 40, the percentage of 40 is 25% , which means that 75% of the students score above 40 and 25% score below 40.)

Q2-50%: If you go 50% way through the list, you'll find a number that is bigger than 50% of the values and smaller than 50% of the values.

Q3-75%: If you go 75% way through the list, you'll find a number that is bigger than 75% of the values and smaller than 25% of the values.

Max: The largest number.

  • 6
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
以下是一篇即将投稿Minerals期刊(MDPI出版社)的论文初稿的部分内容,请按照该期刊对论文格式的要求,将以下内容进行压缩凝练(注意:可对内容进行删减,对错误进行修正,对语句顺序进行调整,符合美式英语标准,符合英语母语者语言习惯,句子简明易懂,术语使用准确,保留文章结构、不偏离论文主要内容): Rocks and ore components directly enter the soil and water system sediments through physical weathering and chemical weathering, and the geochemical anomalies originally present in the rocks further spread with the entry into the soil or directly into the water system, forming soil anomalies and water system sediment anoma-lies.Geochemical anomaly detection is essentially the detection of signal anomalies in geochemical data, which refers to finding out the anomalous distribution of chemical elements themselves and the anomalous distribution of multiple elements in combination through feature extraction and analysis processing of geochemical data in the study area, and reflecting the mineral distribution through the distribution of geochemical ele-ments.Through the method of geochemical anomaly finding, the detected anomalies may contain information indicating specific minerals, which facilitates the rapid tracing of prospective areas and favorable areas for mineralization, identifies possible mineralizing elements and distribution characteristics in the work area, provides basic information for the strategic deployment of mineralization search, and provides good indications for later mineralization search.
02-28
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值