Basic Data Exploration(2)

1—Selecting Data for Modeling

Your dataset has so many variables to wrap your head around🤷🏼, or even to output it nicely🥀. How can you pare down that overwhelming amount of data to something you can understand🧐?

Let's start by selecting some variables using our intuition🤓.

To choose variables/columns, we’ll need to look at a list of all the columns🤯 in the dataset. This is done with the columns property of the DataFrame👇🏻:

#To choose variables/columns,
#we’ll need to look at a list of all the columns in the dataset.
import pandas as pd
melbourne_file_path = '/Users/mac/Desktop/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
melbourne_data.columns
Out[1]: 
Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',
       'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',
       'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',
       'Longtitude', 'Regionname', 'Propertycount'],
      dtype='object')

2—Selecting The Prediction Target

We can pull out a variable with dot-notation. Store this single column in a Series, which is broadly like a DataFrame with only a single column of data.

Now, instead of choosing variables intuitively, but using what you're going to predict💁🏼‍♀️. The variable we pull out is called the prediction target. By convention, the prediction target is called y.

#Selecting the prediction target: y = 'Price'
y = melbourne_data.Price

(Interesting part😄)

3—Choosing "Features"

In the output above, columns other than the ‘Price' are called “features”. By convention, the features are called X.

(In this case, those columns are also inputted into our model and used to determine the home price.)(Sometimes, we will use all columns except the target as features. Other times you'll be better off with fewer features.)

#For now, we'll build a model with only a few features
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 
                      'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]

- Is this the end of it? -No😂

Let's quickly review the data we'll be using to predict house prices using the describe method and the head method......

the describe method👀: 

#To choose variables/columns,
#we’ll need to look at a list of all the columns in the dataset.
import pandas as pd
melbourne_file_path = '/Users/mac/Desktop/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
melbourne_data.columns
#Selecting the prediction target: y = 'Price'
y = melbourne_data.Price
#For now, we'll build a model with only a few features
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 
                      'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]
#Predict house prices using the describe method
X.describe()
Out[2]: 
              Rooms      Bathroom       Landsize     Lattitude    Longtitude
count  13580.000000  13580.000000   13580.000000  13580.000000  13580.000000
mean       2.937997      1.534242     558.416127    -37.809203    144.995216
std        0.955748      0.691712    3990.669241      0.079260      0.103916
min        1.000000      0.000000       0.000000    -38.182550    144.431810
25%        2.000000      1.000000     177.000000    -37.856822    144.929600
50%        3.000000      1.000000     440.000000    -37.802355    145.000100
75%        3.000000      2.000000     651.000000    -37.756400    145.058305
max       10.000000      8.000000  433014.000000    -37.408530    145.526350

the head method👀: 

#To choose variables/columns,
#we’ll need to look at a list of all the columns in the dataset.
import pandas as pd
melbourne_file_path = '/Users/mac/Desktop/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
melbourne_data.columns
#Selecting the prediction target: y = 'Price'
y = melbourne_data.Price
#For now, we'll build a model with only a few features
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 
                      'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]
#Predict house prices using the head method
X.head()
Out[3]: 
   Rooms  Bathroom  Landsize  Lattitude  Longtitude
0      2       1.0     202.0   -37.7996    144.9984
1      2       1.0     156.0   -37.8079    144.9934
2      3       2.0     134.0   -37.8093    144.9944
3      3       2.0      94.0   -37.7969    144.9969
4      4       1.0     120.0   -37.8072    144.9941

Using these commands to visually check data is an important part of the data effort. I think we'll find some surprises💡 in the dataset that are worth checking out.

  • 45
    点赞
  • 40
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
以下是一篇即将投稿Minerals期刊(MDPI出版社)的论文初稿的部分内容,请按照该期刊对论文格式的要求,将以下内容进行压缩凝练(注意:可对内容进行删减,对错误进行修正,对语句顺序进行调整,符合美式英语标准,符合英语母语者语言习惯,句子简明易懂,术语使用准确,保留文章结构、不偏离论文主要内容): Rocks and ore components directly enter the soil and water system sediments through physical weathering and chemical weathering, and the geochemical anomalies originally present in the rocks further spread with the entry into the soil or directly into the water system, forming soil anomalies and water system sediment anoma-lies.Geochemical anomaly detection is essentially the detection of signal anomalies in geochemical data, which refers to finding out the anomalous distribution of chemical elements themselves and the anomalous distribution of multiple elements in combination through feature extraction and analysis processing of geochemical data in the study area, and reflecting the mineral distribution through the distribution of geochemical ele-ments.Through the method of geochemical anomaly finding, the detected anomalies may contain information indicating specific minerals, which facilitates the rapid tracing of prospective areas and favorable areas for mineralization, identifies possible mineralizing elements and distribution characteristics in the work area, provides basic information for the strategic deployment of mineralization search, and provides good indications for later mineralization search.
02-28
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值