Loading A CSV Into Pandas

Want to learn more? I recommend these Python books: Python for Data AnalysisPython Data Science Handbook, and Introduction to Machine Learning with Python.

import modules

import pandas as pd
import numpy as np

Create dataframe (that we will be importing)

raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
        'last_name': ['Miller', 'Jacobson', ".", 'Milner', 'Cooze'],
        'age': [42, 52, 36, 24, 73],
        'preTestScore': [4, 24, 31, ".", "."],
        'postTestScore': ["25,000", "94,000", 57, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])
df
 first_namelast_nameagepreTestScorepostTestScore
0JasonMiller42425,000
1MollyJacobson522494,000
2Tina.363157
3JakeMilner24.62
4AmyCooze73.70

Save dataframe as csv in the working director

df.to_csv('../data/example.csv')

Load a csv

df = pd.read_csv('../data/example.csv')
df
 Unnamed: 0first_namelast_nameagepreTestScorepostTestScore
00JasonMiller42425,000
11MollyJacobson522494,000
22Tina.363157
33JakeMilner24.62
44AmyCooze73.70

Load a csv with no headers

df = pd.read_csv('../data/example.csv', header=None)
df
 012345
0NaNfirst_namelast_nameagepreTestScorepostTestScore
10.0JasonMiller42425,000
21.0MollyJacobson522494,000
32.0Tina.363157
43.0JakeMilner24.62
54.0AmyCooze73.70

Load a csv while specifying column names

df = pd.read_csv('../data/example.csv', names=['UID', 'First Name', 'Last Name', 'Age', 'Pre-Test Score', 'Post-Test Score'])
df
 UIDFirst NameLast NameAgePre-Test ScorePost-Test Score
0NaNfirst_namelast_nameagepreTestScorepostTestScore
10.0JasonMiller42425,000
21.0MollyJacobson522494,000
32.0Tina.363157
43.0JakeMilner24.62
54.0AmyCooze73.70

Load a csv with setting the index column to UID

df = pd.read_csv('../data/example.csv', index_col='UID', names=['UID', 'First Name', 'Last Name', 'Age', 'Pre-Test Score', 'Post-Test Score'])
df
 First NameLast NameAgePre-Test ScorePost-Test Score
UID     
NaNfirst_namelast_nameagepreTestScorepostTestScore
0.0JasonMiller42425,000
1.0MollyJacobson522494,000
2.0Tina.363157
3.0JakeMilner24.62
4.0AmyCooze73.70

Load a csv while setting the index columns to First Name and Last Name

df = pd.read_csv('../data/example.csv', index_col=['First Name', 'Last Name'], names=['UID', 'First Name', 'Last Name', 'Age', 'Pre-Test Score', 'Post-Test Score'])
df
  UIDAgePre-Test ScorePost-Test Score
First NameLast Name    
first_namelast_nameNaNagepreTestScorepostTestScore
JasonMiller0.042425,000
MollyJacobson1.0522494,000
Tina.2.0363157
JakeMilner3.024.62
AmyCooze4.073.70

Load a csv while specifying "." as missing values

df = pd.read_csv('../data/example.csv', na_values=['.'])
pd.isnull(df)
 Unnamed: 0first_namelast_nameagepreTestScorepostTestScore
0FalseFalseFalseFalseFalseFalse
1FalseFalseFalseFalseFalseFalse
2FalseFalseTrueFalseFalseFalse
3FalseFalseFalseFalseTrueFalse
4FalseFalseFalseFalseTrueFalse

Load a csv while specifying "." and "NA" as missing values in the Last Name column and "." as missing values in Pre-Test Score column

sentinels = {'Last Name': ['.', 'NA'], 'Pre-Test Score': ['.']}
df = pd.read_csv('../data/example.csv', na_values=sentinels)
df
 Unnamed: 0first_namelast_nameagepreTestScorepostTestScore
00JasonMiller42425,000
11MollyJacobson522494,000
22Tina.363157
33JakeMilner24.62
44AmyCooze73.70

Load a csv while skipping the top 3 rows

df = pd.read_csv('../data/example.csv', na_values=sentinels, skiprows=3)
df
 2Tina.363157
03JakeMilner24.62
14AmyCooze73.70

Load a csv while interpreting "," in strings around numbers as thousands seperators

df = pd.read_csv('../data/example.csv', thousands=',')
df
 Unnamed: 0first_namelast_nameagepreTestScorepostTestScore
00JasonMiller42425000
11MollyJacobson522494000
22Tina.363157
33JakeMilner24.62
44AmyCooze73.70

转载于:https://my.oschina.net/kaiwangic/blog/920271

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值