Python点滴(五)-pandas进阶

最新推荐文章于 2022-06-24 15:01:33 发布

令狐公子

最新推荐文章于 2022-06-24 15:01:33 发布

阅读量1.8k

点赞数

分类专栏： Python点滴 Python点滴文章标签： pandas Python ipython 数据分析 python基本语法使用

本文链接：https://blog.csdn.net/qq_14959801/article/details/51428615

版权

Python点滴同时被 2 个专栏收录

15 篇文章 3 订阅

订阅专栏

Python点滴

10 篇文章 4 订阅

订阅专栏

pow() 方法返回 x^y（x的y次方）的值。

语法

以下是 pow() 方法的语法:

import math

math.pow( x )

实例

以下展示了使用 pow() 方法的实例：

#!/usr/bin/python
import math   # This will import math module

print "math.pow(100, 2) : ", math.pow(100, 2)
print "math.pow(100, -2) : ", math.pow(100, -2)
print "math.pow(2, 4) : ", math.pow(2, 4)
print "math.pow(3, 0) : ", math.pow(3, 0)

以上实例运行后输出结果为：

math.pow(100, 2) :  10000.0
math.pow(100, -2) :  0.0001
math.pow(2, 4) :  16.0
math.pow(3, 0) :  1.0

pandas.read_csv('path',header=None) 这样就可以防止第一行数据被认为是标题索引，然后用0，1，2等来进行索引代替！

In [1]: import pandas as pd

In [2]: pd.read_csv('D:\pydata\ch06\ex2.csv',header=None) #去掉默认第一行为标题索引行，从而可以从0,1,2来进行索引

Out[2]:

0 1 2 3 4

0 1 2 3 4 hello

1 5 6 7 8 world

2 9 10 11 12 foo

In [3]: pd.read_csv('D:\pydata\ch06\ex2.csv',names=['a','b','c','d','message']) #给列起名字

Out[3]:

a b c d message

0 1 2 3 4 hello

1 5 6 7 8 world

2 9 10 11 12 foo

In [4]: names=['a','b','c','d','message']

In [5]: pd.read_csv('D:\pydata\ch06\ex2.csv',names=names,index_col='message') #以message为默认索引列

Out[5]:

a b c d

message

hello 1 2 3 4

world 5 6 7 8

foo 9 10 11 12

In [6]: !cat D:\pydata\ch06\csv_mindex.csv

'cat' 不是内部或外部命令，也不是可运行的程序

或批处理文件。

In [7]: parsed=pd.read_csv('D:\pydata\ch06\csv_mindex.csv',index_col=['key1','key2']) #key1 key2为索引

In [8]: parsed

Out[8]:

value1 value2

key1 key2

one a 1 2

b 3 4

c 5 6

d 7 8

two a 9 10

b 11 12

c 13 14

d 15 16

In [9]: list(open('D:\pydata\ch06\ex3.txt'))

Out[9]:

[' A B C\n',

'aaa -0.264438 -1.026059 -0.619500\n',

'bbb 0.927272 0.302904 -0.032399\n',

'ccc -0.264273 -0.386314 -0.217601\n',

'ddd -0.871858 -0.348382 1.100491\n']

In [10]: result=pd.read_table('D:\pydata\ch06\ex3.txt',sep='\s+')

In [11]: result

Out[11]:

A B C

aaa -0.264438 -1.026059 -0.619500

bbb 0.927272 0.302904 -0.032399

ccc -0.264273 -0.386314 -0.217601

ddd -0.871858 -0.348382 1.100491

In [12]: pd.read_csv('D:\pydata\ch06\ex4.csv')

Out[12]:

# hey!

a b c d message

# just wanted to make things more difficult for you NaN NaN NaN NaN

# who reads CSV files with computers anyway? NaN NaN NaN

1 2 3 4 hello

5 6 7 8 world

9 10 11 12 foo

In [13]: pd.read_csv('D:\pydata\ch06\ex4.csv',skiprows=[0,2,3]) #跳行显示

Out[13]:

a b c d message

0 1 2 3 4 hello

1 5 6 7 8 world

2 9 10 11 12 foo

In [14]: pycat D:\pydata\ch06\ex5.csv #pycat文本显示

something,a,b,c,d,message

one,1,2,3,4,NA

two,5,6,,8,world

three,9,10,11,12,foo

In [15]: result=pd.read_csv('D:\pydata\ch06\ex5.csv') #自动补空NaN

In [16]: result

Out[16]:

something a b c d message

0 one 1 2 3 4 NaN

1 two 5 6 NaN 8 world

2 three 9 10 11 12 foo

In [17]: pd.isnull(result) #判断是否为空

Out[17]:

something a b c d message

0 False False False False False True

1 False False False True False False

2 False False False False False False

In [18]: result=pd.read_csv('D:\pydata\ch06\ex5.csv',na_values=['NULL'])

In [19]: result

Out[19]:

something a b c d message

0 one 1 2 3 4 NaN

1 two 5 6 NaN 8 world

2 three 9 10 11 12 foo

In [20]: sentinels={'message':['foo','NA'],'something':['two']} #指定不同的NA标记值将foo,NA,two改成NaN

In [21]: pd.read_csv('D:\pydata\ch06\ex5.csv',na_values=sentinels)

Out[21]:

something a b c d message

0 one 1 2 3 4 NaN

1 NaN 5 6 NaN 8 world

2 three 9 10 11 12 NaN

In [22]: pycat D:\pydata\ch06\ex6.csv #pycat文本显示

Error: no such file, variable, URL, history range or macro

In [1]: import pandas as pd
In [2]: pycat D:\pydata\ch06\ex4.csv
# hey!
a,b,c,d,message
# just wanted to make things more difficult for you
# who reads CSV files with computers, anyway?
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo
In [3]: result=pd.read_csv('D:\pydata\ch06\ex6.csv')
In [4]: result
Out[4]:
one two three four key
0 0.467976 -0.038649 -0.295344 -1.824726 L
1 -0.358893 1.404453 0.704965 -0.200638 B
2 -0.501840 0.659254 -0.421691 -0.057688 G
3 0.204886 1.074134 1.388361 -0.982404 R
4 0.354628 -0.133116 0.283763 -0.837063 Q
5 1.817480 0.742273 0.419395 -2.251035 Q
6 -0.776764 0.935518 -0.332872 -1.875641 U
7 -0.913135 1.530624 -0.572657 0.477252 K
8 0.358480 -0.497572 -0.367016 0.507702 S
9 -1.740877 -1.160417 -1.637830 2.172201 G
10 0.240564 -0.328249 1.252155 1.072796 8
11 0.764018 1.165476 -0.639544 1.495258 R
12 0.571035 -0.310537 0.582437 -0.298765 1
13 2.317658 0.430710 -1.334216 0.199679 P
14 1.547771 -1.119753 -2.277634 0.329586 J
15 -1.310608 0.401719 -1.000987 1.156708 E
16 -0.088496 0.634712 0.153324 0.415335 B
17 -0.018663 -0.247487 -1.446522 0.750938 A
18 -0.070127 -1.579097 0.120892 0.671432 F
19 -0.194678 -0.492039 2.359605 0.319810 H
20 -0.248618 0.868707 -0.492226 -0.717959 W
21 -1.091549 -0.867110 -0.647760 -0.832562 C
22 0.641404 -0.138822 -0.621963 -0.284839 C
23 1.216408 0.992687 0.165162 -0.069619 V
24 -0.564474 0.792832 0.747053 0.571675 I
25 1.759879 -0.515666 -0.230481 1.362317 S
26 0.126266 0.309281 0.382820 -0.239199 L
27 1.334360 -0.100152 -0.840731 -0.643967 6
28 -0.737620 0.278087 -0.053235 -0.950972 J
29 -1.148486 -0.986292 -0.144963 0.124362 Y
... ... ... ... ... ..
9970 0.633495 -0.186524 0.927627 0.143164 4
9971 0.308636 -0.112857 0.762842 -1.072977 1
9972 -1.627051 -0.978151 0.154745 -1.229037 Z
9973 0.314847 0.097989 0.199608 0.955193 P
9974 1.666907 0.992005 0.496128 -0.686391 S
9975 0.010603 0.708540 -1.258711 0.226541 K
9976 0.118693 -0.714455 -0.501342 -0.254764 K
9977 0.302616 -2.011527 -0.628085 0.768827 H
9978 -0.098572 1.769086 -0.215027 -0.053076 A
9979 -0.019058 1.964994 0.738538 -0.883776 F
9980 -0.595349 0.001781 -1.423355 -1.458477 M
9981 1.392170 -1.396560 -1.425306 -0.847535 H
9982 -0.896029 -0.152287 1.924483 0.365184 6
9983 -2.274642 -0.901874 1.500352 0.996541 N
9984 -0.301898 1.019906 1.102160 2.624526 I
9985 -2.548389 -0.585374 1.496201 -0.718815 D
9986 -0.064588 0.759292 -1.568415 -0.420933 E
9987 -0.143365 -1.111760 -1.815581 0.435274 2
9988 -0.070412 -1.055921 0.338017 -0.440763 X
9989 0.649148 0.994273 -1.384227 0.485120 Q
9990 -0.370769 0.404356 -1.051628 -1.050899 8
9991 -0.409980 0.155627 -0.818990 1.277350 W
9992 0.301214 -1.111203 0.668258 0.671922 A
9993 1.821117 0.416445 0.173874 0.505118 X
9994 0.068804 1.322759 0.802346 0.223618 H
9995 2.311896 -0.417070 -1.409599 -0.515821 L
9996 -0.479893 -0.650419 0.745152 -0.646038 E
9997 0.523331 0.787112 0.486066 1.093156 K
9998 -0.362559 0.598894 -1.843201 0.887292 G
9999 -0.096376 -1.012999 -0.657431 -0.573315 0
[10000 rows x 5 columns]
In [6]: pd.read_csv('D:\pydata\ch06\ex6.csv',nrows=5)    #只查看前5行
Out[6]:
one two three four key
0 0.467976 -0.038649 -0.295344 -1.824726 L
1 -0.358893 1.404453 0.704965 -0.200638 B
2 -0.501840 0.659254 -0.421691 -0.057688 G
3 0.204886 1.074134 1.388361 -0.982404 R
4 0.354628 -0.133116 0.283763 -0.837063 Q
In [7]: chunker=pd.read_csv('D:\pydata\ch06\ex6.csv',chunksize=1000)    #逐块显示
In [8]: chunker
Out[8]: <pandas.io.parsers.TextFileReader at 0xa9070f0>

In [10]: tot=pd.Series([])     #一个空序列 
In [11]: for piece in chunker:
    ...: tot=tot.add(piece['key'].value_counts(),fill_value=0)    # key计数
    ...: 
In [12]: tot=tot.order(ascending=False)    #降序排列
In [13]: tot[:10]
Out[13]:
E 368
X 364
L 346
O 343
Q 340
M 338
J 337
F 335
K 334
H 330
dtype: float64
In [14]: tot[:15]
Out[14]:
E 368
X 364
L 346
O 343
Q 340
M 338
J 337
F 335
K 334
H 330
V 328
I 327
U 326
P 324
D 320
dtype: float64

In [15]: data=pd.read_csv('D:\pydata\ch06\ex5.csv')
In [16]: data
Out[16]:
something a b c d message
0 one 1 2 3 4 NaN
1 two 5 6 NaN 8 world
2 three 9 10 11 12 foo
In [17]: data.to_csv('D:\pydata\ch06\out.csv')    #将数据输出到一个新的csv文件当中
In [18]: pycat D:\pydata\ch06\out.csv             #文本显示这个新的csv文件
,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo

</pre><pre name="code" class="python">In [19]: data.to_csv(sys.stdout,sep='|')
这样是错误的，会报错！
In [20]: import os
In [21]: data.to_csv(os.sys.stdout,sep='|')     #用“|”作为分隔符显示数据
|something|a|b|c|d|message
0|one|1|2|3.0|4|
1|two|5|6||8|world
2|three|9|10|11.0|12|foo
In [22]: data.to_csv(os.sys.stdout,na_rep='NULL')    #将空字符串用NULL作为标记值
,something,a,b,c,d,message
0,one,1,2,3.0,4,NULL
1,two,5,6,NULL,8,world
2,three,9,10,11.0,12,foo
In [23]: data.to_csv(os.sys.stdout,index=False,header=False)     #禁用标签
one,1,2,3.0,4,
two,5,6,,8,world
three,9,10,11.0,12,foo
In [24]: data.to_csv(os.sys.stdout,index=False,cols=['a','b','c'])     #只显示某些列
a,b,c
1,2,3.0
5,6,
9,10,11.0
C:\Anaconda\lib\site-packages\pandas\util\decorators.py:53: FutureWarning: cols is deprecated, use columns instead
warnings.warn(msg, FutureWarning)

In [25]: dates=pd.date_range('1/1/2000',periods=7)
In [28]: import numpy
In [29]: ts=pd.Series(numpy.arange(7),index=dates)
In [30]: ts
Out[30]:
2000-01-01 0
2000-01-02 1
2000-01-03 2
2000-01-04 3
2000-01-05 4
2000-01-06 5
2000-01-07 6
Freq: D, dtype: int32
In [32]: ts.to_csv('D:\pydata\ch06\tseries.csv')
---------------------------------------------------------------------------
In [35]: import os
In [36]: os.getcwd()
Out[36]: 'C:\\Users\\JackZhang\\Documents\\Python Scripts'    #获取当前工作路径   
In [37]: ts.to_csv('D:\\pydata\\ch06\\tseries.csv')           #存入csv文件当中
In [39]: pycat D:\pydata\ch06\tseries.csv
2000-01-01,0
2000-01-02,1
2000-01-03,2
2000-01-04,3
2000-01-05,4
2000-01-06,5
2000-01-07,6
In [41]: pd.Series.from_csv('D:\\pydata\\ch06\\tseries.csv',parse_dates=True)    #另外一种读入csv的方法
Out[41]:
2000-01-01 0
2000-01-02 1
2000-01-03 2
2000-01-04 3
2000-01-05 4
2000-01-06 5
2000-01-07 6
dtype: int64

import pandas as pd

import numpy

pycat D:\pydata\ch06\ex7.csv
"a","b","c"
"1","2","3"
"1","2","3","4"

import csv            #使用csv模块读取csv文件 与pd.read_csv('path',header=None)不太一样
f=open('D:\pydata\ch06\ex7.csv')
reader=csv.reader(f)
for line in reader:
    print line
    
['a', 'b', 'c']
['1', '2', '3']
['1', '2', '3', '4']

lines=list(csv.reader(open('D:\pydata\ch06\ex7.csv')))
lines
Out[9]: [['a', 'b', 'c'], ['1', '2', '3'], ['1', '2', '3', '4']]

header,values=lines[0],lines[1:]         #列表可以这样分别赋予变量

data_dict={h:v for h,v in zip(header,zip(*values))}     #转换数据为字典形式

data_dict
Out[12]: {'a': ('1', '1'), 'b': ('2', '2'), 'c': ('3', '3')}

#定义自己的reader读取规则
reader=csv.reader(f,dialect=my_dialect,quoting=csv.QUOTE_NONE)

reader=csv.reader(f,delimiter='|')

#用自己的规则写入csv文件
class my_dialect(csv.Dialect): 
	lineterminator='\n' 
	delimiter=';' 
	quoting=csv.QUOTE_NONE 
with open('mydata.csv','w') as f: 
	writer=csv.writer(f,dialect=my_dialect) 
	writer.writerow(('one','two','three')) 
	writer.writerow(('1','2','3')) 
	writer.writerow(('4','5','6')) 
	writer.writerow(('7','8','9'))

将csv文件保存为二进制数据格式存取

frame=pd.read_csv('D:\pydata\ch06\ex1.csv')
frame
Out[29]: 
   a   b   c   d message
0  1   2   3   4   hello
1  5   6   7   8   world
2  9  10  11  12     foo

frame.save('D:\pydata\ch06\pick')    #这样是错误的 必须使用 _pickle的名字

frame.save('D:\pydata\ch06\f_pickle')
frame.save('D:\\pydata\\ch06\\f_pickle')   #注意目录格式


pd.load('D:\\pydata\\ch06\\f_pickle')   #二进制数据格式恢复
Out[34]: 
   a   b   c   d message
0  1   2   3   4   hello
1  5   6   7   8   world
2  9  10  11  12     foo

csv.reader csv.writer的使用很重要！

with open('test.csv','wb') as myfile:       #使用'wb'代替'wt'可以有效解决csv隔行写入的问题
     mywriter=csv.writer(myfile)
     mywriter.writerow([3,'q'])
     mywriter.writerow([5,'j'])
     mylist=[[3,4,5],[6,7,8]]
     mywriter.writerows(mylist)

l=[7,8,9,0]

Out[64]: [7, 8, 9, 0]

a=[3,5,3,6]

b=['q','j',4,7]

with open('test.csv','wb') as myfile:
    i=0
    mywriter=csv.writer(myfile)    
    while i<4:                             #循环多次写入数据
        mywriter.writerow([a[i],b[i],l[i]])
        i=i+1

令狐公子

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Python点滴(五)-pandas进阶

pandas.read_csv('path',header=None) 这样就可以防止第一行数据被认为是标题索引，然后用0，1，2等来进行索引代替！In [1]: import pandas as pdIn [2]: pd.read_csv('D:\pydata\ch06\ex2.csv',header=None) #去掉默认第一行为标题索引行，从而可以从0,1,2来进行索引In [6]: !cat D:\pydata\ch06\csv_mindex.csv'cat' 不是内部或外部命令，也不是
复制链接

扫一扫