1、pandas 库之数据筛选及过滤:
主要介绍 pandas 库的用途,以及安装方法。并介绍它的用途,包括数据的选择和筛选。
<br>
2、pandas 库之字符串提取与操作:
本课时主要介绍使用 pandas 库进行字符串数据的提取与其他操作,使得 pandas 库可以操作字符串数据。
<br>
3、pandas 库之散点图:
本课时主要介绍使用 pandas 绘制散点图以及轨迹的方法,通过pandas库也可以绘制图形。
<br>
4、pandas 库之直方图:
本课时主要介绍使用 pandas 绘制直方图,柱形图和箱形图的方法。
In [278]:
<span class="sd">'''</span> <span class="sd">第一节:</span> <span class="sd">#pandas 库之数据筛选及过滤 ************************************</span> <span class="sd">'''</span>
1
2
3
4
5
6
|
<
span
class
=
"sd"
>
'''</span>
<span class="sd">第一节:</span>
<span class="sd">#pandas 库之数据筛选及过滤 ************************************</span>
<span class="sd">'''
<
/
span
>
|
Out[278]:
'\n\n\xe7\xac\xac\xe4\xb8\x80\xe8\x8a\x82\xef\xbc\x9a\n#pandas \xe5\xba\x93\xe4\xb9\x8b\xe6\x95\xb0\xe6\x8d\xae\xe7\xad\x9b\xe9\x80\x89\xe5\x8f\x8a\xe8\xbf\x87\xe6\xbb\xa4 ************************************\n\n'
1
|
'\n\n\xe7\xac\xac\xe4\xb8\x80\xe8\x8a\x82\xef\xbc\x9a\n#pandas \xe5\xba\x93\xe4\xb9\x8b\xe6\x95\xb0\xe6\x8d\xae\xe7\xad\x9b\xe9\x80\x89\xe5\x8f\x8a\xe8\xbf\x87\xe6\xbb\xa4 ************************************\n\n'
|
In [3]:
<span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span> <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">as</span> <span class="nn">plt</span> <span class="n">dates</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">date_range</span><span class="p">(</span><span class="s">'20160715'</span><span class="p">,</span><span class="n">periods</span><span class="o">=</span><span class="mi">6</span><span class="p">)</span> <span class="n">dates</span>
1
2
3
4
5
|
<
span
class
=
"kn"
>
import
<
/
span
>
<
span
class
=
"nn"
>
pandas
<
/
span
>
<
span
class
=
"kn"
>
as
<
/
span
>
<
span
class
=
"nn"
>
pd
<
/
span
>
<
span
class
=
"kn"
>
import
<
/
span
>
<
span
class
=
"nn"
>
numpy
<
/
span
>
<
span
class
=
"kn"
>
as
<
/
span
>
<
span
class
=
"nn"
>
np
<
/
span
>
<
span
class
=
"kn"
>
import
<
/
span
>
<
span
class
=
"nn"
>
matplotlib
.
pyplot
<
/
span
>
<
span
class
=
"kn"
>
as
<
/
span
>
<
span
class
=
"nn"
>
plt
<
/
span
>
<
span
class
=
"n"
>
dates
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
date_range
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'20160715'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"n"
>
periods
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"mi"
>
6
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"n"
>
dates
<
/
span
>
|
Out[3]:
DatetimeIndex(['2016-07-15', '2016-07-16', '2016-07-17', '2016-07-18', '2016-07-19', '2016-07-20'], dtype='datetime64[ns]', freq='D', tz=None)
1
2
3
|
DatetimeIndex
(
[
'2016-07-15'
,
'2016-07-16'
,
'2016-07-17'
,
'2016-07-18'
,
'2016-07-19'
,
'2016-07-20'
]
,
dtype
=
'datetime64[ns]'
,
freq
=
'D'
,
tz
=
None
)
|
In [115]:
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span><span class="n">index</span><span class="o">=</span><span class="n">dates</span><span class="p">,</span><span class="n">columns</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="s">'ABCD'</span><span class="p">))</span> <span class="n">df</span>
1
2
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
DataFrame
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"n"
>
np
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
random
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
rand
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"mi"
>
6
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"mi"
>
4
<
/
span
>
<
span
class
=
"p"
>
)
,
<
/
span
>
<
span
class
=
"n"
>
index
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
dates
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"n"
>
columns
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"nb"
>
list
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'ABCD'
<
/
span
>
<
span
class
=
"p"
>
)
)
<
/
span
>
<
span
class
=
"n"
>
df
<
/
span
>
|
Out[115]:
A | B | C | D | |
---|---|---|---|---|
2016-07-15 | 0.257509 | 0.264885 | 0.540292 | 0.485975 |
2016-07-16 | 0.629827 | 0.079777 | 0.338386 | 0.187553 |
2016-07-17 | 0.375727 | 0.700579 | 0.384695 | 0.909140 |
2016-07-18 | 0.418069 | 0.308024 | 0.451242 | 0.758287 |
2016-07-19 | 0.114625 | 0.397367 | 0.888026 | 0.358038 |
2016-07-20 | 0.454836 | 0.182236 | 0.158715 | 0.002074 |
In [12]:
<span class="n">df2</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">'A'</span><span class="p">:</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">6</span><span class="p">)})</span><span class="c">#定义一个随机数字的DF数据表</span> <span class="n">df2</span>
1
2
|
<
span
class
=
"n"
>
df2
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
DataFrame
<
/
span
>
<
span
class
=
"p"
>
(
{
<
/
span
>
<
span
class
=
"s"
>
'A'
<
/
span
>
<
span
class
=
"p"
>
:
<
/
span
>
<
span
class
=
"n"
>
np
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
random
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
randn
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"mi"
>
6
<
/
span
>
<
span
class
=
"p"
>
)
}
)
<
/
span
>
<
span
class
=
"c"
>
#定义一个随机数字的DF数据表</span>
<
span
class
=
"n"
>
df2
<
/
span
>
|
Out[12]:
A | |
---|---|
0 | -2.058364 |
1 | 0.817129 |
2 | 1.630002 |
3 | 0.673549 |
4 | 0.416836 |
5 | 0.033933 |
In [22]:
<span class="n">df</span><span class="p">[</span><span class="s">'A'</span><span class="p">]</span><span class="c">#调用A列</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"s"
>
'A'
<
/
span
>
<
span
class
=
"p"
>
]
<
/
span
>
<
span
class
=
"c"
>
#调用A列</span>
|
Out[22]:
2016-07-15 0.736079 2016-07-16 0.183782 2016-07-17 0.198436 2016-07-18 0.865754 2016-07-19 0.095199 2016-07-20 0.731607 Freq: D, Name: A, dtype: float64
1
2
3
4
5
6
7
|
2016
-
07
-
15
0.736079
2016
-
07
-
16
0.183782
2016
-
07
-
17
0.198436
2016
-
07
-
18
0.865754
2016
-
07
-
19
0.095199
2016
-
07
-
20
0.731607
Freq
:
D
,
Name
:
A
,
dtype
:
float64
|
In [25]:
<span class="n">df</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">3</span><span class="p">]</span><span class="c">#调用从1~2的行数</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"mi"
>
1
<
/
span
>
<
span
class
=
"p"
>
:
<
/
span
>
<
span
class
=
"mi"
>
3
<
/
span
>
<
span
class
=
"p"
>
]
<
/
span
>
<
span
class
=
"c"
>
#调用从1~2的行数</span>
|
Out[25]:
A | B | C | D | |
---|---|---|---|---|
2016-07-16 | 0.183782 | 0.740787 | 0.589655 | 0.167018 |
2016-07-17 | 0.198436 | 0.452805 | 0.275851 | 0.119994 |
In [27]:
<span class="n">df</span><span class="p">[</span><span class="s">'20160715'</span><span class="p">:</span><span class="s">'20160718'</span><span class="p">]</span><span class="c">#调用时间序列XX~XX时间的数据</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"s"
>
'20160715'
<
/
span
>
<
span
class
=
"p"
>
:
<
/
span
>
<
span
class
=
"s"
>
'20160718'
<
/
span
>
<
span
class
=
"p"
>
]
<
/
span
>
<
span
class
=
"c"
>
#调用时间序列XX~XX时间的数据</span>
|
Out[27]:
A | B | C | D | |
---|---|---|---|---|
2016-07-15 | 0.736079 | 0.959340 | 0.830599 | 0.627481 |
2016-07-16 | 0.183782 | 0.740787 | 0.589655 | 0.167018 |
2016-07-17 | 0.198436 | 0.452805 | 0.275851 | 0.119994 |
2016-07-18 | 0.865754 | 0.584943 | 0.381434 | 0.966995 |
In [30]:
<span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'20160715'</span><span class="p">:</span><span class="s">'20160718'</span><span class="p">,[</span><span class="s">'A'</span><span class="p">,</span><span class="s">'B'</span><span class="p">]]</span><span class="c">#.loc是通过标签来切片</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
loc
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"s"
>
'20160715'
<
/
span
>
<
span
class
=
"p"
>
:
<
/
span
>
<
span
class
=
"s"
>
'20160718'
<
/
span
>
<
span
class
=
"p"
>
,
[
<
/
span
>
<
span
class
=
"s"
>
'A'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'B'
<
/
span
>
<
span
class
=
"p"
>
]
]
<
/
span
>
<
span
class
=
"c"
>
#.loc是通过标签来切片</span>
|
Out[30]:
A | B | |
---|---|---|
2016-07-15 | 0.736079 | 0.959340 |
2016-07-16 | 0.183782 | 0.740787 |
2016-07-17 | 0.198436 | 0.452805 |
2016-07-18 | 0.865754 | 0.584943 |
In [31]:
<span class="n">df</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="s">'20160715'</span><span class="p">,[</span><span class="s">'A'</span><span class="p">]]</span><span class="c">#.loc是通过标签来切片</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
loc
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"s"
>
'20160715'
<
/
span
>
<
span
class
=
"p"
>
,
[
<
/
span
>
<
span
class
=
"s"
>
'A'
<
/
span
>
<
span
class
=
"p"
>
]
]
<
/
span
>
<
span
class
=
"c"
>
#.loc是通过标签来切片</span>
|
Out[31]:
A 0.736079 Name: 2016-07-15 00:00:00, dtype: float64
1
2
|
A
0.736079
Name
:
2016
-
07
-
15
00
:
00
:
00
,
dtype
:
float64
|
In [38]:
<span class="n">df</span><span class="o">.</span><span class="n">at</span><span class="p">[</span><span class="n">dates</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="s">'B'</span><span class="p">]</span><span class="c">#.at是通过数据位置来取数</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
at
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"n"
>
dates
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"mi"
>
0
<
/
span
>
<
span
class
=
"p"
>
]
,
<
/
span
>
<
span
class
=
"s"
>
'B'
<
/
span
>
<
span
class
=
"p"
>
]
<
/
span
>
<
span
class
=
"c"
>
#.at是通过数据位置来取数</span>
|
Out[38]:
0.95934014640962595
1
|
0.95934014640962595
|
In [40]:
<span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span><span class="c">#查看前3行</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
head
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"mi"
>
3
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#查看前3行</span>
|
Out[40]:
A | B | C | D | |
---|---|---|---|---|
2016-07-15 | 0.736079 | 0.959340 | 0.830599 | 0.627481 |
2016-07-16 | 0.183782 | 0.740787 | 0.589655 | 0.167018 |
2016-07-17 | 0.198436 | 0.452805 | 0.275851 | 0.119994 |
In [41]:
<span class="n">df</span><span class="o">.</span><span class="n">tail</span><span class="p">()</span><span class="c">#默认查看后5行</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
tail
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"c"
>
#默认查看后5行</span>
|
Out[41]:
A | B | C | D | |
---|---|---|---|---|
2016-07-16 | 0.183782 | 0.740787 | 0.589655 | 0.167018 |
2016-07-17 | 0.198436 | 0.452805 | 0.275851 | 0.119994 |
2016-07-18 | 0.865754 | 0.584943 | 0.381434 | 0.966995 |
2016-07-19 | 0.095199 | 0.431479 | 0.394274 | 0.041155 |
2016-07-20 | 0.731607 | 0.018931 | 0.694380 | 0.189079 |
In [42]:
<span class="n">df</span><span class="o">.</span><span class="n">index</span><span class="c">#数据表的索引</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
index
<
/
span
>
<
span
class
=
"c"
>
#数据表的索引</span>
|
Out[42]:
DatetimeIndex(['2016-07-15', '2016-07-16', '2016-07-17', '2016-07-18', '2016-07-19', '2016-07-20'], dtype='datetime64[ns]', freq='D', tz=None)
1
2
3
|
DatetimeIndex
(
[
'2016-07-15'
,
'2016-07-16'
,
'2016-07-17'
,
'2016-07-18'
,
'2016-07-19'
,
'2016-07-20'
]
,
dtype
=
'datetime64[ns]'
,
freq
=
'D'
,
tz
=
None
)
|
In [43]:
<span class="n">df</span><span class="o">.</span><span class="n">columns</span><span class="c">#查看字段表名</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
columns
<
/
span
>
<
span
class
=
"c"
>
#查看字段表名</span>
|
Out[43]:
Index([u'A', u'B', u'C', u'D'], dtype='object')
1
|
Index
(
[
u
'A'
,
u
'B'
,
u
'C'
,
u
'D'
]
,
dtype
=
'object'
)
|
In [45]:
<span class="n">df</span><span class="o">.</span><span class="n">values</span><span class="c">#查看数据</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
values
<
/
span
>
<
span
class
=
"c"
>
#查看数据</span>
|
Out[45]:
array([[ 0.73607859, 0.95934015, 0.83059865, 0.62748078], [ 0.18378218, 0.74078748, 0.58965485, 0.16701782], [ 0.19843576, 0.4528047 , 0.27585071, 0.1199943 ], [ 0.86575368, 0.58494297, 0.38143413, 0.96699532], [ 0.09519916, 0.43147948, 0.39427373, 0.0411551 ], [ 0.73160684, 0.01893109, 0.69437963, 0.18907934]])
1
2
3
4
5
6
|
array
(
[
[
0.73607859
,
0.95934015
,
0.83059865
,
0.62748078
]
,
[
0.18378218
,
0.74078748
,
0.58965485
,
0.16701782
]
,
[
0.19843576
,
0.4528047
,
0.27585071
,
0.1199943
]
,
[
0.86575368
,
0.58494297
,
0.38143413
,
0.96699532
]
,
[
0.09519916
,
0.43147948
,
0.39427373
,
0.0411551
]
,
[
0.73160684
,
0.01893109
,
0.69437963
,
0.18907934
]
]
)
|
In [46]:
<span class="n">df</span><span class="o">.</span><span class="n">describe</span><span class="p">()</span><span class="c">#查询表中的描述统计</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
describe
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"c"
>
#查询表中的描述统计</span>
|
Out[46]:
A | B | C | D | |
---|---|---|---|---|
count | 6.000000 | 6.000000 | 6.000000 | 6.000000 |
mean | 0.468476 | 0.531381 | 0.527699 | 0.351954 |
std | 0.344089 | 0.318945 | 0.212599 | 0.364780 |
min | 0.095199 | 0.018931 | 0.275851 | 0.041155 |
25% | 0.187446 | 0.436811 | 0.384644 | 0.131750 |
50% | 0.465021 | 0.518874 | 0.491964 | 0.178049 |
75% | 0.734961 | 0.701826 | 0.668198 | 0.517880 |
max | 0.865754 | 0.959340 | 0.830599 | 0.966995 |
In [63]:
<span class="n">df</span><span class="o">.</span><span class="n">T</span><span class="c">#反转</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
T
<
/
span
>
<
span
class
=
"c"
>
#反转</span>
|
Out[63]:
2016-07-15 00:00:00 | 2016-07-16 00:00:00 | 2016-07-17 00:00:00 | 2016-07-18 00:00:00 | 2016-07-19 00:00:00 | 2016-07-20 00:00:00 | |
---|---|---|---|---|---|---|
A | 0.736079 | 0.183782 | 0.198436 | 0.865754 | 0.095199 | 0.731607 |
B | 0.959340 | 0.740787 | 0.452805 | 0.584943 | 0.431479 | 0.018931 |
C | 0.830599 | 0.589655 | 0.275851 | 0.381434 | 0.394274 | 0.694380 |
D | 0.627481 | 0.167018 | 0.119994 | 0.966995 | 0.041155 | 0.189079 |
In [ ]:
<span class="c">#df2 = pd.DataFrame([1,2,3],index=['a','b','c'])</span> <span class="c">#df2</span>
1
2
|
<
span
class
=
"c"
>
#df2 = pd.DataFrame([1,2,3],index=['a','b','c'])</span>
<
span
class
=
"c"
>
#df2</span>
|
In [60]:
<span class="n">df</span><span class="o">.</span><span class="n">sort</span><span class="p">()</span><span class="c">#排序 从小到大排序</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
sort
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"c"
>
#排序 从小到大排序</span>
|
Out[60]:
A | B | C | D | |
---|---|---|---|---|
2016-07-15 | 0.736079 | 0.959340 | 0.830599 | 0.627481 |
2016-07-16 | 0.183782 | 0.740787 | 0.589655 | 0.167018 |
2016-07-17 | 0.198436 | 0.452805 | 0.275851 | 0.119994 |
2016-07-18 | 0.865754 | 0.584943 | 0.381434 | 0.966995 |
2016-07-19 | 0.095199 | 0.431479 | 0.394274 | 0.041155 |
2016-07-20 | 0.731607 | 0.018931 | 0.694380 | 0.189079 |
In [64]:
<span class="n">df</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="s">'C'</span><span class="p">)</span><span class="c">#根据C这个列表进行从小到大排序</span>
1
|
<
span
class
=
"n"
>
df
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
sort
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'C'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#根据C这个列表进行从小到大排序</span>
|
Out[64]:
A | B | C | D | |
---|---|---|---|---|
2016-07-17 | 0.198436 | 0.452805 | 0.275851 | 0.119994 |
2016-07-18 | 0.865754 | 0.584943 | 0.381434 | 0.966995 |
2016-07-19 | 0.095199 | 0.431479 | 0.394274 | 0.041155 |
2016-07-16 | 0.183782 | 0.740787 | 0.589655 | 0.167018 |
2016-07-20 | 0.731607 | 0.018931 | 0.694380 | 0.189079 |
2016-07-15 | 0.736079 | 0.959340 | 0.830599 | 0.627481 |
In [77]:
<span class="n">df1</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span><span class="n">columns</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="s">'abcd'</span><span class="p">))</span> <span class="n">df1</span>
1
2
|
<
span
class
=
"n"
>
df1
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
DataFrame
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"n"
>
np
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
random
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
randn
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"mi"
>
6
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"mi"
>
4
<
/
span
>
<
span
class
=
"p"
>
)
,
<
/
span
>
<
span
class
=
"n"
>
columns
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"nb"
>
list
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'abcd'
<
/
span
>
<
span
class
=
"p"
>
)
)
<
/
span
>
<
span
class
=
"n"
>
df1
<
/
span
>
|
Out[77]:
a | b | c | d | |
---|---|---|---|---|
0 | 1.225333 | -0.694005 | -0.868498 | 2.540235 |
1 | -0.089846 | 0.075165 | 0.722056 | 0.261062 |
2 | 1.087733 | -0.590180 | 0.139107 | -0.768135 |
3 | 1.587061 | -0.176495 | -0.784779 | 0.186754 |
4 | 0.978927 | 0.244961 | -0.576200 | 3.148124 |
5 | -0.057562 | 0.575155 | 0.550383 | -0.859195 |
In [78]:
<span class="n">df1</span><span class="p">[</span><span class="n">df1</span><span class="o">.</span><span class="n">d</span><span class="o">></span><span class="mi">0</span><span class="p">]</span>
1
|
<
span
class
=
"n"
>
df1
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"n"
>
df1
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
d
<
/
span
>
<
span
class
=
"o"
>>
<
/
span
>
<
span
class
=
"mi"
>
0
<
/
span
>
<
span
class
=
"p"
>
]
<
/
span
>
|
Out[78]:
a | b | c | d | |
---|---|---|---|---|
0 | 1.225333 | -0.694005 | -0.868498 | 2.540235 |
1 | -0.089846 | 0.075165 | 0.722056 | 0.261062 |
3 | 1.587061 | -0.176495 | -0.784779 | 0.186754 |
4 | 0.978927 | 0.244961 | -0.576200 | 3.148124 |
In [97]:
<span class="n">df1</span><span class="p">[</span><span class="n">df1</span><span class="o">.</span><span class="n">c</span><span class="o"><</span><span class="mi">0</span><span class="p">][[</span><span class="s">'a'</span><span class="p">,</span><span class="s">'b'</span><span class="p">]]</span>
1
|
<
span
class
=
"n"
>
df1
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"n"
>
df1
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
c
<
/
span
>
<
span
class
=
"o"
>
<<
/
span
>
<
span
class
=
"mi"
>
0
<
/
span
>
<
span
class
=
"p"
>
]
[
[
<
/
span
>
<
span
class
=
"s"
>
'a'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'b'
<
/
span
>
<
span
class
=
"p"
>
]
]
<
/
span
>
|
Out[97]:
a | b | |
---|---|---|
0 | 1.225333 | -0.694005 |
3 | 1.587061 | -0.176495 |
4 | 0.978927 | 0.244961 |
In [280]:
<span class="sd">'''</span> <span class="sd">第二节:</span> <span class="sd">#pandas 库之字符串提取与操作 ************************************</span> <span class="sd">'''</span>
1
2
3
4
5
6
|
<
span
class
=
"sd"
>
'''</span>
<span class="sd">第二节:</span>
<span class="sd">#pandas 库之字符串提取与操作 ************************************</span>
<span class="sd">'''
<
/
span
>
|
Out[280]:
'\n\n\xe7\xac\xac\xe4\xba\x8c\xe8\x8a\x82\xef\xbc\x9a\n#pandas \xe5\xba\x93\xe4\xb9\x8b\xe5\xad\x97\xe7\xac\xa6\xe4\xb8\xb2\xe6\x8f\x90\xe5\x8f\x96\xe4\xb8\x8e\xe6\x93\x8d\xe4\xbd\x9c ************************************\n\n'
1
|
'\n\n\xe7\xac\xac\xe4\xba\x8c\xe8\x8a\x82\xef\xbc\x9a\n#pandas \xe5\xba\x93\xe4\xb9\x8b\xe5\xad\x97\xe7\xac\xa6\xe4\xb8\xb2\xe6\x8f\x90\xe5\x8f\x96\xe4\xb8\x8e\xe6\x93\x8d\xe4\xbd\x9c ************************************\n\n'
|
In [227]:
<span class="n">s</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="s">'ABCDEF'</span><span class="p">))</span> <span class="n">s</span>
1
2
|
<
span
class
=
"n"
>
s
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
Series
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"nb"
>
list
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'ABCDEF'
<
/
span
>
<
span
class
=
"p"
>
)
)
<
/
span
>
<
span
class
=
"n"
>
s
<
/
span
>
|
Out[227]:
0 A 1 B 2 C 3 D 4 E 5 F dtype: object
1
2
3
4
5
6
7
|
0
A
1
B
2
C
3
D
4
E
5
F
dtype
:
object
|
In [228]:
<span class="n">s</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span><span class="c">#.lower()将字符串转化为小写</span>
1
|
<
span
class
=
"n"
>
s
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
lower
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"c"
>
#.lower()将字符串转化为小写</span>
|
Out[228]:
0 a 1 b 2 c 3 d 4 e 5 f dtype: object
1
2
3
4
5
6
7
|
0
a
1
b
2
c
3
d
4
e
5
f
dtype
:
object
|
In [229]:
<span class="n">s</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">upper</span><span class="p">()</span><span class="c">#.upper()将字符串转化为大写</span>
1
|
<
span
class
=
"n"
>
s
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
upper
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"c"
>
#.upper()将字符串转化为大写</span>
|
Out[229]:
0 A 1 B 2 C 3 D 4 E 5 F dtype: object
1
2
3
4
5
6
7
|
0
A
1
B
2
C
3
D
4
E
5
F
dtype
:
object
|
In [230]:
<span class="n">s</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">len</span><span class="p">()</span><span class="c">#.str.len()求字符串长度</span>
1
|
<
span
class
=
"n"
>
s
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
len
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"c"
>
#.str.len()求字符串长度</span>
|
Out[230]:
0 1 1 1 2 1 3 1 4 1 5 1 dtype: int64
1
2
3
4
5
6
7
|
0
1
1
1
2
1
3
1
4
1
5
1
dtype
:
int64
|
In [231]:
<span class="n">s</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">''</span><span class="p">)</span><span class="c">#.split('')切割字符串('这里写分隔符')</span>
1
|
<
span
class
=
"n"
>
s
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
split
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
''
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#.split('')切割字符串('这里写分隔符')</span>
|
Out[231]:
0 [A] 1 [B] 2 [C] 3 [D] 4 [E] 5 [F] dtype: object
1
2
3
4
5
6
7
|
0
[
A
]
1
[
B
]
2
[
C
]
3
[
D
]
4
[
E
]
5
[
F
]
dtype
:
object
|
In [232]:
<span class="n">s</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">'A'</span><span class="p">,</span><span class="s">'Z'</span><span class="p">)</span><span class="c">#.replace()将字符串替换成另外的</span>
1
|
<
span
class
=
"n"
>
s
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
replace
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'A'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'Z'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#.replace()将字符串替换成另外的</span>
|
Out[232]:
0 Z 1 B 2 C 3 D 4 E 5 F dtype: object
1
2
3
4
5
6
7
|
0
Z
1
B
2
C
3
D
4
E
5
F
dtype
:
object
|
In [233]:
<span class="n">s1</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s">'a1'</span><span class="p">,</span><span class="s">'a2'</span><span class="p">,</span><span class="s">'a3'</span><span class="p">,</span><span class="s">'a4'</span><span class="p">])</span><span class="c">#创建一个新的序列</span> <span class="n">s1</span>
1
2
|
<
span
class
=
"n"
>
s1
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
Series
<
/
span
>
<
span
class
=
"p"
>
(
[
<
/
span
>
<
span
class
=
"s"
>
'a1'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'a2'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'a3'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'a4'
<
/
span
>
<
span
class
=
"p"
>
]
)
<
/
span
>
<
span
class
=
"c"
>
#创建一个新的序列</span>
<
span
class
=
"n"
>
s1
<
/
span
>
|
Out[233]:
0 a1 1 a2 2 a3 3 a4 dtype: object
1
2
3
4
5
|
0
a1
1
a2
2
a3
3
a4
dtype
:
object
|
In [238]:
<span class="n">s1</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">extract</span><span class="p">(</span><span class="s">'[ab](\d)'</span><span class="p">)</span><span class="c">#提取</span>
1
|
<
span
class
=
"n"
>
s1
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
extract
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'[ab](\d)'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#提取</span>
|
Out[238]:
0 1 1 2 2 3 3 4 dtype: object
1
2
3
4
5
|
0
1
1
2
2
3
3
4
dtype
:
object
|
In [239]:
<span class="n">s1</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">extract</span><span class="p">(</span><span class="s">'([abc])(\d)'</span><span class="p">)</span><span class="c">#提取</span>
1
|
<
span
class
=
"n"
>
s1
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
extract
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'([abc])(\d)'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#提取</span>
|
Out[239]:
0 | 1 | |
---|---|---|
0 | a | 1 |
1 | a | 2 |
2 | a | 3 |
3 | a | 4 |
In [240]:
<span class="n">s1</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">extract</span><span class="p">(</span><span class="s">'(?P<str>[abc])(?P<digit>\d)'</span><span class="p">)</span><span class="c">#提取</span>
1
|
<
span
class
=
"n"
>
s1
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
extract
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'(?P<str>[abc])(?P<digit>\d)'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#提取</span>
|
Out[240]:
str | digit | |
---|---|---|
0 | a | 1 |
1 | a | 2 |
2 | a | 3 |
3 | a | 4 |
In [253]:
<span class="n">s2</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s">'a'</span><span class="p">,</span><span class="s">'B'</span><span class="p">,</span><span class="s">'c'</span><span class="p">,</span><span class="s">'d'</span><span class="p">])</span> <span class="n">a_z</span> <span class="o">=</span> <span class="s">r'[a-z]'</span><span class="c">#正则表达式</span> <span class="n">s2</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">contains</span><span class="p">(</span><span class="n">a_z</span><span class="p">)</span><span class="c">#.contains()检测列表里面有没有包含指定的内容</span>
1
2
3
|
<
span
class
=
"n"
>
s2
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
Series
<
/
span
>
<
span
class
=
"p"
>
(
[
<
/
span
>
<
span
class
=
"s"
>
'a'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'B'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'c'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'd'
<
/
span
>
<
span
class
=
"p"
>
]
)
<
/
span
>
<
span
class
=
"n"
>
a_z
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"s"
>
r
'[a-z]'
<
/
span
>
<
span
class
=
"c"
>
#正则表达式</span>
<
span
class
=
"n"
>
s2
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
contains
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"n"
>
a_z
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#.contains()检测列表里面有没有包含指定的内容</span>
|
Out[253]:
0 True 1 False 2 True 3 True dtype: bool
1
2
3
4
5
|
0
True
1
False
2
True
3
True
dtype
:
bool
|
In [267]:
<span class="n">s3</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">Series</span><span class="p">([</span><span class="s">'ab'</span><span class="p">,</span><span class="s">'Ba'</span><span class="p">,</span><span class="s">'ac'</span><span class="p">,</span><span class="s">'d'</span><span class="p">])</span> <span class="n">s3</span>
1
2
|
<
span
class
=
"n"
>
s3
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
Series
<
/
span
>
<
span
class
=
"p"
>
(
[
<
/
span
>
<
span
class
=
"s"
>
'ab'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'Ba'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'ac'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'd'
<
/
span
>
<
span
class
=
"p"
>
]
)
<
/
span
>
<
span
class
=
"n"
>
s3
<
/
span
>
|
Out[267]:
0 ab 1 Ba 2 ac 3 d dtype: object
1
2
3
4
5
|
0
ab
1
Ba
2
ac
3
d
dtype
:
object
|
In [268]:
<span class="n">s3</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">contains</span><span class="p">(</span><span class="s">'^a'</span><span class="p">)</span><span class="c">#.contains('^a')检测列表里面有没有包含指定的a开头的内容并返回bool值</span>
1
|
<
span
class
=
"n"
>
s3
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
contains
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'^a'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#.contains('^a')检测列表里面有没有包含指定的a开头的内容并返回bool值</span>
|
Out[268]:
0 True 1 False 2 True 3 False dtype: bool
1
2
3
4
5
|
0
True
1
False
2
True
3
False
dtype
:
bool
|
In [271]:
<span class="n">s3</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span><span class="c">#.startswith('')检测列表里面有没有包含指定的a开头的内容并返回bool值,同上</span>
1
|
<
span
class
=
"n"
>
s3
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
startswith
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'a'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#.startswith('')检测列表里面有没有包含指定的a开头的内容并返回bool值,同上</span>
|
Out[271]:
0 True 1 False 2 True 3 False dtype: bool
1
2
3
4
5
|
0
True
1
False
2
True
3
False
dtype
:
bool
|
In [272]:
<span class="n">s3</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">endswith</span><span class="p">(</span><span class="s">'a'</span><span class="p">)</span><span class="c">#.endswith('')检测列表里面有没有包含指定的a结尾的内容并返回bool值</span>
1
|
<
span
class
=
"n"
>
s3
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
endswith
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'a'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#.endswith('')检测列表里面有没有包含指定的a结尾的内容并返回bool值</span>
|
Out[272]:
0 False 1 True 2 False 3 False dtype: bool
1
2
3
4
5
|
0
False
1
True
2
False
3
False
dtype
:
bool
|
In [275]:
<span class="n">s3</span><span class="o">.</span><span class="n">str</span><span class="o">.</span><span class="n">contains</span><span class="p">(</span><span class="s">'a$'</span><span class="p">)</span><span class="c">#.contains('a$')检测列表里面有没有包含指定的a结尾的内容并返回bool值,同上</span>
1
|
<
span
class
=
"n"
>
s3
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
str
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
contains
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'a$'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#.contains('a$')检测列表里面有没有包含指定的a结尾的内容并返回bool值,同上</span>
|
Out[275]:
0 False 1 True 2 False 3 False dtype: bool
1
2
3
4
5
|
0
False
1
True
2
False
3
False
dtype
:
bool
|
In [281]:
<span class="sd">'''</span> <span class="sd">第三节:</span> <span class="sd">#pandas 库之散点图 ************************************</span> <span class="sd">'''</span>
1
2
3
4
5
6
|
<
span
class
=
"sd"
>
'''</span>
<span class="sd">第三节:</span>
<span class="sd">#pandas 库之散点图 ************************************</span>
<span class="sd">'''
<
/
span
>
|
Out[281]:
'\n\n\xe7\xac\xac\xe4\xb8\x89\xe8\x8a\x82\xef\xbc\x9a\n#pandas \xe5\xba\x93\xe4\xb9\x8b\xe6\x95\xa3\xe7\x82\xb9\xe5\x9b\xbe ************************************\n\n'
1
|
'\n\n\xe7\xac\xac\xe4\xb8\x89\xe8\x8a\x82\xef\xbc\x9a\n#pandas \xe5\xba\x93\xe4\xb9\x8b\xe6\x95\xa3\xe7\x82\xb9\xe5\x9b\xbe ************************************\n\n'
|
In [14]:
<span class="n">duqu</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">'603318.XSHG_1m.csv'</span><span class="p">)</span><span class="c">#读取.csv文件:pd.read_csv('路径')</span> <span class="n">duqu</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
1
2
|
<
span
class
=
"n"
>
duqu
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
read_csv
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'603318.XSHG_1m.csv'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#读取.csv文件:pd.read_csv('路径')</span>
<
span
class
=
"n"
>
duqu
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
head
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
|
Out[14]:
Unnamed: 0 | open | close | high | low | volume | money | |
---|---|---|---|---|---|---|---|
0 | 2015-04-24 09:30:00 | 9.39 | 9.39 | 9.39 | 9.39 | 100 | 939 |
1 | 2015-04-24 09:31:00 | 9.39 | 9.39 | 9.39 | 9.39 | 2000 | 18780 |
2 | 2015-04-24 09:32:00 | 9.39 | 9.39 | 9.39 | 9.39 | 1000 | 9390 |
3 | 2015-04-24 09:33:00 | 9.39 | 9.39 | 9.39 | 9.39 | 1000 | 9390 |
4 | 2015-04-24 09:34:00 | 9.39 | 9.39 | 9.39 | 9.39 | 0 | 0 |
In [306]:
<span class="n">pl</span> <span class="o">=</span> <span class="n">duqu</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">'scatter'</span><span class="p">,</span><span class="n">x</span><span class="o">=</span><span class="s">'volume'</span><span class="p">,</span><span class="n">y</span><span class="o">=</span><span class="s">'money'</span><span class="p">)</span><span class="o">.</span><span class="n">get_figure</span><span class="p">()</span> <span class="c">#.plot(kind='种类函数',x轴,y轴)绘制散点图</span> <span class="n">pl</span><span class="o">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'1.jpg'</span><span class="p">)</span><span class="c">#保存图片</span>
1
2
3
|
<
span
class
=
"n"
>
pl
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
duqu
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
plot
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"n"
>
kind
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"s"
>
'scatter'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"n"
>
x
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"s"
>
'volume'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"n"
>
y
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"s"
>
'money'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
get_figure
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"c"
>
#.plot(kind='种类函数',x轴,y轴)绘制散点图</span>
<
span
class
=
"n"
>
pl
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
savefig
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'1.jpg'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#保存图片</span>
|
In [20]:
<span class="sd">'''</span> <span class="sd">第四节:</span> <span class="sd">#pandas 库之柱形图\直方图\箱型图 ************************************</span> <span class="sd">'''</span>
1
2
3
4
5
6
|
<
span
class
=
"sd"
>
'''</span>
<span class="sd">第四节:</span>
<span class="sd">#pandas 库之柱形图\直方图\箱型图 ************************************</span>
<span class="sd">'''
<
/
span
>
|
Out[20]:
'\n\n\xe7\xac\xac\xe5\x9b\x9b\xe8\x8a\x82\xef\xbc\x9a\n#pandas \xe5\xba\x93\xe4\xb9\x8b\xe6\x9f\xb1\xe5\xbd\xa2\xe5\x9b\xbe\\\xe7\x9b\xb4\xe6\x96\xb9\xe5\x9b\xbe\\\xe7\xae\xb1\xe5\x9e\x8b\xe5\x9b\xbe ************************************\n\n'
1
|
'\n\n\xe7\xac\xac\xe5\x9b\x9b\xe8\x8a\x82\xef\xbc\x9a\n#pandas \xe5\xba\x93\xe4\xb9\x8b\xe6\x9f\xb1\xe5\xbd\xa2\xe5\x9b\xbe\\\xe7\x9b\xb4\xe6\x96\xb9\xe5\x9b\xbe\\\xe7\xae\xb1\xe5\x9e\x8b\xe5\x9b\xbe ************************************\n\n'
|
In [9]:
<span class="n">df6</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span><span class="n">columns</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="s">'ABCD'</span><span class="p">))</span> <span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s">'mpl_style'</span><span class="p">,</span><span class="s">'default'</span><span class="p">)</span><span class="c">#设置风格kind='bar' 为柱形图</span> <span class="n">plt6</span> <span class="o">=</span> <span class="n">df6</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">kind</span><span class="o">=</span><span class="s">'bar'</span><span class="p">,</span><span class="n">stacked</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span><span class="o">.</span><span class="n">get_figure</span><span class="p">()</span><span class="c">#.plot(kind='风格',stacked=是否堆积效果bool)</span> <span class="n">plt6</span><span class="o">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'2.jpg'</span><span class="p">)</span>
1
2
3
4
|
<
span
class
=
"n"
>
df6
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
DataFrame
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"n"
>
np
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
random
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
rand
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"mi"
>
10
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"mi"
>
4
<
/
span
>
<
span
class
=
"p"
>
)
,
<
/
span
>
<
span
class
=
"n"
>
columns
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"nb"
>
list
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'ABCD'
<
/
span
>
<
span
class
=
"p"
>
)
)
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
set_option
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'mpl_style'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'default'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#设置风格kind='bar' 为柱形图</span>
<
span
class
=
"n"
>
plt6
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
df6
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
plot
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"n"
>
kind
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"s"
>
'bar'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"n"
>
stacked
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"bp"
>
True
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
get_figure
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"c"
>
#.plot(kind='风格',stacked=是否堆积效果bool)</span>
<
span
class
=
"n"
>
plt6
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
savefig
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'2.jpg'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
|
In [19]:
<span class="n">df7</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span><span class="n">columns</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="s">'abcd'</span><span class="p">))</span> <span class="n">d8</span> <span class="o">=</span> <span class="n">df7</span><span class="p">[</span><span class="s">'a'</span><span class="p">]</span><span class="o">.</span><span class="n">hist</span><span class="p">()</span><span class="o">.</span><span class="n">get_figure</span><span class="p">()</span><span class="c">#绘制直方图</span> <span class="n">d8</span><span class="o">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'3.jpg'</span><span class="p">)</span>
1
2
3
|
<
span
class
=
"n"
>
df7
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
DataFrame
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"n"
>
np
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
random
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
rand
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"mi"
>
100
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"mi"
>
4
<
/
span
>
<
span
class
=
"p"
>
)
,
<
/
span
>
<
span
class
=
"n"
>
columns
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"nb"
>
list
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'abcd'
<
/
span
>
<
span
class
=
"p"
>
)
)
<
/
span
>
<
span
class
=
"n"
>
d8
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
df7
<
/
span
>
<
span
class
=
"p"
>
[
<
/
span
>
<
span
class
=
"s"
>
'a'
<
/
span
>
<
span
class
=
"p"
>
]
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
hist
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
get_figure
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"c"
>
#绘制直方图</span>
<
span
class
=
"n"
>
d8
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
savefig
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'3.jpg'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
|
In [33]:
<span class="n">pd</span><span class="o">.</span><span class="n">set_option</span><span class="p">(</span><span class="s">'mpl_style'</span><span class="p">,</span><span class="s">'default'</span><span class="p">)</span> <span class="n">fig</span><span class="p">,</span><span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span> <span class="n">df9</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span><span class="mi">2</span><span class="p">),</span><span class="n">columns</span><span class="o">=</span><span class="nb">list</span><span class="p">(</span><span class="s">'ab'</span><span class="p">))</span> <span class="n">df9</span><span class="o">.</span><span class="n">boxplot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">)</span><span class="c">#绘制箱型图</span> <span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
1
2
3
4
5
|
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
set_option
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'mpl_style'
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"s"
>
'default'
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"n"
>
fig
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"n"
>
ax
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
plt
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
subplots
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
<
span
class
=
"n"
>
df9
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
pd
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
DataFrame
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"n"
>
np
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
random
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
rand
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"mi"
>
100
<
/
span
>
<
span
class
=
"p"
>
,
<
/
span
>
<
span
class
=
"mi"
>
2
<
/
span
>
<
span
class
=
"p"
>
)
,
<
/
span
>
<
span
class
=
"n"
>
columns
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"nb"
>
list
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"s"
>
'ab'
<
/
span
>
<
span
class
=
"p"
>
)
)
<
/
span
>
<
span
class
=
"n"
>
df9
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
boxplot
<
/
span
>
<
span
class
=
"p"
>
(
<
/
span
>
<
span
class
=
"n"
>
ax
<
/
span
>
<
span
class
=
"o"
>=
<
/
span
>
<
span
class
=
"n"
>
ax
<
/
span
>
<
span
class
=
"p"
>
)
<
/
span
>
<
span
class
=
"c"
>
#绘制箱型图</span>
<
span
class
=
"n"
>
plt
<
/
span
>
<
span
class
=
"o"
>
.
<
/
span
>
<
span
class
=
"n"
>
show
<
/
span
>
<
span
class
=
"p"
>
(
)
<
/
span
>
|