利用Python进行数据分析_Wes McKinney著_唐学韬译_笔记

最新推荐文章于 2024-08-08 08:34:22 发布

西瓜情怀总是籽

最新推荐文章于 2024-08-08 08:34:22 发布

阅读量1.3k

点赞数 1

分类专栏：数据分析文章标签： Python

本文链接：https://blog.csdn.net/sinat_41842926/article/details/86499724

版权

数据分析专栏收录该内容

12 篇文章 1 订阅

订阅专栏

因本人刚开始写博客，学识经验有限，如有不正之处望读者指正，不胜感激；也望借此平台留下学习笔记以温故而知新。这篇主要是利用Python进行数据分析的学习笔记。

利用Python进行数据分析百度网盘链接：https://pan.baidu.com/s/1f3oUdE5ndidYouRf5AvorQ

无需提取码

推荐指数：5颗星

Numpy基础函数

Tab：自动补全
help和？：查看帮助
%run：输入文件名即可运行程序
C+A：光标移动到行首
C+E：光标移动到行尾
C+K：删除从光标开始至行尾的文本
C+U：清除当前行的所有文本
C+F：光标向前移动一个字符
C+B：光标向后移动一个字符
C+L：清屏
%magic：魔术命令的详情
%logstart：开始记录日志
%run -d：执行脚本会直接进入调试器
%time：总体执行时间
%timeit：多次执行取平均
np.astype：转换数值形式
and、or在Python的布尔型数组中无效
np.dot计算矩阵的内积X’X
np.log:ln
np.log1p:log(1+x)
np.ceil:取顶
np.floor:取底
np.rint：四舍五入
np.modf:小数部分和整数部分独立返回
np.power:A的B次幂
np.sopysign:将第二个数组中的值符号复制到第一个数组中
np.where:条件为真取x,否则为y
np.cumsum和np.cumprod不聚合，产生一个由中间结果组成的数组
np.all:检验数组中所有值是否都是True
np.any:检验数组中是否还存在一个或者多个True
np.sort:排序
np.unique:找出数组中的唯一值并返回已排序的结果
np.in1d:测试一个数组中的值在另一个数组中的成员资格，返回一个布尔型数组
np.intersect1d:找出两个数组中的公共元素，并返回有序结果
np.union1d:并集，返回有序结果
np.setdiff1d:差集
np.save和np.load:读写磁盘数组数据的两个主要函数
np.savez：多个数组保存到一个压缩文件中
np.loadtxt和np.savetxt:加载和保存文本文件
np.linalg包含的线性代数运算：trace,det,eig,inv,pinv,qr,svd,solve,lstsq(最小二乘)

Pandas基础函数

常用工具：Series和DateFrame
Series通过values和index属性获取数组表示形式和索引对象
Series最重要的功能是在算术运算中会自动对齐不同索引的数据
Series可以通过数组、字典等创建
DataFrame中columns表示列索引，对行进行操作的时候使用loc或者iloc
DataFrame中del操作用于删除列
DataFrame中传入嵌套字典时，外层字典键作为列，内层键作为行索引
DataFrame中Index对象是不能修改的
pandas.reindex:创建一个适应新索引的新对象
drop:在指定轴上删除指定值的新对象
DataFrame的apply可以将函数应用到由各列或行所形成的一维数组上
Series.sort_index:对行或列索引进行排序，返回一个已排序的新对象
Series.order:按值进行排序
Series.rank:排名（排名从1开始）
idxmax,idxmin：返回间接统计
cumsum:按照行或者列进行累计求和
describe：用于一次性产生多个汇总统计
quantile:计算样本的分位数
median：算术中位数
mad:根据平均值计算平均绝对离差
var:样本的方差
skew:样本值的偏度
kurt:样本值的峰度
pct_change:计算百分数变化
corr:相关系数
cov:协方差
unique：获取唯一值数组，未排序
value_counts：计算各值出现的频率，按照降序排列
dropna:处理缺失数据，返回一个仅含非空数据和索引值的序列
DataFrame.dropna(thresh):滤除满足行要求的数据
fillna:将缺失值替换为常数值
unstack:将层次化索引数据重新分组安排到一个DataFrame中
stack：是unstack的逆运算
swaplevel：接受两个级别编号或名称，并返回一个互换了级别的新对象，数据不发生变化
set_index:将一个或者多个列转换为行索引，并创建一个新的DataFrame
reset_index：层次化索引的级别会被转移到列里面，与set_index功能相反

数据加载、存储与文件格式

pandas中操作
read_csv：默认分割为逗号
read_table：默认分割为制表符
read_fwf：定宽列格式数据
read_clipboard：读取剪贴板中的数据

read_csv参数：列名 names，列名变索引 index_col；跳过某些行 skiprows；
接受表示缺失值的字符串 na_values；分隔符或正则表达式 delimiter;
用列名的行号，通常为第一行 header，如果不需要需设置为None；
将数据解析为日期 parse_dates；读取文件行数 nrows；文本的编码格式 encoding；
thousands 千分位分隔符；文件块的大小，用于迭代 chunksize；

to_csv:将数据写到一个以逗号分隔的文件中
to_csv参数：紧紧打印出文本结果 sys.stdout；将缺失值表示为空字符串 na_rep=‘NULL’；
禁用行和列标签 index=False，header=False；行结束符号 lineterminator；
忽略分隔符后面的空白符，默认False skipinitialspace；
csv.writer：手工输出分隔符文件

json数据
json.loads：将json字符串转换成Python形式
json.dumps：将Python对象转换成json格式

XML和HTML数据
from lxml.html import parse
from urllib2 import urlopen
from lxml import objectify

二进制数据格式
使用Python内置的pickle序列化
保存：pd.read_csv().save(’…/frame_pickle’)
下载：pd.load(’…/frame_pickle’)

HDF5格式（层次型数据格式 hierarchical data format）
通过PyTables存储pandas对象：pd.HDFStore()

Excel文件
pd.ExcelFile()：创建一个ExcelFile实例
pd.ExcelFile().parse()：工作表中的数据读取到DataFrame中

数据库
import sqlite3
主要包含
con = sqlite3.connect(’’)
con.execute()
con.commit()
还有 import pandas.io.sql as sql
sql.read_frame()

数据规整化

pandas.merge():根据一个或多个键将不同DataFrame行连接起来,
参数 suffixes用于指定附加到两个DataFrame对象的重叠列名上的字符串
how:连接方式
on:连接的列名
sort：根据连接键对合并后的数据进行排序；
pd.join():实现按索引合并
pandas.concat():沿着一条轴将多个对象堆叠在一起
参数 join_axes：指定要在其他轴上使用的索引
combine_first():用参数对象中的数据为调用者对象的缺失数据进行补充
reshape：重塑
pivot：轴向旋转
重塑层次化索引时，stack 将数据列旋转为行，unstack将数据行旋转为列

DataFrame的duplicated：返回一个布尔型列表，表示各行是否是重复行
drop_duplicates：返回一个移除了重复行的DataFrame
Series的map：接受一个函数或含有映射关系的字典型对象
Series的replace：修改对象的数据子集
rename：创建数据集的转换板，也就是重命名；还能使用字典型对象对部分轴标签更新

pd.cut():将连续数据离散化或拆分为小组数据
pd.qcut():根据样本数据分位数对数据进行面元划分

时间序列

时间戳、固定时期、时间间隔

datetime中的数据类型
date:公历
time：时间
datetime：日期和时间
timedelta：两个datetime对象之间的时间差
strftime:将日期转换为字符串
strptime:将字符串转化为日期
%Y:4位数的年
%y:2位数的年
%H：24小时制
%I：12小时制
%w:整数表示星期几

pandas中的频率：由一个基础频率和一个乘数组成
D：每日历日
B:每工作日
M:每个月最后一个日历日
H：每个小时
T:每分
S:每秒
BM：每月最后一个工作日
MS：每月第一个日历日
BMS：每月第一个工作日
WOM-3FRI:每月第3个星期五
Q-JAN:对于指定月份结束的年份，每季度最后一月的最后一个日历日
A-JAN:每年指定月份的最后一个日历日

shift:沿着时间轴对数据进行前移或者后移，索引不变，常用于时间序列中的百分比的变化
rollforward，rollback：显示将日期向前或向后滚动

时区对象：
pandas中包括了pytz工具包
tz_localize：从单纯时间序列转换到本地时区
tz.convert：时间序列被本地化之后，转换到别的时区
to_period:将时间戳索引的Series和DataFrame对象转换为时期索引
resample:重采样 ‘参数ohlc：生成金融中常用的开盘，收盘，最高，最低表格’

金融和经济中的数据应用

数据对齐
resample:将数据转换到固定频率
reindex:用于使数据符合一个新索引
at_time:选取特点时间点的数据值
between_time:选取两个Time时间对象之间的数据值
asof:得到特定时间点处的有效数据值
pd.concat:可用于实现在一个特定的时间点上将一个数据源切换到另一个数据源
combine_first:可以引入合并数据点之前的数据

Numpy高级应用

reshape：转换格式
flatten：扁平化
ravel：散开
concatenate：连接合并（vstack,hstack）
repeat和tile：重复
take和put：花式索引的等价函数

Coding （倒序？ hhh…严肃点：无序…贴出来主要是为了方便想尝试一下的伙伴省去敲字的时间啦~~）

from pandas import Series,DataFrame
import pandas as pd
import numpy as np
import json
import statsmodels.api as sm
from datetime import datetime, timedelta
from pandas.tseries.offsets import Hour,Minute,Day,MonthEnd
import pytz
import matplotlib.pyplot as plt
import random
import string

random.seed(0)
N = 1000
def rands(n):
    choices = string.ascii_uppercase
    return ''.join([random.choice(choices) for _ in range(n)])
tickers = np.array([rands(5) for _ in range(N)])
M = 500
df = DataFrame({'Mom':np.random.randn(M)/200+0.03,
               'Value':np.random.randn(M)/200+0.08,
               'Short':np.random.randn(M)/200-0.02},
              index = tickers[:M])
ind_names = np.array(['Financial','Tech'])
sampler = np.random.randint(0,len(ind_names),N)
industries = Series(ind_names[sampler],index = tickers,name='industry')
by_industry = df.groupby(industries)
by_industry.mean()
by_industry.describe()
def zscore(group):
    return (group - group.mean())/group.std()
df_stand = by_industry.apply(zscore)
df_stand.groupby(industries).agg(['mean','std'])
by_industry.apply(lambda x: zscore(x.rank()))
f1,f2,f3 = np.random.rand(3,1000)
ticker_subset = tickers.take(np.random.permutation(N)[:1000])
port = Series(0.7*f1-1.2*f2+0.3*f3 + np.random.rand(1000),index = ticker_subset)
fact = DataFrame({'f1':f1,'f2':f2,'f3':f3},index = ticker_subset)
fact.corrwith(port)
a = range(2,3,4)

gdp = Series([1.78,1.94,2.08,2.01,2.15,2.31,2.46],
             index=pd.period_range('1984Q2',periods=7,freq='Q-SEP'))
infl = Series([0.025,0.045,0.037,0.04],
              index=pd.period_range('1982',periods=4,freq='A-DEC'))
infl_q = infl.asfreq('Q-SEP',how='end')
rng = pd.date_range('2012-06-01 09:30','2012-06-01 15:59',freq='T')
rng

DatetimeIndex(['2012-06-01 09:30:00', '2012-06-01 09:31:00',
               '2012-06-01 09:32:00', '2012-06-01 09:33:00',
               '2012-06-01 09:34:00', '2012-06-01 09:35:00',
               '2012-06-01 09:36:00', '2012-06-01 09:37:00',
               '2012-06-01 09:38:00', '2012-06-01 09:39:00',
               ...
               '2012-06-01 15:50:00', '2012-06-01 15:51:00',
               '2012-06-01 15:52:00', '2012-06-01 15:53:00',
               '2012-06-01 15:54:00', '2012-06-01 15:55:00',
               '2012-06-01 15:56:00', '2012-06-01 15:57:00',
               '2012-06-01 15:58:00', '2012-06-01 15:59:00'],
              dtype='datetime64[ns]', length=390, freq='T')

rng = pd.date_range('1/1/2000',periods=12,freq='T')
ts = Series(np.arange(len(rng)),index = rng)
ts.resample('5min').sum()
ts.resample('5min').ohlc()
rng = pd.date_range('1/1/2000',periods=100,freq='D')
ts = Series(np.arange(100),index = rng)
ts.groupby(lambda x: x.month).mean()
ts.groupby(lambda x: x.weekday).mean()
frame = DataFrame(np.random.randn(2,4),index = pd.date_range('1/1/2000',periods=2,freq='W-WED'),
columns= ['C','T','N','O'])
df_daily = frame.resample('D').ffill()
df_daily['C'].plot()

ts1 = Series(np.random.randn(6),index = pd.date_range('1/1/2019',periods=6,freq='W-WED'))
ts1.resample('B').ffill()
print(ts1)
dates = pd.DatetimeIndex(['2019-01-05','2019-01-16','2019-01-19'])
ts2 = Series(np.random.randn(3),index = dates)
ts2
ts1.reindex(ts2.index).ffill()

2019-01-02   -0.333598
2019-01-09    1.450419
2019-01-16   -1.286464
2019-01-23   -1.212363
2019-01-30    0.115858
2019-02-06   -1.691769
Freq: W-WED, dtype: float64


2019-01-05         NaN
2019-01-16   -1.286464
2019-01-19   -1.286464
dtype: float64

index = pd.date_range('1/1/2019','1/6/2019')
index = pd.date_range(end = '1/1/2019',periods=5)
index = pd.date_range('1/1/2019','1/6/2019',freq='5H')
index = Hour(2)+Minute(30)
index = pd.date_range('1/1/2019','1/6/2019',freq='5h30min')
index = pd.date_range('1/1/2019','2/10/2019',freq='WOM-3FRI')
index
ts = Series(np.random.randn(4),index=pd.date_range('1/1/2019',periods=4,freq='M'))
ts.shift(2,freq=True)
now = datetime(2019,1,12)
now+3*Day()
now+2*MonthEnd()
now+MonthEnd(2)
now = datetime(2019,1,12)
MonthEnd().rollforward(now)
MonthEnd().rollback(now)
ts = Series(np.random.randn(20),index=pd.date_range('1/1/2019',periods=20,freq='4d'))
ts.groupby(MonthEnd().rollforward).mean()
pytz.common_timezones[-5:]
tz = pytz.timezone('US/Hawaii')
rng = pd.date_range('3/9/2019 9:30',periods=6,freq='D')
ts = Series(np.random.randn(len(rng)),index = rng)
pd.date_range('3/9/2019 9:30',periods=6,freq='D',tz='UTC')
ts_utc = ts.tz_localize('UTC')
ts_utc.tz_convert('US/Hawaii')
stamp = pd.Timestamp('2011-1-9 4:00')
stamp_utc = stamp.tz_localize('UTC')
stamp_utc.tz_convert('US/Hawaii')
stamp = pd.Timestamp('2011-1-9 4:00',tz='US/Hawaii')
stamp + Hour(3)
p = pd.Period(2007,freq='A-DEC')
rng = pd.period_range('1/1/2019','6/30/2019',freq='M')
Series(np.random.randn(6),index = rng)
rng = pd.period_range('2003Q3','2004Q4',freq='Q-JAN')
ts = Series(np.arange(len(rng)),index=rng)
new_rng = (rng.asfreq('B','e')-1).asfreq('T','s')+16*60
ts.index = new_rng.to_timestamp()
rng = pd.date_range('1/1/2000',periods=3,freq='M')
ts = Series(np.random.randn(3),index = rng)
pts = ts.to_period()
rng = pd.date_range('1/29/2000',periods=6,freq='D')
ts2 = Series(np.random.randn(6),index = rng)
ts2.to_period('M')
print(pts)
pts.to_timestamp(how='end')

2000-01   -0.171205
2000-02    0.248715
2000-03    0.468292
Freq: M, dtype: float64

2000-01-31   -0.171205
2000-02-29    0.248715
2000-03-31    0.468292
Freq: M, dtype: float64

dates = [datetime(2019,1,2),datetime(2019,1,3),datetime(2019,1,4)]
ts = Series(np.random.randn(3),index=dates)
print(ts)
type(ts)
ts.index
ts + ts[::2]
lts = Series(np.random.randn(10),index=pd.date_range('1/1/2019',periods=10))
dates = pd.DatetimeIndex(['1/1/2019','1/2/2019','1/2/2019'])
dup = Series(np.arange(3),index = dates)
group = dup.groupby(level=0)
print(group.count())

2019-01-02   -0.099660
2019-01-03    1.038844
2019-01-04   -1.828429
dtype: float64
2019-01-01    1
2019-01-02    2
dtype: int64

now = datetime.now()
now.year, now.month, now.day
delta = datetime(2019,1,11)-datetime(2019,1,1)
delta
datetime(2019,1,11) + timedelta(14)
stamp = datetime(2011,1,3)
b = stamp.strftime('%Y-%m-%d')
datetime.strptime(b,'%Y-%d-%m')
datestr = ['2019/1/1','2019/1/11']
ids = pd.to_datetime(datestr+[None])
ids[2]

NaT

df = DataFrame({'cata':['a','a','a','a','b','b','b','b'],'data':np.random.randn(8),
               'weight':np.random.rand(8)})
print(df)
a = df.groupby('cata')
a.apply(np.mean)
b = pd.cut(df.data,4)

  cata      data    weight
0    a -1.410970  0.510190
1    a  0.556708  0.488728
2    a -0.234643  0.892280
3    a  0.674317  0.653063
4    b  0.754127  0.567138
5    b -3.584606  0.380196
6    b -1.062379  0.909739
7    b  0.119287  0.578805

suits = ['H','S','C','D']
card_val = (list(range(1, 11)) + [10] * 3) * 4
base_name = ['A'] + list(range(2,11)) + ['J','Q','K']
cards = []
for suit in suits:
    cards.extend(str(num) + suit for num in base_name)
deck = Series(card_val,index=cards)
print(deck[:13])
def draw(deck,n=5):
    return deck.take(np.random.permutation(len(deck))[:n])
draw(deck)

AH      1
2H      2
3H      3
4H      4
5H      5
6H      6
7H      7
8H      8
9H      9
10H    10
JH     10
QH     10
KH     10
dtype: int64





8S     8
7D     7
JD    10
6C     6
2D     2
dtype: int64

a = np.random.permutation(9)
a

array([4, 3, 8, 1, 6, 7, 0, 2, 5])

s = Series(np.random.randn(6))
s[::2] = np.nan
s

0         NaN
1   -1.494708
2         NaN
3   -0.637867
4         NaN
5    0.701418
dtype: float64

a = s.fillna(s.mean())
a

0   -0.477052
1   -1.494708
2   -0.477052
3   -0.637867
4   -0.477052
5    0.701418
dtype: float64

frame = DataFrame({'data1':np.random.randn(20),'data2':np.random.randn(20)})
factor = pd.qcut(frame.data1,5)
print(factor)
def get_stats(group):
    return {'min':group.min(),'max':group.max()}
grouped = frame.data1.groupby(factor,group_keys=False)
grouped.apply(get_stats).unstack()

0     (-0.376, 0.0911]
1       (0.488, 1.881]
2     (-3.477, -1.532]
3     (-1.532, -0.376]
4      (0.0911, 0.488]
5       (0.488, 1.881]
6     (-3.477, -1.532]
7     (-1.532, -0.376]
8      (0.0911, 0.488]
9     (-1.532, -0.376]
10     (0.0911, 0.488]
11      (0.488, 1.881]
12    (-0.376, 0.0911]
13    (-0.376, 0.0911]
14      (0.488, 1.881]
15    (-3.477, -1.532]
16    (-1.532, -0.376]
17    (-3.477, -1.532]
18     (0.0911, 0.488]
19    (-0.376, 0.0911]
Name: data1, dtype: category
Categories (5, interval[float64]): [(-3.477, -1.532] < (-1.532, -0.376] < (-0.376, 0.0911] < (0.0911, 0.488] < (0.488, 1.881]]

	max	min
data1
(-3.477, -1.532]	-1.549791	-3.476091
(-1.532, -0.376]	-0.408399	-1.527223
(-0.376, 0.0911]	0.082558	-0.355047
(0.0911, 0.488]	0.455659	0.103927
(0.488, 1.881]	1.880841	0.617265

data = DataFrame({'k1':['one']*3 +['two']*4,'k2':[1,1,2,3,3,4,4]})
data['v1']=range(7)
print(data)
data.drop_duplicates(['k1'])
data.describe()

    k1  k2  v1
0  one   1   0
1  one   1   1
2  one   2   2
3  two   3   3
4  two   3   4
5  two   4   5
6  two   4   6

	k2	v1
count	7.000000	7.000000
mean	2.571429	3.000000
std	1.272418	2.160247
min	1.000000	0.000000
25%	1.500000	1.500000
50%	3.000000	3.000000
75%	3.500000	4.500000
max	4.000000	6.000000

s1 = Series([0,1,2,3],index=['a','b','c','d'])
print(s1)
s2 = Series([2,3,4,5],index=['b','c','d','e'])
print(s2)
df = pd.concat([s1,s2],keys=['one','two'])
print(df)
df.unstack().stack()

a    0
b    1
c    2
d    3
dtype: int64
b    2
c    3
d    4
e    5
dtype: int64
one  a    0
     b    1
     c    2
     d    3
two  b    2
     c    3
     d    4
     e    5
dtype: int64





one  a    0.0
     b    1.0
     c    2.0
     d    3.0
two  b    2.0
     c    3.0
     d    4.0
     e    5.0
dtype: float64

df1 = DataFrame({'key1':['f','f','b'],'key2':['o','t','t'],'lval':[1,2,3]})
print(df1)
df2 = DataFrame({'key1':['f','f','b','b'],'key2':['o','o','o','t'],'rval':[1,2,3,4]})
print(df2)
pd.merge(df1,df2,on=['key1','key2'],how='outer')

  key1 key2  lval
0    f    o     1
1    f    t     2
2    b    t     3
  key1 key2  rval
0    f    o     1
1    f    o     2
2    b    o     3
3    b    t     4

	key1	key2	lval	rval
0	f	o	1.0	1.0
1	f	o	1.0	2.0
2	f	t	2.0	NaN
3	b	t	3.0	4.0
4	b	o	NaN	3.0

dt = DataFrame(np.random.randn(4,3))
dt.to_csv('cho1.csv')
dt

	0	1	2
0	1.169573	-0.372106	0.557916
1	0.229560	-0.568512	0.199071
2	0.003694	-1.621262	1.095708
3	-1.632407	0.198777	-0.918157

!type cho1.csv

,0,1,2
0,-0.6649274501471323,0.1568505875966796,2.0463934986909793
1,0.4875590491290402,0.25487978183722804,-0.02295841069629054
2,-1.102747473663344,-1.00245492236496,0.019003095836607142
3,-0.6714493747654602,0.7129870680225009,-0.784597797672288

frame = DataFrame(np.arange(6).reshape(3,2),index=[2,0,1])
print(frame)
print(frame.iloc[0])

   0  1
2  0  1
0  2  3
1  4  5
0    0
1    1
Name: 2, dtype: int32

frame = DataFrame(np.random.randn(4,3),columns=list('adc'))
print(frame)
np.fabs(frame)
frame.sort_index(axis=1)

          a         d         c
0 -0.451933  1.026963  1.553605
1 -1.463336 -0.324006 -1.114068
2 -0.042280 -1.390858  0.302908
3  1.185429 -0.290627  0.409816

	a	c	d
0	-0.451933	1.553605	1.026963
1	-1.463336	-1.114068	-0.324006
2	-0.042280	0.302908	-1.390858
3	1.185429	0.409816	-0.290627

o = Series(range(3),index = ['a','d','c'])
print(o)
o.sort_index()

a    0
d    1
c    2
dtype: int32





a    0
c    2
d    1
dtype: int32

from pandas import Series,DataFrame
import pandas as pd
data = {'A':[1,2,3],"B":[3,2,1],"C":[0,0,0]}
frame = DataFrame(data)
print(frame)
del frame['C']
frame.iloc[2]

   A  B  C
0  1  3  0
1  2  2  0
2  3  1  0

A    3
B    1
Name: 2, dtype: int64

from pandas import Series,DataFrame
import pandas as pd
def f(x,y,z):
    return (x+y)/z
a = 2
b = 3
c = 5

f(1,2,z=3)

1.0

arr = np.array([3.7,9,2])
brr = arr.astype(np.int32)

brr

array([3, 9, 2])

a = np.array([1,2,3],dtype=np.int32)
a.dtype

dtype('int32')

b = a.astype(int)
b.dtype

dtype('int32')

arr = np.array([[1.,2.,3.],[4.,5.,6.]])
arr*arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

data = np.random.randn(7,4)
data

array([[ 0.54869007,  1.37392201, -0.78342317, -1.24509936],
       [-1.12983185,  0.05239011, -1.04543191, -0.46990263],
       [ 0.69662765, -2.36818955, -1.34385099,  0.39715629],
       [ 1.12148482, -0.34732483,  1.98797763, -0.35084757],
       [-1.40928013,  0.60146898,  0.10408849, -0.4332211 ],
       [ 1.13320099, -0.65301464, -1.40721037,  0.32454336],
       [-1.71210266,  0.72566395, -1.11679164,  2.16165969]])

names = np.array(['A','B','C','B'])
names
names == 'B'

array(['A', 'B', 'C', 'B'], dtype='<U1')

arr = np.empty((3,2))
arr[[1,1]]

array([[ 9., 16.],
       [ 9., 16.]])

arr = np.arange(16).reshape((2,2,4))
arr.transpose((1,0,2))

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

arr = np.random.randn(7)*5
print(arr)
np.modf(arr)

[-1.57935486  0.67115887  2.85975907 -0.65035502  3.65726964  9.86445743
  4.17652556]

(array([-0.57935486,  0.67115887,  0.85975907, -0.65035502,  0.65726964,
         0.86445743,  0.17652556]), array([-1.,  0.,  2., -0.,  3.,  9.,  4.]))

import matplotlib.pyplot as plt
points = np.arange(-5,5,0.01)
x , y = np.meshgrid(points,points)
z = np.sqrt(x**2 + y**2)
plt.imshow(z)
plt.colorbar()
plt.show()

在这里插入图片描述

arr = np.random.randn(4,4)
print(arr)
np.where(arr>0,2,-2)

[[-0.52697665 -0.64766927 -1.40893777  0.7755106 ]
 [ 1.58782052  0.31437337  0.37338648 -0.00637067]
 [-0.89267104 -1.20542302  0.22396485  0.99881132]
 [-0.30565723  0.40836548  1.07196103  1.05869402]]

array([[-2, -2, -2,  2],
       [ 2,  2,  2, -2],
       [-2, -2,  2,  2],
       [-2,  2,  2,  2]])

arr = np.random.randn(5,4)
arr.mean()

-0.5278565959601283

arr= np.arange(10)
np.save('some_array',arr)

np.random.randint(0,4)

西瓜情怀总是籽

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录