Numpy数据分析练习(21-30)

最新推荐文章于 2022-10-25 00:38:08 发布

山水指尖

最新推荐文章于 2022-10-25 00:38:08 发布

阅读量1k

点赞数

分类专栏： python numpy 文章标签： python numpy

本文链接：https://blog.csdn.net/wjw173/article/details/85467832

版权

numpy 同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

python

2 篇文章 2 订阅

订阅专栏

numpy练习 21-30

Numpy数据分析问答

Numpy数据分析问答

21、如何在numpy数组中只打印小数点后三位？

问题：只打印或显示numpy数组rand_arr的小数点后3位。

#given
rand_arr = np.random((5,3))
#limit to 3 decimal places
np.set_printoptions(precision = 3)
rand_arr[:4]
# > array([[ 0.443,  0.109,  0.97 ],
# >        [ 0.388,  0.447,  0.191],
# >        [ 0.891,  0.474,  0.212],
# >        [ 0.609,  0.518,  0.403]])

22、如何通过e式科学计数法（如1e10）来打印一个numpy数组？

问题：通过e式科学计数法来打印rand_arr(如 1e10）

#create the random array
np.random.seed(100)
rand_arr = np.random.random([3,3])/1e3
print(rand_arr)
# > array([[  5.434049e-04,   2.783694e-04,   4.245176e-04],
# >        [  8.447761e-04,   4.718856e-06,   1.215691e-04],
# >        [  6.707491e-04,   8.258528e-04,   1.367066e-04]])
#solution
#reset printoptions to default
np.set_printoptions(suppress=False)
np.random.seed(100)
rand_arr = np.random.random([3,3])/1e3
print(rand_arr)
np.set_printoptions(supress=True, precison=6)
print(rand_arr)
# > array([[ 0.000543,  0.000278,  0.000425],
# >        [ 0.000845,  0.000005,  0.000122],
# >        [ 0.000671,  0.000826,  0.000137]])

23、如何限制numpy数组中打印的项目数？

**将numpy数组中打印的项数限制为最多6个元素。

#given
a = np.arange(15)
# > array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
#solution
np.set_printoptions(threshold = 6)
a = np.arange(15)
#output
# > array([ 0,  1,  2, ..., 12, 13, 14])

24、如何打印完整的numpy数组而不被截断

#given
np.set_printoptions(threshold = 6)
a = np.arange(15)
# > array([ 0,  1,  2, ..., 12, 13, 14])
#solution
np.set_printoptions(threshold = np.nan)
a
# > array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

25、如何导入数组和文本数据集保持文本在numpy数组中完好无损？

问题：导入鸢尾属植物数据集，保持文本不变。

# Solution
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

# Print the first 3 rows
iris[:3]
# > array([[b'5.1', b'3.5', b'1.4', b'0.2', b'Iris-setosa'],
# >        [b'4.9', b'3.0', b'1.4', b'0.2', b'Iris-setosa'],
# >        [b'4.7', b'3.2', b'1.3', b'0.2', b'Iris-setosa']], dtype=object)

IDLE运行查看
关于genformtext()函数，有一个完整的参数表。
在这里插入图片描述

参数	说明	默认值
fname	必须的，其他参数都是非必须的,可以是一个指向文本的Url或者本地文本文件	自己指定
dtype	类型可为，object，float等	float
comments	文本中的注释标记	#
delimiter	分隔符	，
skip_header	是否忽略首行	0，不忽略
skip_footer	是否忽略末尾行	0，不忽略
converters	通常是一个以列索引或列名称作为键，以一个转换函数作为值的字典。转换函数可以是实际的函数或者lambda函数。也可以用来转换日期时间	None
missimg_value	默认使用空格表示缺失值	None
filling_values	出现缺失值时自动填充，如果是0，不填充，如果是1，按以下规则填充：bool False,int -1, float np.nan, complex np.nan+0j, string ‘???’	0，不填充
encoding	编码格式，VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default. 读取unicode字符串时，要有编码方式参数	默认用None（必须 >>> np.version ‘1.15.3’）

26、如何从1维元组数组中提取特定列？

问题：从前面问题中导入的一维鸢尾属植物数据集中提取文本列的物种。

# given
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)
print(iris_1d.shape)

# Solution:
species = np.array([row[4] for row in iris_1d])
species[:5]
# > (150,)
# > array([b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa',
# >        b'Iris-setosa'],
# >       dtype='|S18')

27、如何将1维元组数组转换为2维numpy数组？

问题：通过省略鸢尾属植物数据集种类的文本字段，将一维鸢尾属植物数据集转换为二维数组iris_2d。

# **给定：**
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',', dtype=None)

# Solution:
# Method 1: Convert each row to a list and get the first 4 items
iris_2d = np.array([row.tolist()[:4] for row in iris_1d])
iris_2d[:4]

# Alt Method 2: Import only the first 4 columns from source url
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[:4]
# > array([[ 5.1,  3.5,  1.4,  0.2],
# >        [ 4.9,  3. ,  1.4,  0.2],
# >        [ 4.7,  3.2,  1.3,  0.2],
# >        [ 4.6,  3.1,  1.5,  0.2]])

在这里插入图片描述

28、如何计算numpy数组的均值，中位数，标准差？

问题：求出鸢尾属植物萼片长度的平均值、中位数和标准差(第1列)

np函数名	说明
np.mean	平均值
np.median	中位数
np.std	方差

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
#数据类型 dtype= 'float', 取自定义列 usecols=[0]
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])
# Solution
mu, med, sd = np.mean(sepallength), np.median(sepallength), np.std(sepallength)
print(mu, med, sd)
# > 5.84333333333 5.8 0.825301291785

29、如何规范化数组，是数组的值正好介于0和1之间？

问题：创建一种标准化形式的鸢尾属植物间隔长度，其值正好介于0和1之间，这样最小值为0，最大值为1。
经常用来做数据的格式化。

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
#数据类型 dtype= 'float', 取自定义列 usecols=[0]
sepallength = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0])

# Solution
Smax, Smin = sepallength.max(), sepallength.min()
S = (sepallength - Smin)/(Smax - Smin)
# or 
S = (sepallength - Smin)/sepallength.ptp()  
print(S)
# > [ 0.222  0.167  0.111  0.083  0.194  0.306  0.083  0.194  0.028  0.167
# >   0.306  0.139  0.139  0.     0.417  0.389  0.306  0.222  0.389  0.222
# >   0.306  0.222  0.083  0.222  0.139  0.194  0.194  0.25   0.25   0.111
# >   0.139  0.306  0.25   0.333  0.167  0.194  0.333  0.167  0.028  0.222
# >   0.194  0.056  0.028  0.194  0.222  0.139  0.222  0.083  0.278  0.194
# >   0.75   0.583  0.722  0.333  0.611  0.389  0.556  0.167  0.639  0.25
# >   0.194  0.444  0.472  0.5    0.361  0.667  0.361  0.417  0.528  0.361
# >   0.444  0.5    0.556  0.5    0.583  0.639  0.694  0.667  0.472  0.389
# >   0.333  0.333  0.417  0.472  0.306  0.472  0.667  0.556  0.361  0.333
# >   0.333  0.5    0.417  0.194  0.361  0.389  0.389  0.528  0.222  0.389
# >   0.556  0.417  0.778  0.556  0.611  0.917  0.167  0.833  0.667  0.806
# >   0.611  0.583  0.694  0.389  0.417  0.583  0.611  0.944  0.944  0.472
# >   0.722  0.361  0.944  0.556  0.667  0.806  0.528  0.5    0.583  0.806
# >   0.861  1.     0.583  0.556  0.5    0.944  0.556  0.583  0.472  0.722
# >   0.667  0.722  0.417  0.694  0.667  0.667  0.556  0.611  0.528  0.444]

30. 如何计算Softmax得分？

问题：计算sepallength的softmax分数。

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
sepallength = np.array([float(row[0]) for row in iris])

# Solution
def softmax(x):
    """Compute softmax values for each sets of scores in x.
    https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python"""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

print(softmax(sepallength))
# > [ 0.002  0.002  0.001  0.001  0.002  0.003  0.001  0.002  0.001  0.002
# >   0.003  0.002  0.002  0.001  0.004  0.004  0.003  0.002  0.004  0.002
# >   0.003  0.002  0.001  0.002  0.002  0.002  0.002  0.002  0.002  0.001
# >   0.002  0.003  0.002  0.003  0.002  0.002  0.003  0.002  0.001  0.002
# >   0.002  0.001  0.001  0.002  0.002  0.002  0.002  0.001  0.003  0.002
# >   0.015  0.008  0.013  0.003  0.009  0.004  0.007  0.002  0.01   0.002
# >   0.002  0.005  0.005  0.006  0.004  0.011  0.004  0.004  0.007  0.004
# >   0.005  0.006  0.007  0.006  0.008  0.01   0.012  0.011  0.005  0.004
# >   0.003  0.003  0.004  0.005  0.003  0.005  0.011  0.007  0.004  0.003
# >   0.003  0.006  0.004  0.002  0.004  0.004  0.004  0.007  0.002  0.004
# >   0.007  0.004  0.016  0.007  0.009  0.027  0.002  0.02   0.011  0.018
# >   0.009  0.008  0.012  0.004  0.004  0.008  0.009  0.03   0.03   0.005
# >   0.013  0.004  0.03   0.007  0.011  0.018  0.007  0.006  0.008  0.018
# >   0.022  0.037  0.008  0.007  0.006  0.03   0.007  0.008  0.005  0.013
# >   0.011  0.013  0.004  0.012  0.011  0.011  0.007  0.009  0.007  0.005]