[python] python - Plotting, basic statistics(绘图)
一、归并排序
1.Python programming. Please use the code skeleton provided on the course website.
(1)Write a python function to merge sort a list of numbers. Test your function on a small random integer array of 10 elements.
#生成随机数组
import numpy as np
np.random.seed() #seed()有参数时生成的数组每次都一样
arr = [np.random.randint(0,100) for _ in range(10)]
print("原始数据:", arr)
运行结果:
原始数据: [75, 18, 50, 89, 1, 75, 61, 4, 97, 37]
from typing import List
#归并排序
def merge(arr1:List[int], arr2:List[int]):
result = []
while arr1 and arr2:
if arr1[0] < arr2[0]:
result.append(arr1.pop(0))
else:
result.append(arr2.pop(0))
if arr1:
result += arr1
if arr2:
result += arr2
return result
def merge_sort(arr:List[int]):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
return merge(merge_sort(arr[:mid]), merge_sort(arr[mid:]))
if __name__ == '__main__':
arr_new = merge_sort(arr)
print("归并排序结果:", arr_new)
运行结果:
归并排序结果: [1, 4, 18, 37, 50, 61, 75, 75, 89, 97]
二、基础统计:‘最大值’,‘最小值’,‘均值’,‘标准差’,‘中位数’,‘75%分位数’,'25%分位数’
(2)Write a python function to calculate the summary statistics of an array of numbers:max, min, mean, standard deviation, median, 75 percentile, 25 percentile. Use the three sets of random numbers as generated in the code skeleton to test your code and output the summary statistics for each data set.
#先生成三个随机数组data1,data2,data3
#max,min,mean,standard deviation,median,75%,25%
def status(x) :
return pd.Series([x.max(),x.min(),x.mean(),x.std(),x.median(),x.quantile(.75),x.quantile(.25)],
index=['最大值','最小值','均值','标准差','中位数','75%分位数','25%分位数'])
df = pd.DataFrame(np.array([d1,d2,d3]).T, columns=['d1','d2','d3'])
df.apply(status)
运行结果:
数据归一化
(3)Write a python function that accepts a list of numbers in any range, then scales the numbers to integers in [0, 9], and then count the number of occurrences of each integer in the dataset. Test your function on the three datasets used in (2).
from sklearn import preprocessing
#归一化(0,9)
x = np.array([d1,d2,d3])
min_max_scaler = preprocessing.MinMaxScaler(feature_range = (0,9),copy = False) #一般MinMaxScaler()范围为(0,1),通过feature_range可以调整范围
x_minmax = min_max_scaler.fit_transform(x)
print(x_minmax)
运行结果:
[[0. 9. 0.68478261 9. 3.04225352 0. 3.10714286 2.86363636 0. 9. ]
[9. 7.27941176 9. 3. 9. 9. 9. 9. 9. 6.23076923]
[1.63636364 0. 0. 0. 0. 6.375 0. 0. 6.54545455 0. ]]
#化整 (float -> int)
d1_normal = x_minmax[0].astype(int)
d2_normal = x_minmax[1].astype(int)
d3_normal = x_minmax[2].astype(int)
print('d1_normal',d1_normal)
print('d2_normal',d2_normal)
print('d3_normal',d3_normal)
运行结果:
d1_normal [0 9 0 9 3 0 3 2 0 9]
d2_normal [9 7 9 3 9 9 9 9 9 6]
d3_normal [1 0 0 0 0 6 0 0 6 0]
计算各个数组中元素出现的次数(counter函数)
from collections import Counter
data1 = Counter(d1_normal) #数组一
#数组二与数组三一致,
dic = {number: value for number, value in data1.items()}
x = sorted(dic.keys()) #按键的大小排序
y = []
for i in dic.keys():
y.append(dic.get(i))
df1 = pd.DataFrame(x, y)
df1
运行结果:数组二、三类似
次数 | data1 |
---|---|
4 | 0 |
3 | 2 |
2 | 3 |
1 | 9 |
2.Plotting绘图 : 折线图、散点图
2.(50 points) Plotting.
(1)Using the counts you collected in 1-(3), plot the distribution of the three data sets in one figure. Again, try to reproduce all the details of the example Fig 2.
同一个图中画多条折线图 根据上面归一化后的数组数据
#根据上面归一化后的数组数据
plt.figure(figsize=(8, 5))
plt.title("Fig.2") #title
plt.xlabel('value') #x-axis
plt.ylabel("frequence") #y-axis
plt.tick_params(axis='both')
#marker:对标记数据设置关键字参数的样式 c:color linestyle:折线的样式
plt.plot(x,y, marker='o', c='r',linestyle=':',label='Data1')
plt.plot(x2,y2, marker='*', c='b', linestyle='--',label='Data2')
plt.plot(x3,y3, marker='x', c='g', linestyle='-',label='Data3')
plt.legend()
(1)For the three sets of data generated in 1-(2), plot the summary statistics in one figure. Try to reproduce all the details of the example Fig 1.
在同一个图中标注几组数据的信息,画成散点图
#利用for循环将信息放入
plt.figure(dpi=200,figsize=(3,3.5))
plt.rcParams['font.sans-serif']=['SimHei']
plt.title('Fig1')
marker = ['o','p','X','h','^','+','*']
color = ['r','b','g','c','m','k','y']
label = ['max','min','mean','std','medien','75%','25%']
for m,l,c,i in zip(marker,label,color,range(1,8)):
plt.scatter('d1',df['d1'][i],s=12,c=c,marker=m,label=l)
for m,c,i in zip(marker,color,range(1,8)):
plt.scatter('d2',df['d2'][i],s=12,c=c,marker=m)
for m,c,i in zip(marker,color,range(1,8)):
plt.scatter('d3',df['d3'][i],s=12,c=c,marker=m)
plt.legend(fontsize=5)
运行结果:
一开始这个画图我用了一种比较笨的方法,就是一行行代码写上去,但是这个方法太蠢了,不建议用哈,这种就是最普通原始的话散点图的方法,虽然也能成图,但是效率不高
plt.scatter('d1',df['d1'][1],c="r",marker='o',label='max')