[python] python - Plotting, basic statistics(绘图、基本统计)

[python] python - Plotting, basic statistics(绘图)

一、归并排序
1.Python programming. Please use the code skeleton provided on the course website.
(1)Write a python function to merge sort a list of numbers. Test your function on a small random integer array of 10 elements.

#生成随机数组
import numpy as np
np.random.seed()   #seed()有参数时生成的数组每次都一样
arr = [np.random.randint(0,100) for _ in range(10)]
print("原始数据:", arr)

运行结果:
原始数据: [75, 18, 50, 89, 1, 75, 61, 4, 97, 37]

from typing import List
#归并排序
def merge(arr1:List[int], arr2:List[int]): 
    result = []
    while arr1 and arr2:
        if arr1[0] < arr2[0]:
            result.append(arr1.pop(0))
        else:
            result.append(arr2.pop(0))
    if arr1:
        result += arr1
    if arr2:
        result += arr2
    return result

def merge_sort(arr:List[int]):
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    return merge(merge_sort(arr[:mid]), merge_sort(arr[mid:]))

if __name__ == '__main__':
    arr_new = merge_sort(arr)
    print("归并排序结果:", arr_new)

运行结果:
归并排序结果: [1, 4, 18, 37, 50, 61, 75, 75, 89, 97]


二、基础统计:‘最大值’,‘最小值’,‘均值’,‘标准差’,‘中位数’,‘75%分位数’,'25%分位数’
(2)Write a python function to calculate the summary statistics of an array of numbers:max, min, mean, standard deviation, median, 75 percentile, 25 percentile. Use the three sets of random numbers as generated in the code skeleton to test your code and output the summary statistics for each data set.

#先生成三个随机数组data1,data2,data3
#max,min,mean,standard deviation,median,75%,25%
def status(x) : 
    return pd.Series([x.max(),x.min(),x.mean(),x.std(),x.median(),x.quantile(.75),x.quantile(.25)],
                     index=['最大值','最小值','均值','标准差','中位数','75%分位数','25%分位数'])
                     
df = pd.DataFrame(np.array([d1,d2,d3]).T, columns=['d1','d2','d3'])
df.apply(status)                  

运行结果:
结果


数据归一化
(3)Write a python function that accepts a list of numbers in any range, then scales the numbers to integers in [0, 9], and then count the number of occurrences of each integer in the dataset. Test your function on the three datasets used in (2).

from sklearn import preprocessing
#归一化(0,9)
x = np.array([d1,d2,d3])
min_max_scaler = preprocessing.MinMaxScaler(feature_range = (0,9),copy = False)  #一般MinMaxScaler()范围为(0,1),通过feature_range可以调整范围
x_minmax = min_max_scaler.fit_transform(x)
print(x_minmax)

运行结果:
[[0. 9. 0.68478261 9. 3.04225352 0. 3.10714286 2.86363636 0. 9. ]
[9. 7.27941176 9. 3. 9. 9. 9. 9. 9. 6.23076923]
[1.63636364 0. 0. 0. 0. 6.375 0. 0. 6.54545455 0. ]]

#化整 (float -> int)
d1_normal = x_minmax[0].astype(int)
d2_normal = x_minmax[1].astype(int)
d3_normal = x_minmax[2].astype(int)
print('d1_normal',d1_normal)
print('d2_normal',d2_normal)
print('d3_normal',d3_normal)

运行结果:
d1_normal [0 9 0 9 3 0 3 2 0 9]
d2_normal [9 7 9 3 9 9 9 9 9 6]
d3_normal [1 0 0 0 0 6 0 0 6 0]


计算各个数组中元素出现的次数(counter函数)

from collections import Counter
data1 = Counter(d1_normal)   #数组一
#数组二与数组三一致,
dic = {number: value for number, value in data1.items()}
x = sorted(dic.keys())  #按键的大小排序
y = []
for i in dic.keys():
    y.append(dic.get(i))
df1 = pd.DataFrame(x, y)
df1

运行结果:数组二、三类似

次数data1
40
32
23
19




2.Plotting绘图 : 折线图、散点图
2.(50 points) Plotting.
(1)Using the counts you collected in 1-(3), plot the distribution of the three data sets in one figure. Again, try to reproduce all the details of the example Fig 2.
在这里插入图片描述

同一个图中画多条折线图       根据上面归一化后的数组数据

#根据上面归一化后的数组数据
plt.figure(figsize=(8, 5))
plt.title("Fig.2")   #title
plt.xlabel('value')    #x-axis
plt.ylabel("frequence")    #y-axis
plt.tick_params(axis='both')   
#marker:对标记数据设置关键字参数的样式    c:color     linestyle:折线的样式 
plt.plot(x,y, marker='o', c='r',linestyle=':',label='Data1')    
plt.plot(x2,y2, marker='*', c='b', linestyle='--',label='Data2')
plt.plot(x3,y3, marker='x', c='g', linestyle='-',label='Data3')
plt.legend()

在这里插入图片描述
(1)For the three sets of data generated in 1-(2), plot the summary statistics in one figure. Try to reproduce all the details of the example Fig 1.
在这里插入图片描述
在同一个图中标注几组数据的信息,画成散点图

#利用for循环将信息放入
plt.figure(dpi=200,figsize=(3,3.5))
plt.rcParams['font.sans-serif']=['SimHei']
plt.title('Fig1')
marker = ['o','p','X','h','^','+','*']
color = ['r','b','g','c','m','k','y']
label = ['max','min','mean','std','medien','75%','25%']
for m,l,c,i in zip(marker,label,color,range(1,8)):
    plt.scatter('d1',df['d1'][i],s=12,c=c,marker=m,label=l)

for m,c,i in zip(marker,color,range(1,8)):
    plt.scatter('d2',df['d2'][i],s=12,c=c,marker=m)

for m,c,i in zip(marker,color,range(1,8)):
    plt.scatter('d3',df['d3'][i],s=12,c=c,marker=m)

plt.legend(fontsize=5)

运行结果:
在这里插入图片描述
一开始这个画图我用了一种比较笨的方法,就是一行行代码写上去,但是这个方法太蠢了,不建议用哈,这种就是最普通原始的话散点图的方法,虽然也能成图,但是效率不高

plt.scatter('d1',df['d1'][1],c="r",marker='o',label='max') 
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值