Python数值分析小讲--《Python数据分析从小白到专家》读书笔记_module 'numpy' has no attribute 'arrange-CSDN博客

本文链接：https://blog.csdn.net/tsdss_/article/details/123950738

本文介绍了《Python数据分析从小白到专家》一书，内容涵盖了数据处理、numpy运算、pandas操作、matplotlib图表绘制，以及sklearn和statsmodels进行回归分析的基础知识。通过实例演示了如何使用reshape、linspace、数据类型转换和pandas数据结构，适合大学生快速入门数据分析。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

推荐一下来自工信出版集团的《Python数据分析从小白到专家》

整体的内容实用性极强，与大学生的知识体系匹配度较高，对于入门而言非常友好，并且难度设计有条理，到后期设计了数值分析的概率论知识，可以协助读者迅速入门回归问题和神经网络的初步探索

粗入门

数据处理

numpy运算

pandas表格和矩阵

matplotlib图表绘制

sklearn和statsmodels回归分析和统计计算

numpy

方法

ndarray.ndim    return int，维度
ndarray.size    
ndarray.dtype
ndarray.shape   return tuple,(round，column)

实例

reshape

1维变2维

import numpy as np
a=np.array([1,2,3,4,5,6,7,8])
b=np.arange(1,9).reshape(4,2)
print(a)
print(b)

正在上传…重新上传取消

reshape(x,x,x)

变3维

b=np.arange(1,9).reshape(2,2,2)

正在上传…重新上传取消

reshape(x,-1)

变成x行和不知道n列

b=np.arange(1,9).reshape(x,-1)

linspace(start,stop,step)

浮点型划分

import numpy as np
a=np.arange(0,53,3).reshape(3,-1)
b=np.linspace(0,53,3).reshape(3,-1)
c=np.linspace(0,53,18).reshape(3,-1)
print(a)
print(b)
print(c)

array(,dtype=float)

转为浮点或复数

a=np.array([1,2,3,4,5,6,7,8],dtype=complex)

其他一些方法

np.empty((x,y))

随机浮点矩阵

np.zeros((x,y))

零矩阵

np.ones((x,y))

1矩阵

np.pi

np.exp(1)

矩阵操作

具体可上网查

一些比较搞笑的报错

module 'numpy' has no attribute 'array'
#py的名字不要叫numpy，不然import就冲突了，它会优先识别同文件夹的numpy，所以array的定义它就找不到了

Numpy - module has no attribute ‘arrange‘
#要把arrange改为arange

正则表达式

正则表达式-RE模块-高级文本匹配模式

import re
str="""
Getting going with Fedora is easier than ever. All you need is a 2GB USB flash drive, and Fedora Media Writer.

Once Fedora Media Writer is installed, it will set up your flash drive to run a "Live" version of Fedora Workstation, meaning that you can boot it from your flash drive and try it out right away without making any permanent changes to your computer. Once you are hooked, installing it to your hard drive is a matter of clicking a few buttons*.
"""
result = re.search('you',str).group()
print(result)

有一些正则表达式的式子，可以去学习

pandas

csv使用逗号分隔和空格分隔

别的数据文件都转化为csv比较方便

因为与csv绑定的pandas很方便

import pandas as pd
file = pd.read_csv('a.txt')
file = file.head(10)
file.to_csv('a.csv')
print(file)

panda数据结构

series和dataframe

series原属于numpy，而numpy没有分析与统计的方法，而pandas能实现分析与统计，pandas同时具备series与dataframe相互转化的能力

import pandas as pd
import numpy as np
a = pd.Series([0,1,2,34,np.nan,6,2,3])
print(a)

也可以使用键值对-字典

比较优雅的例子

import pandas as pd
import numpy as np
a = pd.DataFrame([[0,1,2,34,np.nan,6,2,3],
              [12,1,312,3,12,312,4,1],
              [12,3,12,3,np.nan,12312]])
for col in a.columns:
    print(a[col])

dataframe转为series

import pandas as pd
import numpy as np
a = pd.DataFrame([[0,1,2,34,np.nan,6,2,3],
              [12,1,312,3,12,312,4,1],
              [12,3,12,3,np.nan,12312]])
for col in a.columns:
    print(a[col])

print('-'*60)
b = a.to_numpy()
print(b)

dataframes列选择

import pandas as pd
import numpy as np
a = pd.DataFrame([[0,1,2,34,np.nan,6,2,3],
              [12,1,312,3,12,312,4,1],
              [12,3,12,3,np.nan,123,1,2]],columns=list('ABCDEFGH'))

print(a)
print('-'*30)
print(a['B'])

loc方法可用于选定特定行与列

同时loc方法可用于降维（选定部分行列）

布尔方法查找

import pandas as pd
import numpy as np
a = pd.DataFrame([[0,1,2,34,np.nan,6,2,3],
              [12,1,312,3,12,312,4,1],
              [12,3,12,3,np.nan,123,1,2]],columns=list('ABCDEFGH'))

print(a)
print('-'*30)
print(a['A'])

matplotlib

默认状态不支持中文，注意一下

mpl.rcParams['font.san-serifs']=['SimHei']
mpl.rcParams['axes.unicode_minimus']=False

想要RGB调色，需要加入seaborn库，然后它还需要一个scipy库

点图函数

def visualModel(x,y,ols,lad):
    fig = plt.figure(figsize=(12,6),dpi=80)
    ax2 = fig.add_subplot(121)
    ax3 = fig.add_subplot(122)
    ax2.set_xlabel("$x$")
    ax2.set_xticks(range(0,15000,1500))
    ax2.set_ylabel("$x$")
    ax2.set_title('OLS')
    ax3.set_xlabel("$x$")
    ax3.set_xticks(range(0, 15000, 1500))
    ax3.set_ylabel("$x$")
    ax3.set_title('LAD')
    ax2.scatter(x, y, color="b",alpha=0.4,label='实验数据')
    ax2.plot(x,ols,label='实验数据')
    ax3.scatter(x, y, color="b", alpha=0.4, label='预测数据')
    ax3.plot(x, lad, label='预测数据')
    plt.legend(shadow=True)
    plt.show()

散点图绘画，主要是scatter函数

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
x = np.random.randn(1000)
y = np.random.randn(1000)
plt.scatter(x,y,marker='h',s=np.random.randn(1000)*100,cmap='Blues',c=y,edgecolors='black')
plt.grid(True,linestyle='--')
plt.show()

建立数学模型

概率论的知识来了

线性回归

生成数据

import numpy as np
import pandas as pd
def generate_date():
    np.random.seed(4889)
    x = np.array([10]+list(range(10,29)))
    error = np.round(np.random.randn(20),2)
    y = x + error
    x = np.append(x,29)
    y = np.append(y,29*10)
    return pd.DataFrame({"x": x, "y": y})
print(generate_date())

OLS模型实现与可视化

import matplotlib.pyplot as plt
from sklearn import  linear_model
import numpy as np
import pandas as pd

def generate_date():
    np.random.seed(4889)
    x = np.array([10]+list(range(10,29)))
    error = np.round(np.random.randn(20),2)
    y = x + error
    x = np.append(x,29)
    y = np.append(y,29*10)
    return pd.DataFrame({"x": x, "y": y})

def train_OLS(x,y):
    model=linear_model.LinearRegression()
    model.fit(x,y)
    re=model.predict(x)
    return re

def visualize_model(x, y ,ols):
    fig = plt.figure(figsize=(8,8),dpi=80)
    ax = fig.add_subplot(111)
    ax.set_xlabel("$x$")
    ax.set_xticks(range(10,31,5))
    ax.set_ylabel("$y$")
    ax.scatter(x, y, color="b", alpha=0.4)
    ax.plot(x, ols, 'r--',label="OLS")
    plt.legend(shadow=True)
    plt.show()

if __name__=="__main__":
    data = generate_date()
    features=["x"]
    label=["y"]
    ols=train_OLS(data[features],data[label])
    visualize_model(data[features],data[label],ols)