Python 基本功能应用与简单数据分析

最新推荐文章于 2023-01-13 06:11:22 发布

两面包+芝士

最新推荐文章于 2023-01-13 06:11:22 发布

阅读量821

点赞数

分类专栏： python 文章标签： python 数据分析

本文链接：https://blog.csdn.net/weixin_42455006/article/details/123993326

版权

python 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

任务 1 - 掷骰子

编写一个Python脚本来生成6面骰子掷骰子。
脚本应提示用户"再次掷骰子？（y/n）。
如果用户键入"y"，则掷骰子并再次显示提示。
如果用户键入"n"，则脚本应结束。

提示:
若要生成某个范围中的随机数，请使用以下代码：

import random
rand_number = random.randint(1, 6)

掷骰子
获取用户输入
使用 while 循环执行条件

import random
rand_number = random.randint(1, 6)

while input("Roll the dice again? (y/n)") == 'y':
    print(random.randint(1, 6))

任务 2 - 刽子手

创建一个刽子手游戏。询问用户单字母猜测。将错误数限制为 10。显示：

剩余的错误数
他们猜到的人物
他们猜对的字母（在单词的位置）

original_word = "椅子" # 这是原来的词
word = set（original_word） # 把单词放在一个集合中，以便稍后进行比较
blanks = ["_" for c in original_word] # 创建一个空白列表。如果猜测正确，将填充此列表

正在运行 = True # 运行条件。如果猜对了字，或者所有的生命都被消耗掉了，那就错了

guessed_letters = set（） # 这是猜到的字母列表

n_lives_left = 10 # 生命数
while running:
    uinput = input("Guess a letter: ") # 取出用户输入
    
    if uinput not in guessed_letters:
        guessed_letters.add(uinput) # 将输入追加到猜测列表中
        
        # 检查猜到的字母是否在原始单词中
        if uinput in word:
            print("The letter {0} is in the word".format(uinput))
            word.remove(uinput) # 如果猜测是正确的，请从其余字母中删除该字母
            
            blank_index = [pos for pos, char in enumerate(original_word) if char == uinput] # 检查猜到了哪个位置
            
            for idx in blank_index:
                blanks[idx] = uinput # 将猜出的字母放在正确的索引中
            
            # 如果所有字母都猜对了，打印原始单词并退出循环
            if len(word) <= 0:
                print("You guessed the word: {0}".format(original_word))
                break
            
        else:
            print("The letter {0} is not in the word".format(uinput)) # print that the guess is incorrect
            n_lives_left = n_lives_left - 1 # reduce the life by 1
    else:
        print("You already guessed '{0}'!".format(uinput) ) # 如果用户猜出同一个字母两次
    
    # 打印猜到的字母，剩余的生命和正确的猜测
    print("Guessed letters: {0}".format(guessed_letters))
    print("{0} lives left".format(n_lives_left))
    print(blanks)
    
    # 如果所有生命都已完成，更改运行条件，这将结束循环
    if n_lives_left <= 0:
        running = False
        print("Out of lives!")

任务 3 - 二阶多项式

从二阶多项式（二次）生成合成数据

$f = 4 + 1.5 x +3.2 x^2.$

向线性模型添加噪声

$y = f + e p s i l o n$

生成一个散点图 $y$ 对 $x$ ，将真实值 $f$ 添加到图中。
将线性模型拟合到数据，使用正态方程： $beta = （X^T X ）^{-1} X^TY$ 来估计 $beta_0$ 和 $beta_1$ 。
将二次回归模型拟合到数据中，使用正态方程估计参数。
绘制二次模型

1.让我们先生成合成数据。

import numpy as np
import matplotlib.pyplot as plt

# 初始化 RNG，所以我们每次都能得到相同的结果
np.random.seed（0）

# Number of training points
m = 50

x = np.linspace（0.0， 1.0， m）

# 函数系数/参数
beta0 = 4
beta1 = 1.5
beta2 = 3.2
 
# 二阶多项式f value
f = beta0 + beta1 * x + beta2 * np.power（x，2）

2.让我们添加随机噪声。我们将从均值为零且方差为 $0.1$ 的正态分布中对噪声进行采样。

#从总体生成噪声样本
sigma2 = 0.1
y = f + np.random.normal（0， np.sqrt（sigma2）， m）

3.Let’s produce the scatter plot.

fig2 = plt.figure()
plt.plot(x, f, label = "f (Ground Truth)")
plt.scatter(x, y, label = "y (Observed Points)", color = "red")
plt.xlabel("Predictor/Feature Value")
plt.ylabel("Target Value")
plt.title("Underlying Function vs Observations")
plt.legend(loc="upper left")

在这里插入图片描述
4.让我们使用normal equation计算线性模型的解。

# 为beta_0添加一列
X = np.column_stack（（np.ones（m）， x））
# 将 X 转换为矩阵
X = np.asmatrix（X）

# 估计线性回归系数
lin_betas = np.linalg.inv（X.T*X） * X.T * y.reshape（m，1）

# beta_0
lin_intercept = lin_betas[0，0]
print（"intercept （beta_0）： {0：.2f}".format（lin_intercept））

# beta_1
lin_beta = lin_betas[1，0]
print（"beta_1： {0：.2f}".format（lin_beta））

5.让我们使用线性模型绘制预测图。

# Reconstruct our model from the coefficients
lin_func = X * lin_betas

fig2 = plt.figure()
plt.plot(x, f, label = "f (Ground Truth)")
plt.scatter(x, y, label = "Observed Points", color = "red")
plt.plot(x, lin_func, label = "Linear model") 
plt.legend()

在这里插入图片描述
6.现在，我们将估计假设二次模型的系数。

# 通过添加一列平方 x 值来构造多项式数据矩阵
poly_X = np.column_stack（（np.ones（m）， x， np.power（x，2）））

poly_X = np.asmatrix（poly_X）

# 估计多项式回归系数
poly_betas = np.linalg.inv（poly_X.T*poly_X） * poly_X.T * y.reshape（m，1）

# beta_0
poly_intercept = poly_betas[0，0]
print（"intercept （beta_0）： {0：.2f}".format（poly_intercept））

# beta_1
poly_beta1 = poly_betas[1,0]
print("beta_1: {0:.2f}".format(poly_beta1))

# beta_2
poly_beta2 = poly_betas[2,0]
print("beta_2: {0:.2f}".format(poly_beta2))

7.让我们绘制估计的回归曲线。

poly_func = poly_X * poly_betas

fig3 = plt.figure()
plt.plot(x, f, label = "f (Ground Truth)")
plt.scatter(x, y, label = "Observed Points", color = "red")
plt.plot(x, poly_func, label = "Poly") 
plt.plot(x, lin_func, label = "Linear") 
plt.legend()

在这里插入图片描述

任务 4 - 实践分析

使用 sklearn 中的iris数据集。使用以下代码训练和测试split：

from sklearn.model_selection import train_test_split
import numpy as np
from sklearn import datasets

iris = datasets.load_iris()

X = iris.data

y = iris.target

np.random.seed(0)

X_train, X_test, y_train, y_test = train_test_split(X, y)

子任务 1 - 手动调整 Logistic 回归分类器的惩罚类型和正则化量

编写自己的代码以在数据集上找到最佳惩罚类型和正则化参数"C"。您可以使用"cross_val_score"来帮助您。

提示：对 “C” 的值使用指数缩放来搜索更广泛的参数值，例如 ‘‘np.exp（np.arange（1， 10）)’’

您可以从以下链接查看 LogisticRegression（）对象的详细信息。

https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

子任务 2 - 自动调优

执行相同的任务，这次使用sklearn的"GridSearchCV"。

子任务 3 - 将调优模型与未调优模型进行比较

在测试集上使用 sklean 的默认参数将调优模型的性能与模型进行比较。

子任务 1

from sklearn.model_selection import train_test_split
import numpy as np
from sklearn import datasets
import warnings
warnings.filterwarnings("ignore")

iris = datasets.load_iris()

X = iris.data

y = iris.target

np.random.seed(0)

X_train, X_test, y_train, y_test = train_test_split(X, y)

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

cv_scores = []
params = []

k_values = np.exp(np.arange(1, 10))
penalties = ['l2', 'l1']

for penalty in penalties:
    for k in k_values:
        clf = LogisticRegression(penalty = penalty, C = k)

        scores = cross_val_score(clf, X_train, y_train)

        # Get the mean of the scores from the 3 folds
        cv_score = np.mean(scores)

        cv_scores.append(cv_score)
        
        params.append((penalty, k))

print("Best params: {0}".format(params[np.argmax(cv_scores)]))

Result：
Best params: (‘l1’, 2.718281828459045)

子任务 2

from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': np.exp(np.arange(1, 10)),
    'penalty': ['l2', 'l1']
}

model = LogisticRegression()

searcher = GridSearchCV(model, param_grid)

searcher.fit(X_train, y_train)

searcher.best_params_

Result：
{‘C’: 54.598150033144236, ‘penalty’: ‘l2’}

子任务 3

from sklearn.metrics import classification_report
base_model = LogisticRegression().fit(X_train, y_train) # Use default parameter set

base_preds = base_model.predict(X_test)

print(classification_report(y_test, base_preds))

在这里插入图片描述

tuned_model = searcher.best_estimator_

tuned_preds = tuned_model.predict(X_test)

print(classification_report(y_test, tuned_preds))

在这里插入图片描述

两面包+芝士

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python 基本功能应用与简单数据分析

任务 1 - 掷骰子编写一个Python脚本来生成6面骰子掷骰子。脚本应提示用户"再次掷骰子？（y/n）。如果用户键入"y"，则掷骰子并再次显示提示。如果用户键入"n"，则脚本应结束。提示:若要生成某个范围中的随机数，请使用以下代码：import randomrand_number = random.randint(1, 6)掷骰子获取用户输入使用 while 循环执行条件import randomrand_number = random.randint(1, 6)whi
复制链接

扫一扫

专栏目录