实战篇

目录

Titanic罹难乘客预测

IMDB影评得分估计

MNIST 手写体数字图片识别


Titanic罹难乘客预测

import pandas as pd
from sklearn.feature_extraction import DictVectorizer
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

#导入pandas方便数据读取和预测
#分别对训练和测试数据从本地进行读取
train = pd.read_csv("../input/titanic/train_data.csv")
test = pd.read_csv("../input/titanic/test_data.csv")
#先分别输出训练与测试数据的基本信息。这是一个好习惯,可以对数据的规模、各个特征的数据类型以及是否有缺失等,有一个总体的了解
print(train.info())
print(test.info())
#发现数据是齐全的,无需对数据进行补充
print("----"*20)

#按照我们之前对Titanic事件的经验,人工选取对预测有效的特征
selected_features = ['Pclass_1','Pclass_2','Pclass_3','Sex','Age','Emb_1','Emb_2','Emb_3','Family_size','Fare']

X_train = train[selected_features]
X_test = test[selected_features]

y_train = train['Survived']

#接下来便是采用 DicVectorizer 对特征向量化
dict_vec = DictVectorizer(sparse = False)
X_train = dict_vec.fit_transform(X_train.to_dict(orient="records"))
print(dict_vec.feature_names_)

X_test = dict_vec.transform(X_test.to_dict(orient="records"))

#从 sklearn.ensemble 中导入 RandomForestClassifier
rfc = RandomForestClassifier()

#从流行工具包 xgboost 导入 XGBClassifier 用于处理分类预测问题
#也使用默认配置初始化XGBClassifier
xgbc = XGBClassifier()

#使用5折交叉验证的方法在训练集上分别对默认配置的RandomForestClassifier以及XGBClassifier 进行性能评估,并获取平均分类准确性的得分
print(cross_val_score(rfc,X_train,y_train,cv=5).mean())
print(cross_val_score(xgbc,X_train,y_train,cv=5).mean())

#使用默认配置的RandomForestClassifier进行预测操作
rfc.fit(X_train,y_train)
rfc_y_predict = rfc.predict(X_test)
rfc_submission = pd.DataFrame({'PassengerId':test['PassengerId'],'Survived':rfc_y_predict})
#将默认配置的 RandomForestClassifier 对测试数据的预测结果存储在文件 rfc_submission.csv中
rfc_submission.to_csv("./rfc_submission.csv",index=False)

#使用默认配置的XGBClassifier 进行预测操作
xgbc.fit(X_train,y_train)
xgbc_y_predict = xgbc.predict(X_test)
xgbc_submission = pd.DataFrame({'PassengerId':test['PassengerId'],'Survived':xgbc_y_predict})
#将默认配置的 使用默认配置的XGBClassifier 对测试数据的预测结果存储在文件 xgbc_submission.csv中
xgbc_submission.to_csv("./xgbc_submission.csv",index=False)

#使用并行网格搜索的方式寻找更好的超参数重组合,以期待进一步提高XGBClassifier 的预测性能
params = {'max_depth':range(2,4),'n_estimators':range(100,1100,200),'learning_rate':[0.05,0.1,0.25,0.5,1.0]}

xgbc_best = XGBClassifier()
gs = GridSearchCV(xgbc_best,params,n_jobs=-1,cv=5,verbose=1)
gs.fit(X_train,y_train)

#检查优化之后的XBGBClassifier的超参数配置以及交叉验证的准确性
print("gs.best_score",gs.best_score_)
print("gs.best_params",gs.best_params_)

#使用经过优化超参数配置的 XGBClassifier 对测试数据的预测结果存储在文件 xgbc_best_submission.csv中
xgbc_best_y_predict = gs.predict(X_test)
xgbc_best_submission = pd.DataFrame({'PassengerId':test['PassengerId'],'Survived':xgbc_best_y_predict})
xgbc_best_submission.to_csv("./xgbc_best_submission.csv",index=False)

结果:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 792 entries, 0 to 791
Data columns (total 17 columns):
Unnamed: 0     792 non-null int64
PassengerId    792 non-null int64
Survived       792 non-null int64
Sex            792 non-null int64
Age            792 non-null float64
Fare           792 non-null float64
Pclass_1       792 non-null int64
Pclass_2       792 non-null int64
Pclass_3       792 non-null int64
Family_size    792 non-null float64
Title_1        792 non-null int64
Title_2        792 non-null int64
Title_3        792 non-null int64
Title_4        792 non-null int64
Emb_1          792 non-null int64
Emb_2          792 non-null int64
Emb_3          792 non-null int64
dtypes: float64(3), int64(14)
memory usage: 105.3 KB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 17 columns):
Unnamed: 0     100 non-null int64
PassengerId    100 non-null int64
Survived       100 non-null int64
Sex            100 non-null int64
Age            100 non-null float64
Fare           100 non-null float64
Pclass_1       100 non-null int64
Pclass_2       100 non-null int64
Pclass_3       100 non-null int64
Family_size    100 non-null float64
Title_1        100 non-null int64
Title_2        100 non-null int64
Title_3        100 non-null int64
Title_4        100 non-null int64
Emb_1          100 non-null int64
Emb_2          100 non-null int64
Emb_3          100 non-null int64
dtypes: float64(3), int64(14)
memory usage: 13.4 KB
None
--------------------------------------------------------------------------------
['Age', 'Emb_1', 'Emb_2', 'Emb_3', 'Family_size', 'Fare', 'Pclass_1', 'Pclass_2', 'Pclass_3', 'Sex']
D:\anaconda3\envs\tree Point five\lib\site-packages\sklearn\ensemble\forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
D:\anaconda3\envs\tree Point five\lib\site-packages\sklearn\ensemble\forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
D:\anaconda3\envs\tree Point five\lib\site-packages\sklearn\ensemble\forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
D:\anaconda3\envs\tree Point five\lib\site-packages\sklearn\ensemble\forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
0.798006329113924
D:\anaconda3\envs\tree Point five\lib\site-packages\sklearn\ensemble\forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
D:\anaconda3\envs\tree Point five\lib\site-packages\sklearn\ensemble\forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)
0.8358860759493671
Fitting 5 folds for each of 50 candidates, totalling 250 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed:    2.9s
[Parallel(n_jobs=-1)]: Done 176 tasks      | elapsed:    5.7s
[Parallel(n_jobs=-1)]: Done 250 out of 250 | elapsed:    7.2s finished
gs.best_score 0.8421717171717171
gs.best_params {'learning_rate': 0.25, 'max_depth': 2, 'n_estimators': 100}

IMDB影评得分估计

MNIST 手写体数字图片识别

 

 

### 黑马点评实战篇课程及相关资料 #### 关于黑马点评实战篇的内容概述 黑马点评项目的实战篇主要围绕 Redis 的实际应用展开,重点讲解如何利用 Redis 解决高并发场景下的性能瓶颈问题。该项目通过具体的业务案例(如秒杀活动中的库存管理),深入探讨了分布式锁的设计与实现方式以及如何防止超卖现象的发生[^1]。 #### 防止超卖的技术手段 为了有效应对多线程环境下的数据一致性问题,例如因多个线程同时操作而导致的商品超卖情况,可以采用基于 Redis 的分布式锁机制来确保同一时刻只有一个线程能够修改共享资源的状态。这种做法能显著减少由于时间差引发的数据不一致风险[^2]。 以下是使用 Python 和 `redis-py` 库实现的一个简单版本的分布式锁代码: ```python import redis from contextlib import contextmanager @contextmanager def acquire_lock_with_timeout(redis_conn, lock_key, timeout=10): """尝试获取一个带过期时间的分布式锁""" identifier = str(uuid.uuid4()) end_time = time.time() + timeout while True: if redis_conn.set(lock_key, identifier, nx=True, ex=timeout): try: yield identifier finally: # 删除锁前先确认当前锁属于本客户端 with redis_conn.pipeline() as pipe: while True: try: pipe.watch(lock_key) current_value = pipe.get(lock_key).decode('utf-8') if current_value == identifier: pipe.multi() pipe.delete(lock_key) pipe.execute() break except redis.WatchError: continue break elif time.time() > end_time: raise TimeoutError(f"Could not get the lock {lock_key}.") else: time.sleep(0.1) # 使用示例 r = redis.StrictRedis(host='localhost', port=6379, db=0) with acquire_lock_with_timeout(r, 'inventory_lock') as lock_id: print(f'Got Lock! ID={lock_id}') ``` 上述脚本展示了如何创建并释放带有自动失效保护功能的独占型锁对象,从而保障临界区内的逻辑得以安全执行。 #### 如何获取相关学习材料? 目前网络上存在一些公开分享的学习文档链接地址,比如 CSDN 博客平台上的这两篇文章提供了详尽的知识点解析和技术细节说明: - **Redis 笔记_基础篇_实战篇_黑马点评项目**: https://blog.csdn.net/weixin_45033015/article/details/127545710 - **Redis 实战**: https://blog.csdn.net/weixin_43424325/article/details/127223497 这些博文不仅涵盖了理论知识还附上了实用性强的实际编码例子供读者参考学习。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

萌新待开发

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值