DataScience中遇到的一些技巧

Anaconda

Anaconda中安装XGBoost、Lightgbm、CatBoost

pip下安装lightgbm、catboost在Anaconda环境中的Jupyter里并不能使用,pip的lightgbm、catboost默认安装在本地python环境中,而anaconda的python路径与本地路径不同,不能使用本地环境中的包

$pip install xgboost
$conda install -c conda-forge lightgbm
$conda install -c conda-forge catboost

Pandas

填充Dataframe中的缺失值(nan)

nan为非数,且nan!=nan,在数据中为缺失值,当对数据进行预处理时,需对缺失值进行处理,一种方式为删掉含有缺失值的行或列,另一种则是对nan值进行填充。
进行填充的方式如下:

columns_with_nan=['column_include_nan_1','column_include_nan_2',
				  'column_include_nan_3']
for i in columns_with_nan:
    X_all[i].fillna(-1.0,inplace=True)

对所给非数字的数据进行硬编码

X_all=pd.DataFrame(YourData)
used_columns=['column1','column2','column3']
for i in used_columns:
    c_Map = {elem:index+1 for index,elem in enumerate(set(X_all[i]))}
    X_all[i] = X_all[i].map(c_Map)
    

更改列的数据类型

example_df=pd.DataFrame(YourData)
example_df.astype(数据类型)
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Data Science (The MIT Press Essential Knowledge) By 作者: John D. Kelleher – Brendan Tierney ISBN-10 书号: 0262535432 ISBN-13 书号: 9780262535434 出版日期: 2018-04-13 pages 页数: 280 A concise introduction to the emerging field of data science, explaining its evolution, relation to machine learning, current uses, data infrastructure issues, and ethical challenges. The goal of data science is to improve decision making through the analysis of data. Today data science determines the ads we see online, the books and movies that are recommended to us online, which emails are filtered into our spam folders, and even how much we pay for health insurance. This volume in the MIT Press Essential Knowledge series offers a concise introduction to the emerging field of data science, explaining its evolution, current uses, data infrastructure issues, and ethical challenges. It has never been easier for organizations to gather, store, and process data. Use of data science is driven by the rise of big data and social media, the development of high-performance computing, and the emergence of such powerful methods for data analysis and modeling as deep learning. Data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting non-obvious and useful patterns from large datasets. It is closely related to the fields of data mining and machine learning, but broader in scope. This book offers a brief history of the field, introduces fundamental data concepts, and describes the stages in a data science project. It considers data infrastructure and the challenges posed by integrating data from multiple sources, introduces the basics of machine learning, and discusses how to link machine learning expertise with real-world problems. The book also reviews ethical and legal issues, developments in data regulation, and computational approaches to preserving privacy. Finally, it considers the future impact of data science and offers principles for success in data science projects.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值