您必须知道的11个用于数据科学的python库-CSDN博客

One of the reasons Python is so valuable to Data Science is its huge collection of data analysis and visualization libraries. In this article, we’ve covered the most popular ones.

Python对数据科学如此重要的原因之一是其大量的数据分析和可视化库。在本文中，我们介绍了最受欢迎的内容。

1. TensorFlow (1. TensorFlow)

The TensorFlow deep learning framework developed by Google is without a doubt the most popular tool for training neural networks. Google actively uses its own framework for such large-scale services like Gmail and Google Translate. TensorFlow is used by brands such as Uber, Airbnb, Xiaomi, Dropbox, and others.

毫无疑问，由Google开发的TensorFlow 深度学习框架是训练神经网络最流行的工具。 Google积极将自己的框架用于Gmail和Google Translate等大型服务。 TensorFlow已被Uber，Airbnb，Xiaomi，Dropbox 等品牌使用。

With TensorFlow, you can visualize individual parts of a neural network.
使用TensorFlow，您可以可视化神经网络的各个部分。
Tensorflow modules can be made standalone.
Tensorflow模块可以独立制作。
TensorFlow allows you to train neural networks on both the CPU and GPU.
TensorFlow允许您在CPU和GPU上训练神经网络。
Conveyor learning process.
传送带学习过程。
A large team is constantly working to improve stability and new features.
一个庞大的团队正在不断努力提高稳定性和新功能。

2. Scikit-Learn (2. Scikit-Learn)

Scikit-Learn is a popular machine learning library written in Python, C, and C ++. With a common choice for solving the classical problems of machine learning. Used in both industrial systems and scientific research.

Scikit-Learn是使用Python，C和C ++编写的流行机器学习库。具有解决机器学习经典问题的通用选择。用于工业系统和科学研究。

Wide range of supervised and unsupervised learning algorithms.
广泛的有监督和无监督学习算法。
Scikit-learn specializes exclusively in machine learning algorithms. The tasks of the library do not include loading, processing, data manipulation, and visualization.
Scikit-learn专攻机器学习算法。库的任务不包括加载，处理，数据操作和可视化。
Large community and detailed documentation.
大型社区和详细文档。

3. NumPy (3. NumPy)

NumPy is one of the most popular Python libraries for machine learning. TensorFlow and other libraries use it internally to perform operations on multidimensional arrays.

NumPy是最流行的机器学习Python库之一。 TensorFlow和其他库在内部使用它在多维数组上执行操作。

Mathematical algorithms implemented in interpreted languages (Python) are often much slower than those implemented in compiled languages. The NumPy library provides computational algorithm implementations optimized for working with multidimensional arrays.

用解释语言(Python)实现的数学算法通常要比用编译语言实现的算法慢得多。 NumPy库提供了针对使用多维数组而优化的计算算法实现。

4.凯拉斯 (4. Keras)

Keras is a perfect choice if you need to quickly and easily assemble a deep learning model. P is an add-on over the TensorFlow and Theano frameworks. The library is aimed at operational work with deep learning networks, while being designed to be compact, modular and extensible. Keras provides a high-level, intuitive set of abstractions that makes it easy to build neural networks, regardless of the scientific computing library used as the computational backend.

如果您需要快速轻松地组装深度学习模型，那么Keras是一个完美的选择。 P是TensorFlow和Theano框架的附加组件。该库旨在通过深度学习网络进行操作，同时设计为紧凑，模块化和可扩展的。 Keras提供了高级的，直观的抽象集，无论使用什么科学计算库作为计算后端，都可以轻松构建神经网络。

Works great on both CPU and GPU.
在CPU和GPU上都可以很好地工作。
Supports almost all neural network models that can be combined to build more complex models.
支持几乎所有可以组合以构建更复杂模型的神经网络模型。
The platform is completely written in Python, that is, you can use standard debugging tools.
该平台是完全用Python编写的，也就是说，您可以使用标准的调试工具。

5. PyTorch (5. PyTorch)

PyTorch is one of the best options for working with neural networks, a longtime competitor to TensorFlow. Developed primarily by the Facebook AI group. PyTorch was used as the deep learning framework in Generative Adversarial Networking. Learn how to write your own GAN on PyTorch.

PyTorch是使用神经网络的最佳选择之一，神经网络是TensorFlow的长期竞争对手。主要由Facebook AI小组开发。在生成对抗网络中，PyTorch被用作深度学习框架。了解如何在PyTorch上编写自己的GAN 。

Simple GPU support.
简单的GPU支持。
In GPU mode, PyTorch provides high-quality optimization, there is a runtime environment with API in C ++ .
在GPU模式下，PyTorch提供了高质量的优化，在C ++中有一个带有API的运行时环境。
Support for asynchronous computation execution .
支持异步计算执行。
Direct access to ONNX- based frameworks, renderers and runtimes .
直接访问基于ONNX的框架，渲染器和运行时。

6. LightGBM (6. LightGBM)

LightGBM is a gradient boosting framework, one of the most popular algorithms in the Kaggle competition. Gradient boosting is a machine learning technique for classification and regression problems that builds a prediction model in the form of an ensemble of predictive models, usually decision trees.

LightGBM是一个梯度增强框架，是Kaggle竞赛中最受欢迎的算法之一。梯度提升是一种用于分类和回归问题的机器学习技术，它以一组预测模型(通常为决策树)的形式构建预测模型。

Fast learning speed and high efficiency.
学习速度快，效率高。
Low memory consumption.
低内存消耗。
Support for parallel and GPU computing.
支持并行和GPU计算。
You can work with large amounts of data.
您可以处理大量数据。

7.熊猫 (7. Pandas)

Pandas is a library that provides high-level structures for working with data and a wide range of tools for analyzing them. The library allows you to execute many complex commands with a small amount of code: sorting and grouping data, working with missing data, time series, etc. All data is presented in the form of dataframe tables.

Pandas是一个图书馆，提供用于处理数据的高级结构以及用于分析数据的各种工具。该库使您可以用少量代码执行许多复杂的命令：对数据进行排序和分组，处理缺失的数据，时间序列等。所有数据都以数据框表的形式呈现。

8.科学 (8. SciPy)

SciPy is essential for scientific and engineering calculations, including machine learning tasks.

SciPy对于科学和工程计算(包括机器学习任务)至关重要。

Features: search for minimums and maxima of functions, calculation of integrals, support for special functions, signal and image processing, solution of differential equations, etc.
特征：搜索函数的最小值和最大值，积分计算，支持特殊函数，信号和图像处理，微分方程的求解等。
SciPy is closely related to NumPy, so NumPy arrays are supported by default.
SciPy与NumPy密切相关，因此默认情况下支持NumPy数组。
The SciPy library can interact with PyTables, a hierarchical database designed to manage large amounts of data in HDF5 files.
SciPy库可以与PyTables (一个用于管理HDF5文件中的大量数据的层次结构数据库)进行交互。

9. Eli5 (9. Eli5)

Eli5 Is a Python library for visualizing and debugging machine learning models using a unified API. There is built-in support for several ML frameworks and libraries: scikit-learn, Keras, LightGBM mentioned above, as well as XGBoost, lightning and CatBoost.

Eli5是一个Python库，用于使用统一的API可视化和调试机器学习模型。内置了对多个ML框架和库的支持：上面提到的scikit-learn，Keras，LightGBM以及XGBoost，lightning和CatBoost。

10. NLTK(自然语言工具包) (10. NLTK (Natural Language Toolkit))

NLTK is a package of libraries and programs for symbolic and statistical processing of natural language. It is accompanied by extensive documentation, including a book explaining the concepts behind the natural language processing tasks that can be performed with this package.

NLTK是用于自然语言的符号和统计处理的程序库和程序包。它附带大量的文档，其中包括一本书，解释了可以使用此软件包执行的自然语言处理任务背后的概念。

11.枕头 (11. Pillow)

Pillow is an improved version of the PIL (Python Image Library) image library. Supports a variety of file types: PDF, WebP, PCX, PNG, JPEG, GIF, PSD, WebP, PCX, GIF, IM, EPS, ICO, BMP, and others. There are many filtering tools that can be used for computer vision tasks.

Pillow是PIL(Python图像库)图像库的改进版本。支持多种文件类型：PDF，WebP，PCX，PNG，JPEG，GIF，PSD，WebP，PCX，GIF，IM，EPS，ICO，BMP等。有许多过滤工具可用于计算机视觉任务。

结论 (Conclusion)

We reviewed a selection of usefulness that is actively used by machine learning specialists, experts in neural networks, and other areas of Data Science. If you are interested in data science, take a look at our publications on the Data Science tag.

我们回顾了机器学习专家，神经网络专家和数据科学其他领域正在积极使用的一系列有用性。如果您对数据科学感兴趣，请查看我们在数据科学标签上的出版物。

演示地址