python开源聊天机器人_排名前20位的Python AI和机器学习开源项目

最新推荐文章于 2024-08-01 17:15:53 发布

cumei1658

最新推荐文章于 2024-08-01 17:15:53 发布

阅读量1.1k

点赞数

文章标签：算法数据结构神经网络大数据编程语言

原文链接：https://www.pybloggers.com/2018/07/top-20-python-ai-and-machine-learning-open-source-projects/

版权

本文列举了自2016年以来在GitHub上排名前20的Python AI和机器学习开源项目，重点介绍了TensorFlow、Scikit-learn等项目的增长趋势。这些项目对学习和理解机器学习的发展以及参与开源社区非常有帮助。

摘要由CSDN通过智能技术生成

python开源聊天机器人

Getting into Machine Learning and AI is not an easy task. Many aspiring professionals and enthusiasts find it hard to establish a proper path into the field, given the enormous amount of resources available today. The field is evolving constantly and it is crucial that we keep up with the pace of this rapid development. In order to cope with this overwhelming speed of evolution and innovation, a good way to stay updated and knowledgeable on the advances of ML, is to engage with the community by contributing to the many open-source projects and tools that are used daily by advanced professionals.

进入机器学习和AI并非易事。鉴于当今可用的大量资源，许多有抱负的专业人士和发烧友发现很难建立进入该领域的正确道路。该领域正在不断发展，因此，跟上这一快速发展的步伐至关重要。为了应对这种巨大的发展和创新速度，保持最新状态并了解ML进步的一种好方法是通过为高级用户每天使用的许多开源项目和工具做出贡献来与社区互动。专业人士。

Here we update the information and examine the trends since our previous post Top 20 Python Machine Learning Open Source Projects (Nov 2016).

自从我们上一篇文章前20个Python机器学习开放源代码项目（2016年11月）以来，我们在这里更新信息并检查趋势。

Tensorflow has moved to the first place with triple-digit growth in contributors. Scikit-learn dropped to 2nd place, but still has a very large base of contributors.

Tensorflow的贡献者增长了三位数，已跃居第一。 Scikit-learn跌至第二位，但贡献者基础仍然很大。

Compared to 2016, the projects with the fastest growth in number of contributors were

与2016年相比，贡献者数量增长最快的项目是

TensorFlow, 169% up, from 493 to 1324 contributors
Deap, 86% up, from 21 to 39 contributors
Chainer, 83% up, from 84 to 154 contributors
Gensim, 81% up, from 145 to 262 contributors
Neon, 66% up, from 47 to 78 contributors
Nilearn, 50% up, from 46 to 69 contributors

TensorFlow，从493个增加到1324个贡献者，增长169％
成长，从21人增加到39人，增长了86％
Chainer，从84个贡献到154个贡献者，增长83％
Gensim，从145个贡献者增加到262个贡献者，增长81％
霓虹灯，从47个贡献到78个贡献者，上升66％
Nilearn，上升了50％，从46个贡献到69个贡献者

Also new in 2018:

2018年的新功能：

Keras, 629 contributors
PyTorch, 399 contributors

Keras，629位贡献者
PyTorch，399位贡献者

top-python-ai-machine-learning-github-693

Fig. 1: Top 20 Python AI and Machine Learning projects on Github.

图1：Github上排名前20的Python AI和机器学习项目。

Size is proportional to the number of contributors, and color represents to the change in the number of contributors – red is higher, blue is lower. Snowflake shape is for Deep Learning projects, round for other projects.

大小与贡献者的数量成正比，颜色代表贡献者的数量的变化-红色越高，蓝色越低。雪花形状适用于深度学习项目，圆形适用于其他项目。

We see that Deep Learning projects like TensorFlow, Theano, and Caffe are among the most popular.

我们看到诸如TensorFlow，Theano和Caffe之类的深度学习项目最受欢迎。

The list below gives projects in descending order based on the number of contributors on Github. The change in number of contributors is versus 2016 KDnuggets Post on Top 20 Python Machine Learning Open Source Projects.

以下列表根据Github上的贡献者数量以降序排列。贡献者数量的变化与2016年KDnuggets发布的《 Python 20机器学习开源项目前20名》相比。

We hope you enjoy going through the documentation pages of each of these to start collaborating and learning the ways of Machine Learning using Python.

我们希望您喜欢阅读每个文档的文档页面，以开始协作并学习使用Python进行机器学习的方式。

TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization. The system is designed to facilitate research in machine learning, and to make it quick and easy to transition from research prototype to production system. Contributors: 1324 (168% up), Commits: 28476, Stars: 92359. Github URL: Tensorflow
Scikit-learn is simple and efficient tools for data mining and data analysis, accessible to everybody, and reusable in various context, built on NumPy, SciPy, and matplotlib, open source, commercially usable – BSD license. Contributors: 1019 (39% up), Commits: 22575, Github URL: Scikit-learn
Keras, a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Contributors: 629 (new), Commits: 4371, Github URL: Keras
PyTorch, Tensors and Dynamic neural networks in Python with strong GPU acceleration. Contributors: 399 (new), Commits: 6458, Github URL: pytorch
Theano allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Contributors: 327 (24% up), Commits: 27931, Github URL: Theano
Gensim is a free Python library with features such as scalable statistical semantics, analyze plain-text documents for semantic structure, retrieve semantically similar documents. Contributors: 262 (81% up), Commits: 3549, Github URL: Gensim
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors. Contributors: 260 (21% up), Commits: 4099, Github URL: Caffe
Chainer is a Python-based, standalone open source framework for deep learning models. Chainer provides a flexible, intuitive, and high performance means of implementing a full range of deep learning models, including state-of-the-art models such as recurrent neural networks and variational auto-encoders. Contributors: 154 (84% up), Commits: 12613, Github URL: Chainer
Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator. Contributors: 144 (33% up), Commits: 9729, Github URL: Statsmodels
Shogun is Machine learning toolbox which provides a wide range of unified and efficient Machine Learning (ML) methods. The toolbox seamlessly allows to easily combine multiple data representations, algorithm classes, and general purpose tools. Contributors: 139 (32% up), Commits: 16362, Github URL: Shogun
Pylearn2 is a machine learning library. Most of its functionality is built on top of Theano. This means you can write Pylearn2 plugins (new models, algorithms, etc) using mathematical expressions, and Theano will optimize and stabilize those expressions for you, and compile them to a backend of your choice (CPU or GPU). Contributors: 119 (3.5% up), Commits: 7119, Github URL: Pylearn2
NuPIC is an open source project based on a theory of neocortex called Hierarchical Temporal Memory (HTM). Parts of HTM theory have been implemented, tested, and used in applications, and other parts of HTM theory are still being developed. Contributors: 85 (12% up), Commits: 6588, Github URL: NuPIC
Neon is Nervana’s Python-based deep learning library. It provides ease of use while delivering the highest performance. Contributors: 78 (66% up), Commits: 1112, Github URL: Neon
Nilearn is a Python module for fast and easy statistical learning on NeuroImaging data. It leverages the scikit-learn Python toolbox for multivariate statistics with applications such as predictive modelling, classification, decoding, or connectivity analysis. Contributors: 69 (50% up), Commits: 6198, Github URL: Nilearn
Orange3 is open source machine learning and data visualization for novice and expert. Interactive data analysis workflows with a large toolbox. Contributors: 53 (33% up), Commits: 8915, Github URL: Orange3
Pymc is a python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo. Its flexibility and extensibility make it applicable to a large suite of problems. Contributors: 39 (5.4% up), Commits: 2721, Github URL: Pymc
Deap is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data structures transparent. It works in perfect harmony with parallelisation mechanism such as multiprocessing and SCOOP. Contributors: 39 (86% up), Commits: 1960, Github URL: Deap
Annoy (Approximate Nearest Neighbors Oh Yeah) is a C++ library with Python bindings to search for points in space that are close to a given query point. It also creates large read-only file-based data structures that are mapped into memory so that many processes may share the same data. Contributors: 35 (46% up), Commits: 527, Github URL: Annoy
PyBrain is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms. Contributors: 32 (3% up), Commits: 992, Github URL: PyBrain
Fuel is a data pipeline framework which provides your machine learning models with the data they need. It is planned to be used by both the Blocks and Pylearn2 neural network libraries. Contributors: 32 (10% up), Commits: 1116, Github URL: Fuel

TensorFlow最初是由Google机器智能研究组织内Google Brain团队的研究人员和工程师开发的。该系统旨在促进机器学习方面的研究，并使其快速，轻松地从研究原型过渡到生产系统。贡献者：1324（168％上升），承诺：28476，星数：92359. Github URL： Tensorflow
Scikit-learn是简单有效的数据挖掘和数据分析工具，每个人都可以访问，并且可以在NumPy，SciPy和matplotlib（开源，可商业使用）– BSD许可证的基础上进行重用。贡献者：1019（上升39％），贡献： 22575 ，Github URL： Scikit-learn
Keras是用Python编写的高级神经网络API，能够在TensorFlow，CNTK或Theano之上运行。贡献者：629（新），提交：4371，Github URL： Keras
具有强大GPU加速功能的Python中的PyTorch ，张量和动态神经网络。贡献者：399（新），提交：6458，Github URL： pytorch
Theano允许您有效地定义，优化和评估涉及多维数组的数学表达式。贡献者：327（上升24％），提交：27931，Github URL： Theano
Gensim是一个免费的Python库，具有可扩展的统计语义，分析纯文本文档的语义结构，检索语义相似的文档等功能。贡献者：262（上升81％），提交：3549，Github URL： Gensim
Caffe是一个深度学习框架，考虑了表达，速度和模块化。它是由伯克利视觉与学习中心（ BVLC ）和社区贡献者开发的。贡献者：260（上升21％），提交：4099，Github URL： Caffe
Chainer是用于深度学习模型的基于Python的独立开源框架。 Chainer提供灵活，直观和高性能的方式来实现各种深度学习模型，包括最新模型，例如递归神经网络和变体自动编码器。贡献者：154（上升84％），提交：12613，Github URL： Chainer
Statsmodels是一个Python模块，允许用户浏览数据，估计统计模型和执行统计测试。描述性统计信息，统计检验，绘图功能和结果统计信息的大量列表适用于不同类型的数据和每个估计量。贡献者：144（上升33％），提交：9729，Github URL： Statsmodels
Shogun是机器学习工具箱，它提供了广泛的统一而高效的机器学习（ML）方法。该工具箱无缝地允许轻松地组合多个数据表示，算法类和通用工具。贡献者：139（上升32％），提交：16362，Github URL： Shogun
Pylearn2是一个机器学习库。它的大部分功能都建立在Theano之上。这意味着您可以使用数学表达式编写Pylearn2插件（新模型，算法等），Theano将为您优化和稳定这些表达式，并将其编译到您选择的后端（CPU或GPU）。贡献者：119（上升3.5％），提交：7119，Github URL： Pylearn2
NuPIC是一个基于新皮质理论（称为分层时间记忆（HTM））的开源项目。 HTM理论的某些部分已经实现，测试并在应用程序中使用，HTM理论的其他部分仍在开发中。贡献者：85（上升12％），提交：6588，Github URL： NuPIC
Neon是Nervana基于Python的深度学习库。它在提供最高性能的同时，提供了易用性。贡献者：78（上升66％），提交：1112，Github URL：霓虹灯
Nilearn是一个Python模块，用于快速，轻松地对NeuroImaging数据进行统计学习。它利用scikit-learn Python工具箱通过预测建模，分类，解码或连通性分析等应用程序进行多元统计。贡献者：69（上升50％），贡献：6198，Github URL： Nilearn
Orange3是面向新手和专家的开源机器学习和数据可视化。带有大型工具箱的交互式数据分析工作流。贡献者：53（上升33％），提交：8915，Github URL： Orange3
Pymc是一个Python模块，可实现贝叶斯统计模型和拟合算法，包括Markov链Monte Carlo。它的灵活性和可扩展性使其适用于大量问题。贡献者：39（上升5.4％），提交：2721，Github URL： Pymc
Deap是一个新颖的进化计算框架，用于快速原型制作和测试思想。它力求使算法明确且数据结构透明。它与多处理和SCOOP等并行化机制完美协调。贡献者：39（上升86％），承诺：1960，Github URL： Deap
Annoy （哦，是，最近邻居）是一个带有Python绑定的C ++库，用于搜索空间中接近给定查询点的点。它还会创建大型的基于文件的只读数据结构，这些数据结构映射到内存中，以便许多进程可以共享相同的数据。贡献者：35（上升46％），提交：527，Github URL： Annoy
PyBrain是用于Python的模块化机器学习库。它的目标是为机器学习任务和各种预定义环境提供灵活，易于使用但仍功能强大的算法，以测试和比较您的算法。贡献者：32（上升3％），提交：992，Github URL： PyBrain
Fuel是一个数据管道框架，可为您的机器学习模型提供所需的数据。计划由Blocks和Pylearn2神经网络库使用。贡献者：32（上升10％），贡献：1116，Github URL：燃料

The contributor and commit numbers were recorded in February 2018.

贡献者和提交号记录于2018年2月。

Editor’s note: This was originally posted on KDNuggets, and has been reposted with permission. Author Ilan Reinstein is a physicist and data scientist.

编者注：该文章最初发布在KDNuggets上，并经许可被重新发布。作者Ilan Reinstein是物理学家和数据科学家。