

The field of data science is varied, and today there are many different roles and responsibilities involved in the process. Data science work typically involves working with unstructured data, implementing machine learning (ML) concepts and techniques, generating insights. This process typically ends in a visual presentation of data-driven insights.

数据科学领域是多种多样的,如今,该过程涉及许多不同的角色和责任。 数据科学工作通常包括处理非结构化数据,实施机器学习(ML)概念和技术,产生见解。 该过程通常以可视化的数据驱动见解表示。

Machine learning is a critical element of the process, but training ML models is often a time-consuming process that requires a lot of resources. In the past, gaining access to ML resources was difficult and expensive. Today, many cloud computing vendors offer resources for data science in the cloud.

机器学习是该过程的关键要素,但是训练ML模型通常是一个耗时的过程,需要大量资源。 过去,获得机器学习资源的访问既困难又昂贵。 如今,许多云计算供应商为云中的数据科学提供资源。

This article reviews the machine learning options on AWS, Azure and FCP to help you decide which resource meets your ML needs.


云计算在数据科学中的重要性 (The Importance of Cloud Computing for Data Science)

Training of machine learning and deep learning models involves thousands of iterations. You need these extensive amounts of iterations to produce the most accurate model. For example, if you have a set of training samples with only 1TB of data, 10 iterations of this training set will require 10TB of I/O. When computer vision algorithms process high-resolution images the input dataset size is very large. You can reduce the processing time by removing any associated network latency. This can help you ensure the best I/O performance for reading source data.

机器学习和深度学习模型的训练涉及数千次迭代。 您需要进行大量的迭代才能生成最准确的模型。 例如,如果您有一组仅具有1TB数据的训练样本,则此训练集的10次迭代将需要10TB的I / O。 当计算机视觉算法处理高分辨率图像时,输入数据集的大小非常大。 您可以通过消除任何相关的网络延迟来减少处理时间。 这可以帮助您确保最佳的I / O性能,以读取源数据。

Cloud computing enables you to model storage capacity and handle loads at scale, or to scale the processing across nodes. For example, AWS offers Graphics Processing Unit (GPU) instances with 8–256GB memory capacity. These instances are priced at an hourly rate. GPUs are specialized processors designed for complex image processing. Azure offers the NC-series high performance GPU for high performance computing algorithms and applications.

云计算使您可以对存储容量建模并按比例处理负载,或跨节点扩展处理量。 例如,AWS提供具有8–256GB内存容量的图形处理单元(GPU)实例。 这些实例按小时收费。 GPU是专为复杂图像处理而设计的专用处理器。 Azure为高性能计算算法和应用程序提供了NC系列高性能GPU。

AWS Machine Learning服务和工具 (AWS Machine Learning Services and Tools)

Amazon offers several machine learning tools and services. These services enable organizations and developers to improve the performance of compute-intensive and high performance computing models. The list below reviews some of these services.

亚马逊提供了几种机器学习工具和服务。 这些服务使组织和开发人员能够提高计算密集型和高性能计算模型的性能。 下面的列表回顾了其中一些服务。

Amazon SageMaker


SageMaker is a fully-managed machine learning platform for data scientists and developers. The platform runs on Elastic Compute Cloud (EC2), and enables you to build machine learning models, organize your data, and scale your operations. Machine learning applications on SageMaker range from speech recognition, computer vision, and recommendations.

SageMaker是一个面向数据科学家和开发人员的完全托管的机器学习平台。 该平台在Elastic Compute Cloud(EC2)上运行,使您能够构建机器学习模型,组织数据并扩展操作。 SageMaker上的机器学习应用程序包括语音识别,计算机视觉和建议。

The AWS marketplace offers models you use, instead of starting from scratch. You can then start training and optimizing your model. The most common choices are frameworks like Keras, TensorFlow, and PyTorch. SageMaker can optimize and configure these frameworks automatically, or you can train them yourself. You can also develop your own algorithm by building it in a Docker container. You can use a Jupyter notebook to build your machine learning model, and visualize your data.

AWS市场提供了您使用的模型,而不是从头开始。 然后,您可以开始训练和优化模型。 最常见的选择是框架,例如Keras,TensorFlow和PyTorch。 SageMaker可以自动优化和配置这些框架,或者您可以自己进行培训。 您还可以通过在Docker容器中构建算法来开发自己的算法。 您可以使用Jupyter笔记本构建您的机器学习模型,并可视化数据。

Amazon Lex


The Lex API is designed to integrate chatbots into applications. Lex contains deep learning-based Natural Language Processing (NLP) and automatic speech recognition capabilities.

Lex API旨在将聊天机器人集成到应用程序中。 Lex包含基于深度学习的自然语言处理(NLP)和自动语音识别功能。

The API can recognize spoken and written text. Lex’s User Interface (UI) enables you to embed recognized inputs to many different back-end solutions. Besides standalone apps, Lex supports chatbots deployment for Slack, Facebook Messenger, and Twilio.

API可以识别口头和书面文字。 Lex的用户界面(UI)使您可以将公认的输入嵌入到许多不同的后端解决方案中。 除了独立的应用程序外,Lex还支持Slack ,Facebook Messenger和Twilio的聊天机器人部署。

Amazon Rekognition


Rekognition is a computer vision service that simplifies the development process for image and video recognition applications. Rekognition enables companies to customize their apps according to business needs. Rekognition’s image and video recognition features include:

Rekognition是一种计算机视觉服务,可简化图像和视频识别应用程序的开发过程。 Rekognition使公司可以根据业务需求自定义其应用程序。 Rekognition的图像和视频识别功能包括:

  • Objects detection and classification — enables you to find and identify different objects in images and videos. For example, you can detect people that are dancing or an extinguishing fire.

    对象检测和分类-使您能够查找和识别图像和视频中的不同对象。 例如,您可以检测跳舞或灭火的人。

  • Face recognition — used for face detection and matching. For example, you can use it to detect celebrity faces in images and videos.

    人脸识别-用于人脸检测和匹配。 例如,您可以使用它来检测图像和视频中的名人脸。

  • Facial analysis — used to analyze facial expressions. You can detect smiles, analyze eyes, and even define emotional sentiment in videos.

    面部分析-用于分析面部表情。 您可以检测微笑,分析眼睛,甚至在视频中定义情感情绪。

  • Inappropriate scene detection — enables you to determine if an image or video contains inappropriate content, like explicit adult content or violence.


Azure机器学习服务和工具 (Azure Machine Learning Services and Tools)

Compared to AWS, Azure machine learning offerings are more flexible in terms of out-of-the-box algorithms. Azure machine learning offerings can be separated into two main categories — Azure Machine Learning Services and Bot Service.

与AWS相比,Azure机器学习产品在开箱即用的算法方面更加灵活。 Azure机器学习产品可以分为两个主要类别-Azure机器学习服务和Bot服务。

Azure Machine Learning (Azure ML) Services

Azure机器学习(Azure ML)服务

Azure ML is a huge library of pre-trained, pre-packaged machine learning algorithms. Azure ML Service also provides an environment for implementing these algorithms and applying them to real-world applications. The UI of Azure ML enables you to build machine learning pipelines that combine multiple algorithms. You can use the UI to train, test, and evaluate models.

Azure ML是一个庞大的预训练,预包装的机器学习算法库。 Azure ML服务还提供了用于实现这些算法并将其应用于实际应用程序的环境。 Azure ML的UI使您能够构建结合了多种算法的机器学习管道。 您可以使用UI训练,测试和评估模型。

Azure ML also provides solutions for Artificial Intelligence (AI). This includes visualization and other data that can help understand model behavior, and compare algorithms to find the best option.

Azure ML还提供了人工智能(AI)解决方案。 这包括可视化和其他有助于理解模型行为并比较算法以找到最佳选择的数据。

Azure ML Services offerings include:

Azure ML服务产品包括:

  • Python packages — contain functions and libraries for computer vision, text analysis, forecasting, and hardware acceleration.


  • Experimentation — enables you to build different models, compare them, set the project to a particular historic configuration, and continue development from that moment.


  • Model management — provides an environment to host models, manage versions, and monitor models that run on Azure or on-premises.


  • Workbench — a simple command-line and desktop environment with dashboards and model development tracking tools.


  • Visual Studio Tools for AI — enables you to add tools for deep learning and other AI projects.

    适用于AI的Visual Studio工具-使您能够添加用于深度学习和其他AI项目的工具。

Azure Service Bot framework

Azure Service Bot框架

Azure Service Bot provides an environment for building, deploying, and testing bots using different programming languages. Bot Service does not necessarily require machine learning methods, because Microsoft provides five pre-defined bot templates. This includes basic, form, proactive, language understanding, and Q&A. Only the language understanding template requires advanced AI techniques.

Azure Service Bot提供了使用不同的编程语言来构建,部署和测试机器人的环境。 Bot Service不一定需要机器学习方法,因为Microsoft提供了五个预定义的Bot模板。 这包括基本,形式,主动,语言理解和问答。 仅语言理解模板需要高级AI技术。

You can use Node.js and .NET technologies to build bots with Azure. You can deploy these bots on services like Skype, Bing, Office 365 email, Slack, Facebook Messenger, Twilio, and Telegram.

您可以使用Node.js和.NET技术通过Azure构建机器人。 您可以将这些漫游器部署在Skype,Bing,Office 365电子邮件,Slack,Facebook Messenger, Twilio和Telegram等服务上。

Google Cloud Machine Learning服务和工具 (Google Cloud Machine Learning Services and Tools)

Google provides machine learning and AI services on two levels — Google Cloud Machine Learning Engine for experienced data professionals and the Cloud AutoML platform for beginners.

Google在两个级别上提供机器学习和AI服务-针对经验丰富的数据专业人员的Google Cloud Machine Learning Engine和针对初学者的Cloud AutoML平台。

Google Cloud AutoML

Google Cloud AutoML

A cloud-based machine learning platform built for inexperienced users. You can upload your datasets, train models, and deploy them on the website. AutoML integrates with all Google’s services and stores data in the cloud. You can deploy trained models via the REST API interface.

为经验不足的用户构建的基于云的机器学习平台。 您可以上传数据集,训练模型并将其部署在网站上。 AutoML与Google的所有服务集成,并将数据存储在云中。 您可以通过REST API界面部署训练有素的模型。

There are a number of available AutoML products you can access via a graphical interface. This includes training models on structured data, image and video processing services, and a natural language processing and translation engine.

您可以通过图形界面访问许多可用的AutoML产品。 这包括有关结构化数据,图像和视频处理服务以及自然语言处理和翻译引擎的培训模型。

Google Cloud Machine Learning Engine

Google Cloud Machine Learning Engine

The Google Cloud ML Engine enables you to run machine learning training jobs and predictions at scale. You can use Google Cloud ML to train a complex model by leveraging GPU and Tensor Processing Unit (TPU) infrastructure. You can also use the service to deploy an externally trained model.

Google Cloud ML Engine可让您大规模运行机器学习训练作业和预测。 您可以利用Google Cloud ML来利用GPU和Tensor处理单元(TPU)基础架构来训练复杂的模型。 您还可以使用该服务来部署外部训练的模型。

Cloud ML automates all monitoring and resource provisioning processes for running the jobs. Besides hosting and training, Cloud ML can also perform hyperparameter tuning that influences the accuracy of predictions. Without hyperparameter tuning automation, data scientists need to manually experiment with multiple values while evaluating the accuracy of the results.

Cloud ML自动执行所有作业的监视和资源配置过程。 除了托管和培训之外,Cloud ML还可以执行超参数调整,这会影响预测的准确性。 没有超参数调整自动化,数据科学家需要在评估结果准确性的同时手动测试多个值。



TensorFlow is an open source software library that uses data-flow graphs for numerical operations. Mathematical operations in these graphs are represented by nodes, while edges represent data transferred from one node to another. Data in TensorFlow is represented as tensors, which are multidimensional arrays. TensorFlow is usually used for deep learning research and practice. TensorFlow is cross-platform. You can run it on GPUs, CPUs, TPUs, and mobile platforms.

TensorFlow是一个开源软件库,使用数据流图进行数值运算。 这些图中的数学运算由节点表示,而边沿表示从一个节点传输到另一节点的数据。 TensorFlow中的数据表示为张量,它们是多维数组。 TensorFlow通常用于深度学习研究和实践。 TensorFlow是跨平台的。 您可以在GPU,CPU,TPU和移动平台上运行它。

结论 (Conclusion)

You can easily get lost in the variety of available data science solutions in the cloud. They differ in terms of algorithms, features, pricing, and programming languages. Moreover, the list of solutions is always changing. There is a good chance that you will choose one vendor and suddenly another one will release a product that is more suited to your business needs. When choosing, first figure out what you want to achieve with machine learning, and then choose the service that can help you accomplish your goals.

您很容易迷失在云中各种可用的数据科学解决方案中。 它们在算法,功能,定价和编程语言方面有所不同。 而且,解决方案的列表总是在变化。 您很有可能会选择一个供应商,然后突然另一个供应商将发布一种更适合您的业务需求的产品。 选择时,首先要弄清楚您希望通过机器学习实现的目标,然后选择可以帮助您实现目标的服务。

