人工智能与神经网络BenchMark--MLPerf

What is MLPerf

MLPerf is a benchmark suite that is used to evaluate training and inference performance of on-premises and cloud platforms. MLPerf is intended as an independent, objective performance yardstick for software frameworks, hardware platforms, and cloud platforms for machine learning.

The goal of MLPerf is to give developers a way to evaluate hardware architectures and the wide range of advancing machine learning frameworks.

MLPerf Training benchmarks

The suite measures the time it takes to train machine learning models to a target level of accuracy.

MLPerf Inference benchmarks

The suite measure how quickly a trained neural network can perform inference tasks on new data.

The MLPerf inference benchmark is intended as an objective way to measure inference performance in both the data center and the edge. Each benchmark has four measurement scenarios: server, offline, single-stream, and multi-stream. The following figure depicts the basic structure of the MLPerf Inference benchmark and the order that data flows through the system:

 Server and offline scenarios are most relevant for data center use cases.

while single-stream and multi-stream scenarios evaluate the workloads of edge devices

MLPerf Results Show Advances in Machine Learning Inference Performance and Efficiency | MLCommons

Today, MLCommons®, an open engineering consortium, released new results for three MLPerf™ benchmark suites - Inference v2.0, Mobile v2.0, and Tiny v0.7. These three benchmark suites measure the performance of inference - applying a trained machine learning model to new data. Inference enables adding intelligence to a wide range of applications and systems. Collectively, these benchmark suites scale from ultra-low power devices that draw just a few microwatts all the way up to the most powerful datacenter computing platforms. The latest MLPerf results demonstrate wide industry participation, an emphasis on energy efficiency, and up to 3.3X greater performance ultimately paving the way for more capable intelligent systems to benefit society at large.

The MLPerf Mobile benchmark suite targets smartphones, tablets, notebooks, and other client systems with the latest submissions highlighting an average 2X performance gain over the previous round.

MLPerf Mobile v2.0 includes a new image segmentation model, MOSAIC, that was developed by Google Research with feedback from MLCommons. The MLPerf Mobile application and the corresponding source code, which incorporates the latest updates and submitting vendors’ backends, are expected to be available in the second quarter of 2022.

MLPerf Suite Categories

MLPerf Training v2.0 is the sixth instantiation for training and consists of eight different workloads covering a broad diversity of use cases, including vision, language, recommenders, and reinforcement learning.

MLPerf Inference v2.0 tested seven different use cases across seven different kinds of neural networks. Three of these use cases were for computer vision, one for recommender systems, two for language processing, and one for medical imaging.

Image Classificataion:

MLPerf Traininguse ResNet v1.5 with the ImageNet dataset.

MLPerf Inference use ResNet v1.5 with the ImageNet dataset.

Object Detection (Lightweight)

MLPerf Training uses Single-Shot Detector (SSD) on the COCO 2017 dataset.
MLPerf Inference uses SSD-MobileNet-v1 (low-res, 0.09 MP), with the COCO 2017 dataset.

Object Detection (Heavyweight)

MLPerf Training uses Mask R-CNN on the COCO 2014 dataset.
MLPerf Inference uses SSD-ResNet-34 (high-res, 1.44 MP) with the COCO 2017 dataset.

Biomedical Image Segmentation

MLPerf Training use 3D U-Net with the KiTS19 dataset.

MLPerf Inference use 3D U-Net with the KiTS19 dataset.

Automatic Speech Recognition (ASR)

MLPerf Training use RNN-T on the Librispeech dataset.

MLPerf Inference use RNN-T on the Librispeech dataset.

Natural Language Processing (NLP)

MLPerf Training uses Bidirectional Encoder Representations from Transformers (BERT) on the Wikipedia 2020/01/01 dataset.
MLPerf Inference used BERT with the SQuAD v.1.1 dataset.

Recommendation

MLPerf Training use the Deep Learning Recommendation Model (DLRM) on Criteo 1TB dataset.

MLPerf Inference use the Deep Learning Recommendation Model (DLRM) on Criteo 1TB dataset.

Reinforcement Learning

MLPerf Training uses Mini Go benchmark based gameplay.

MLPerf Inference suite

The latest branch for MLPerf Inference is MLPerf Inference v2.0 (submission 02/25/2022)

GitHub - mlcommons/inference: Reference implementations of MLPerf™ inference benchmarks

User Cases

model

reference app

framework

dataset

Image Classificataion:

resnet50-v1.5vision/classification_and_detectiontensorflow, pytorch, onnximagenet2012

Object Detection (Lightweight)

ssd-mobilenet 300x300vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 300x300

Object Detection (Heavyweight)

ssd-resnet34 1200x1200vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 1200x1200

Natural Language Processing (NLP)

bertlanguage/berttensorflow, pytorch, onnxsquad-1.1

Recommendation

dlrmrecommendation/dlrmpytorch, tensorflow(?), onnx(?)Criteo Terabyte

Biomedical Image Segmentation

3d-unetvision/medical_imaging/3d-unet-kits19pytorch, tensorflow, onnxKiTS19

Automatic Speech Recognition (ASR)

rnntspeech_recognition/rnntpytorchOpenSLR LibriSpeech Corpus

MLPerf Mobile Inference suite

This benchmark suite measures how fast systems can process inputs and produce results using a trained model. 

GitHub - mlcommons/mobile_app_open: Mobile App Open

GitHub - mlcommons/mobile_models: MLPerf™ Mobile models

GitHub - mlcommons/mobile_datasets: Scripts to create performance-only test sets for MLPerf™ Mobile

GitHub - mlcommons/mobile_results_v2.0

Scenarios and Metrics

In order to enable representative testing of a wide variety of inference platforms and use cases, MLPerf has defined four different scenarios as described below. A given scenario is evaluated by a standard load generator generating inference requests in a particular pattern and measuring a specific metric.

Scenario

Query Generation

Duration

Samples/query

Latency Constraint

Tail Latency

Performance Metric

Single streamLoadGen sends next query as soon as SUT completes the previous query1024 queries and 60 seconds1None90%90%-ile measured latency
Multiple stream (1.1 and earlier)LoadGen sends a new query every latency constraint if the SUT has completed the prior query, otherwise the new query is dropped and is counted as one overtime query270,336 queries and 60 secondsVariable, see metricBenchmark specific99%Maximum number of inferences per query supported
Multiple stream (2.0 and later)Loadgen sends next query, as soon as SUT completes the previous query270,336 queries and 600 seconds8None99%99%-ile measured latency
ServerLoadGen sends new queries to the SUT according to a Poisson distribution270,336 queries and 60 seconds1Benchmark specific99%Maximum Poisson throughput parameter supported
OfflineLoadGen sends all queries to the SUT at start1 query and 60 secondsAt least 24,576NoneN/AMeasured throughput

Benchmarks

Each benchmark is defined by a Dataset and Quality Target. The following table summarizes the benchmarks in this version of the suite (the rules remain the official source of truth):

Area

Task

Model

Dataset

Mode

Quality

VisionImage classificationMobileNetEdgeTPUImageNetSingle-stream, Offline98% of FP32 (Top1: 76.19%)
VisionObject detectionMobileDETsMS-COCO 2017Single-stream95% of FP32 (mAp: 0.285)
VisionSegmentationDeepLabV3+ (MobileNetV2)ADE20K (32 classes, 512x512)Single-stream97% of FP32 (32-class mIOU: 54.8)
VisionSegmentation, MOSAICMOSAICADE20K (32 classes, 512x512)Single-stream96% of FP32 (32-class mIOU: 59.8)
LanguageLanguage processingMobile-BERTSQUAD 1.1Single-stream93% of FP32 (F1 score: 90.5)

Each Mobile benchmark requires the single stream scenario. The Image classification benchmark permits an optional Offline scenario.

Divisions

MLPerf aims to encourage innovation in software as well as hardware by allowing submitters to reimplement the reference implementations. MLPerf has two Divisions that allow different levels of flexibility during reimplementation.

The Closed division is intended to compare hardware platforms or software frameworks “apples-to-apples” and requires using the same model as the reference implementation.

The Open division is intended to foster innovation and allows using a different model or retraining.

inference-mobile-20 results

https://mlcommons.org/en/inference-mobile-20/

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值