前言
首先看官方给出的性能差距图,从图中可以看出性能确实有质的飞跃
由于没有专业的显卡,我只能拿出家用 RTX 2060
进行测试,测试环境如下
- CentOS 7
- CPU(i7-10700)
- GPU(RTX 2060 -> 6G)
- 内存(16G)
环境准备
- Spark3+
- NVIDIA GPU驱动(linux)
- cuda 11.8
- Spark-rapids
- TPC-DS
- Miniconda (Python3.9+)
本文采用 NVIDIA官方 spark-rapids 技术进行GPU加速测试
官方给出的环境要求:
To enable GPU processing acceleration you will need:
- Apache Spark 3.1+
- A Spark cluster configured with GPUs that comply with the requirements for RAPIDS.
- One GPU per executor.
- The RAPIDS Accelerator for Apache Spark plugin jar.
- To set the config spark.plugins to com.nvidia.spark.SQLPlugin
环境安装
NVIDIA GPU驱动安装
检查环境
sudo yum install pciutils
[root@nebula3 nds]# lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1)
然后在官网上下载适用于CentOS系统的英伟达显卡驱动(选择对应的显卡型号):https://www.nvidia.cn/geforce/drivers/
或者直接下载 cuda 驱动
安装需要的环境
# 1.禁用Nouveau驱动,打开终端,执行命令
sudo echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
# 2.更新内核驱动,执行命令:
sudo yum update
# 3.安装依赖包,执行命令:
sudo yum install gcc kernel-devel kernel-headers
# 4.关闭X Server服务,执行命令:
sudo service stop lightdm
# 5.进入命令行模,命令:
sudo init 3
# 如果在这一步无法安装 NVIDIA 驱动,请重启电脑并查看 nouveau 是否完全关闭
# lsmod | grep nouveau
# 6.运行安装包,执行命令:
sudo sh NVIDIA-Linux-x86_64-*-run
# 7.安装完成后,重启系统,执行命令:
sudo reboot
# 8.启动系统后,检查显卡驱动是否安装成功,执行命令:
nvidia-smi
可查看完整安装教程 :CentOS 7.6安装 NVIDIA 独立显卡驱动(完整版)
cudf安装(可选-主要针对 Python)
conda install -c rapidsai -c conda-forge -c nvidia \
cudf=23.06 python=3.10 cudatoolkit=11.8
测试流程
参考教程
首先看官方给出的性能差距图,从图中可以看出性能确实有质的飞跃
由于没有专业的显卡,我只能拿出家用 RTX 2060
进行测试,测试环境如下
- CentOS 7
- CPU(i7-10700)
- GPU(RTX 2060 -> 6G)
- 内存(16G)
环境准备
- Spark3+
- NVIDIA GPU驱动(linux)
- cuda 11.8
- Spark-rapids
- TPC-DS
- Miniconda (Python3.9+)
本文采用 NVIDIA官方 spark-rapids 技术进行GPU加速测试
官方给出的环境要求:
To enable GPU processing acceleration you will need:
- Apache Spark 3.1+
- A Spark cluster configured with GPUs that comply with the requirements for RAPIDS.
- One GPU per executor.
- The RAPIDS Accelerator for Apache Spark plugin jar.
- To set the config spark.plugins to com.nvidia.spark.SQLPlugin
环境安装
NVIDIA GPU驱动安装
检查环境
sudo yum install pciutils
[root@nebula3 nds]# lspci | grep -i vga
01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1)
然后在官网上下载适用于CentOS系统的英伟达显卡驱动(选择对应的显卡型号):https://www.nvidia.cn/geforce/drivers/
或者直接下载 cuda 驱动
安装需要的环境
# 1.禁用Nouveau驱动,打开终端,执行命令
sudo echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
# 2.更新内核驱动,执行命令:
sudo yum update
# 3.安装依赖包,执行命令:
sudo yum install gcc kernel-devel kernel-headers
# 4.关闭X Server服务,执行命令:
sudo service stop lightdm
# 5.进入命令行模,命令:
sudo init 3
# 如果在这一步无法安装 NVIDIA 驱动,请重启电脑并查看 nouveau 是否完全关闭
# lsmod | grep nouveau
# 6.运行安装包,执行命令:
sudo sh NVIDIA-Linux-x86_64-*-run
# 7.安装完成后,重启系统,执行命令:
sudo reboot
# 8.启动系统后,检查显卡驱动是否安装成功,执行命令:
nvidia-smi
可查看完整安装教程 :CentOS 7.6安装 NVIDIA 独立显卡驱动(完整版)
cudf安装(可选-主要针对 Python)
conda install -c rapidsai -c conda-forge -c nvidia \
cudf=23.06 python=3.10 cudatoolkit=11.8
测试流程
参考教程
按照 https://github.com/NVIDIA/spark-rapids-benchmarks/nds 官方教程走就行
大概步骤如下:
- 安装 TPC-DS,随机生成数据
- 下载 spark-rapids.jar https://nvidia.github.io/spark-rapids/docs/archive.html,并设置环境变量;spark运行参数,请查看 base.template 和 convert_submit_*.template
性能报告
CPU(i7-10700)
Spark configuration follows:
('spark.driver.extraJavaOptions', '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED')
('spark.executor.id', 'driver')
('spark.driver.memory', '10G')
('spark.executor.instances', '8')
('spark.driver.port', '42825')
('spark.app.submitTime', '1684138774104')
('spark.app.id', 'local-1684138775340')
('spark.executor.cores', '12')
('spark.executor.memory', '16G')
('spark.rdd.compress', 'True')
('spark.executor.extraJavaOptions', '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED')
('spark.app.name', 'NDS - transcode - parquet')
('spark.serializer.objectStreamReset', '100')
('spark.sql.warehouse.dir', 'file:/root/spark/test/spark-rapids-benchmarks/nds/spark-warehouse')
('spark.sql.shuffle.partitions', '200')
('spark.master', 'local[*]')
('spark.submit.pyFiles', '')
('spark.submit.deployMode', 'client')
('spark.app.startTime', '1684138774633')
GPU(RTX 2060 - 6G)
Spark configuration follows:
('spark.app.startTime', '1684293945663')
('spark.driver.extraJavaOptions', '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED')
('spark.jars', 'file:///root/spark/test/spark-rapids-benchmarks/nds/rapids-4-spark_2.12-23.04.1.jar')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.incompatibleOps.enabled', 'true')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.driver.user.timezone', 'Asia/Shanghai')
('spark.driver.port', '37812')
('spark.rapids.driver.user.timezone', 'Asia/Shanghai')
('spark.app.submitTime', '1684293945280')
('spark.executor.resource.gpu.discoveryScript', './getGpusResources.sh')
('spark.executor.cores', '12')
('spark.executor.memory', '16G')
('spark.sql.warehouse.dir', 'file:/root/spark/test/spark-rapids-benchmarks/nds/spark-warehouse')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.explain', 'NOT_ON_GPU')
('spark.app.name', 'NDS - transcode - parquet')
('spark.serializer.objectStreamReset', '100')
('spark.files', 'file:///root/spark/spark-3.3.2-bin-hadoop3/examples/src/main/scripts/getGpusResources.sh')
('spark.sql.shuffle.partitions', '200')
('spark.master', 'local[*]')
('spark.submit.deployMode', 'client')
('spark.sql.legacy.parquet.datetimeRebaseModeInWrite', 'CORRECTED')
('spark.sql.extensions', 'com.nvidia.spark.rapids.SQLExecPlugin,com.nvidia.spark.udf.Plugin,com.nvidia.spark.rapids.optimizer.SQLOptimizerPlugin')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.variableFloatAgg.enabled', 'true')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.concurrentGpuTasks', '2')
('spark.rapids.memory.pinnedPool.size', '8g')
('spark.rapids.sql.incompatibleOps.enabled', 'true')
('spark.executor.id', 'driver')
('spark.app.id', 'local-1684293946417')
('spark.executor.instances', '8')
('spark.driver.memory', '10G')
('spark.app.initial.file.urls', 'file:///root/spark/spark-3.3.2-bin-hadoop3/examples/src/main/scripts/getGpusResources.sh')
('spark.executor.resource.gpu.amount', '1')
('spark.plugins', 'com.nvidia.spark.SQLPlugin')
('spark.rapids.sql.variableFloatAgg.enabled', 'true')
('spark.rapids.sql.concurrentGpuTasks', '2')
('spark.rdd.compress', 'True')
('spark.executor.extraJavaOptions', '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED')
('spark.submit.pyFiles', '')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.memory.pinnedPool.size', '8g')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.multiThreadedRead.numThreads', '20')
('spark.rapids.sql.multiThreadedRead.numThreads', '20')
('spark.repl.local.jars', 'file:///root/spark/test/spark-rapids-benchmarks/nds/rapids-4-spark_2.12-23.04.1.jar')
('spark.sql.files.maxPartitionBytes', '2g')
('spark.rapids.sql.explain', 'NOT_ON_GPU')
('spark.app.initial.jar.urls', 'spark://nebula3:37812/jars/rapids-4-spark_2.12-23.04.1.jar')
测试对比表
以下数据测试结果均为 transform
时长,不包含数据读取时长
表名 | 大小(MB) | CPU /s | GPU /s |
---|---|---|---|
catalog_sales | 29593.6 | 258 | 168 |
web_sales | 14745.6 | 120 | 96 |
inventory | 8192 | 72 | 45 |
store_returns | 3276.8 | 27 | 35 |
catalog_returns | 2150.4 | 17 | 5 |
web_returns | 1004.7 | 10 | 4 |
customer | 257 | 15 | 0.002 |
customer_address | 106 | 9 | 0.047 |
customer_demographics | 76.9 | 10 | 0.005 |
item | 56.5 | 2 | 0.004 |
date_dim | 10 | 0.8 | 0.004 |
time_dim | 4.9 | 0.5 | 0.003 |
catalog_page | 2.7 | 0.2 | 0.002 |
web_page | 0.1898438 | 0.038 | 0.002 |
household_demographics | 0.1446289 | 0.054 | 0.002 |
promotion | 0.1191406 | 0.046 | 0.002 |
store | 0.1018555 | 0.043 | 0.002 |
call_center | 0.0088867 | 0.038 | 0.003 |
web_site | 0.006543 | 0.03 | 0.003 |
reason | 0.0018682 | 0.023 | 0.004 |
warehouse | 0.0016994 | 0.041 | 0.004 |
ship_mode | 0.0010614 | 0.033 | 0.003 |
income_band | 0.0003128 | 0.024 | 0.003 |
- 数据量大于
100MB
- 数据量小于
100MB
总结
从图中可以看出,只有 6G 的显存性能都能提高将近一半;但或是显存太小的原因,随着数据量增长,时长也在慢慢逼近。甚至在测试48G 数据量时,GPU 抛出了 oom
。由于我调参进行了多次,在不砍掉执行数和分区数的情况下,都无一成功运行。为了保证测试的一致性,砍掉了48G大数据量试验。
结论:GPU加速数据分析属实有效,但专业级显卡的价格属实与家用的不在一个水平线上,只有企业追求性能时,使用GPU加速数据分析是一个不错的选择。
GPU Models: NVIDIA P100, V100, T4 and A2/A10/A30/A100 GPUs
按照 https://github.com/NVIDIA/spark-rapids-benchmarks/nds 官方教程走就行
大概步骤如下:
- 安装 TPC-DS,随机生成数据
- 下载 spark-rapids.jar https://nvidia.github.io/spark-rapids/docs/archive.html,并设置环境变量;spark运行参数,请查看 base.template 和 convert_submit_*.template
性能报告
CPU(i7-10700)
Spark configuration follows:
('spark.driver.extraJavaOptions', '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED')
('spark.executor.id', 'driver')
('spark.driver.memory', '10G')
('spark.executor.instances', '8')
('spark.driver.port', '42825')
('spark.app.submitTime', '1684138774104')
('spark.app.id', 'local-1684138775340')
('spark.executor.cores', '12')
('spark.executor.memory', '16G')
('spark.rdd.compress', 'True')
('spark.executor.extraJavaOptions', '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED')
('spark.app.name', 'NDS - transcode - parquet')
('spark.serializer.objectStreamReset', '100')
('spark.sql.warehouse.dir', 'file:/root/spark/test/spark-rapids-benchmarks/nds/spark-warehouse')
('spark.sql.shuffle.partitions', '200')
('spark.master', 'local[*]')
('spark.submit.pyFiles', '')
('spark.submit.deployMode', 'client')
('spark.app.startTime', '1684138774633')
GPU(RTX 2060 - 6G)
Spark configuration follows:
('spark.app.startTime', '1684293945663')
('spark.driver.extraJavaOptions', '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED')
('spark.jars', 'file:///root/spark/test/spark-rapids-benchmarks/nds/rapids-4-spark_2.12-23.04.1.jar')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.incompatibleOps.enabled', 'true')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.driver.user.timezone', 'Asia/Shanghai')
('spark.driver.port', '37812')
('spark.rapids.driver.user.timezone', 'Asia/Shanghai')
('spark.app.submitTime', '1684293945280')
('spark.executor.resource.gpu.discoveryScript', './getGpusResources.sh')
('spark.executor.cores', '12')
('spark.executor.memory', '16G')
('spark.sql.warehouse.dir', 'file:/root/spark/test/spark-rapids-benchmarks/nds/spark-warehouse')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.explain', 'NOT_ON_GPU')
('spark.app.name', 'NDS - transcode - parquet')
('spark.serializer.objectStreamReset', '100')
('spark.files', 'file:///root/spark/spark-3.3.2-bin-hadoop3/examples/src/main/scripts/getGpusResources.sh')
('spark.sql.shuffle.partitions', '200')
('spark.master', 'local[*]')
('spark.submit.deployMode', 'client')
('spark.sql.legacy.parquet.datetimeRebaseModeInWrite', 'CORRECTED')
('spark.sql.extensions', 'com.nvidia.spark.rapids.SQLExecPlugin,com.nvidia.spark.udf.Plugin,com.nvidia.spark.rapids.optimizer.SQLOptimizerPlugin')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.variableFloatAgg.enabled', 'true')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.concurrentGpuTasks', '2')
('spark.rapids.memory.pinnedPool.size', '8g')
('spark.rapids.sql.incompatibleOps.enabled', 'true')
('spark.executor.id', 'driver')
('spark.app.id', 'local-1684293946417')
('spark.executor.instances', '8')
('spark.driver.memory', '10G')
('spark.app.initial.file.urls', 'file:///root/spark/spark-3.3.2-bin-hadoop3/examples/src/main/scripts/getGpusResources.sh')
('spark.executor.resource.gpu.amount', '1')
('spark.plugins', 'com.nvidia.spark.SQLPlugin')
('spark.rapids.sql.variableFloatAgg.enabled', 'true')
('spark.rapids.sql.concurrentGpuTasks', '2')
('spark.rdd.compress', 'True')
('spark.executor.extraJavaOptions', '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED')
('spark.submit.pyFiles', '')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.memory.pinnedPool.size', '8g')
('spark.plugins.internal.conf.com.nvidia.spark.SQLPlugin.spark.rapids.sql.multiThreadedRead.numThreads', '20')
('spark.rapids.sql.multiThreadedRead.numThreads', '20')
('spark.repl.local.jars', 'file:///root/spark/test/spark-rapids-benchmarks/nds/rapids-4-spark_2.12-23.04.1.jar')
('spark.sql.files.maxPartitionBytes', '2g')
('spark.rapids.sql.explain', 'NOT_ON_GPU')
('spark.app.initial.jar.urls', 'spark://nebula3:37812/jars/rapids-4-spark_2.12-23.04.1.jar')
测试对比表
以下数据测试结果均为 transform
时长,不包含数据读取时长
表名 | 大小(MB) | CPU /s | GPU /s |
---|---|---|---|
catalog_sales | 29593.6 | 258 | 168 |
web_sales | 14745.6 | 120 | 96 |
inventory | 8192 | 72 | 45 |
store_returns | 3276.8 | 27 | 35 |
catalog_returns | 2150.4 | 17 | 5 |
web_returns | 1004.7 | 10 | 4 |
customer | 257 | 15 | 0.002 |
customer_address | 106 | 9 | 0.047 |
customer_demographics | 76.9 | 10 | 0.005 |
item | 56.5 | 2 | 0.004 |
date_dim | 10 | 0.8 | 0.004 |
time_dim | 4.9 | 0.5 | 0.003 |
catalog_page | 2.7 | 0.2 | 0.002 |
web_page | 0.1898438 | 0.038 | 0.002 |
household_demographics | 0.1446289 | 0.054 | 0.002 |
promotion | 0.1191406 | 0.046 | 0.002 |
store | 0.1018555 | 0.043 | 0.002 |
call_center | 0.0088867 | 0.038 | 0.003 |
web_site | 0.006543 | 0.03 | 0.003 |
reason | 0.0018682 | 0.023 | 0.004 |
warehouse | 0.0016994 | 0.041 | 0.004 |
ship_mode | 0.0010614 | 0.033 | 0.003 |
income_band | 0.0003128 | 0.024 | 0.003 |
- 数据量大于
100MB
- 数据量小于
100MB
总结
从图中可以看出,只有 6G 的显存性能都能提高将近一半;但或是显存太小的原因,随着数据量增长,时长也在慢慢逼近。甚至在测试48G 数据量时,GPU 抛出了 oom
。由于我调参进行了多次,在不砍掉执行数和分区数的情况下,都无一成功运行。为了保证测试的一致性,砍掉了48G大数据量试验。
结论:GPU加速数据分析属实有效,但专业级显卡的价格与家用的不在一个水平线上,只有企业追求性能时,使用GPU加速数据分析是一个不错的选择。
GPU Models: NVIDIA P100, V100, T4 and A2/A10/A30/A100 GPUs