【openeuler/spark docker image overview】

openEuler

Quick reference

Spark | openEuler

Current MLflow docker images are built on the openEuler. This repository is free to use and exempted from per-user rate limits.

Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

Learn more on Spark website.

Supported tags and respective Dockerfile links

The tag of each spark docker image is consist of the version of spark and the version of basic image. The details are as follows

TagsCurrentlyArchitectures
3.3.1-22.03-ltsspark 3.3.1 on openEuler 22.03-LTSamd64, arm64
3.3.2-22.03-ltsspark 3.3.2 on openEuler 22.03-LTSamd64, arm64
3.4.0-22.03-ltsspark 3.4.0 on openEuler 22.03-LTSamd64, arm64

Usage

In this usage, users can select the corresponding {Tag} based on their requirements.

  • Online Documentation
    You can find the latest Spark documentation, including a programming guide, on the project web page. This README file only contains basic setup instructions.

  • Pull the openeuler/redis image from docker

    docker pull openeuler/spark:{Tag}
    
  • Interactive Scala Shell
    The easiest way to start using Spark is through the Scala shell:

    docker run -it --name spark openeuler/spark:{Tag} /opt/spark/bin/spark-shell
    

    Try the following command, which should return 1,000,000,000:

    scala> spark.range(1000 * 1000 * 1000).count()
    

    在这里插入图片描述

  • Interactive Python Shell
    The easiest way to start using PySpark is through the Python shell:

    docker run -it --name spark openeuler/spark:{Tag} /opt/spark/bin/pyspark
    

    And run the following command, which should also return 1,000,000,000:

    >>> spark.range(1000 * 1000 * 1000).count()
    

    在这里插入图片描述

  • Running Spark on Kubernetes
    https://spark.apache.org/docs/latest/running-on-kubernetes.html⁠.

  • Configuration and environment variables
    See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable.

Question and answering

If you have any questions or want to use some special features, please submit an issue or a pull request on openeuler-docker-images.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值