1.Apache Spark Component
- Spark SQL and DataFrames + Datasets
A module for working with structured data.
- MLlib
A scalable machine learning library.
- Structured Streaming
This makes it easy to build scalable fault-tolerant streaming applications.
- GraphX (legacy)
GraphX is Apache Spark’s library for graphs and graph-parallel computation.However, for graph analytics, GraphFrames is recommended instead of GraphX,which isn’t being actively developed as much and lacks Python bindings. GraphFrames is an open source general graph processing library that is similar to Apache Spark’s GraphX but uses DataFrame-based APIs.
2.Spark Versus PySpark Versus SparkSQL
3.AWS EMR, Azure Databricks, GCP Dataproc
4.PySpark Addresses Challenges of Data Science
倘若您觉得我写的好,那么请您动动你的小手粉一下我,你的小小鼓励会带来更大的动力。Thanks.