参考:
1、https://github.com/mahmoudparsian/pyspark-tutorial
2、https://github.com/XD-DENG/Spark-practice
Download, Install Spark and Run PySpark
Basics of PySpark
PySpark Examples and Tutorials
- DNA Base Counting
- Classic Word Count
- Find Frequency of Bigrams
- Join of Two Relations R(K, V1), S(K, V2)
- Basic Mapping of RDD Elements
- How to add all RDD elements together
- How to multiply all RDD elements together
- Find Top-N and Bottom-N
- Find average by using combineByKey()
- How to filter RDD elements
- How to find average
- Cartesian Product: rdd1.cartesian(rdd2)
- Sort By Key: sortByKey() ascending/descending
- How to Add Indices
- Map Partitions: mapPartitions() by Examples
How to Minimize the Verbosity of Spark
PySpark Tutorial and References...
- Getting started with PySpark - Part 1
- Getting started with PySpark - Part 2
- A really really fast introduction to PySpark
- PySpark
- Basic Big Data Manipulation with PySpark
- Working in Pyspark: Basics of Working with Data and RDDs