Currently spark is most popular so that I want to learn it. Not because deal with data speed more than hadoop but also used in MLlib,GraphX and so on. And spark support Python language, so I can use Pyspark to study.
Today I try to read text used pyspark, you also can refer the https://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes, I also study from the spark page.
一、Read Text File used Pyspark
Step1. I create a new text file called people01.txt in the /home/cindy/file. The file content as below:
Step2. See the detail as below picture, the code come from https://spark.apache.org/docs/latest/sql-programming-guide.html#creating-dataframes, you can refer them:
二、Read Json File used Pyspark
Read Json file is more easily than read Text file in spark.
Step1. Create Json File as below:
Step2. Read the Json File and print the result:
By the way, we also can type code: >>>data = [('Alice',15),('Bob',20) to create a simply DataFrame to study.