Spark SQL provides three main capabilities.
- It can load data from a variety of structuted sources. (eg. Json, Hive, and Parquet).
- It lets you the data using SQL, both inside a Spark program and from external tools that connect to Spark SQL through stanadard database connectors(JDBC/ODBC), such as business intelligence tools like Tableau.
- When used within a spark program, Spark SQL provides rich integration between SQL and regular Python/Java/Scala code, including the the ablitity to join RDDs and SQL tables, expose custom functions in SQL and more.
Loading and saving data
Apache Hive