I'm trying to build and run a Scala/Spark project in IntelliJ IDEA.
I have added org.apache.spark:spark-sql_2.11:2.0.0 in global libraries and my build.sbt looks like below.
name := "test"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.0"
I still get an error that says
unknown artifact. unable to resolve or indexed
under spark-sql.
When tried to build the project the error was
Error:(19, 26) not found: type sqlContext, val sqlContext = new sqlContext(sc)
I have no idea what the problem could be. How to create a Spark/Scala project in IntelliJ IDEA?
Update:
Following the suggestions I updated the code to use Spark Session, but it still unable to read a csv file. What am I doing wrong here? Thank you!
val spark = SparkSession
.builder()
.appName("Spark example")
.config("spark.some.config.option", "some value")
.getOrCreate()
import spark.implicits._
val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf.show() //it doesn't show anything
//pdf.select("DATE_KEY").show()
解决方案
sql should upper case letters as below
val sqlContext = new SQLContext(sc)
SQLContext is deprecated for newer versions of spark so I would suggest you to use SparkSession
val spark = SparkSession.builder().appName("testings").getOrCreate
val sqlContext = spark.sqlContext
If you want to set the master through your code instead of from spark-submit command then you can set .master as well (you can set configs too)
val spark = SparkSession.builder().appName("testings").master("local").config("configuration key", "configuration value").getOrCreate
val sqlContext = spark.sqlContext
Update
Looking at your sample data
DATE|PID|TYPE
8/03/2017|10199786|O
and testing your code
val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf.show()
I had output as
+--------------------+
| _c0|
+--------------------+
| DATE|PID|TYPE|
|8/03/2017|10199786|O|
+--------------------+
Now adding .option for delimiter and header as
val testdf2 = spark.read.option("delimiter", "|").option("header", true).csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf2.show()
Output was
+---------+--------+----+
| DATE| PID|TYPE|
+---------+--------+----+
|8/03/2017|10199786| O|
+---------+--------+----+
Note: I have used .master("local") for SparkSession object