scala和spark用到的依赖_如何在IntelliJ IDEA中创建Spark / Scala项目(无法解析build.sbt中的依赖项)?...

I'm trying to build and run a Scala/Spark project in IntelliJ IDEA.

I have added org.apache.spark:spark-sql_2.11:2.0.0 in global libraries and my build.sbt looks like below.

name := "test"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0"

libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.0"

I still get an error that says

unknown artifact. unable to resolve or indexed

under spark-sql.

When tried to build the project the error was

Error:(19, 26) not found: type sqlContext, val sqlContext = new sqlContext(sc)

I have no idea what the problem could be. How to create a Spark/Scala project in IntelliJ IDEA?

Update:

Following the suggestions I updated the code to use Spark Session, but it still unable to read a csv file. What am I doing wrong here? Thank you!

val spark = SparkSession

.builder()

.appName("Spark example")

.config("spark.some.config.option", "some value")

.getOrCreate()

import spark.implicits._

val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")

testdf.show() //it doesn't show anything

//pdf.select("DATE_KEY").show()

解决方案

sql should upper case letters as below

val sqlContext = new SQLContext(sc)

SQLContext is deprecated for newer versions of spark so I would suggest you to use SparkSession

val spark = SparkSession.builder().appName("testings").getOrCreate

val sqlContext = spark.sqlContext

If you want to set the master through your code instead of from spark-submit command then you can set .master as well (you can set configs too)

val spark = SparkSession.builder().appName("testings").master("local").config("configuration key", "configuration value").getOrCreate

val sqlContext = spark.sqlContext

Update

Looking at your sample data

DATE|PID|TYPE

8/03/2017|10199786|O

and testing your code

val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")

testdf.show()

I had output as

+--------------------+

| _c0|

+--------------------+

| DATE|PID|TYPE|

|8/03/2017|10199786|O|

+--------------------+

Now adding .option for delimiter and header as

val testdf2 = spark.read.option("delimiter", "|").option("header", true).csv("/Users/H/Desktop/S_CR_IP_H.dat")

testdf2.show()

Output was

+---------+--------+----+

| DATE| PID|TYPE|

+---------+--------+----+

|8/03/2017|10199786| O|

+---------+--------+----+

Note: I have used .master("local") for SparkSession object

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值