Spark
jerrfy_w
做一只会飞的狼
展开
-
pyspark读取es
方式一:sqlcontextdef readEs(): conf = SparkConf().setAppName("es").setMaster("local[2]") sc = SparkContext(conf=conf) sqlContext = SQLContext(sc) df = sqlContext.read.format("org.elasticsearch.spark.sql") \ .option("es.nodes.wan.only"原创 2020-10-14 11:54:55 · 1433 阅读 · 0 评论 -
SparkStreaming Demo
配置官方提供依赖添加方式:<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.12</artifactId> <version>2.4.6</version> <scope>provided</scope></dependency>idea本地原创 2020-10-14 11:47:58 · 144 阅读 · 0 评论 -
SparkSQL(三)
http://spark.apache.org/docs/latest/sql-getting-started.htmlRDD转DataFrame/DataSet处理文本格式必备方式一:reflection(反射)// RDD转DataFrameimport org.apache.spark.sql.SparkSessionobject DataFrameRDDAPP { def main(args: Array[String]): Unit = { val spark = S原创 2020-08-28 16:29:09 · 111 阅读 · 0 评论 -
SparkSQL(二)
SparkCore编程模型是RDDSparkSQL编程模型是DataFrame/DataSetSparkSQL编程入口为SparkSessionselect 三种写法df.select(“column1”,“column2”)df.select(df(“column1”),df(“column2”))import spark.implicits._val frame = df.select($“column1”, $“column2”)filter 三种写法value是数值直接写数值,原创 2020-08-28 16:27:57 · 145 阅读 · 0 评论 -
SparkSQL(一)
http://spark.apache.org/docs/latest/sql-getting-started.htmlscala> val df = spark.read.json("file:///home/wzj/app/spark/examples/src/main/resources/people.json")df: org.apache.spark.sql.DataFrame = [age: bigint, name: string] scal原创 2020-08-28 16:27:09 · 166 阅读 · 0 评论 -
SparkStreaming Demo
配置官方提供依赖添加方式:<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.12</artifactId> <version>2.4.6</version> <scope>provided</scope></dependency>idea本地原创 2020-07-10 22:58:31 · 147 阅读 · 0 评论 -
spark-on-yarn jar包优化
spark-on-yarn jar包问题submit运行过程中会把spark的jar包上传到HDFS的/user/hadoop/.sparkStaging路径下面,运行完毕进行释放,上传的这个过程实际上比较耗费时间WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.spark.yarn.jars和spar原创 2020-06-18 23:30:18 · 334 阅读 · 0 评论 -
Spark Core
Spark CoreRDD五大特性:A list of partitions:一系列的分区A function for computing each split:对每一个分片做计算A list of dependencies on otherOptionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)Optionally, a list of preferred locat原创 2020-06-18 23:24:41 · 135 阅读 · 0 评论