java链接spark,并提交数据

SparkConf conf = new SparkConf().setAppName("SparkExtractData").setMaster(sparkConfig.getMasterUrl())
					.set("spark.executor.memory", sparkConfig.getExecutorMemory()).set("spark.driver.host", sparkConfig.getDriverHost()) // 指定driver的hosts-name
					.set("spark.driver.port", sparkConfig.getDriverPort()) // 指定driver的服务端口
					.set("spark.cores.max", sparkConfig.getMaxCores()).set("spark.executor.cores", sparkConfig.getExecutorCores())
					.set("spark.sql.parquet.writeLegacyFormat", "true");

//			SparkSession spark = SparkSession.builder().config(conf).getOrCreate();
			JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
			SparkContext sparkContext = JavaSparkContext.toSparkContext(javaSparkContext);
			SQLContext spark = SQLContext.getOrCreate(sparkContext);
			Properties conp = new Properties();
			conp.setProperty("user", druidConfig.getUsername());
			conp.setProperty("password", druidConfig.getPassword());
			conp.setProperty("driver", "oracle.jdbc.driver.OracleDriver");

			String[] table = tablename.split(",");
			String[] path = filepath.split(",");
			for (int i =0; i< table.length; i++ ) {
				if (StringUtil.isNotEmpty(table[i]) && StringUtil.isNotEmpty(path[i]) ) {
					DataFrame jdbcDF = spark.read().jdbc(druidConfig.getDbUrl(), table[i].toUpperCase(), conp);
					List<Column> colList = getColumns(table[i]);
					Column[] cols = colList.toArray(new Column[colList.size()]);
					DataFrame m = jdbcDF.select(cols);

					m.show();
					m.printSchema();
					System.out.println("开始写入数据。。。。。");
					try {
						m.write().mode("overwrite").parquet(sparkConfig.getHdfs() + sparkConfig.getHiveDb() + "/" + path[i]);
					}catch (Exception e){
						e.printStackTrace();
					}

				}
			}

			String ct = "58377173";
			String sql_str = "(select a.*, ROWNUM rownum__rn from TD_AD_DATA_CHARGE_DETAIL a) b";
			DataFrame m = spark.read().format("jdbc").option("driver", "oracle.jdbc.driver.OracleDriver").option("url", druidConfig.getDbUrl())
					.option("user", druidConfig.getUsername()).option("password", druidConfig.getPassword()).option("dbtable", sql_str)
					.option("fetchsize", "100000").option("partitionColumn", "rownum__rn").option("lowerBound", "0").option("upperBound", ct)
					.option("numPartitions", "10").load().drop("rownum__rn");
			m.show();
			m.printSchema();
			m.repartition(100).write().mode("overwrite").parquet(sparkConfig.getHdfs() + sparkConfig.getHiveDb() + "/charge");

			sparkContext.stop();

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Java使用SparkLauncher提交任务的步骤如下: 1. 引入相关依赖 ```xml <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.4.7</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-launcher_2.11</artifactId> <version>2.4.7</version> </dependency> ``` 2. 创建SparkLauncher实例 ```java SparkLauncher launcher = new SparkLauncher() .setAppName("MyApp") .setMaster("local") .setSparkHome("/path/to/spark/home") .setAppResource("/path/to/my/app.jar") .setMainClass("com.mycompany.MyApp") .addAppArgs("arg1", "arg2") .setConf(SparkLauncher.DRIVER_MEMORY, "2g"); ``` 3. 启动任务 ```java Process process = launcher.launch(); ``` 4. 监控任务状态 ```java InputStream stdout = process.getInputStream(); InputStream stderr = process.getErrorStream(); // 启动一个线程来处理stdout和stderr new Thread() { public void run() { try (BufferedReader reader = new BufferedReader(new InputStreamReader(stdout))) { String line = null; while ((line = reader.readLine()) != null) { System.out.println(line); } } catch (IOException e) { e.printStackTrace(); } } }.start(); new Thread() { public void run() { try (BufferedReader reader = new BufferedReader(new InputStreamReader(stderr))) { String line = null; while ((line = reader.readLine()) != null) { System.err.println(line); } } catch (IOException e) { e.printStackTrace(); } } }.start(); // 等待任务完成 int exitCode = process.waitFor(); System.out.println("Task completed with exit code: " + exitCode); ``` 其中,stdout和stderr分别是任务的标准输出和标准错误。启动一个线程来读取它们的内容,避免阻塞主线程。waitFor方法会等待任务完成并返回任务的退出码。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值