- 博客(27)
- 收藏
- 关注
原创 Failed to send data to Kafka
The producer has been rejected from the broker because it tried to use an old epoch with the transactionalId
2024-04-01 11:08:06 303 2
原创 java Hbase API + kerberos windows与linux端代码
java hbase api + kerberos + windows端linux端代码
2022-07-11 11:27:45 311
原创 spark structed streaming + kerberos踩坑之旅
spark structed + kafka +kerberos 踩坑
2022-07-06 18:16:54 465
原创 Hbase表删不掉建不了错误解决
hadoopfs -mv /hbase/<table_name> /tmp或者hadoop fs -rm -r /hbase/.tmp/data/dc_sma/itablename反正去找这个.tmp目录
2022-03-31 14:28:06 1932
原创 获取指定时间kafka的offset
public static void main(String[] args) throws ParseException { String topicName = args[0]; String timeStamp = args[1];//yyyy-MM-dd HH:mm:ss String kafkaNode = args[2];//10.1.*.186:9092,10.1.*.187:9092,10.1.*.188:9092 ...
2021-07-12 14:10:46 1670
原创 同步Oracle到Hbase(sparksql+bulkLoad)
这里其实是一个实时的项目,接kafka,但是开始需要把oracle的数据同步到Hbase,之前的方案1.sqoop抽到hbase,特慢2.sqoop抽到hive,hive建Hbase映射表,再利用sparksql同步到hbase,也挺慢以下是现在的代码 private val logger = LoggerFactory.getLogger(jdbcTes.getClass) def main(args: Array[String]): Unit = { val spark
2021-07-09 14:55:53 575 1
原创 scala spark程序在集群上连接oracle,报错Exception in thread “main“ java.lang.ClassNotFoundException: oracle.jdbc
这种情况需要打全依赖包到集群上,先贴上我之前的pom.xml <build> <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) --> <plugins> <!-- clean lifecycle, see https://maven.apache.
2021-06-09 10:23:52 581 1
原创 import org.apache.hadoop.hbase.HBaseConfiguration包导不进来
<dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>2.1.0-cdh6.2.1</version> </dependency> <dependency> <groupId>org...
2021-05-28 11:09:07 3350 1
原创 row_number() over 函数的一次使用
在业务处理过程中,需要对一个表的主键和一个值做转换,原来主键对应一个值,现在值做主键,主键做值,有可能出出现上万列,以前的操作是列转行,用concat_ws(),collect_set,group by 值 ,但是同样会出现上万列,这很影响我们的效率,于是用row_number() over(partition by s.p2 order by s.p2) as num 最后取num=1...
2021-05-07 15:36:50 90
原创 spark streaming整合kafka遇到的问题(一)
val query = lines .selectExpr("CAST(partition AS STRING) as partition","CAST(topic AS STRING) as topic","CAST(offset AS STRING) as offset","CAST(value AS STRING) as value") .filter($"value".contains("\"op\":\"ins\"") || $"value".contains("...
2021-04-27 15:06:00 226
原创 创建Hbase的hive外部表脚本记录
#!/bin/bashhive -e "use ${databasename};drop table if exists ${hivetablename};create external table if not exists ${hivetablename} (rowkey string,kafkatime string,loadtime string,offset string)STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageH
2021-04-23 16:33:23 310
原创 记录一次hbase的rowkey设计
def getHashConcat(key:String):String={ val mu = Math.abs(key.hashCode)%100 val sbr=new StringBuffer() sbr.delete(0,sbr.length()) sbr.append(mu.toString).append("_") .append(key) return sbr.toString }
2021-04-09 16:45:56 96
原创 spark sql 中concat_ws和collect_set的使用
concat_ws和collect_set一般搭配使用官方释义collect_setcollect_set(expr) - Collects and returns a set of unique elements.concat_wsconcat_ws(sep, [str | array(str)]+) - Returns the concatenation of the strings separated by sep.1.concat_ws: 用指定的字符连接字符串连接字符串:conca
2021-04-09 11:13:34 6822
原创 structed streaming 整合kafka idea本地测试时遇到的问题
structed streaming 整合kafka idea本地测试时遇到的问题代码如下 def main(args: Array[String]): Unit = { val sparksession = SparkSession .builder() .master("local[*]") .appName("demoPro") //.config("spark.debug.maxToStringFields", "200")
2021-03-23 13:41:12 258
原创 记录一次spark配置
//序列化sparksession.conf.set(“spark.serializer”, “org.apache.spark.serializer.KryoSerializer”)//保证数据不丢失sparksession.conf.set(“spark.streaming.stopGracefullyOnShutdown”, “true”)//开启被压// sparksession.conf.set(“spark.streaming.backpressure.enabled”, “true”
2021-03-22 15:19:07 469
原创 记录一次自己用的structed-spark-streaming结合kafka 的配置
val lines = sparksession.readStream .format("kafka") .option("kafka.bootstrap.servers",kafkaServers) //设置offset .option("startingOffsets",topics) //topic被删除或者offset不在范围内会是否报错 .option("failOnDataLoss","false") //消费组 .option("group.id",groupi
2021-03-22 15:14:30 446
原创 Hbase写入过快解决办法
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 226 actions: org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=256.0M, regionName=9004772cc158e83ea71b27bd2ef57f7b, 解决办法连接[https://blog.csdn.net/d
2021-01-14 14:27:16 874
原创 sqoop导数脚本
从hive导数到mysql#!/bin/bash#date:20190827OIFS=$IFS#当前路径baseDirForScriptSelf=$(cd "$(dirname "$0")"; pwd)#本地路径localPath=${baseDirForScriptSelf}#java文件路径java_path=${localPath}/javamkdir -p ${java_path}#日志文件路径cur_date=`date -d "0 days ago" +%Y%m%d`l
2020-09-23 16:57:57 356 1
原创 sqoop从oracle导出数据到habse
T_ADDRESSsqoop import–connect jdbc:oracle:thin:@//10.1..29:1521/orcl–username sma_–password sma_dat*–table DEV_PAS.T_ADDRESS–columns address_id,customer_id,address_type,country_code,state,city,d...
2020-03-13 15:06:15 166
原创 hive以及impala时间格式的转换
from_unixtime(unix_timestamp(XXX,‘yyyy/MM/dd’),‘yyyyMMdd’)第一个时间是源时间格式,第二个是想要的时间格式.
2020-03-11 15:58:43 2347
原创 impala连接串
impala-shell-i IP:IP:IP:port–auth_creds_ok_in_clear-l -u $username–ldap_password_cmd=“echo-n passwd"−d"passwd" -d "passwd"−d"database”-q “sql…”ip:端口号集群用户名密码数据库...
2020-03-11 15:54:16 436
原创 sqoop从oracle抽数到hive
sqoop import–connect jdbc:oracle:thin:@//10.1.67.26:1521/orcl–username gonglt–password test1-m 1 --table LAAGENT–hive-import–hive-database sas–hive-table laagent–target-dir‘/user/hive/warehou...
2020-03-11 15:39:26 460
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人