分功能主题列表
DBeaver介绍
- 除了关系型数据库用Navicat这个利器外,其他nosql统统推荐用dbeaver,如redis,elasticsearch,mongodb
- 这里主要讲下MongoDB如何使用命令
mongodb查询like
db.checkHistory.find({rowKey:/^T_HDS_GXJS_JZGGZJL/}).toArray()
//.limit(3).toArray()
//db.checkHistory.deleteMany({rowKey:/^T_HDS_GXJG_JZGJCSJZLB/})
带条件删除
db.manualHandle.deleteMany({tableUniqueCode:'MYSQL_10.0.x.10_3306_nullDBDatabase^HDS_Basics|DBTable^test2019v2'})
spark相关记录
spark.rdd.rowset中使用 row.getAs[Object](fieldEnName)形式或去字段值才不会报错
show tabpropperties targettable; 查看表记录
统计表记录数
ANALYZE TABLE 表名 COMPUTE STATISTICS;
拼接 目录下的文件
–jars $(echo /home/rowen/libs/*.jar | tr ' ' ',')
spark-debug -- conf spark.driver.extraJavaOptions =“-Dorg.slf4j.simpleLogger.defaultLogLevel=trace -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=41999"
- 压缩格式指定
df.write.mode(SaveMode.Overwrite).option("compression", "gzip").csv(s"${path}")
- spark 连接 星环inceptor 的方法,
https://nj.transwarp.cn:8180/?p=3382
如果用jdbc方式访问,需要把jars下的三个包删除:
spark-hive_2.11-2.3.2.jar
spark-hive-thriftserver_2.11-2.3.2.jar
hive-jdbc-1.2.1.spark2.jar
如果不用jdbc访问,则需要保留以上3个包,直接用
./spark-shell命令,spark.sql("SELECT * FROM default.test_orc").show()
- docker下进入交互模式
kubectl get pods | grep kafka
kubectl exec -it kafka-server-kafka1-58d9ccdd75-5wqzt bash
- 华为mrs环境
替换archive包
cd $SPARK_HOME
grep "yarn.archive" conf/spark-defaults.conf
source ../../bigdata_env
hdfs dfs -get 'grep到的配置路径'
cp spark-archive-2x.zip spark-archive-2x_kafka.zip
unzip -l spark-archive-2x.zip |grep kafka 没存在的话就要加上了
ll jars/streamingClient010/spark-sql-kafka* 找到kafka包
zip -u spark-archive-2x.zip spark-sql-kafka.jar 添加kafka文件到压缩包
hdfs dfs -put spark-archive-2x_kafka.zip '到archive目录'
修改 conf/spark-defaults.conf 中yarn.archive对应的配置文件名加上kafka
然后重新跑任务
-spark3调优
我们可以设置参数spark.sql.adaptive.enabled为true来开启AQE,在Spark 3.0中默认是false,并满足以下条件:
非流式查询
包含至少一个exchange(如join、聚合、窗口算子)或者一个子查询
AQE通过减少了对静态统计数据的依赖,成功解决了Spark CBO的一个难以处理的trade off(生成统计数据的开销和查询耗时)以及数据精度问题。相比之前具有局限性的CBO,现在就显得非常灵活。
- hive创建自定义函数
create temporary function ods.customcheck_date_format as 'com.sefonsoft.dataquality.func.check.check_date_format'
create function ckdate as 'com.soft.quality.func.check.check_date_format' USING JAR 'hdfs://sxx8142.:8020/user/hive/hiveUDF.jar';
spark-shell创建dataset
spark-shell 构造dataframe测试 表达式
import org.apache.spark.sql.types.{IntegerType, LongType, StringType, StructField, StructType}
import scala.collection.mutable.ArrayBuffer
import org.apache.spark.sql.Row
val structFields = new ArrayBuffer[StructField]()
structFields += StructField("topic", StringType, true)
structFields += StructField("partition", IntegerType, true)
structFields += StructField("offset", LongType, true)
val row1 = Row("a", 3, 45L)
val row2 = Row("b", 5, 55L)
val lrow = List(row1, row2)
val df = spark.createDataFrame(spark.sparkContext.parallelize(lrow), StructType(structFields))
df.select(expr("3*2").as("nub")).show
df.select(expr("date_add('2022-11-11', 3)").as("express")).show
df.select(expr("date_add(current_date(), 3)").as("express")).show
df.select(expr("round(sqrt(partition),4)").as("express"), df("partition")).show
上面都挺复杂,提供一种
val ds = Seq("lisi", "wangwu").toDS()
字段名默认是value
idea相关
jvm调试
-Xdebug -Xrunjdwp:transport=dt_socket,suspend=n,server=y,address=10000
spark调试在spark-submit.sh 调用前使用
export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=43999
jconsole 远程连接 在jvm启动命令中加入
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8999 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false
github下载文件提速
https://tool.mintimate.cn/gh/
redis相关
redis-cli -p 57084 -a Cdsf@119
mysql相关
mysql -uroot --port=3306 -p123456 -h10.x.x.92 -e "show create database DATEE_RS;"
linux相关
替换文件多处值
sed -i -e "s/ip1/1p2/g" -e "s/ip3/ip4/g" xxx.properties
find . -type f -name '*.html' | xargs sed -i -e "s/ip5/ip6/g" -e "s/ip7/ip8/g"
排除指定的文件后清空目录
ls | grep -v "zip" | xargs rm -rf
代替telnet
curl http://10.x.x.200:3306
或者
ssh -vp 3306 10.x.x.204
替换eval命令
echo ${cmd}|sh &> ${log_dir}/server-app.log &
强制同步服务器时间
ntpdate -u server-ip
和windows交互rz接收windows文件,sz xx.txt 发送文件到windows
yum install -y lrzsz
拷贝查询到的数据
ls |grep -v "zip" |xargs -i{} cp -rp {} /home/server-app/plugins/
查看压缩包文件
unzip -l xxx.zip
添加文件到压缩包
zip -u xxx.zip abc.jar
elasticsearch相关
华为访问es
source /opt/fi-client/bigdata-env
kinit -kt user.keytab mdev
curl -XGET --tlsv1.2 --negotiate -k -u : 'https://10.x.x.135:24100/website/_search?pretty'
https://elasticstack.blog.csdn.net/ es官方博客
es出现对端断开连接,需要在es客户端设置连接保活,同时设置系统保活时间
1. httpClientBuilder.setDefaultIOReactorConfig(IOReactorConfig.custom().setSoKeepAlive(true)
2. sudo sysctl -w net.ipv4.tcp_keepalive_time=300 sysctl -p
maven打包带commit信息
<plugin>
<groupId>pl.project13.maven</groupId>
<artifactId>git-commit-id-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>revision</goal>
</goals>
</execution>
</executions>
<configuration>
<dateFormat>yyyy-MM-dd HH:mm:ss</dateFormat>
<generateGitPropertiesFile>true</generateGitPropertiesFile>
<generateGitPropertiesFilename>${project.build.directory}/wode/public/git-commit.properties</generateGitPropertiesFilename>
<format>properties</format>
<includeOnlyProperties>
<property>git.remote.origin.url</property>
<property>git.branch</property>
<property>git.commit.id</property>
<property>git.commit.time</property>
</includeOnlyProperties>
</configuration>
</plugin>
- deploy快照文件
mvn deploy:deploy-file -Dfile=inceptor-service-8.8.1.jar -DgroupId=com.transwarp -DartifactId=inceptor-service -Dversion=8.8.1-SNAPSHOT -Dpackaging=jar -Durl=http://x.0.x.78/repository/dev-snapshots -DrepositoryId=dev-snapshots
华为mrs用到的版本jar
https://repo.huaweicloud.com/repository/maven/huaweicloudsdk
防火墙端口
firewall-cmd --zone=public --add-port=80/tcp --permanent
firewall-cmd --zone=public --remove-port=80/tcp --permanent
maven全量修改版本号
批量替换所有依赖的版本号
mvn versions:set -DnewVersion=1.0.0-SNAPSHOT
mvn versions:commit
另外也可以用变量
<properties>
<revision>0.3.7-XTY-SNAPSHOT</revision>
</properties>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>flatten-maven-plugin</artifactId>
<version>1.1.0</version>
<configuration>
<updatePomFile>true</updatePomFile>
<flattenMode>resolveCiFriendliesOnly</flattenMode>
</configuration>
<executions>
<execution>
<id>flatten</id>
<phase>process-resources</phase>
<goals>
<goal>flatten</goal>
</goals>
</execution>
<execution>
<id>flatten.clean</id>
<phase>clean</phase>
<goals>
<goal>clean</goal>
</goals>
</execution>
</executions>
</plugin>
自动生成git文件
<plugin>
<groupId>pl.project13.maven</groupId>
<artifactId>git-commit-id-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>revision</goal>
</goals>
</execution>
</executions>
<configuration>
<dateFormat>yyyy-MM-dd HH:mm:ss</dateFormat>
<generateGitPropertiesFile>true</generateGitPropertiesFile>
<generateGitPropertiesFilename>${project.build.directory}/public/git-commit.properties</generateGitPropertiesFilename>
<format>properties</format>
<includeOnlyProperties>
<property>git.remote.origin.url</property>
<property>git.branch</property>
<property>git.commit.id</property>
<property>git.commit.time</property>
</includeOnlyProperties>
</configuration>
</plugin>
maven在shade时引用本地jar
大家都知道 <scope>system<scope> 标注的坐标是本地的,打包的时候有多种方法shade
现在分享一种,比较简便的方式,直接上pom
<dependencies>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>${artifactId}-kingbase8-8.6.0.jar</artifactId>
<version>${version}</version>
</dependency>
</dependencies>
上面的kingbase文件在我项目里面。没有上传到仓库
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
<artifactSet>
<includes>
<include>${groupId}:${artifactId}-kingbase8-8.6.0.jar</include>
</includes>
</artifactSet>
<relocations>
<relocation>
<pattern>com.kingbase8</pattern>
<shadedPattern>uniq.com.kingbase8r6</shadedPattern>
</relocation>
</relocations>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/maven/**</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
# 上面是shade插件,直接把kingbase包的package重命名,防止包冲突
# 下面是重点讲的 addjars-maven-plugin 这玩意儿可以把
# ${basedir}/lib 下的所有文件都install到本地仓库,
# 命名规则就是上面我 dependency 的样子,整个过程不需要写 systemPath那些东西
<plugin>
<groupId>com.googlecode.addjars-maven-plugin</groupId>
<artifactId>addjars-maven-plugin</artifactId>
<version>1.0.5</version>
<executions>
<execution>
<goals>
<goal>add-jars</goal>
</goals>
<configuration>
<resources>
<resource>
<directory>${basedir}/lib</directory>
</resource>
</resources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
查找jar中的文件
find ./ -name "*.jar" -print | xargs grep "JaasContext"
kafka
启动zookeeper bin/zkServer.sh start
启动kafka bin/kafka-server-start.sh -daemon config/server.properties
关闭kafka打印日志spark/conf/log4j.properties
log4j.rootCategory=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.logger.org.apache.kafka.clients.consumer.internals.SubscriptionState = ERROR
生产
bin/kafka-console-producer.sh --bootstrap-server x.x.x.x:9092 --topic cdctest_new
消费
bin/kafka-console-consumer.sh --bootstrap-server x.x.x.x:9092 --from-beginning --topic cdctest_new
CDH kafka连不上 需要在管理端kafka.brokers配置 advertised.listeners=PLAINTEXT://node01:9092