dolphinscheduler 3.0.1 数据源中心及使用

🔼上一集:dolphinscheduler 3.0.1数据质量

*️⃣主目录:dolphinscheduler 3.0.1功能梳理及源码解读

🔽下一集:dolphinscheduler 3.0.1 监控中心(上):服务管理

2.0常见数据库都支持,MySQL、PostgreSQL、Oracle、SQLServer、Hive,这样都验证过,都支持,Spark是不支持的,2.0没开发spark数据库组件,据说3.0支持,今天就来验证一下。至于其它的,目前完全没接触过的(有兴趣的自研吧):

  • ClickHouse
    在这里插入图片描述
  • Presto
    在这里插入图片描述
  • Redshift
    在这里插入图片描述
  • DB2:也是常见的关系型数据库,不过目前我还没接触过

🐬Spark 数据源


🐠创建失败


在这里插入图片描述

🐟查看日志


看日志是输入的数据库名称不对,看来3.0确实是支持spark数据库插件了
在这里插入图片描述

🐟查看源码


  • 数据源目录结构:看样子是都支持了
    在这里插入图片描述
  • 集成hive数据库插件中的类,虽然没啥问题,但是有悖插件这个概念,加入hive插件拿掉,spark插件很明显受影响
    在这里插入图片描述
    3.1.0也是,不知后续会不会优化
    在这里插入图片描述

🐟spark sql


一说到大数据就能想到hadoop、spark。其实hive/spark sql目前还没接触过,因为spark比较出门,加上2.0的时候测试了spark数据源,插件不支持,所以对spark sql兴趣比较大,稍微调研下吧。

🐡官网

spark sql官网

  • Spark SQL 允许您使用 SQL 或熟悉的DataFrame API 查询 Spark 程序中的结构化数据。可用于Java,Scala,Python和R。以相同的方式连接到任何数据源。

  • DataFrame 和 SQL 提供了一种访问各种数据源的通用方法,包括 Hive、Avro、Parquet、ORC、JSON 和 JDBC。您甚至可以跨这些源联接数据。在现有仓库上运行 SQL 或 HiveQL 查询。

  • Spark SQL支持HiveQL语法以及Hive SerDes和UDF,允许 以访问现有的 Hive 仓库。服务器模式为商业智能工具提供行业标准的 JDBC 和 ODBC 连接。

在这里插入图片描述

🐡使用指南

使用指南
在这里插入图片描述

🐟hive sql


🐡官网

官网,从主要功能看,hive sql感觉简称hive
在这里插入图片描述

🐡使用指南

hive sql 使用指南
在这里插入图片描述

🐬数据源使用


定义任务节点,涉及数据库操作的时候会使用到定义好的数据源
在这里插入图片描述

🐠节点调用数据库过程


  • SqlTask
    在这里插入图片描述
  • 数据库客户端,看到JDBC,其实目的就达到了
    在这里插入图片描述

🐵其它


HikariCP

  • github地址

    • 是什么?数据库连接池,高性能的 JDBC 连接池组件.

    • 特点?最快

    • spring boot的默认数据库连接池:回到上图代码,直接new HikariDataSource(),便获取到了连接

      • JDBCDataSourceProvider
          public static HikariDataSource createJdbcDataSource(BaseConnectionParam properties, DbType dbType) {
              logger.info("Creating HikariDataSource pool for maxActive:{}", PropertyUtils.getInt(Constants.SPRING_DATASOURCE_MAX_ACTIVE, 50));
              HikariDataSource dataSource = new HikariDataSource();
      
              //TODO Support multiple versions of data sources
              ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
              loaderJdbcDriver(classLoader, properties, dbType);
      
              dataSource.setDriverClassName(properties.getDriverClassName());
              dataSource.setJdbcUrl(DataSourceUtils.getJdbcUrl(dbType, properties));
              dataSource.setUsername(properties.getUser());
              dataSource.setPassword(PasswordUtils.decodePassword(properties.getPassword()));
      
              dataSource.setMinimumIdle(PropertyUtils.getInt(Constants.SPRING_DATASOURCE_MIN_IDLE, 5));
              dataSource.setMaximumPoolSize(PropertyUtils.getInt(Constants.SPRING_DATASOURCE_MAX_ACTIVE, 50));
              dataSource.setConnectionTestQuery(properties.getValidationQuery());
      
              if (properties.getProps() != null) {
                  properties.getProps().forEach(dataSource::addDataSourceProperty);
              }
      
              logger.info("Creating HikariDataSource pool success.");
              return dataSource;
          }
      
      • pom.xml
       <dependency>
            <groupId>com.zaxxer</groupId>
            <artifactId>HikariCP</artifactId>
            <version>4.0.3</version>
       </dependency>
      
  • README.md,里面有具体参数使用说明

Essentials
                                

🔤
                                dataSourceClassName
                                

This is the name of the class provided by the JDBC driver. 
                                Consult the documentation for your specific JDBC driver to get this class name, or see the table below. 
                                Note XA data sources are not supported. 
                                XA requires a real transaction manager like bitronix. 
                                Note that you do not need this property if you are using for "old-school" DriverManager-based JDBC driver configuration. 
                                Default: noneDataSourcejdbcUrl
                                


- or -
                                


🔤
                                jdbcUrl
                                

This property directs HikariCP to use "DriverManager-based" configuration. 
                                We feel that DataSource-based configuration (above) is superior for a variety of reasons (see below), but for many deployments there is little significant difference. 
                                When using this property with "old" drivers, you may also need to set the driverClassName property, but try it first without. 
                                Note that if this property is used, you may still use DataSource properties to configure your driver and is in fact recommended over driver parameters specified in the URL itself. 
                                Default: none
                                


🔤
                                username
                                

This property sets the default authentication username used when obtaining Connections from the underlying driver. 
                                Note that for DataSources this works in a very deterministic fashion by calling on the underlying DataSource. 
                                However, for Driver-based configurations, every driver is different. 
                                In the case of Driver-based, HikariCP will use this property to set a property in the passed to the driver's call. 
                                If this is not what you need, skip this method entirely and call , for example. 
                                Default: noneDataSource.
                                getConnection(*username*, password)usernameuserPropertiesDriverManager.
                                getConnection(jdbcUrl, props)addDataSourceProperty("username", ...)
                                


🔤
                                password
                                

This property sets the default authentication password used when obtaining Connections from the underlying driver. 
                                Note that for DataSources this works in a very deterministic fashion by calling on the underlying DataSource. 
                                However, for Driver-based configurations, every driver is different. 
                                In the case of Driver-based, HikariCP will use this property to set a property in the passed to the driver's call. 
                                If this is not what you need, skip this method entirely and call , for example. 
                                Default: noneDataSource.
                                getConnection(username, *password*)passwordpasswordPropertiesDriverManager.
                                getConnection(jdbcUrl, props)addDataSourceProperty("pass", ...)
                                


Frequently used
                                

✅
                                autoCommit
                                

This property controls the default auto-commit behavior of connections returned from the pool. 
                                It is a boolean value. 
                                Default: true
                                


⏳
                                connectionTimeout
                                

This property controls the maximum number of milliseconds that a client (that's you) will wait for a connection from the pool. 
                                If this time is exceeded without a connection becoming available, a SQLException will be thrown. 
                                Lowest acceptable connection timeout is 250 ms. Default: 30000 (30 seconds)
                                


⏳
                                idleTimeout
                                

This property controls the maximum amount of time that a connection is allowed to sit idle in the pool. 
                                This setting only applies when minimumIdle is defined to be less than maximumPoolSize. 
                                Idle connections will not be retired once the pool reaches connections. 
                                Whether a connection is retired as idle or not is subject to a maximum variation of +30 seconds, and average variation of +15 seconds. 
                                A connection will never be retired as idle before this timeout. 
                                A value of 0 means that idle connections are never removed from the pool. 
                                The minimum allowed value is 10000ms (10 seconds). 
                                Default: 600000 (10 minutes)minimumIdle
                                


⏳
                                keepaliveTime
                                

This property controls how frequently HikariCP will attempt to keep a connection alive, in order to prevent it from being timed out by the database or network infrastructure. 
                                This value must be less than the value. 
                                A "keepalive" will only occur on an idle connection. 
                                When the time arrives for a "keepalive" against a given connection, that connection will be removed from the pool, "pinged", and then returned to the pool. 
                                The 'ping' is one of either: invocation of the JDBC4 method, or execution of the . 
                                Typically, the duration out-of-the-pool should be measured in single digit milliseconds or even sub-millisecond, and therefore should have little or no noticeable performance impact. 
                                The minimum allowed value is 30000ms (30 seconds), but a value in the range of minutes is most desirable. 
                                Default: 0 (disabled)maxLifetimeisValid()connectionTestQuery
                                


⏳
                                maxLifetime
                                

This property controls the maximum lifetime of a connection in the pool. 
                                An in-use connection will never be retired, only when it is closed will it then be removed. 
                                On a connection-by-connection basis, minor negative attenuation is applied to avoid mass-extinction in the pool. 
                                We strongly recommend setting this value, and it should be several seconds shorter than any database or infrastructure imposed connection time limit. 
                                A value of 0 indicates no maximum lifetime (infinite lifetime), subject of course to the setting. 
                                The minimum allowed value is 30000ms (30 seconds). 
                                Default: 1800000 (30 minutes)idleTimeout
                                


🔤
                                connectionTestQuery
                                

If your driver supports JDBC4 we strongly recommend not setting this property. 
                                This is for "legacy" drivers that do not support the JDBC4 . 
                                This is the query that will be executed just before a connection is given to you from the pool to validate that the connection to the database is still alive. 
                                Again, try running the pool without this property, HikariCP will log an error if your driver is not JDBC4 compliant to let you know. 
                                Default: noneConnection.
                                isValid() API
                                


🔢
                                minimumIdle
                                

This property controls the minimum number of idle connections that HikariCP tries to maintain in the pool. 
                                If the idle connections dip below this value and total connections in the pool are less than , HikariCP will make a best effort to add additional connections quickly and efficiently. 
                                However, for maximum performance and responsiveness to spike demands, we recommend not setting this value and instead allowing HikariCP to act as a fixed size connection pool. 
                                Default: same as maximumPoolSizemaximumPoolSize
                                


🔢
                                maximumPoolSize
                                

This property controls the maximum size that the pool is allowed to reach, including both idle and in-use connections. 
                                Basically this value will determine the maximum number of actual connections to the database backend. 
                                A reasonable value for this is best determined by your execution environment. 
                                When the pool reaches this size, and no idle connections are available, calls to getConnection() will block for up to milliseconds before timing out. 
                                Please read about pool sizing. 
                                Default: 10connectionTimeout
                                


📈
                                metricRegistry
                                

This property is only available via programmatic configuration or IoC container. 
                                This property allows you to specify an instance of a Codahale/Dropwizard to be used by the pool to record various metrics. 
                                See the Metrics wiki page for details. 
                                Default: noneMetricRegistry
                                


📈
                                healthCheckRegistry
                                

This property is only available via programmatic configuration or IoC container. 
                                This property allows you to specify an instance of a Codahale/Dropwizard to be used by the pool to report current health information. 
                                See the Health Checks wiki page for details. 
                                Default: noneHealthCheckRegistry
                                


🔤
                                poolName
                                

This property represents a user-defined name for the connection pool and appears mainly in logging and JMX management consoles to identify pools and pool configurations. 
                                Default: auto-generated
                                


Infrequently used
                                

⏳
                                initializationFailTimeout
                                

This property controls whether the pool will "fail fast" if the pool cannot be seeded with an initial connection successfully. 
                                Any positive number is taken to be the number of milliseconds to attempt to acquire an initial connection; 
                                the application thread will be blocked during this period. 
                                If a connection cannot be acquired before this timeout occurs, an exception will be thrown. 
                                This timeout is applied after the period. 
                                If the value is zero (0), HikariCP will attempt to obtain and validate a connection. 
                                If a connection is obtained, but fails validation, an exception will be thrown and the pool not started. 
                                However, if a connection cannot be obtained, the pool will start, but later efforts to obtain a connection may fail. 
                                A value less than zero will bypass any initial connection attempt, and the pool will start immediately while trying to obtain connections in the background. 
                                Consequently, later efforts to obtain a connection may fail. 
                                Default: 1connectionTimeout
                                


❎
                                isolateInternalQueries
                                

This property determines whether HikariCP isolates internal pool queries, such as the connection alive test, in their own transaction. 
                                Since these are typically read-only queries, it is rarely necessary to encapsulate them in their own transaction. 
                                This property only applies if is disabled. 
                                Default: falseautoCommit
                                


❎
                                allowPoolSuspension
                                

This property controls whether the pool can be suspended and resumed through JMX. 
                                This is useful for certain failover automation scenarios. 
                                When the pool is suspended, calls to will not timeout and will be held until the pool is resumed. 
                                Default: falsegetConnection()
                                


❎
                                readOnly
                                

This property controls whether Connections obtained from the pool are in read-only mode by default. 
                                Note some databases do not support the concept of read-only mode, while others provide query optimizations when the Connection is set to read-only. 
                                Whether you need this property or not will depend largely on your application and database. 
                                Default: false
                                


❎
                                registerMbeans
                                

This property controls whether or not JMX Management Beans ("MBeans") are registered or not. 
                                Default: false
                                


🔤
                                catalog
                                

This property sets the default catalog for databases that support the concept of catalogs. 
                                If this property is not specified, the default catalog defined by the JDBC driver is used. 
                                Default: driver default
                                


🔤
                                connectionInitSql
                                

This property sets a SQL statement that will be executed after every new connection creation before adding it to the pool. 
                                If this SQL is not valid or throws an exception, it will be treated as a connection failure and the standard retry logic will be followed. 
                                Default: none
                                


🔤
                                driverClassName
                                

HikariCP will attempt to resolve a driver through the DriverManager based solely on the , but for some older drivers the must also be specified. 
                                Omit this property unless you get an obvious error message indicating that the driver was not found. 
                                Default: nonejdbcUrldriverClassName
                                


🔤
                                transactionIsolation
                                

This property controls the default transaction isolation level of connections returned from the pool. 
                                If this property is not specified, the default transaction isolation level defined by the JDBC driver is used. 
                                Only use this property if you have specific isolation requirements that are common for all queries. 
                                The value of this property is the constant name from the class such as , , etc. Default: driver defaultConnectionTRANSACTION_READ_COMMITTEDTRANSACTION_REPEATABLE_READ
                                


⏳
                                validationTimeout
                                

This property controls the maximum amount of time that a connection will be tested for aliveness. 
                                This value must be less than the . 
                                Lowest acceptable validation timeout is 250 ms. Default: 5000connectionTimeout
                                


⏳
                                leakDetectionThreshold
                                

This property controls the amount of time that a connection can be out of the pool before a message is logged indicating a possible connection leak. 
                                A value of 0 means leak detection is disabled. 
                                Lowest acceptable value for enabling leak detection is 2000 (2 seconds). 
                                Default: 0
                                


➡
                                dataSource
                                

This property is only available via programmatic configuration or IoC container. 
                                This property allows you to directly set the instance of the to be wrapped by the pool, rather than having HikariCP construct it via reflection. 
                                This can be useful in some dependency injection frameworks. 
                                When this property is specified, the property and all DataSource-specific properties will be ignored. 
                                Default: noneDataSourcedataSourceClassName
                                


🔤
                                schema
                                

This property sets the default schema for databases that support the concept of schemas. 
                                If this property is not specified, the default schema defined by the JDBC driver is used. 
                                Default: driver default
                                


➡
                                threadFactory
                                

This property is only available via programmatic configuration or IoC container. 
                                This property allows you to set the instance of the that will be used for creating all threads used by the pool. 
                                It is needed in some restricted execution environments where threads can only be created through a provided by the application container. 
                                Default: nonejava.
                                util.
                                concurrent.
                                ThreadFactoryThreadFactory
                                


➡
                                scheduledExecutor
                                

This property is only available via programmatic configuration or IoC container. 
                                This property allows you to set the instance of the that will be used for various internally scheduled tasks. 
                                If supplying HikariCP with a instance, it is recommended that is used. 
                                Default: nonejava.
                                util.
                                concurrent.
                                ScheduledExecutorServiceScheduledThreadPoolExecutorsetRemoveOnCancelPolicy(true)
                                


在这里插入图片描述

Druid vs HikariCP

参考文献
在这里插入图片描述
可以看到Druid功能更加全面,但是HikariCP的性能是最高的。其中Druid防sql注入可以研究下,正好前端时间项目通过拦截器增加加了SQL、xss防注入拦截。

Druid防sql注入

有时间可以测试对比一下之前增加的SQL防注入拦截器和Druid配置防sql注入效果

 <!-- 配置监控统计拦截的filters,和防sql注入 -->
  <property name="filters" value="stat,wall" />

在这里插入图片描述
参数配置详解

Apache DolphinScheduler是一个新一代分布式大数据工作流任务调度系统,致力于“解决大数据任务之间错综复杂的依赖关系,整个数据处理开箱即用”。它以 DAG(有向无环图) 的方式将任务连接起来,可实时监控任务的运行状态,同时支持重试、从指定节点恢复失败、暂停及 Kill任务等操作。目前已经有像IBM、腾讯、美团、360等400多家公司生产上使用。 调度系统现在市面上的调度系统那么多,比如老牌的Airflow, Oozie,Kettle,xxl-job ,Spring Batch等等, 为什么要选DolphinSchedulerDolphinScheduler 的定位是大数据工作流调度。通过把大数据和工作流做了重点标注. 从而可以知道DolphinScheduler的定位是针对于大数据体系。DolphinScheduler 发展很快 很多公司调度都切换到了DolphinScheduler,掌握DolphinScheduler调度使用势在必行,抓住新技术机遇,为跳巢涨薪做好准备。 优秀的框架都是有大师级别的人写出来的,包含了很多设计思想和技术。DolphinScheduler也不例外,它是一个非常优秀的框架,用到很多技术和设计思想,本课程会带大家深入DolphinScheduler框架源码,包括设计的思想和技术都会讲解,DolphinScheduler源码很多,会按照模块进行讲解,学习完课程后,不仅可以熟悉DolphinScheduler使用,而且可以掌握很多优秀的设计思想和技术,给自己的技术带来很大提升,为跳巢涨薪做好准备。
1. 下载并安装Hadoop 下载Hadoop并解压缩到指定目录,配置Hadoop环境变量。 2. 下载并安装Zookeeper 下载Zookeeper并解压缩到指定目录,配置Zookeeper环境变量。 3. 下载并安装FastDFS 下载FastDFS并解压缩到指定目录,配置FastDFS环境变量。 4. 配置FastDFS 在FastDFS的安装目录下找到conf目录,将tracker.conf和storage.conf复制到另一个目录下作为配置文件。 修改tracker.conf和storage.conf配置文件中的IP地址和端口号。 启动tracker和storage服务: 进入FastDFS的安装目录,执行以下命令: ./trackerd /etc/fdfs/tracker.conf start ./storaged /etc/fdfs/storage.conf start 5. 配置dolphinscheduler 进入dolphinscheduler的安装目录,编辑conf/dolphinscheduler.properties文件。 配置资源中心相关属性: ``` # resource center properties ds.resourcemanager.url=http://localhost:8032 ds.resourcemanager.scheduler.address=http://localhost:8030 ds.resourcemanager.webapp.address=http://localhost:8088 ds.resourcemanager.webapp.https.address=https://localhost:8090 ds.resourcemanager.principal=hadoop/_HOST@EXAMPLE.COM ds.resourcemanager.keytab=/etc/security/keytabs/hdfs.headless.keytab ds.resourcemanager.default.queue=root.default ds.resourcemanager.fs.defaultFS=hdfs://localhost:9000 ds.fastdfs.tracker_servers=192.168.1.100:22122 ds.fastdfs.connect_timeout=5000 ds.fastdfs.network_timeout=30000 ds.fastdfs.charset=UTF-8 ds.fastdfs.http_anti_steal_token=false ds.fastdfs.http_secret_key=FastDFS1234567890 ds.fastdfs.http_tracker_http_port=8080 ds.fastdfs.tracker_http_port=8080 ds.fastdfs.http_tracker_https_port=8081 ds.fastdfs.tracker_https_port=8081 ``` 其中: - ds.resourcemanager.url:Hadoop的ResourceManager地址。 - ds.resourcemanager.scheduler.address:Hadoop的ResourceManager的scheduler地址。 - ds.resourcemanager.webapp.address:Hadoop的ResourceManager的webapp地址。 - ds.resourcemanager.webapp.https.address:Hadoop的ResourceManager的https地址。 - ds.resourcemanager.principal:Hadoop的ResourceManager的Kerberos principal。 - ds.resourcemanager.keytab:Hadoop的ResourceManager的Kerberos keytab文件路径。 - ds.resourcemanager.default.queue:Hadoop的ResourceManager的默认队列。 - ds.resourcemanager.fs.defaultFS:Hadoop的FileSystem的默认FileSystem。 - ds.fastdfs.tracker_servers:FastDFS的tracker服务器地址,多个地址用逗号分隔。 - ds.fastdfs.connect_timeout:FastDFS客户端连接超时时间,单位为毫秒。 - ds.fastdfs.network_timeout:FastDFS客户端网络超时时间,单位为毫秒。 - ds.fastdfs.charset:FastDFS客户端字符集。 - ds.fastdfs.http_anti_steal_token:FastDFS客户端是否开启防盗链。 - ds.fastdfs.http_secret_key:FastDFS客户端的secret_key。 - ds.fastdfs.http_tracker_http_port:FastDFS的tracker服务器的http端口。 - ds.fastdfs.tracker_http_port:FastDFS的tracker服务器的http端口。 - ds.fastdfs.http_tracker_https_port:FastDFS的tracker服务器的https端口。 - ds.fastdfs.tracker_https_port:FastDFS的tracker服务器的https端口。 6. 启动dolphinscheduler 进入dolphinscheduler的bin目录,执行以下命令: ./dolphinscheduler-daemon.sh start resourcemanager 启动成功后,可以访问http://localhost:12345/resourcemanager进行资源中心的管理。
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

韧小钊

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值