2017年12月_我不是李寻欢

12月 10月 09月 08月 07月

原创 transwarp Slipstream 简介之实战应用

Application间的资源隔离Exg：用户在基于某个App下的Stream时，只能在改App下查看该App下的StreamJob；当用户退出该App时，将无法查看其他App下的StreamJob只能查看当前App下的StreamJob。流上的统计Emily接到了老板的第一个任务:如何实现对网站访问次数做统计。假设源数据如下: 27.0.1.125,www.transwarp.io/home

2017-12-15 16:48:47 2095

原创 transwarp Slipstream 简介之高级功能

1. 监控告警Slipstream整合监控告警工具Alert4J，用于在流应用出问题的时候报错，支持邮件推送，也可以与微信、其他监控工具整合。 Alert4J当前版本没有专门的配置界面，将在下个版本时支持。当前版本可以通过增加alert4j.properties文件支持。下面是该文件的一般配置内容，以邮件推送为例: alert4j.service=email email.server.ho

2017-12-15 15:35:37 5070

原创 transwarp Slipstream 简介之事件驱动流处理

1. 从流表导数据到普通表SET streamsql.use.eventmode=true;CREATE STREAM s1(score INT, name STRING) TBLPROPERTIES("topic"="tps1","kafka.zookeeper"="tw- node127:2181", "kafka.broker.list"="tw-node127:9092");

2017-12-15 14:26:14 1271

原创 transwarp Slipstream 简介之运行时的管理

背压功能(Back Pressure)在Slipstream中，数据源发来的消息量大时，出于稳定性的考虑，有时会需要让receiver暂停接收发来的消息，这个功能称为背压功能。设置方法如下:SET streamsql.enable.backpressure.receiver=true; 1 SET streamsql.backpressure.max.pendingJobs=<int>; 21 打

2017-12-15 14:11:25 798

原创 transwarp Slipstream 简介之权限管理

Slipstream权限管理简介 Slipstream采用基于SQL的权限管理: • 管理员可以管理角色(ROLE); • 用户或角色可以被授予或收回对不同数据对象的权限。 Slipstream的管理员角色的获取、角色管理以及对 DATABASE, VIEW 和 TABLE 这三个数据对象的权限管理和Inceptor SQL一致。您可以直接查看《Transwarp Inceptor 使用手

2017-12-15 14:04:31 712

原创 transwarp Slipstream 简介之DDL|DML

Slipstream的优势:微批模式和事件驱动模式的(创建Stream的方式和简表语句基本相同，随心所欲) * 一体化极高的易用性（低门槛，只要会SQL就可以） * 性能提升（无需编码） * 产品化程度高（封装程度高） * 迁移成本低（基本不需迁移，Stream里面的数据可以直接通过查询insert到另一张表中）创建Stream及触发StreamJob的形式： 1.首先登入集群中的任意一个

2017-12-15 13:26:48 3047

监控指标.pdf

告警方式有：电话短信邮件企业微信目前企业微信每个级别都报暂定几个告警级别：（目前还未实现细分） 1：短信，邮件，企业微信 3：邮件，企业微信 5：企业微信 7：不报警

2019-10-09

Elasticsearch调优实践.pdf

Elasticsearch（ES）作为NOSQL+搜索引擎的有机结合体，不仅有近实时的查询能力，还具有强大的聚合分析能力。因此在全文检索、日志分析、监控系统、数据分析等领域ES均有广泛应用。而完整的Elastic Stack体系（Elasticsearch、Logstash、Kibana、Beats），更是提供了数据采集、清洗、存储、可视化的整套解决方案。本文基于ES 5.6.4，从性能和稳定性两方面，从linux参数调优、ES节点配置和ES使用方式三个角度入手，介绍ES调优的基本方案。当然，ES的调优绝不能一概而论，需要根据实际业务场景做适当的取舍和调整，文中的疏漏之处也随时欢迎批评指正。

2019-10-09

sec_hdp_security_overview.pdf

Security is essential for organizations that store and process sensitive data in the Hadoop ecosystem. Many organizations must adhere to strict corporate security polices. Hadoop is a distributed framework used for data storage and large-scale processing on clusters using commodity servers. Adding security to Hadoop is challenging because not all of the interactions follow the classic client-server pattern. • In Hadoop, the file system is partitioned and distributed, requiring authorization checks at multiple points. • A submitted job is executed at a later time on nodes different than the node on which the client authenticated and submitted the job. • Secondary services such as a workflow system access Hadoop on behalf of users. • A Hadoop cluster scales to thousands of servers and tens of thousands of concurrent tasks. A Hadoop-powered "Data Lake" can provide a robust foundation for a new generation of Big Data analytics and insight, but can also increase the number of access points to an organization's data. As diverse types of enterprise data are pulled together into a central repository, the inherent security risks can increase. Hortonworks understands the importance of security and governance for every business. To ensure effective protection for its customers, Hortonworks uses a holistic approach based on five core security features: • Administration • Authentication and perimeter security • Authorization • Audit • Data protection This chapter provides an overview of the security features implemented in the Hortonworks Data Platform (HDP). Subsequent chapters in this guide provide more details on each of these security features.

2019-06-24

马士兵jvm调优笔记.docx

一.java内存结构 2 二垃圾收集算法: 3 三 JVM参数 4 四 JVM的垃圾回收集器 7 五常用参数设置 7

2019-06-20

Hbase_目录结构.pptx

/hbase/archive (1) 进行snapshot或者升级的时候使用到的归档目录。compaction删除hfile的时候，也会把旧的hfile归档到这里等。 /hbase/corrupt (2) splitlog的corrupt目录，以及corrupt hfile的目录。

2019-06-19

Hbase_存储结构.pptx

Apache HBase™是Hadoop数据库，是一个分布式，可扩展的大数据存储。当您需要对大数据进行随机，实时读/写访问时，请使用Apache HBase™。该项目的目标是托管非常大的表 - 数十亿行X百万列 - 在商品硬件集群上。 Apache HBase是一个开源的，分布式的，版本化的非关系数据库，模仿Google的Bigtable结构化数据分布式存储系统。正如Bigtable利用Google文件系统提供的分布式数据存储一样，Apache HBase在Hadoop和HDFS之上提供类似Bigtable的功能。

2019-06-19

平安数据库试题

1 如何修改spfile?（A、C）先create pfile from spfile; 修改pfile; 然后再create spfile from pfile; create spfile from pfile，启动数据库即可， 也可以alter system set parametervalue=parametervalue scope = spfile 数据库重启后参数生效、 a. 从spfile生成pfile，修改pfile，再从pfile生成spfile b. 直接用VI命令打开修改 c. 用命令修改alter system set scope=spfile d. 通过重建控制文件修改它 e. 用文本编辑器修改 2 cbo优化的模式下用dbms -stats搜集统计信息，以下哪个参数能够搜集核准图信息 Method_opt 3 为表table创建一个参考同义词 tabl-syn语法是（）C a. create synonym table_syn on tabl b. create public synonym tabl_syn on tabl c. create public synonym tabl_syn for tabl d. create synonym table_syn for tabl create public synonym tabl_syn for table 4 ORACLE中最小的逻辑单位 Block 块 5 对于不经常更新的表，你应该设置 lower pctfree Higher PCTFREE Lower PCTUSED 6 ORACLE 9i 报 ORA-4031,从init参数文件哪个参数去入手解决解释：共享池问题答案：shared_pool_size 7 使用LOGMINER恢复archive log 文件，视图$logmnr_contents中不包含() a. archive logfile路径 b. table_name c. SCN型 d. Sql_redo 8 Which statement about locally managed table spaces is true? a. Tables in locally managed tablespaces should be regularly reorganized. b. Locally managed tablespaces have dictionary intervention. c. Extent allocation information for a locally managed tablespaces is stored in the tablespaces itself. 9 ??当需要对连接到数据库的用户user1限制起连接数，需操作（）B a. 在init文件中修改session参数 b. 创建一个有连接限制的profile_new,并把user1的profile修改为profile_new c. drop users，重新创建一个新用户 d. 使用alter user直接修改用户连接数的值

2018-10-13

CM安装部署文档

如何在RedHat7.3安装CDH5.14.pdf 1.

2018-10-13

数据仓库建模

问题导读： 1、如何理解IBM 的 TDWM 概念模型是什么？ 2、什么是数据模型和数据仓库模型？ 3、为什么需要数据模型，如何建设数据模型以及数据仓库数据模型架构？ 4、数据仓库建模阶段划分分为多少阶段？ 5、数据仓库建模方法都有哪些？

2018-09-05

spark官方文档

1 概述（Overview） Spark SQL是Spark的一个组件，用于结构化数据的计算。Spark SQL提供了一个称为DataFrames的编程抽象，DataFrames可以充当分布式SQL查询引擎。 2 DataFrames DataFrame是一个分布式的数据集合，该数据集合以命名列的方式进行整合。DataFrame可以理解为关系数据库中的一张表，也可以理解为R/Python中的一个data frame。DataFrames可以通过多种数据构造，例如：结构化的数据文件、hive中的表、外部数据库、Spark计算过程中生成的RDD等。 DataFrame的API支持4种语言：Scala、Java、Python、R。 2.1 入口：SQLContext（Starting Point: SQLContext） Spark SQL程序的主入口是SQLContext类或它的子类。创建一个基本的SQLContext，你只需要SparkContext，创建代码示例如下： Scala val sc: SparkContext // An existing SparkContext. val sqlContext = new org.apache.spark.sql.SQLContext(sc) Java JavaSparkContext sc = ...; // An existing JavaSparkContext. SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc); 除了基本的SQLContext，也可以创建HiveContext。SQLContext和HiveContext区别与联系为： SQLContext现在只支持SQL语法解析器（SQL-92语法） HiveContext现在支持SQL语法解析器和HiveSQL语法解析器，默认为HiveSQL语法解析器，用户可以通过配置切换成SQL语法解析器，来运行HiveSQL不支持的语法。使用HiveContext可以使用Hive的UDF，读写Hive表数据等Hive操作。SQLContext不可以对Hive进行操作。 Spark SQL未来的版本会不断丰富SQLContext的功能，做到SQLContext和HiveContext的功能容和，最终可能两者会统一成一个Context HiveContext包装了Hive的依赖包，把HiveContext单独拿出来，可以在部署基本的Spark的时候就不需要Hive的依赖包，需要使用HiveContext时再把Hive的各种依赖包加进来。 SQL的解析器可以通过配置spark.sql.dialect参数进行配置。在SQLContext中只能使用Spark SQL提供的”sql“解析器。在HiveContext中默认解析器为”hiveql“，也支持”sql“解析器。 2.2 创建DataFrames（Creating DataFrames）使用SQLContext，spark应用程序（Application）可以通过RDD、Hive表、JSON格式数据等数据源创建DataFrames。下面是基于JSON文件创建DataFrame的示例： Scala val sc: SparkContext // An existing SparkContext. val sqlContext = new org.apache.spark.sql.SQLContext(sc) val df = sqlContext.read.json("examples/src/main/resources/people.json") // Displays the content of the DataFrame to stdout df.show() Java JavaSparkContext sc = ...; // An existing JavaSparkContext. SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc); DataFrame df = sqlContext.read().json("examples/src/main/resources/people.json"); // Displays the content of the DataFrame to stdout df.show(); 2.3 DataFrame操作（DataFrame Operations） DataFrames支持Scala、Java和Python的操作接口。下面是Scala和Java的几个操作示例： Scala val sc: SparkContext // An existing SparkContext. val sqlContext = new org.apache.spark.sql.SQLContext(sc) // Create the DataFrame val df = sqlContext.read.json("examples/src/main/resources/people.json") // Show the content of the DataFrame df.show() // age name // null Michael // 30 Andy // 19 Justin // Print the schema in a tree format df.printSchema() // root // |-- age: long (nullable = true) // |-- name: string (nullable = true) // Select only the "name" column df.select("name").show() // name // Michael // Andy // Justin // Select everybody, but increment the age by 1 df.select(df("name"), df("age") + 1).show() // name (age + 1) // Michael null // Andy 31 // Justin 20 // Select people older than 21 df.filter(df("age") > 21).show() // age name // 30 Andy // Count people by age df.groupBy("age").count().show() // age count // null 1 // 19 1 // 30 1 Java JavaSparkContext sc // An existing SparkContext. SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc) // Create the DataFrame DataFrame df = sqlContext.read().json("examples/src/main/resources/people.json"); // Show the content of the DataFrame df.show(); // age name // null Michael // 30 Andy // 19 Justin // Print the schema in a tree format df.printSchema(); // root // |-- age: long (nullable = true) // |-- name: string (nullable = true) // Select only the "name" column df.select("name").show(); // name // Michael // Andy // Justin // Select everybody, but increment the age by 1 df.select(df.col("name"), df.col("age").plus(1)).show(); // name (age + 1) // Michael null // Andy 31 // Justin 20 // Select people older than 21 df.filter(df.col("age").gt(21)).show(); // age name // 30 Andy // Count people by age df.groupBy("age").count().show(); // age count // null 1 // 19 1 // 30 1 详细的DataFrame API请参考 API Documentation。除了简单列引用和表达式，DataFrames还有丰富的library，功能包括string操作、date操作、常见数学操作等。详细内容请参考 DataFrame Function Reference。 2.4 运行SQL查询程序（Running SQL Queries Programmatically） Spark Application可以使用SQLContext的sql()方法执行SQL查询操作，sql()方法返回的查询结果为DataFrame格式。代码如下： Scala val sqlContext = ... // An existing SQLContext val df = sqlContext.sql("SELECT * FROM table") Java SQLContext sqlContext = ... // An existing SQLContext DataFrame df = sqlContext.sql("SELECT * FROM table")

2017-09-01

SQOOP导入和导出参数.pdf

sqoop操作指南

2017-08-05

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人