自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+

转载 MAC系统删除残留图标

该文引用于 MAC系统删除残留图标adobe licensing helper打开MAC的终端,输入下面的命令,然后输入系统登录密码即可删除启动台内遗留的图标sqlite3 $(sudo find /private/var/folders -name com.apple.dock.launchpad)/db/db "DELETE FROM apps WHERE title='adobe_l...

2020-02-06 21:39:39 2114

原创 夜读reading

安静阅读:datebook2020.1圆通的人际关系

2020-01-29 20:26:36 156

原创 [2]服务内存使用量监控

一、服务内存使用量监控脚本内存监控脚本,监控服务如YourServer内存使用是否过量,当内存使用超过阈值,重启服务。新建./mointer_mem_kill.sh,并赋予可执行权限#!/bin/bashinterval=120 #设置采集间隔120smem_threshold=27648 # 大于27G内存时kill待进程,进程通过supervisor自动重启condition=...

2019-07-02 23:02:27 150

原创 [32] Presto存活监控脚本

一、服务存活监控脚本建立服务存活监控脚本/usr/local/presto/mointer-presto-restart.sh,chomd +x ./mointer-presto-restart.sh,脚本每20s循环检测PrestoServer进程,若进程挂掉,自动拉起并在/usr/local/presto/log.txt中打点记录本次重启时间。#!/bin/bashinterval=20...

2019-07-02 22:49:32 376

转载 [1]CPU time与WALL time

转载自CPU time与WALL time进程时间也称CPU时间,用以度量进程使用的中央处理器资源。进程时间以时钟嘀嗒计算,实际时间(Real),用户CPU时间(User),系统CPU时间(Sys)实际时间指实际流逝的时间;用户时间和系统时间指特定进程使用的CPU时间:real time是从进行开始执行到完成所经历的墙上时钟时间(wall clock)时间,包括其他进程使用的时间片(tim...

2019-06-12 16:43:21 530

原创 [31]Web UI for Presto

yanagishima github

2019-05-28 13:54:49 284

转载 [30]Presto Blog

Presto BlogEven Faster ORCEven Faster ORC-2

2019-05-16 12:48:51 74

转载 同环比与MTD/QTD/YTD月季年累计在查询引擎中的实现

一、实现环比/同比计算(1)利用窗函数实现同比环比计算计算每月的销量情况的场景中,我们可以窗口函数Lag计算获得上个月的销量数据:LAG(value, offset, DEFAULT) OVER ()这个函数的功能就是返回与当前行向前偏移n行的目标行的数值,如LAG(sum(price), 1) OVER () 即可以获得前一行的销量数据。而月环比的计算公式为 (当月销量-上月销量)/...

2019-04-30 16:38:36 3286

原创 [29]Presto window function

presto:default> select id, name, sum(age) as age_num, sum(sum(age) ) over (partition by name) from mysql.dbtest_1.student group by name,id; id | name | age_num | _col3----+------------+----...

2019-04-29 21:03:00 250

转载 MySQL Window Functions

原文:MySQL Window FunctionsSummary: in this tutorial, you will learn about the MySQL window functions and their useful applications in solving analytical query challenges.MySQL has supported window ...

2019-04-29 18:14:01 220

原创 [28]Presto 强制类型相关

HiveCoercionPolicyCoercerExpressionAnalyzer::getOperatorTypeRegistry::public Optional<Type> coerceTypeBase(Type sourceType, String resultTypeBase)

2019-04-28 14:07:56 384

转载 How to Write a Git Commit Message

How to Write a Git Commit Message

2019-04-16 15:48:17 65

原创 [1]Mac单机kylin安装与debug

提前安装hadoop生态,参见 Mac单机开源软件安装备忘Kylin 安装配置待补充单机debug待补充

2019-03-24 11:33:46 220

翻译 [1]Hbase-overview

HBase - Overview原文hbase_overviewLimitations of Hadoop顺序批处理数据,简单job也要扫整个datasetHadoop can perform only batch processing, data will be accessed only in a sequential manner. That means one has to s...

2019-03-22 23:03:22 63

原创 [4]Carbondata integration-presto查询carbondata

1、编译carbondata获得presto connector相关jar.参考:CarbonData编译与可能的依赖错误在presto(建议0.210+版本,否则spi接口不一致presto无法识别carbondata)安装目录的plugin目录下新建carbondata目录,将carbondata编译生成的相关jar拷贝到该新建目录:cd pluginmkdir carbondata...

2019-03-22 22:44:58 482

原创 [3]CarbonData编译与可能的依赖错误

carbondata 编译依赖错误(1)拉取carbondata master分支build失败Could not resolve dependencies for project org.apache.carbondata:carbondata-core:jar:1.6.0-SNAPSHOT这可能是当前的master分支相关依赖在maven中央仓库中不存在。如现在mater分支为1....

2019-03-22 22:04:16 272

原创 [2]Installing and Configuring CarbonData to run locally with Spark Shell

官方文档:Quick StartRun locally with Spark Shellbin/spark-shell --jars /Users/xxx/Documents/software/carbondata/apache-carbondata-1.5.2-bin-spark2.2.1-hadoop2.7.2.jar.....Welcome to ____ ...

2019-03-18 15:11:10 116

原创 [1]CarbonData Introduction And Docs 笔记

CarbonDataApache CarbonData is a new big data file format for faster interactive query using:advanced columnar storage,index,compressionencodingtechniques to improve computing efficiency, whi...

2019-03-18 12:27:40 98

转载 Query Router(pseudo pushdown)on kylin

apache_kylin_cube_queriesapache_kylin_data_source_sdk

2019-03-06 10:27:55 73

转载 Introduction to Presto Cost-Based Optimizer笔记

原文:introduction-to-presto-cost-based-optimizerIntroductionTheCost-Based Optimizer (CBO)we havereleased just recentlyachievesstunning results in industry standard benchmarks(and not only in b...

2019-03-03 15:02:03 261

转载 Presto join enumeration笔记

原文:presto-join-enumerationQuery improvementsIncorporating join enumeration into Presto means that your queries can automatically run faster without manual adjustments. Such manual adjustments are ...

2019-03-03 14:37:21 291

原创 如何查询进程中占用CPU的线程

如何查询进程中占用CPU的线程(1)命令查找进程PIDtop -c  或jps -v (2)找进程中的线程号top -Hp PID(3)将线程转换成16进制(对应nid)printf %x PID 线程号 (4)查看具体的线程jstack PID |grep 线程ID转换的4位16进制数 -C5 --color 找到线程或者 jstack [PID] > jstac...

2019-02-28 15:38:50 1114

原创 MySQL storage engine

MySQL storage engine说明文档MySQL中存储engine决定了表的类型(Storage engines are MySQL components that handle the SQL operations for different table types),具体体现在以下特性的支持度:存储机制、外键、索引技巧、锁定水平、事物支持(Transactions)、XA、S...

2019-02-18 12:53:01 216

原创 PretenureSizeThreshold

最近调整项目jvm参数,慢慢整理一些平时调参的一些笔记备忘吧XX:PretenureSizeThresholdXX:PretenureSizeThreshold超过这个值的时候,对象直接在old区分配内存Frequently Asked Questions about Garbage CollectionThere is a flag (available in 1.4.2 and l...

2019-01-28 22:52:36 877

转载 一篇文章掌握Sql-On-Hadoop核心技术

转载自:一篇文章掌握Sql-On-Hadoop核心技术1. SQL On Hadoop 分类1.1 查询延时分类在众多的 SQL On Hadoop 系统中,有必要对其进行一个分类。一般而言,用户更关心的是查询时延,根据用户提交查询到结果返回的时间长短,将 SQL 查询分为如下三类:batch SQL,interactive SQL,operation SQL, 如图 1。 ...

2019-01-19 07:12:41 238

原创 查询优化

查询优化一、join Optimizer1.1 基于CBO(Cost-Based Optimizer)的join优化cost-based-optimizer-in-apache-spark-2-2presto-join-enumerationjoin 优化二、Hash aggregationHash aggregation参考PostgreSQL技术内幕数据库查询优化...

2019-01-16 23:13:26 68

翻译 JDBC Tutorial Reading Notes

What is JDBC?JDBC stands for Java Database Connectivity, which is a standard Java API for database-independent connectivity between the Java programming language and a wide range of databases.The JD...

2019-01-10 21:59:39 126

原创 spark读hdfs(hive表)处理数据结果落hive表Demo

一、查询引擎测试压测demo实现逻辑很久没写spark工程了,近期需要一个查询引擎测试压测工具,以hive(HDFS)中每日落盘的查询来压测引擎性能,正适合用spark读hdfs,结果落hive。小结个小demo吧(1) 实现逻辑spark读取HDFS中存储的随机某天(以参数形式传入)的查询(hive_test.engine_queryjson表的第二列即为查询)以2秒为间隔向引擎提交查...

2019-01-08 21:20:49 2853 1

原创 [27]Presto Event Listener Plugin开发简述

一、Event Listenerpresto事件监听器Event Listener,作为plugin监听以下事件:Query creation查询建立相关信息Query completion (success or failure)查询执行相关信息,包含成功查询的细节信息,失败查询的错误码等信息Split completion (success or failure)split执行...

2018-12-22 14:01:00 489

原创 Hive Create table as select

Create/Drop/Truncate Table排查问题触发了一个Create table as 的spark bug,no 爽create table xxx as selectcreate table table1 as select * from table2 where 2=3;创建一个表结构与table2表相同的table1,只复制结构不复制数据create table...

2018-12-21 17:12:16 8781 1

原创 [26]Presto函数开发简述

一、Presto函数注册Presto function通过FunctionRegistry注册,FunctionRegistry在MetadataManager中初始化1.1 MetadataManager@Inject public MetadataManager(FeaturesConfig featuresConfig, TypeManager type...

2018-12-18 20:48:25 903

原创 [6] Hive3.x SemanticAnalyzer and CalcitePlanner 物化视图相关源码-02

接Hive3.x SemanticAnalyzer and CalcitePlanner 物化视图相关源码SemanticAnalyzervoid analyzeInternal(ASTNode ast, PlannerContextFactory pcf) { .... // 1. Generate Resolved Parse tree from syntax tree...

2018-12-08 15:40:44 352

原创 [5] Hive3.x Query Results Caching

Hive Query Results Caching DesignDocsQuery Results CachingHive Query Results Caching related setting parameters<property> <name>hive.query.results.cache.enabled</name> &...

2018-12-08 12:18:06 231

原创 [4] Hive3.x SemanticAnalyzer and CalcitePlanner 物化视图相关源码-01

SemanticAnalyzervoid analyzeInternal(ASTNode ast, PlannerContextFactory pcf) { .... // 1. Generate Resolved Parse tree from syntax tree boolean needsTransform = needsTransform(); ...

2018-12-04 21:37:52 305

翻译 [3] Hive3.x Materialized view

Hive Materialized viewsLLAPObjectives一般来说,查询加速的最有效方法即关系聚合预计算(pre-computation of relevant summaries)物化视图(materialized views)Hive3.0开始尝试引入物化视图,并提供对于物化视图的查询自动重写(基于Apache Calcite实现);值得注意的是,3.0中提...

2018-12-02 15:51:07 1124

翻译 Lambda architecture and Kappa architecture.

Lambda architecture and kappa architecture.FromMastering Azure Analytics by Zoiner Tejada 阅读笔记Lambda ArchitectureLambda architecture was originally proposed by the creator of Apache Storm, Nathan ...

2018-12-02 14:39:12 428

翻译 ch6 Ways to Restructure Queries

High.Performance.MySQL 读书笔记chapter 6 Query Performance OptimizationWays to Restructure QueriesWays to Restructure QueriesAs you optimize problematic queries,You can sometimes transform queries int...

2018-12-01 22:48:13 76

翻译 ch6 Optimize Data Access

chapter 6 Query Performance OptimizationOptimize Data AccessSlow Query Basics: Optimize Data AccessWe’ve found it useful to analyze a poorly performing query in two steps:Find out whether your ap...

2018-12-01 22:12:41 65

原创 [2] Hive3.x 查询流程源码-Cli端-01

Hive架构简图Hive架构简图 hive原理与源码分析-hive源码架构与理论Hive3.x安装准备工作详细参见:Hive3.x 安装与debug1 Hive命令行提交查询SELECT deptno, count(deptname) as deptno_cnt from hive3_test.depts group by deptno;2 CliDriver接收查询public ...

2018-11-29 20:55:47 449

原创 [1] Hive3.x 安装与debug

1 下载安装hive3.1.1下载地址修改hive-env.sh如:HADOOP_HOME=/Users/xxx/software/hadoop/hadoop-2.7.4export HIVE_CONF_DIR=/Users/xxx/software/hive/confexport HIVE_AUX_JARS_PATH=/Users/xxx//software/hive/lib建立...

2018-11-29 20:49:03 802

提示
确定要删除当前文章?
取消 删除