2018年12月_zhixingheyi_tian

12月 11月 10月 09月 08月 07月 06月 05月 04月 03月

原创 kafka 安装

kafka 单机版安装,见官网分布式安装下载最近稳定版的kafkahttp://mirror.bit.edu.cn/apache/kafka/2.1.0/kafka_2.11-2.1.0.tgz解压编辑配置文件config/server.properties修改两项broker.id=541zookeeper.connect=sr541:2181,sr553:2181,sr554...

2018-12-26 16:12:04 113

原创 ZooKeeper 安装

集群安装编辑配置文件下载 zookeeper-3.4.13.tar.gz，解压之后进入 conf目录cp zoo_sample.cfg zoo.cfg编辑zoo.cfg# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial # synchroniza...

2018-12-26 15:50:55 169

原创 spark-shell

spark-shell 就是一个脚本里面调度了spark-submitfunction main() { if $cygwin; then # Workaround for issue involving JLine and Cygwin # (see http://sourceforge.net/p/jline/bugs/40/). # If you're usi...

2018-12-23 11:43:12 395 1

总述Before Spark 2.0, the main programming interface of Spark was the Resilient Distributed Dataset (RDD).After Spark 2.0, RDDs are replaced by Dataset, which is strongly-typed like an RDD, but with r...

2018-12-20 17:24:43 274

原创 Spark Shared Variables

broadcast variables and accumulatorsNormally, when a function passed to a Spark operation (such as map or reduce) is executed on a remote cluster node, it works on separate copies of all the variable...

2018-12-20 16:45:57 201

原创 Spark 之 SparkContext

Initializing SparkThe first thing a Spark program must do is to create a SparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf o...

2018-12-20 09:33:09 225 1

原创 Hadoop 那些命令

hdfs fsck在HDFS中，提供了fsck命令，用于检查HDFS上文件和目录的健康状态、获取文件的block块信息和位置信息等。hadoop fsck hdfs://bdpe101:9000/user/oap/oaptest/0.5-allFileFormat/tpcds/parquet_tpcds_2/store_sales/part-00064-83f1c5b1-1f95-4830...

2018-12-19 17:17:40 171

原创 RDD 探究

总述At a high level, every Spark application consists of a driver program that runs the user’s main function and executes various parallel operations on a cluster.RDDThe main abstraction Spark provi...

2018-12-19 15:28:23 284 1

原创 Spark 之 persist

persisit/** * Set this RDD's storage level to persist its values across operations after the first time * it is computed. This can only be used to assign a new storage level if the RDD does not...

2018-12-18 16:29:09 609 1

原创 Spark 零件

SparkEnvSparkEnv 是spark的执行环境对象，存在driver 或 executor 进程中。BlockManagerDriver Application 和 Executor 都会创建 BlockManager .Manager running on every node (driver and executors) which provides interfaces ...

2018-12-18 15:05:15 166

原创 Spark 之 FileFormat

每个FileFormat 都实现了，inferSchema，但是只有初始化的时候的调用一次。ParquetFileFormatspark 获取 parquet 的 schema 是通过发起了一个job/** * Figures out a merged Parquet schema with a distributed Spark job. * * Note that lo...

2018-12-17 11:43:04 483

原创 spark on yarn

Apache Hadoop YARNconceptThe fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons.YARN的基本思想是将资源管理和作业调度/监视的功能分解为单独的守...

2018-12-13 12:53:55 125

原创 spark 部署模式和启动进程

Spark Standalone Mode（独立集群模式）Launching Spark Applications （启动应用）The spark-submit script provides the most straightforward way to submit a compiled Spark application to the cluster.For standalone cl...

2018-12-12 15:45:56 530

原创 OAP read parquet

spark2.1FileScanRDDprivate def nextIterator(): Boolean = {...currentIterator = readFunction(currentFile)...}OptimizedParquetFileFormatoverride def buildReaderWithPartitionValues( sparkS...

2018-12-12 13:26:02 332

原创 spark 读写 parquet

SQLConf// This is used to set the default data source val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default") .doc("The default data source to use in input/output.") .stringCo...

2018-12-10 15:47:41 3062 1

原创 C语言那些事之动态库

检查可执行程序所依赖的库ldd testvim /etc/ld.so.conf ，添加动态库搜索路径include ld.so.conf.d/*.conf./加载改配置文件ldconfig

2018-12-04 17:34:41 457 1

原创 scala 之关键字 case

case 声明类的好处创建 case class 和它的伴生 object实现了 apply 方法让你不需要通过 new 来创建类实例默认为主构造函数参数列表的所有参数前加 val添加天然的 hashCode、equals 和 toString 方法。由于 == 在 Scala 中总是代表 equals，所以 case class 实例总是可比较的下面的三个操作效果是等价的val ...

2018-12-04 14:16:02 652

原创 Dataset schema

/** * Returns the schema of this Dataset. * * @group basic * @since 1.6.0 */ def schema: StructType = queryExecution.analyzed.schema

2018-12-04 13:20:02 591

原创 memkind 内存申请和释放

#include &lt;memkind.h&gt;memkind_create_pmem////// \brief Create a new PMEM (file-backed) kind of given size on top of a temporary file/// in the given directory dir/// \note STANDARD API...

2018-12-03 10:45:56 584

原创 malloc and free

void *malloc(size_t size);void* 表示未确定类型的指针，void *可以指向任何类型的数据，更明确的说是指申请内存空间时还不知道用户是用这段空间来存储什么类型的数据（比如是char还是int或者其他数据类型）。void free(void *ptr);与malloc()函数配对使用，释放malloc函数申请的动态内存。（另：对于free§这句语句，如果p 是...

2018-12-03 10:31:52 183

Spring Boot in Action

A developer-focused guide to writing applications using Spring Boot. You'll learn how to bypass the tedious configuration steps so that you can concentrate on your application's behavior., Spring Boot in Action is a developer-focused guide to writing applications using Spring Boot. In it, you?ll learn how to bypass configuration steps so you can focus on your application?s behavior. Spring expert Craig Walls uses interesting and practical examples to teach you both how to use the default settings effectively and how to override and customize Spring Boot for your unique environment. Along the way, you?ll pick up insights from Craig?s years of Spring development experience.

2017-11-23

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

zhixingheyi_tian的博客

原创 kafka 安装

原创 ZooKeeper 安装

原创 spark-shell

原创 DataSet 探究

原创 Spark Shared Variables

原创 Spark 之 SparkContext

原创 Hadoop 那些命令

原创 RDD 探究

原创 Spark 之 persist

原创 Spark 零件

原创 Spark 之 FileFormat

原创 spark on yarn

原创 spark 部署模式和启动进程

原创 OAP read parquet

原创 spark 读写 parquet

原创 C语言那些事之动态库

原创 scala 之关键字 case

原创 Dataset schema

原创 memkind 内存申请和释放

原创 malloc and free

Spring Boot in Action

深入理解Spark 核心思想与源码分析.pdf

从PAXOS到ZOOKEEPER 国人写的技术实践书分布式

机器学习实战英文版 Machine Learning in Action 书中所有代码和数据集

机器学习实战英文版 Machine Learning in Action

机器学习实战中文版

数据库实现英文第二版 Database System Implementation

yammer metrics-2.2.0 源码

空空如也

Spring Boot in Action

深入理解Spark 核心思想与源码分析.pdf

从PAXOS到ZOOKEEPER 国人写的技术实践书 分布式

机器学习实战 英文版 Machine Learning in Action 书中所有代码和数据集

机器学习实战 英文版 Machine Learning in Action

机器学习实战 中文版

数据库实现英文第二版 Database System Implementation

yammer metrics-2.2.0 源码

空空如也

从PAXOS到ZOOKEEPER 国人写的技术实践书分布式

机器学习实战英文版 Machine Learning in Action 书中所有代码和数据集

机器学习实战英文版 Machine Learning in Action

机器学习实战中文版