- 博客(20)
- 资源 (11)
- 收藏
- 关注
原创 kafka 安装
kafka 单机版安装,见官网分布式安装下载最近稳定版的kafkahttp://mirror.bit.edu.cn/apache/kafka/2.1.0/kafka_2.11-2.1.0.tgz解压编辑配置文件config/server.properties修改两项broker.id=541zookeeper.connect=sr541:2181,sr553:2181,sr554...
2018-12-26 16:12:04 113
原创 ZooKeeper 安装
集群安装编辑配置文件下载 zookeeper-3.4.13.tar.gz,解压之后进入 conf目录cp zoo_sample.cfg zoo.cfg编辑zoo.cfg# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial # synchroniza...
2018-12-26 15:50:55 169
原创 spark-shell
spark-shell 就是一个脚本里面调度了spark-submitfunction main() { if $cygwin; then # Workaround for issue involving JLine and Cygwin # (see http://sourceforge.net/p/jline/bugs/40/). # If you're usi...
2018-12-23 11:43:12 395 1
原创 DataSet 探究
总述Before Spark 2.0, the main programming interface of Spark was the Resilient Distributed Dataset (RDD).After Spark 2.0, RDDs are replaced by Dataset, which is strongly-typed like an RDD, but with r...
2018-12-20 17:24:43 274
原创 Spark Shared Variables
broadcast variables and accumulatorsNormally, when a function passed to a Spark operation (such as map or reduce) is executed on a remote cluster node, it works on separate copies of all the variable...
2018-12-20 16:45:57 201
原创 Spark 之 SparkContext
Initializing SparkThe first thing a Spark program must do is to create a SparkContext object, which tells Spark how to access a cluster. To create a SparkContext you first need to build a SparkConf o...
2018-12-20 09:33:09 225 1
原创 Hadoop 那些 命令
hdfs fsck在HDFS中,提供了fsck命令,用于检查HDFS上文件和目录的健康状态、获取文件的block块信息和位置信息等。hadoop fsck hdfs://bdpe101:9000/user/oap/oaptest/0.5-allFileFormat/tpcds/parquet_tpcds_2/store_sales/part-00064-83f1c5b1-1f95-4830...
2018-12-19 17:17:40 171
原创 RDD 探究
总述At a high level, every Spark application consists of a driver program that runs the user’s main function and executes various parallel operations on a cluster.RDDThe main abstraction Spark provi...
2018-12-19 15:28:23 284 1
原创 Spark 之 persist
persisit/** * Set this RDD's storage level to persist its values across operations after the first time * it is computed. This can only be used to assign a new storage level if the RDD does not...
2018-12-18 16:29:09 609 1
原创 Spark 零件
SparkEnvSparkEnv 是spark的执行环境对象,存在driver 或 executor 进程中。BlockManagerDriver Application 和 Executor 都会创建 BlockManager .Manager running on every node (driver and executors) which provides interfaces ...
2018-12-18 15:05:15 166
原创 Spark 之 FileFormat
每个FileFormat 都实现了,inferSchema,但是只有初始化的时候的调用一次。ParquetFileFormatspark 获取 parquet 的 schema 是通过发起了一个job/** * Figures out a merged Parquet schema with a distributed Spark job. * * Note that lo...
2018-12-17 11:43:04 483
原创 spark on yarn
Apache Hadoop YARNconceptThe fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons.YARN的基本思想是将资源管理和作业调度/监视的功能分解为单独的守...
2018-12-13 12:53:55 125
原创 spark 部署模式和启动进程
Spark Standalone Mode(独立集群模式)Launching Spark Applications (启动应用)The spark-submit script provides the most straightforward way to submit a compiled Spark application to the cluster.For standalone cl...
2018-12-12 15:45:56 530
原创 OAP read parquet
spark2.1FileScanRDDprivate def nextIterator(): Boolean = {...currentIterator = readFunction(currentFile)...}OptimizedParquetFileFormatoverride def buildReaderWithPartitionValues( sparkS...
2018-12-12 13:26:02 332
原创 spark 读写 parquet
SQLConf// This is used to set the default data source val DEFAULT_DATA_SOURCE_NAME = buildConf("spark.sql.sources.default") .doc("The default data source to use in input/output.") .stringCo...
2018-12-10 15:47:41 3062 1
原创 C语言那些事之动态库
检查可执行程序所依赖的库ldd testvim /etc/ld.so.conf ,添加动态库搜索路径include ld.so.conf.d/*.conf./加载改配置文件ldconfig
2018-12-04 17:34:41 457 1
原创 scala 之关键字 case
case 声明类的好处创建 case class 和它的伴生 object实现了 apply 方法让你不需要通过 new 来创建类实例默认为主构造函数参数列表的所有参数前加 val添加天然的 hashCode、equals 和 toString 方法。由于 == 在 Scala 中总是代表 equals,所以 case class 实例总是可比较的下面的三个操作效果是等价的val ...
2018-12-04 14:16:02 652
原创 Dataset schema
/** * Returns the schema of this Dataset. * * @group basic * @since 1.6.0 */ def schema: StructType = queryExecution.analyzed.schema
2018-12-04 13:20:02 591
原创 memkind 内存申请和释放
#include <memkind.h>memkind_create_pmem////// \brief Create a new PMEM (file-backed) kind of given size on top of a temporary file/// in the given directory dir/// \note STANDARD API...
2018-12-03 10:45:56 584
原创 malloc and free
void *malloc(size_t size);void* 表示未确定类型的指针,void *可以指向任何类型的数据,更明确的说是指申请内存空间时还不知道用户是用这段空间来存储什么类型的数据(比如是char还是int或者其他数据类型)。void free(void *ptr);与malloc()函数配对使用,释放malloc函数申请的动态内存。(另:对于free§这句语句,如果p 是...
2018-12-03 10:31:52 183
Spring Boot in Action
2017-11-23
从PAXOS到ZOOKEEPER 国人写的技术实践书 分布式
2017-09-22
机器学习实战 英文版 Machine Learning in Action 书中所有代码和数据集
2017-09-13
机器学习实战 英文版 Machine Learning in Action
2017-09-13
数据库实现英文第二版 Database System Implementation
2017-09-11
yammer metrics-2.2.0 源码
2017-09-06
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人