uestzengting-CSDN博客

原创 linux中强大的screen命令

linux中强大的screen命令今天发现了一个“宝贝”，就是Linux的screen命令，对于远程登录来说，不仅提供了类似于nohup的功能，而且提供了我非常喜欢的“多个桌面”的功能。平常开一个putty远程登录，经常需要在两个程序之间来回切换，怎么办？ctrl-z和fg、bg？这些太麻烦了。其实我们可以借助screen命令来实现轻松便捷的切换。我主要是参考了下面的两篇...

2013-08-24 09:56:35 211

原创 ubuntu下配置rsync服务

ubuntu下配置rsync服务1.新建/etc/rsyncd.conf内容 motd file = /etc/rsyncd.motd pid file = /var/run/rsyncd.pid lock file = /var/run/rsyncd.lock log file = /var/log/rsyncd.log [...

2013-08-24 08:38:36 603

1.Bloomfilter的原理？可参考 http://hi.baidu.com/yizhizaitaobi/blog/item/cc1290a0a0cd69974610646f.html 2.Bloomfilter在HBase中的作用？ HBase利用Bloomfilter来提高随机读（Get）的性能，对于顺序读（Scan）而言，设置Bloomfilter是没有作用的（0.92以后，如果设置了...

2013-08-14 11:36:53 320

原创 CentOS 5.5的源mysql升级

系统环境：CentOS 5.5CentOS 5.5的源mysql目前还停留在5.0.19上，要做数据库主从的时候，必须升级到5.1以上。索性，直接到5.5吧1、安装MySQL 5.5.x的yum源：rpm -Uvh http://repo.webtatic.com/yum/centos/5/latest.rpm2、安装MySQL客户端的支持包：yum install li...

2013-08-08 14:03:16 157

原创网上找到的一些hbase sql解决方案

http://code.google.com/p/hbase-sql/ https://github.com/forcedotcom/phoenix http://www.cnblogs.com/RicCC/archive/2008/03/09/OQL-ANTLR-SQL-Parser.html http://lxy19791111.iteye.com/ htt...

2013-07-11 14:56:03 168

原创算法杂货铺——分类算法之朴素贝叶斯分类(Naive Bayesian classification)

算法杂货铺——分类算法之朴素贝叶斯分类(Naive Bayesian classification) 1.1、摘要贝叶斯分类是一类分类算法的总称，这类算法均以贝叶斯定理为基础，故统称为贝叶斯分类。本文作为分类算法的第一篇，将首先介绍分类问题，对分类问题进行一个正式的定义。然后，介绍贝叶斯分类算法的基础——贝叶斯定理。最后，通过实例讨论贝叶斯分类中最简单的一种...

2013-06-21 14:44:26 125

原创 webdriver入门-Java

如何用webdriver打开一个浏览器，我们常用的浏览器有firefox和IE两种，firefox是selenium支持得比较成熟的浏览器，很多新的特性都会在firefox中体现。但是做页面的测试，启动速度比较慢，启动以后运行速度还是可以接受的。启动firefox浏览器新建一个firefoxDriver 如果火狐浏览器没有默认安装在C盘，需要制定其路径System.setP...

2013-04-30 11:05:14 146

CamStudioV2GR一个不错的录屏软件，做培训视频很好用

2013-03-12 17:12:38 364

原创 springmvc3支持ajax配置

1. 在springmvc-servlet.xml里加上 <mvc:annotation-driven />写道 <context:component-scan base-package="coin" resource-pattern="*...

2013-03-05 14:17:37 106

mysql系统参数配置

1.系统参数解释http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html 2.slowlog配置slow-query-log=1slow_query_log_file=/var/log/mysql/mysql-slow.loglong_query_time=2 3.字符集配置...

2012-09-16 08:37:09 118

Ubuntu下apache与tomcat整合

本文主要讲在ubuntu下如何整合apache2与tomcat，假设你已经安装了apache2和tomcat首先安装mod_jk，这个模块负责转发请求到tomcatsudo apt-get install libapache2-mod-jk安装完后，在/etc/apache2/mods-enabled下会多出一个jk.load文件，重启apache2后，apach...

2012-09-15 12:09:40 132

原创 IN 和EXISTS

IN 和EXISTSIN 是把外表和内表作hash连接，而EXISTS是对外表作loop循环，每次loop循环再对内表进行查询。EXISTS比in效率高的说法是不正确的。如果查询的两个表大小相当，那么用in和EXISTS差别不大。如果连个表中一个较大，一个是小表，则子查询表达的用EXISTS，子查询表小的用in。例如：表Athor(小表),表Winner（大表）SELECT * F...

2012-09-14 14:52:39 121

maven常用命令

1.创建项目mvn archetype:create -DgroupId=test -DartifactId=test2.下载jar包和源码mvn eclipse:eclipse -DdownloadSources=true 3.打包跳过测试mvn package -Dmaven.test.skip=true Maven常用命令： 1. 创建Maven的普通ja...

2012-07-30 14:51:50 130

原创 WebService配置

一、准备工作。 1、下载apache-cxf的应用包，地址：http://cxf.apache.org/download.html，我选择的是2.4.1版本。二、发布webservices1．新建web project ,并加入apache-cxf-2.4.1\lib所有包，编写要发布的web service 接口和实现.这一步，与前面一样。1）创建一个接...

2012-06-20 14:55:20 219

原创 eclipse远程调试

http://www.ibm.com/developerworks/cn/opensource/os-eclipse-javadebug/记下来看

2012-05-25 13:52:01 74

原创 scp命令前台转后台

利用kill命令转后台执行先用ps -a找到进程号，再用sudo kill -stop procNumber，将进程暂停。然后再用jobs查看一下要转到后台的进程，找到它的job号，最后用bg jobNumber将其转到后台。大概就像这样： [color=red]ps -a [/color]PID TTY TIME CMD 6729 pts/0 00:0...

2012-03-21 14:14:46 467

原创 java线程池

java线程池Java JDK1.5 线程池使用一、简介线程池类为 java.util.concurrent.ThreadPoolExecutor，常用构造方法为：ThreadPoolExecutor(int corePoolSize, int maximumPoolSize,long keepAliveTime, TimeUnit unit,BlockingQueue...

2012-02-02 16:19:49 108

HBase bulkload的一个bug定位

在HBase bulkload过程中，如果rowkey和version都一致，无法取得最新导入的数据。问题定位：在HBase里，如果两个HFile中都有相同rowkey和version的数据，是靠HFile的fileinfo里的MAX_SEQ_ID_KEY来判断哪个文件是最新，MAX_SEQ_ID_KEY越大的文件越新。1.通过flush写的HFile文件有往fileinf...

2011-12-30 11:13:12 188

原创 HBase 不常用命令列表

1.查看hfile的内容 hbase org.apache.hadoop.hbase.io.hfile.HFileusage: HFile [-a] [-f ] [-k] [-m] [-p] [-r ] [-v] -a,--checkfamily Enable family check -f,--file File to scan. Pass full-path;...

2011-12-30 10:38:37 134

Hbase的Region Compact算法实现分析

Hbase的Region Compact算法属于一种多路归并的外排算法。这种算法的特点是，待排序文件本身是有序的，同时打开这些文件，顺序遍历并对比它们的首条数据，最后合并输出为一个文件，多个文件遍历时的首条数据用内存堆进行内排。Hbase在实现该算法的过程中重要的是下面这五个类。1.org.apache.hadoop.hbase.regionserver.Store2.org.ap...

2011-12-08 15:06:14 219

Hbase region split源代码阅读笔记

客户端1. HbaseAdmin.split(final byte [] tableNameOrRegionName, final byte [] splitPoint)这个方法首先判断参数是regionName还是tableName；如果是regionName则只分裂该region,如果是tableName则分裂该表下的所有regionif (isRegionN...

2011-12-07 00:03:46 131

Hbase bulkload源代码阅读笔记

1.LoadIncrementalHFiles.doBulkLoad(Path hfofDir, HTable table)首先用discoverLoadQueue方法扫描出hfofDir下有哪些fhile文件，再循环调用tryLoad方法把每个文件load进去，这是一个串行的过程。 Deque queue = null; queue = discoverL...

2011-12-06 00:21:37 786

原创 Hive常用命令

hive文档https://cwiki.apache.org/confluence/display/Hive/Home开启本地模式set hive.exec.mode.local.auto=true;DDL Operations创建表hive> CREATE TABLE pokes (foo INT, bar STRING); 创建表并创建索引字段dshive> C...

2011-11-29 19:25:13 103

原创 HBase HMerge源代码阅读和修改

随着Hbase里删除的进行，有些Region的数据会越来越少，而HBase不会主动去回收这些Region，因此会造成Region越来越多。HBase里提供了一个工具类HMerge，直接拿过来用却并不能运行，按照自已对HBase的理解对HMerge稍做修改，使其可以运行。运行时不需要Disable表，但需要注意的是在运行时如果同时往该表里put数据，可能会有问题，改后的代码如下：/** ...

2011-11-14 19:50:11 146

原创 Hbase HLog源代码阅读笔记

HLog当客户端往RegionServer上提交了一个更新操作后，会调用HLog的append方法往WAL上写一个节点，入口方法就是append1.append public void append(HRegionInfo info, byte [] tableName, WALEdit edits, final long now) throws IOExceptio...

2011-11-03 18:01:51 164

Hbase WAL原理学习

1.概述客户端往RegionServer端提交数据的时候，会写WAL日志，只有当WAL日志写成功以后，客户端才会被告诉提交数据成功，如果写WAL失败会告知客户端提交失败，换句话说这其实是一个数据落地的过程。在一个RegionServer上的所有的Region都共享一个HLog，一次数据的提交是先写WAL，再写memstore，示意图如下[img]http://dl.iteye.com/up...

2011-10-31 20:19:15 185

原创 Hbase put过程源代码阅读笔记

客户端1.HTable.put for (Put put : puts) { validatePut(put);//验证Put有效，主要是判断kv的长度 writeBuffer.add(put);//写入缓存 currentWriteBufferSize += put.heapSize();//计算缓存容量 } if (a...

2011-10-27 10:22:39 173

原创 bulk-load装载hdfs数据到hbase小结

HBaseHadoopMapreduceXMLApache.bulk-load的作用是用mapreduce的方式将hdfs上的文件装载到hbase中，对于海量数据装载入hbase非常有用，参考http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html： hbase提供了现成的程序将hdfs上的文件导入hbase,即bulk-loa...

2011-10-12 18:26:54 160

Hbase region compact源代码阅读笔记

边缘代码略过，核心代码笔记客户端 1.table_jsp._jspService(HttpServletRequest request, HttpServletResponse response)调用客户端HBaseAdmin的compact方法来压缩region//调用HBaseAdmin的功能来完成Region的压缩 HBaseAdmin hbadmin = ...

2011-10-08 19:49:05 132

原创 <context-param>与<init-param>的区别与作用

与的区别与作用的作用:web.xml的配置中配置作用1. 启动一个WEB项目的时候,容器(如:Tomcat)会去读它的配置文件web.xml.读两个节点: 和 2.紧接着,容器创建一个ServletContext(上下文),这个WEB项目所有部分都将共享这个上下文.3.容器将转化为键值对,并交给ServletContext.4.容器创建中的类实例,即创建监听.5.在监听中会有co...

2011-09-03 08:35:07 61

原创垂直搜索，可以燎原

垂直搜索，可以燎原在百度、谷歌称雄的搜索时代，不愿臣服的草莽英雄们揭竿而起，他们的旗帜五色杂陈，但都印着四个大字——“垂直搜索”。他们呼朋引类，啸聚山林，在生活、旅游、职位、汽车等各个领域割据自立，不断地蚕食着通用搜索的领地。这，究竟是一场怎样的大戏？是史诗，还是闹剧，是燎原星火，还是一现昙花。网络上，充斥着各种各样的说法，肯定者称垂直搜索“渐成主流需求”，“是重大机遇”，...

2011-09-02 17:05:56 265

原创 Jsoup--java解析HTML的一个新的选择

java处理HTML的一个新的选择，类似Jquery 的选择器 HTMLJavajQuery正则表达式CSS.jsoup 是一款 Java 的 HTML 解析器，可直接解析某个 URL 地址、HTML 文本内容。它提供了一套非常省力的 API，可通过 DOM，CSS 以及类似于 jQuery 的操作方法来取出和操作数据。 jsoup 的主要功能如下： 1. 从一个 URL，文件或字...

2011-09-02 15:36:24 107

原创 easymock教程-自定义参数匹配器

博客分类：虽然easymock中提供了大量的方法来进行参数匹配，但是对于一些特殊场合比如参数是复杂对象而又不能简单的通过equals()方法来比较，这些现有的参数匹配器就无能为力了。easymock为此提供了IArgumentMatcher 接口来让我们实现自定义的参数匹配器。我们还是用例子来说话： public interface Servic...

2011-08-30 16:54:43 244

原创 HBASE SHELL 常用命令 .

HBASE SHELL 常用命令 . 说明：新版hbase取消了对HQL的支持，只能使用shell 命令：disable 'tableName' --disable表。注：修改表结构时，必须要先disable表。命令：enable 'tableName' --使表可用命令：drop 'tableName' --删除表 HBase基本命...

2011-08-04 16:31:02 394

原创 CompletionService用法demo

CompletionService相当于Executor加上BlockingQueue，使用场景为当子线程并发了一系列的任务以后，主线程需要实时地取回子线程任务的返回值并同时顺序地处理这些返回值，谁先返回就先处理谁。下面两个类的工作效果相同，一个使用了CompletionService，代码更简捷些，一个直接使用的Executort和BlockingQueue，更复杂一些。其实读Completio...

2011-07-25 13:38:21 217

原创 java Future 接口介绍

java Future 接口介绍 .2010-12-23 10:19 57人阅读评论(0) 收藏举报在Java中，如果需要设定代码执行的最长时间，即超时，可以用Java线程池ExecutorService类配合Future接口来实现。 Future接口是Java标准API的一部分，在java.util.concurrent包中。Future接口是Java线程Future模式的实现，可...

2011-07-22 16:38:22 91

原创 Future模式

java.util.concurrent.Callable与java.util.concurrent.Future类可以协助您完成Future模式。Future模式在请求发生时，会先产生一个Future对象给发出请求的客户。它的作用类似于代理(Proxy)对象，而同时所代理的真正目标对象的生成是由一个新的线程持续进行。真正的目标对象生成之后，将之设置到Future之中，而当客户端真正需要目标对象时...

2011-07-22 16:19:58 86

原创 hbase快速安装

[b]1.下载hbase[/b][color=red]http://labs.renren.com/apache-mirror//hbase/hbase-0.90.3/hbase-0.90.3.tar.gz[/color][b]2.修改conf/hbase-site.xml，修改hbase.rootdir目录[/b] hbase.rootdir ...

2011-07-20 14:59:07 82

原创 log4j.properties快速配置

java的log4j，将这个文件下载，将名字改为【log4j.properties】，放到工程的src目录下即可，其中的属性，都可以根据自己需要更正=================================================================下面是文件代码 log4j.rootCategory=info,console,filelog4...

2011-07-20 10:26:27 83

原创 maven命令

1.mvn archetype:generate

2011-07-20 09:49:30 100

Java NIO(中文版).

本书介绍了 Java 平台上的高级输入／输出，具体点说，就是使用 Java 2 标准版（J2SE）软件开发包（SDK）1.4 及以后版本进行的输入／输出。J2SE 1.4 版代号 Merlin，包含可观的 I/O 新特性，对此我们将作详细论述。这些新的 I/O 特性主要包含在 java.nio 软件包及其子包中，并被命名为 New I/O（NIO）。通过本书，您将学会如何使用这些令人兴奋的新特性来极大地提升 Java 应用程序的 I/O 效率。

2011-05-31

省市县镇村五级行政区划数据

2024-12-28

pro hadoop

This book is a concise guide to getting started with Hadoop and getting the most out of your Hadoop clusters. My early experiences with Hadoop were wonderful and stressful. While Hadoop supplied the tools to scale applications, it lacked documentation on how to use the framework effectively. This book provides that information. It enables you to rapidly and painlessly get up to speed with Hadoop. This is the book I wish was available to me when I started using Hadoop. Who This Book Is For This book has three primary audiences: developers who are relatively new to Hadoop or MapReduce and must scale their applications using Hadoop; system administrators who must deploy and manage the Hadoop clusters; and application designers looking for a detailed understanding of what Hadoop will do for them. Hadoop experts will learn some new details and gain insights to add to their expertise.

2009-09-24

Hadoop开发者第四期

Hadoop开发者第四期目录 mooon........................................................................................................................................................... 1 海量数据处理平台架构演变....................................................................................................................... 4 计算不均衡问题在Hive 中的解决办法....................................................................................................15 Join 算子在Hadoop 中的实现................................................................................................................... 20 配置Hive 元数据DB 为PostgreSQL........................................................................................................32 ZooKeeper 权限管理机制.......................................................................................................................... 36 ZooKeeper 服务器工作原理和流程...........................................................................................................39 ZooKeeper 实现共享锁.............................................................................................................................. 47 Hadoop 最佳实践.......................................................................................................................................50 通过Hadoop 的API 管理Job....................................................................................................................54 Hadoop 集群的配置调优............................................................................................................................60 Hadoop 平台的Java 规范及经验...............................................................................................................63 MapReduce 开发经验总结......................................................................................................................... 67 Hadoop 中的tar 命令的实现......................................................................................................................70 Hadoop 技术论坛运营数据分享................................................................................................................92

2011-05-31

hbase-0.20_程式設計

hbase-0.20_程式設計hbase-0.20_程式設計hbase-0.20_程式設計

2011-09-07

The Definitive ANTLR 4 Reference 2nd Edition

The Definitive ANTLR 4 Reference, 2nd Edition epub

2013-04-19

elasticsearch设计思路

elasticsearch 设计思路 ppt文档

2013-09-18

JAVA并发编程分享

不错的分享线程&线程模型线程状态以及相互之间的转换 JMM（java Memory Model）存在的意义及对并发的处理监视器锁/显示锁、可重入/独占/共享/自旋锁之间的区别与联系常见各种死锁以及解决方法和思路 JDK中的J、U、C框架介绍（主要包括线程池，并发容器，并发工具类）

2013-12-27

Programming Pig

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets. At the present time, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (e.g., the Hadoop subproject). Pig's language layer currently consists of a textual language called Pig Latin, which has the following key properties: •Ease of programming. It is trivial to achieve parallel execution of simple, "embarrassingly parallel" data analysis tasks. Complex tasks comprised of multiple interrelated data transformations are explicitly encoded as data flow sequences, making them easy to write, understand, and maintain. •Optimization opportunities. The way in which tasks are encoded permits the system to optimize their execution automatically, allowing the user to focus on semantics rather than efficiency. •Extensibility. Users can create their own functions to do special-purpose processing.

2012-05-08

hadoop权威指南第三版英文版

hadoop权威指南第三版发行说明：第三版会在2012年5月发行。你现在可以预定一份电子版，或购买“Early Release”版，买了这版送正式版。（这话对国人基本没用，呵呵！）下面大概说说这本书的一些改动。第三版添加了哪些新东西？第三版内容覆盖hadoop发行包1.x(原0.20版)，也包括0.22，0.23版。书中所有的例子都已经在这些版本上运行过，除了少数例外的情况，都已经在文中标注了。其实每一版的新特性都在第一章的"Hadoop Releases"描述了。这一版大部分例子用新API，由于旧版API仍在广泛使用，所以在旁注中仍然讨论它，旧版的实现代码可以在这本书的网站找到。 hadoop 0.23的主要变化是使用了new MapReduce runtime, MapReduce 2，是一个基于新的分布式资源管理系统的YARN，第六章讲如何工作，第七章讲如何应用。书中包括了更多的mapreduce资料，比如用maven打包MapReduce，设置java环境变量，写MRUnit测试单元（第五章介绍），还有一些更深入的特性，比如输出的提交，分布式缓存等（第8章），任务内存监控（第9章），第4章新增了通过mapreduce job处理avro 数据，第5章介绍了用oozie运行简单的workflow 工作流。（很遗憾没有coodenater的介绍）第3章在讲HDFS时介绍了高可用性，联合特性，及新的WebHDFS和HttpFS文件系统。 Pig, Hive, Sqoop, and ZooKeeper这几个框架的最新版的特性和修改都有扩展介绍。这本书还有许多修改和提高。原文： Third Edition The third edition is due to be published in May 2012. You can pre-order a copy, or buy the “Early Release” ebook today (you will receive the final ebook version when it is available for no extra charge). The following section is from the book’s preface, and outlines the changes in the third edition. What’s New in the Third Edition? The third edition covers the 1.x (formerly 0.20) release series of Apache Hadoop, as well as the newer 0.22 and 0.23 series. With a few exceptions, which are noted in the text, all the examples in this book run against these versions. The features in each release series are described at a high-level in "Hadoop Releases" in Chapter 1. This edition uses the new MapReduce API for most of the examples. Since the old API is still in widespread use, it continues to be discussed in the text alongside the new API, and the equivalent code using the old API can be found on the book’s website. The major change in Hadoop 0.23 is the new MapReduce runtime, MapReduce 2, which is built on a new distributed resource management system called YARN. This edition includes new sections covering MapReduce on YARN: how it works (Chapter 6) and how to run it (Chapter 9). There is more MapReduce material too, including development practices like packaging MapReduce jobs with Maven, setting the user’s Java classpath, and writing tests with MRUnit (all in Chapter 5); and more depth on features such as output committers, the distributed cache (both in Chapter 8), and task memory monitoring (Chapter 9). There is a new section on writing MapReduce jobs to process Avro data (Chapter 4), and on running a simple MapReduce workflow in Oozie (Chapter 5). The chapter on HDFS (Chapter 3) now has introductions to High Availability, Federation, and the new WebHDFS and HttpFS filesystems. The chapters on Pig, Hive, Sqoop, and ZooKeeper have all been expanded to cover the new features and changes in their latest releases. In addition, numerous corrections and improvements have been made throughout the book.

2012-08-01

Mahout in Action完整版本.pdf

Mahout in Action 3. Representing data 4. Making recommendations 5. Taking recommenders to production 6. Distributing recommendation computations Part 2 Clustering 7. Introduction to clustering 8. Representing data 9. Clustering algorithms in Mahout 10. Evaluating clustering quality 11. Taking clustering to production 12. Real-world applications of clustering Part 3 Classification 13. Introduction to classification 14. Training a classifier 15. Evaluating and tuning a classifier 16. Deploying a classifier 17. Case study: Shop it To Me Appendices A. JVM tuning B. Mahout math C. Resources

2013-04-07

libstdc++.zip

2021-08-30

Veloeclipse.ui_2.0.8

Veloeclipse.ui_2.0.8放在这儿方便大家下载

2012-08-18

css权威指南

css权威指南css权威指南css权威指南css权威指南css权威指南css权威指南css权威指南

2011-09-14

jQuery权威指南

jQuery权威指南jQuery权威指南jQuery权威指南jQuery权威指南jQuery权威指南jQuery权威指南

2011-09-06

深入java虚拟机第二版.pdf 中文版

2012-03-06

Cassandra权威指南.pdf

Welcome to Apache Cassandra The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages. Cassandra's ColumnFamily data model offers the convenience of column indexes with the performance of log-structured updates, strong support for materialized views, and powerful built-in caching.

2011-12-21

ElasticSearch Server 英文版

ElasticSearch Server 英文版，用EPUB阅读器打开