Wang_Qinghe-CSDN博客

转载 New

http://127.0.0.1:1337小数据池（注意只有字符串和数值有这个概念）在一定范围之内的，共用同一个内存地址。数字 -5~256 节省空间（内存），共用的都是一个小数据池（指向的是同一个内存地址）小整数对象[-5, 257]在python中是共享的（注，用列表表示是因为python中含首不含尾，所以［-5，257］表示的范围也就是数字 -5~256。）整...

2018-05-13 08:58:54 232

原创决策树 Decision Tree

信息熵 entropy 描述信息量越大需要猜的次数越多越不容易被预测单位是bitsID3算法选择节点：信息增益 Information Gain： IG(Y|X)=H(Y)-H(Y|X)；衡量一个属性(x)区分样本(y)的能力。当新增一个属性(x)时，信息熵H(Y)的变化大小即为信息增益。 IG(Y|X)越大表示x越重要所以IG大的作为Decision T

2017-10-11 22:03:29 378

原创 R Note 统计基础

事件和分布离散： random experiment : 随机事件在相同条件下事先已知可能的结果 sample space: 样本空间 sample point: 样本点试验的每一个可能结果连续随机事件：样本空间的子集必然事件：不可能事件：对立事件：互相矛盾和为1 互斥事件：不可能同时发生无交集对立事件一定是互斥反之不一定概率密度：表示概率的分布概率

2017-10-11 12:34:29 537

一、定义set是一个无序且不重复的元素集合。集合对象是一组无序排列的可哈希的值，集合成员可以做字典中的键。集合支持用in和not in操作符检查成员，由len()内建函数得到集合的基数(大小)，用 for 循环迭代集合的成员。但是因为集合本身是无序的，不可以为集合创建索引或执行切片(slice)操作，也没有键(keys)可用来获取集合中元素的值。set和dict一样，只是没有value，相当于di

2017-10-08 14:27:45 853

原创 python String子字符串方法

a = 'abdf'b = 'abfff12abdf56'通过list 实现def str_str(str1, str2): list1, list2, list3, list4 = [], [], [], [] list1, list2 = list(str1), list(str2) for i in range(0, len(str2)): if lis

2017-10-08 14:08:48 1257

原创 python Note higher-order functions

def not_empty(s): return s and s.strip()list(filter(not_empty, ['A', '', 'B', None, 'C', ' ']))# 结果: ['A', 'B', 'C']eval()可以将string转成list，tuple，dicta = "[1,2,3]"b = eval(a)print(type(b))<class

2017-10-07 11:12:26 248

原创 python generator

generator保存的是算法，每次调用next(g)，就计算出g的下一个元素的值，直到计算到最后一个元素，没有更多的元素时，抛出StopIteration的错误。使用for循环，因为generator也是可迭代对象def fib(max): n, a, b = 0, 0, 1 while n < max: yield b a, b = b, a +

2017-10-06 17:54:29 210

原创 Atom python3 UnicodeEncodeError: 'ascii' codec can't encode characters in position

code:import sysprint(sys.getdefaultencoding())f = open("/Users/wqh/Desktop/foo.txt", "w")f.write( "有错误" )# 关闭打开的文件f.close()utf-8Traceback (most recent call last): File "/Users/wqh/Desktop/t.py",

2017-10-05 14:02:05 1226

转载 python Note III

格式化输出‘!a’ (使用 ascii()), ‘!s’ (使用 str()) 和 ‘!r’ (使用 repr()) >>> for x in range(1, 11):... print(repr(x).rjust(2), repr(x*x).rjust(3), end=' ')... # 注意前一行 'end' 的使用... print(repr(x*x*

2017-10-05 13:12:21 582

转载 python Note II

end seq斐波纳契数列a, b = 0, 1while b 1000: print(b, end=',') a, b = b, a+b1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,>>> a=10;b=388;c=98>>> print(a,b,c,sep='@')10@388@98条件控制

2017-10-03 11:05:49 208

转载 python3 Note I

命令行 getoptPython中getopt模块说明：该模块是用来在终端执行程序时处理命令行参数时使用的。函数用法格式：getopt.getopt(args, options[, long_options])args: 要解析的命令行参数列表options: 以字符串的格式定义，options后的冒号(:)表示该选项必须有附加的参数，不带冒号表示该选项不附加参数。

2017-10-02 17:44:12 328

原创 R Manage Data

R read and save()读取文件> x = read.table("1.txt")> x V1 V21 134 22 23 33 12 34 234 25 23 null6 123 null7 123 18 23 23> a = c(1:10)> save(a,file = "a.RData")> save.imag

2017-09-22 12:31:45 1839

原创 Data in R

R数据结构向量数据类型：integernumeric 包含小数character logical （TRUE，FALSE）NULL NA 缺失值c() combine function> x1=c(2,4,6,8,0)> x2=c(1,3,5,7,9)> rbind(x1,x2) [,1] [,2] [,3] [,4] [,

2017-09-22 10:10:15 365

原创 Atom python3 Mac

Mac has python2 for the default To use non-default python3 of Atom with atom-runner:config.csonrunner: scopes: python: "/Library/Frameworks/Python.framework/Versions/3.6/bin/python3"more

2017-09-14 10:11:27 536

原创 Spark json TempTags Sample

Data Frame———–>取同一编号即同一家的前三的最多评论77287793{ "reviewPics": null, "extInfoList": [ { "title": "contentTags", "values": [ "高大上", "

2017-09-12 17:28:16 264

原创 Spark: sortBy sortByKey 二次排序

Sample data（考场号，班级号，学号）–> 考场号升序，班级号升序，学号降序1 1 31 1 41 2 81 3 73 2 93 5 111 4 131 5 122 1 142 1 102 4 12 3 52 4 63 5 23 2 151 1 162 2 173 3 182 2 193 3 20sortBypackage com.spark.sort

2017-09-12 12:29:12 3570

原创 Zookeeper

zookeeper1.介绍协同服务。动物园管理员。集中式服务，用来管理配置、名称服务以及分布式同步。2.安装zk a)下载tar开 b)环境变量zk组件1.Client 访问server的节点，定期发送信息给server，表明还活着。连接时，server回传ack确认信息给client，如果client没有收到ack信息，自动

2017-09-09 23:37:24 308

原创 Scala III

scala 高阶函数def add(a:Int,b:Int):Int = a + b val f: (Int,Int)=>Int = add _val f: (Int,Int)=>Int = (a:Int,b:Int)=> a + b1.高阶函数-1 //定义函数 def line(f1:(Int,Int)=>Int,a:Int,b:Int,f2:(Int,Int)=>Int,c:Int,

2017-09-09 22:47:37 223

原创 Scala II

脚本语言sql , js , shell , python所见即所得.repl //read -> evalute -> print -> loop javac javajava : *.java ------->*.class ------->appscalajava语句脚本化。scala1.下载2.11.82.安装

2017-09-09 22:42:55 888

原创 Scala I

脚本语言sql , js , shell , python所见即所得.repl //read -> evalute -> print -> loop javac javajava : *.java ------->*.class ------->appscalajava语句脚本化。scala1.下载2.11.82.安装

2017-09-09 22:41:10 440

原创 Hive II

Hive数据仓库。OLAP(online analyze process)hdfs元数据关系型数据中。Hive执行流程cli交互driverdriver通过编译器进行编译(语法解析和语义解析)编译器查询metastore进行编译，生成计划。执行计划返回driver，driver提交执行引擎，执行引擎再提交作业给hadoop，hadoop返回结果直至client。tool,hadoo

2017-09-09 20:27:45 298

原创 Hive I

数据仓库OLAP //online analyze process. //数量量大,并发低，延迟高。hive //hadoop mr,效率高。sql //类似sql语句。数据库mysql,OLTP //在线事务处理。acid //事务并发现象: dirty read | unrepeat

2017-09-09 20:18:37 438

原创 Hadoop HA

namenodeSPOF : single point of failure.可靠性问题。secondary nnfail over，容灾。去OIEoracleibmemc //HAhigh availability.,高可用性。2NN持续提供服务的能力。 99.999%NFSnetwork filesystem .QJMQuorum Journal Mana

2017-09-09 20:16:44 267

原创 MapReduce II

排序1.部分排序默认.2.全排序 1.一个reduce 2.自定义分区类可能会产生数据倾斜。 3.使用hadoop内置的全排序分区类。采样. 分区文件(sequencefile)。3.二次排序对value进行排序。 value做到key中。合成key.数据倾斜大量数据涌向到一个或者几个redu

2017-09-09 20:07:00 191

原创 MapReduce I

MapReduceMR : 编程模型。WordCountMR1.编写Mapperpackage com.hadoop.mr; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.LongWritab

2017-09-09 19:49:54 263

转载 ganglia For Hadoop, Hbase

Details

2017-09-04 20:00:42 176

原创 Flume Note

more details

2017-08-29 10:08:25 356

原创 Hadoop Node II

hadoop完全分布式hdfs命令://. == /user/centoshdfs dfs -put xxx.tar .找出所有hadoop配置信息hadoop-common-2.7.3.jar/core-default.xml //core-site.xmlhadoop-hdfs-2.7.3.jar/hdfs-default.xml //hd

2017-08-27 08:02:49 288

原创 Hadoop Replication Pipelining, Replica Placement Policy, Replication Rack Awareness

https://hadoopabcd.wordpress.com/2015/03/17/hdfs-file-blocks-distribution-in-datanodes/Replication Rack Awareness1. Default 默认机架感知是基于脚本的。 <property> <name>net.topology.node.switc

2017-08-26 06:13:54 250

转载 Hadoop 2.0 data write operation acknowledgement

Step 1: The client creates the file by calling create() method on DistributedFileSystem.Step 2: DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespac

2017-08-26 05:24:15 613

原创 java.io.IOException: There appears to be a gap in the edit log. We expected txid 41, but got txid

Need to copy the edits file to the journal node (I have 3 journal node s101 s102 s103)[centos@s100 /home/centos/hadoop/ha/dfs/name1/current]$scp edits_0000000000000000041-0000000000000000043 centos@s10

2017-08-24 13:42:54 2593

原创 org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.

I set Hive hadoop on CentOS of VMware of Mac. And I set Mysql on Mac. When I use ‘schematool -dbType mysql -initSchema’ command I got error blow. Then I checked my hive-site.xml<property> <name>jav

2017-08-21 15:56:42 5329

原创 Hadoop Notes I

hadoop分布式计算框架。[内置模块]1.common2.hdfs 分布式存储。 Namenode DataNode secondaryNamenode3.mapreduce4.yarn ResourceManager Nodemanagerhadoop部署模式local(standalone) //本地模式pesudo

2017-08-19 15:10:29 264

转载 HDFS Commands

HDFS commands

2017-08-10 09:42:33 220

原创 Hadoop Fully distributed mode

Port50070 namenode http port 50075 datanode http portsshDelete all the files under .ssh/[centos@s201 .ssh]$ rm -rf *[centos@s201 .ssh]$ ssh-keygen[centos@s201 .ssh]$ cp id_rsa.pub authorized_keys

2017-07-02 13:13:13 237

原创 Hadoop Environment Setup(VM fushion. Centos7)

Setup VM fushionConfigure Static IP AddressYou will only need to edit the settings for:DNS GATEWAY PREFIX IPADDR$>su root$>cd /etc/sysconfig/network-scripts$>vi ifcfg-eno16777736ONBOOT=yesIPAD

2017-06-30 17:30:50 538

原创 The authenticity of host 'localhost (::1)' can't be established.

Error:[centos@s200 hadoop]$ ssh localhost The authenticity of host ‘localhost (::1)’ can’t be established. ECDSA key fingerprint is b7:a7:1d:80:34:67:c6:1e:a2:ac:8e:67:a5:38:38:d0. Are you sure you

2017-06-30 10:25:00 5283

原创 JBDC+mysql Notes

概念C/Sclient / serverSocket / ServerSocket用户体验.B/Sbrowser / server.ie|firefox / web server.(tomcat)SQLstructure query language.insert into t(f1,f2,...) values(v1,v2,...) ;delete from t where .

2017-06-26 06:14:11 299

转载欢迎使用CSDN-markdown编辑器

欢迎使用Markdown编辑器写博客本Markdown编辑器使用StackEdit修改而来，用它写博客，将会带来全新的体验哦：Markdown和扩展Markdown简洁的语法代码块高亮图片链接和图片上传LaTex数学公式UML序列图和流程图离线写博客导入导出Markdown文件丰富的快捷键快捷键加粗 Ctrl + B 斜体 Ctrl + I 引用 Ctrl

2017-06-26 03:24:15 152

空空如也

空空如也