dawnohdawn-CSDN博客

原创 [C++ Primer Reading Notes] Day 6

文章目录ScopeReviewNotes1. about the iterators1.1 the role of iterators play1.2 the begin member and end member of iteratorsthe analogy between iterators and pointers1.3 iterator operators1.4 iterator typ...

2019-08-09 16:02:53 282

原创 [C++ Primer Reading Notes] Day 5

文章目录ScopeReviewNotes1. not to use using declarations inside headers2. about type2.1 two ways of initilization a string2.2 reading and writing operators of string2.3 the string::size_type type2.4 the d...

2019-08-06 18:02:11 302

原创 [C++ Primer Reading Notes] Day 4

Chapter 2.5 introduce more facilities that can deal with types. They also have to obey the rules that applied to built-in types and compound types we have learned. Chapter 2.6 introduces how can we write a safe header file for our data structure.

2019-08-02 21:25:07 247

原创 [大数据] HiveQL知识点

Q1：什么是数据倾斜，怎么产生，怎么解决？Q2：什么是hive的严格模式（strict mode）？　　Q3：order by, sort by, distribute by, cluster by的区别？Q4：collect_all()的作用？Q5：三个排名函数的区别？Q6:Hive原理Q7：Hive存储元数据的方式？Q8：Hive优化方法Q1：...

2019-08-02 11:54:06 268

原创 [机器学习算法推导与总结] 逻辑回归

梯度下降求解　　目标函数为： J(ω)=−1M∑i=1M[y(i)log(h(x(i)))+(1−y(i))log(1−h(x(i)))]J(ω)=−1M∑i=1M[y(i)log(h(x(i)))+(1−y(i))log(1−h(x(i)))]J(\omega)=-\frac{1}{M} \sum_{i=1}^M[ y^{(i)} log ( h( x^{(i)}))+(1- y^{(i...

2019-08-02 11:52:39 170

原创 [机器学习工程总结] 特征工程

数据清洗不可忽视修复数据用于训练和预测的样本数据，有可能口径是不一致的，如果不加以修复，可能会会导致线下效果很好，在线预测gg的惨剧。数据首先要注意的就是，用于训练和预测的是两条数据流，这两条数据流是不是一致的，因为只要是人开发的系统，即使你自己重新写两遍，因为上下游依赖的种种东西，它就可能不一致，就可能出毛病。这是最重要的，一致性的问题是最重要的。另外你用于训练的数据，是否...

2019-08-02 11:47:17 346 1

原创 [C++ Primer Reading Notes] Day 3

文章目录ScopeReviewNotes1. the type of a compound type2. about the reference2.1 the inner character of a reference2.2 definition and initialization of a reference3. about the pointer3.1 the inner characte...

2019-08-02 11:43:40 376

原创 [机器学习算法推导与总结] SVM

分类函数　　SVM是使用一个分类超平面来分类的： f(x)=ωTx+bf(x)=ωTx+bf(x)=\omega^T x+b 　　其中ωω\omega为分类超平面的法向量。因此样本到分类超平面的距离（Margin）是有正负之分的。当样本到分类超平面的距离&amp;amp;amp;amp;amp;gt;0时，类别为1，距离&amp;amp;amp;amp;amp;lt;0时，类别为-1。距离是这样计算的： d(xi)=ωTxi+b||ω||d(xi)

2019-08-01 14:40:10 194

原创 [C++ Primer Reading Notes] Day 2

Chapter 2 mainly introduces the built-in types and the mechanisms for defining a class. Chapter 2.1 and 2.2 focus on built-in types and variables. The characters of built-in types are closely tied to their representation on the machine's hardware. Once we

2019-08-01 10:58:51 526

原创 [C++ Primer Reading Notes] Day 1

Chapter 1 is a bridf introduction to C++. It focuses on the basic knowledge of this language instead of grammar, which is mainly about the main function, the input and output stream, writing the comment, the flow of control, and the class. It shows readers

2019-08-01 10:57:04 271

原创 [大数据] scala入门第一坑：变化多端的函数

在scala中，函数具有与变量同等的位置，函数也是一种变量。1. 函数的定义函数的定义有两种，一种是使用def定义，另一种是使用匿名函数定义。1.1 第一种定义：使用def格式：def 函数名(参数名:参数类型)=函数体def increase(x:Int)=x+1 //这是省略返回类型的写法def increase(x:Int):Int=x+1 //这是把返回类...

2018-08-08 14:36:50 382

原创 [机器学习算法推导与总结] 线性回归最小二乘法的两种求解方法对比

normal equation（解析法）　　求解目标函数如下，其中XXX为所有样本的所有特征，是一个M（M个样本）行N（N个特征）列的矩阵，YYY是M个样本的真实值，是M行的列向量，ωω\omega是回归系数，是N行的列向量。 minJ(ω)=||Y−Xω||2minJ(ω)=||Y−Xω||2min J(\omega)= ||Y-X\omega||^2 　　用解析法求解ωω\ome...

2018-08-06 14:03:38 4776 3

原创 [杂谈] 机器学习与优化算法的对比

机器学习算法的本质：　　在求解一个问题（输入X，输出什么？）时，不清楚问题的模型是什么，不知道各个变量之间符合什么规则（式子），所以干脆把现成的各种万能模型（线性回归、逻辑回归、SVM、神经网络等等）套用进去，希望这些万能的模型可以拟合实际问题的模型。　　但是这些万能模型里面具体的参数是未定的，需要使用大量数据进行学习。等参数确定之后，我们的拟合模型才完全确定。参数的学习相当于损失函数最小...

2018-08-02 16:33:06 4379 2