- 博客(27)
- 收藏
- 关注
转载 Spark:The Definitive Book第十四章笔记
In addition to the Resilient Distributed Dataset (RDD) interface, the second kind of low-level API in Spark is two types of “distributed shared variables”: broadcast variables and accumulators. T...
2019-03-04 10:36:00
377
转载 Spark:The Definitive Book第十三章笔记
This chapter covers the advanced RDD operations and focuses on key–value RDDs, a powerful abstraction for manipulating data. We also touch on some more advanced topics like custom partitioning, a...
2019-03-04 10:03:00
303
转载 Spark:The Definitive Book第十二章笔记
What Are the Low-Level APIs? There are two sets of low-level APIs: there is one for manipulating distributed data (RDDs), and another for distributing and manipulating distributed shared variable...
2019-02-28 11:24:00
519
转载 Spark:The Definitive Book第十一章笔记
Datasets are a strictly Java Virtual Machine (JVM) language feature that work only with Scala and Java. Using Datasets, you can define the object that each row in your Dataset will consist of. In...
2019-02-23 14:51:00
408
转载 Spark:The Definitive Book第十章笔记
What Is SQL? Big Data and SQL: Apache Hive Big Data and SQL: Spark SQL The power of Spark SQL derives from several key facts: SQL analysts can now take advantage of Spark’s computation abilities ...
2019-02-23 11:05:00
350
转载 Spark:The Definitive Book第九章笔记
Spark Core DataSource: CSV JSON Parquet ORC JDBC/ODBC connections Plain-text files The Structure of the Data Sources API Read API Structure The core structure for reading data is as follows:Data...
2019-02-23 09:58:00
413
转载 Spark:The Definitive Book第八章笔记
Join Expressions A join brings together two sets of data, the left and the right, by comparing the value of one or more keys of the left and right and evaluating the result of a join expression t...
2019-02-19 12:29:00
237
转载 Spark:The Definitive Book第七章笔记
分组的类型: The simplest grouping is to just summarize a complete DataFrame by performing an aggregation in a select statement. A “group by” allows you to specify one or more keys as well as one or mo...
2019-02-19 11:06:00
256
转载 Spark:The Definitive Book第六章笔记
Where to Look for APIs DataFrame本质上是类型为Row的DataSet,需要多看https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset来发现API的更新。 DataFrameStatFunctions与DataFrameNaFunctions...
2019-02-16 12:40:00
220
转载 Spark:The Definitive Book第五章笔记
DataFrame由record序列组成,record的类型是Row类型。 columns代表者计算表达式可以在独立的record上运行。 Schema定义了各列的名称和数据类型。 分区定义了DataFrame和DataSet在集群上的物理分配。 Schemas 可以让数据源定义Schema(又叫做读时模式)或者自己明确定义模式。 警告:读时模式可能会导致精度问题,在用Spark做ET...
2019-02-14 16:58:00
226
转载 subgraph示例
import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val users: RDD[(VertexId, (String, String))] = sc.parallelize(Array( (3L, ("rxin", "student...
2018-12-20 11:40:00
578
转载 学习Spark GraphX
import org.apache.spark._ import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD val userGraph: Graph[(String, String), String] Name: Compile Error Message: <console>:30: error: ...
2018-12-20 11:07:00
170
转载 [速记]Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/doc...
https://www.cnblogs.com/informatics/p/8276172.html 转载于:https://www.cnblogs.com/DataNerd/p/9154489.html
2018-06-08 10:54:00
119
转载 [速记]python: symbol lookup error: /usr/lib/x86_64-linux-gnu/libatk-1.0.so.0: undefined symbol: g_log_...
python: symbol lookup error: /usr/lib/x86_64-linux-gnu/libatk-1.0.so.0: undefined symbol: g_log_structured_standard https://packages.debian.org/sid/amd64/libatk1.0-0/download sudo dpkg -i *.deb ...
2018-05-27 01:43:00
2361
转载 《Hive编程指南》14.3 投影变换的实践出错原因分析
自己在学习14.3节投影变换执行SQL语句hive (default)> SELECT TRANSFORM(col1, col2) USING '/bin/cut -f1' AS newA, newB FROM a;时出现了这个错误 Ended Job = job_local1231989520_0004 with errors Error during job, obtainin...
2018-05-03 22:48:00
730
转载 Maven No sources to compile
现象:自己在用maven执行package命令时出现No sources to compile提示,生成的jar文件没有class文件。 原因:项目不是使用maven创建的,项目的目录结构不正确。 解决方案:使用maven创建项目,来生成正确的目录结构。 参考网址:https://stackoverflow.com/questions/27897104/maven-no-sources-...
2018-05-01 00:44:00
1424
转载 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single (make-assem...
自己在使用maven进行package操作时出现 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single (make-assembly) on project hive-udf: Error reading assemblies: No assembly descri...
2018-04-30 19:42:00
3028
转载 在使用maven时出现Invalid packaging for parent pom.xml, must be _pom_ but is _xxx类问题的处理...
自己在使用maven进行clean操作时出现Invalid packaging for parent pom.xml, must be pom but is _jar这个错误。 在Stack Overflow上找到了类似的问题,https://stackoverflow.com/questions/13330930/invalid-packaging-for-parent-pom-xml...
2018-04-30 18:54:00
2008
转载 用Docker从零开始安装配置Hadoop环境
安装配置Centos 下载Docker镜像 docker pull centos 参考网址:https://hub.docker.com/_/centos/ 启动Docker镜像并进行必要配置 ifconfig, ssh-server, wget 问题:/usr/sbin/sshd -D 执行不成功, 处理方式:跳过 参考网址:https://blog.csdn.net/m...
2018-04-12 07:08:00
85
转载 在Spark2.1.0中使用Date作为DateFrame列
参考网址:How to store custom objects in Dataset? 转载于:https://www.cnblogs.com/DataNerd/p/8684613.html
2018-03-31 22:50:00
172
转载 写的一些代码
自己看书学习时写的练习代码GitHub地址 转载于:https://www.cnblogs.com/DataNerd/p/8680703.html
2018-03-31 03:08:00
106
转载 ScipyLectures-simple学习笔记
Chapter 1 1.4.3 中的常用 magic function。 Chapter 2 字符串复制 >>> 2*b 'hellohello' 类型转换 >>> float(1) 1.0 注意 整数除法 Python2 和Python3 的差别 # Python 2 >>> 3 / 2 1 # Python...
2017-12-05 19:01:00
151
转载 机器学习1一个月2017/11/24-2017/12/24
机器学习 andrew ng coursera 高等数学 上 高等数学 下 线性代数 概率论与数理统计 最优化导论 机器学习基石 机器学习技法 转载于:https://www.cnblogs.com/DataNerd/p/7890983.html...
2017-11-24 15:52:00
135
转载 机器学习课程 matlab 练习
Columns 6557 through 6560 -7.6419 -0.3008 -6.2724 -4.7964 Columns 6561 through 6564 -7.1002 -4.3957 -9.8648 -5.9318 Columns 6565 through 6568 -8.8009 -6.6060 ...
2017-11-15 13:50:00
1670
转载 2017年11月14日 星期二
2017年11月14日 周二 TODO 机器学习 Andrew Ng 2 机器学习 Andrew Ng 3 机器学习 Andrew Ng 4 机器学习 Andrew Ng 5 机器学习 Andrew Ng 6 机器学习 Andrew Ng 7 机器学习 Andrew Ng 8 机器学习 Andrew Ng 9 Done 机器学习 Andrew Ng 2-7 机器学习 Andrew Ng...
2017-11-15 00:39:00
250
转载 关于我
读过的书、看过的视频、学过的MOOC之不完全整理记录 环境搭建&配置&etc 看过的代码 写的一些代码 通过邮件联系我:bruce-du@hotmail.com 转载于:https://www.cnblogs.com/DataNerd/p/7831732.html...
2017-11-14 11:41:00
115
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人
RSS订阅