wengyupeng-CSDN博客

原创 Linux Vim Python自动补全自动提示

Pydiction 可以让vi/vim 使用Tab键自动补全Python代码。Pydiction没有任何依赖包，主要包含三个文件。1. 下载Pydictioncd ~/.vim/bundlegit clone https://github.com/rkulla/pydiction.git2. 配置Pydiction #- UNIX/LINUX/O...

2019-09-02 13:52:27 4308

Data Warehouse Architecture: Traditional vs. CloudData warehouse architecture is changing. Learn about traditional EDW vs. cloud-based architectures with lower upfront cost, improved scalability and...

2019-07-30 15:44:49 3237

原创通过mongo-hadoop(pymongo_spark)从PySpark保存数据到MongoDB

一、背景PySpark to connect to MongoDB via mongo-hadoop二、配置步骤（注意版本作相应调整，spark-2.4.3，hadoop2.7，Scala2.11）1. # Get the MongoDB Java Driver#PROJECT_HOME 自定义的项目根目录，下面存放spark等mkdir -p $PR...

2019-07-12 17:58:16 1225

转载 Jupyter Notebook 字体设置& 代码自动提示补全

作者：湫兮出处：https://www.cnblogs.com/qiuxirufeng/p/9609031.html1.首先是主题下载，命令行如下所示：pip install --no-dependencies jupyterthemes==0.18.22. 安装好了，有的电脑可能会提示缺少 lesscpy，继续 pip 安装pip install lesscpy...

2019-07-12 16:55:14 3301

原创 Python(pyspark) only supports DataFrames and not RDDs

一、背景用Mongo Spark Connector 来连接 python（pyspark）和MongoDB：二、问题报下面错误：Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsNewAPIHadoopFile.: java.l...

2019-07-12 16:34:16 291 1

原创 Docker Jenkins 连接agent：SSH key presented by the remote host does not match the key saved in the Know

一、背景物理机A 上安装了Docker， Docker 上有container JenkinsJenkins 要配置连接物理机A作为agent A(因为要在物理机A上执行一些Job)。二、问题在Jenkins->nodes下配agent，选的 launch slave agents via SSH[07/10/19 09:35:19] ...

2019-07-10 18:03:01 1955

原创 Python in worker has different version 2.7 than that in driver 3.6

一、问题Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIV...

2019-06-05 10:21:12 2433

原创 shell 下python/PySpark 代码高亮和自动提示代码补全

1. 用法1. pip install "ptpython==0.41"#默认最新版本 ptpython-2.0.4 有bug, 自动提示不起作用。0.41的没问题#pip版本（pip --version)，决定你装的库基于python2 还是3.2. export PYSPARK_DRIVER_PYTHON=ptpython;...

2019-06-04 16:56:23 1316

原创 spark:pyspark shell python tab自动提示

1. home dir 下创建.pythonrc ~> vi .pythonrcimport rlcompleter, readlinereadline.parse_and_bind('tab: complete')2. 在shell的启动文件添加.pythonrc 查看是那种shell。如果是csh ， startup file是.cshr...

2019-06-04 16:23:22 775

原创 Alpine pyspark ModuleNotFoundError: No module named 'zlib'

一、问题（在Alpine 上装好spark，运行 pyspark）bash-4.4# pysparkPython 3.6.5 (default, May 30 2019, 09:48:14)[GCC 6.4.0] on linuxType "help", "copyright", "credits" or "license" for more information.Traceb...

2019-05-31 16:24:10 7234

原创 Alpine Linux 安装 python3.6

1: 下载 Code$ wget https://www.python.org/ftp/python/3.6.5/Python-3.6.5.tgz2: 准备环境//gcc --version 检查是否已经安装，若 bash: gcc: command not found 说明没安装apk add build-base //install GCC on...

2019-05-31 16:09:28 14124 3

原创 docker bash: vi: command not found

1. 问题vi /tmp/root/hive.logbash: vi: command not found2. 办法# cat /etc/issueDebian GNU/Linux 8# apt-get update# apt-get install vim

2019-05-23 17:59:12 939

原创 Java 什么是 CAS？通俗易懂

Java 并发机制实现原子操作有两种：一种是锁，一种是CAS。 CAS是Compare And Swap（比较并替换）的缩写。 java.util.concurrent.atomic中的很多类，如（AtomicInteger AtomicBoolean AtomicLong等）都使用了CAS。一. 示例： CAS机制当中使用了3个基本操作数：内存地址V，旧...

2019-05-15 17:25:24 14852 4

原创 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

1. 问题hive> LOAD DATA LOCAL INPATH '/root/data/cities.csv' OVERWRITE INTO TABLE cities;Loading data to table default.citiesFailed with exception Unable to move source file:/root/data/cities.cs...

2019-05-13 18:28:24 15251 2

原创 Cannot create directory /tmp/hive/root/xxx. Name node is in safe mode

1. 问题 Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/root/153df88d-1ef5-401b-bd81-d3026412e732. Name ...

2019-05-13 18:17:55 699

原创 hive:command not found in ubuntu /hdfs: command not found /hadoop:command not found

1、问题 hive:command not found in ubuntu hdfs: command not found hadoop:command not found2、解决方法# echo $SHELL/bin/bash#添加红色变量到.bashrcvi ~/.bashrcexport HIVE_HOME=/usr/local/...

2019-05-13 18:12:34 950

原创 Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)

1. 错误Exception in thread "main" java.lang.IllegalStateException: Failed to read Class-Path attribute from manifest of jar file:/C:/Users/.m2/repository/com/sun/jersey/jersey-client/1.19.1/jersey-cl...

2019-02-26 01:18:27 1727

原创 pull access denied for frolvlad/alpine-oraclejdk8, repository does not exist or may require '

一、问题 Step 1/6 : FROM frolvlad/alpine-oraclejdk8:slimERROR: Service 'config-server' failed to build: pull access denied for frolvlad/alpine-oraclejdk8, repository does not exist or may require 'dock...

2019-02-23 22:17:22 7266 1

转载二叉排序树相对哈希表的优点 Advantages of Binary Search Tree over Hash Table

1. Binary Search Trees (reference-based) are memory-efficient. They do not reserve more memory than they need to. For instance, if a hash function has a range R(h) = 0...100, then you need to a...

2018-12-17 22:46:11 511

转载图解—创建堆

转自： https://blog.csdn.net/u013254061/article/details/52514599 一.堆堆数据结构是一种数组对象，它可以被视为一棵完全二叉树结构。它的特点是父节点的值大于（小于）两个子节点的值（分别称为大顶堆和小顶堆）。二.堆的创建过程给定n个数，从n/2个节点开始，依次构建堆，直到第一个节点。举例：给定数组{5,23,3...

2018-12-17 12:38:49 2526

原创渐进性分析（asymptomatic analysis)& 大O的数学定义&时间复杂度

一、什么是渐进性分析？假设同一个任务，有2种算法，如何去找出那个更好？一个简单的办法——用两个程序实现这两种算法，然后输入不同的数据，在你电脑上运行这两个程序，看看那个需要的时间更少。用这种方法分析算法，有很多问题。1. 对一些输入，可能第一个性能更好；对另外一些，可能第二更好2. 对一些输入，第一个算法在一台机器上表现更好，另外一个算法在其它的机器上更好。渐进...

2018-12-16 19:23:23 6853 1

原创图解拓扑排序（Topological sort)

一、什么是拓扑排序下图就是拓扑排序拓扑排序其实是一个线性排序。——若图中存在一条有向边从u指向v，则在拓扑排序中u一定出现在v前面。维基百科拓扑排序的定义：a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such th...

2018-12-14 17:16:27 5965

转载图解-迪杰斯特拉算法（找最短路径）Dijkstra's Algorithm (finding shortestpaths)

转自：http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/11-Graph/dijkstra2.html 一. 图解迪杰斯特拉 Before showing you the pseudo code, I will first execute the Dijkstra's Algorithm on the followin...

2018-12-14 10:10:46 2741

转载图解-堆排序 in-place Heap Sort

转自：http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/9-BinTree/heap-sort2.html Short-coming of the previous "simple" Heap Sort algorithm Consider the (previously discussed) simple Hea...

2018-12-04 18:10:50 1020

转载图解-堆删除节点

转自：http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/9-BinTree/heap-delete.html Deleting a specific node from a Heap Problem description: You are given a heap...

2018-12-04 18:06:34 2362 1

转载图解-堆插入节点

转自：http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/9-BinTree/heap-insert.html Inserting a value into a Heap Problem description: You are given a heap. F...

2018-12-04 18:04:35 793

原创快速选择排序 Quick select 解决Top K 问题

1. 思想 Quick select算法通常用来在未排序的数组中寻找第k小/第k大的元素。 Quick select和Quick sort类似，核心是partition。 1. 什么是partition？（如下图，选44为pivot，把数组分为2部分，左边比44小，右边比44大）从数组中选一个数据作为pivot，根据每个数组的元素与该pivot的大小将整...

2018-12-03 18:07:49 8435

原创 M个苹果放在N个盘子里，有多少种不同的放法

1. 问题 M个同样的苹果放N个同样的盘子，允许有盘子空着, 问有多少种放法？ 2.分析令f(m,n)表示m个苹果放到n个盘子里有多少种放法，下面分类讨论: m<n时，至少有n-m个盘子空着(这些空盘子并不影响最后的结果，因为每种方法都带有着些空盘子)。只考虑m个苹果放m个盘子 f(m,n)=f(m,m) m>n时,按是否有空盘子 ...

2018-12-03 11:33:54 8505 1

原创 4.图解-折半插入排序 Binary insert sort

1. 思想折半插入排序是对直接插入排序的改进。直接插入排序就是不断的依次将元素插入前面已经排好序的序列中。由于前半部分为已经排好的序列，这样就不用按顺序依次比较寻找插入点，而是采用折半查找的方法来加快寻找插入点2.图解-折半查找将关键字 7 折半查找插入到 numbersArray。查找过程如上图int[] numbersArray = {1,3...

2018-11-29 16:47:28 1941 1

原创卡特兰数（catalan number）

1. 卡特兰数是什么卡塔兰数是组合数学中一个常在各种计数问题中出现的数列。公式为：前几项为（n=0,1,2,3,4,5时）: 1, 1, 2, 5, 14, 42 n=3时，C 3＝5 2. 应用2.1 一个栈的进栈次序为1、2、3……n。有多少种不同的出栈次序？当时，有如下5种不同的入出栈顺序 ...

2018-11-23 11:18:25 350

翻译 3. 图解-什么是堆

1. 完全二叉树 Complete binary tree Complete binary tree = a binary tree where all levels, except possibly the last level are completely filled with nodes Furthermore: the last level has all its n...

2018-11-22 16:05:58 351

原创 2. 图解-选择排序

1. 思想每一次从待排序的数据元素中选出最小（或最大）的一个元素，存放在序列的起始位置 2. 图解过程每一趟选最小的。例1：例2：3.CODE in JAVApublic class SelectionSort { public static void selectionSort(int[] a) { int n =...

2018-11-19 20:30:23 200

原创 1. 图解-直接插入排序

1. 思想直接插入排序就是打牌中抓牌插入的过程。（一般人都会整理为从小到大的序列）思想：按照数据的大小插入到有序队列的合适位置。 2. 图解过程例1例2 3. 算法分析排序类别排序方法时间复杂度空间复杂度 ...

2018-11-19 14:22:47 422

转载 Precision,Recall and F1-measure 准确率、召回率以及综合评价指标

通俗易懂，故转一下。转自：http://www.cnblogs.com/bluepoint2009/archive/2012/09/18/precision-recall-f_measures.html在信息检索和自然语言处理中经常会使用这些参数，下面简单介绍如下：1.准确率与召回率（Precision & Recall）我们先看下面这张图来加深对概念的理解，然...

2018-07-28 13:56:54 667

原创 SSH Connection closed by window cygwin

1. 背景用户A从Linux上ssh 到 window Server2008（已经安装cygwin和配置了SSH），报下面问题---------------------------------------ssh window64.com.cnConnection closed by 01.10.10.10//或者让输入 password: Permission denied, please tr...

2018-06-28 18:15:58 503 1

原创 linux shell将命令结果赋值给变量 shell assign command output to variable

1. 命令 1.1 反引号`` (也就是tab上面~键) ~ a=`echo "hello world"` ~ echo $a hello world 1.2 $() a=$(echo "hello world") echo $a hello world2. 实例截取目...

2018-04-16 18:19:22 10502

原创 jpaVendorAdapter....NoClassDefFoundError: org/hibernate/HibernateException in SpringBoot

1. 错误：SpringBoot项目maven build报：'jpaVendorAdapter' threw exception; nested exception is java.lang.NoClassDefFoundError: org/hibernate/HibernateException2.解决办法=========

2018-01-27 12:09:54 3516