Flink CEP 示例(可运行)

package com.cepimport org.apache.flink.api.common.serialization.SimpleStringSchemaimport org.apache.flink.cep.PatternSelectFunctionimport org.apache.flink.cep.pattern.conditions.SimpleConditionimport org.apache.flink.cep.scala.pattern.Patternimport .

2022-02-25

scal sdk plugin 地址 scala-intellij-bin**.zip

scal sdk plugin 地址 各个版本Adds support for the Scala language. The following features are available for free with IntelliJ IDEA Community Edition:Coding assistance (highlighting, completion, formatting, refactorings, etc.) Navigation, search, informat...

2021-07-09

HIve 使用MapReduce查询计算引擎,输出结果汉字显示乱码

HIve 使用MapReduce查询计算引擎,输出结果汉字显示乱码在配置MultiDelimitSerDe后,建立hive多分隔符表,select * from tab1 正常显示汉字;但是select s2,substr(s2,3) from db_mul.multi_delimiter_test 通过MR 引擎处理后,查询结果出现乱码。建表语句如下: create table db_mul.multi_delimiter_test( s1 string, s2 string, s3 string)

2020-11-24

Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found

Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not foundTo resolve this issue, do the following:Option 1: For clusters without Sentry:Manually add the jar from Hive/ Beeline before running the query:ADD JAR /opt/cloudera/parcels/CDH

2020-11-24

kudu 文件描述符 更改

kudu 文件描述符超过阀值kudu 文件描述符缺省打开文件数为32768在/etc/security/limits.d/下找到了cloudera的limit配置文件,里面限制为32768/etc/security/limits.d/cloudera-scm.conf修改:32768会覆盖系统配置,cm启动的进程最大打开文件数都是32768.要修改这个配置,需要修改cm...

2019-11-26

Idea Error:java: Compilation failed: internal java compiler error

Idea Error:java: Compilation failed: internal java compiler error解决办法很简单:File-->Setting...-->Build,Execution,Deployment-->Compiler-->Java Compiler 设置相应Module的target bytecode version的合适版本...

2019-11-26

HUE middleware INFO Processing exception: StandbyException: Operation category RAD is not supported

HUE middleware INFO Processing exception: StandbyException: Operation category RAD is not supported原因是:HDFS高可用(HA)活动节点变了,而HUE HDFS Web url没有变,导致HUE HDFS Web url用的是NameNode节点是standby namenode,所以出现问...

2019-11-21

Keras 更新指令

Keras 更新指令pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps

2019-01-19

CDH5 某机器节点每个角色都提示:此角色的主机的运行状况为存在隐患。 以下运行状况测试存在隐患 网络接口速度. 看看是不是网络问题

CDH5 某机器节点每个角色都提示:此角色的主机的运行状况为存在隐患。 以下运行状况测试存在隐患  网络接口速度.  看看是不是网络问题问题解决:   1、查找不是网络及网卡问题;2、查看防火前状态(OS:RHEL7.3)  发现防火墙是开着的 #systemctl status firewalld● firewalld.service - firewalld - dynamic...

2018-09-21

PR曲线,ROC曲线,AUC指标等,Accuracy vs Precision

混淆矩阵(Confusion Matrix): PR Precision-Recall曲线,这个东西应该是来源于信息检索中对相关性的评价吧,precision就是你检索出来的结果中,相关的比率;recall就是你检索出来的结果中,相关的结果占数据库中所有相关结果的比率;所以PR曲线要是绘制的话,可以先对decision进行排序,就可以当作一个rank值来用了,然后把分类问题

2018-02-02

java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeE

写了一个storm集成kfaka的程序,kafkaSpout消费的数据作为storm的数据源。运行报错如下:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brok

2018-01-10

CentOS 6.6 升级GCC G++ (当前最新版本为v6.1.0) (完整)

---恢复内容开始---CentOS 6.6 升级GCC G++ (当前最新GCC/G++版本为v6.1.0)没有便捷方式,yum update....   yum install 或者 添加yum 的 repo 文件 也不行, 只能更新到 4.4.7!then,  只能手动编译安装了,那么开始第一步下载源代码吧,GO!1、 获取安装包并解压wget http://ft

2017-03-18

Spark集群某些worker无法停止的原因分析和解决

今天想停止spark集群,发现执行stop-all.sh的时候spark的相关进程都无法停止。提示:no org.apache.spark.deploy.master.Master to stopno org.apache.spark.deploy.worker.Worker to stop上网查了一些资料,再翻看了一下stop-all.sh,stop-master.sh,

2017-03-13

centos install scipy 问题:File "scipy/linalg/setup.py", line 20, in configuration raise NotFoundE

依赖包:pyparsing、dateutil、scipy、numpy、libpng 1.2 (or later)、`freetype` 1.4 (or later)安装pyparsing:# pip install pyparsing安装numpy:# pip install numpy安装dateutil:# pip install

2017-03-13

Hbase万亿级存储性能优化总结

背景      Hbase主集群在生产环境已稳定运行有1年半时间,最大的单表region数已达7200多个,每天新增入库量就有百亿条,对hbase的认识经历了懵懂到熟的过程。为了应对业务数据的压力,hbase入库也由最初的单机多线程升级为有容灾机制的分布式入库,为及早发现集群中的问题,还开发了一套对hbase集群服务和应用全面监控的报警系统。总结下hbase优化(针对0.94版本)方面

2017-03-08

Spark(二): 内存管理

Spark 作为一个以擅长内存计算为优势的计算引擎,内存管理方案是其非常重要的模块; Spark的内存可以大体归为两类:execution和storage,前者包括shuffles、joins、sorts和aggregations所需内存,后者包括cache和节点间数据传输所需内存;在Spark 1.5和之前版本里,两者是静态配置的,不支持借用,spark1.6 对内存管理模块进行了优化,通过内存

2017-03-08

Spark(一): 基本架构及原理

Apache Spark是一个围绕速度、易用性和复杂分析构建的大数据处理框架,最初在2009年由加州大学伯克利分校的AMPLab开发,并于2010年成为Apache的开源项目之一,与Hadoop和Storm等其他大数据和MapReduce技术相比,Spark有如下优势:Spark提供了一个全面、统一的框架用于管理各种有着不同性质(文本数据、图表数据等)的数据集和数据源(批量数据或实时的流数

2017-03-08

centos7 能联通内网,但是不能访问外网网页问题

需要把连接配置文件(/etc/sysconfig/network-scripts/ifcfg-Shared_Wired_Connection)内容中:BOOTPROTO=none 改为 BOOTPROTO=static 或 BOOTPROTO=dhcp 即可。注:这个是网络配置参数:BOOTPROTO=static   静态IPBOOTPROTO=dhcp   动态

2017-03-03

Hue 安装问题django.core.exceptions.ImproperlyConfigured: Error loading MySQLdb module: libmysqlclient.so

问题:[root@master hue-3.11.0]# build/env/bin/hue syncdbTraceback (most recent call last):  File "build/env/bin/hue", line 9, in     load_entry_point('desktop==3.11.0', 'console_scripts', 'hue'

2017-03-01

park将数据写入hbase以及从hbase读取数据


2017-02-06

HIVE2:ERROR [main]: ql.Driver (:()) - FAILED: Execution Error, return code 1 from org.apache.hadoop.

在Hive2.1 on Tez环境中运行select count(*) from students;时,遇到ERROR [main]: ql.Driver (:()) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask查看hive日志,具体问题是:2016-12

2016-12-21

HIVE2 :beeline连接设置用户名和密码注意问题

beeline connect有几种方式,见hive-site.xml,缺省为NONE。     hive.server2.authentication    NONE          Expects one of [nosasl, none, ldap, kerberos, pam, custom].      Client authentication types

2016-12-19

HIVE2 Error: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteExc

LF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Connecting to jdbc:hive2://localhost:10

2016-12-19

https://packages.elastic.co/elasticsearch/2.3/centos/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22

操作系统CENTOS#  yum install  xinetdLoaded plugins: fastestmirror, refresh-packagekit, securitySetting up Install ProcessLoading mirror speeds from cached hostfile * base: mirrors.btte.net *

2016-11-29

hive2.1.insert、update、delete操作测试

hive2.1.insert、update、delete操作测试 在HIve缺省配置设置中,转换管理器不支持update跟delete操作。若要Hive支持update操作跟delete操作,必须额外再配置一些东西,详细见:https://cwiki.apache.org/confluence/display/Hive/Hive+TransactionsConfiguratio

2016-11-27

Hadoop列式存储引擎Parquet/ORC和snappy压缩

Hadoop列式存储引擎Parquet/ORC和snappy压缩原文  http://www.itweet.cn/2016/03/15/columnar-storage-parquet-and-orc/主题 Parquet Hadoop相对于传统的行式存储格式,列式存储引擎具有更高的压缩比,更少的IO操作而备受青睐。列式存储缺点:在column数很多,每次操作大部分

2016-11-26

Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.Runti

Caused by: org.apache.hive.service.cli.HiveSQLException: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hive, access

2016-11-26

Hadoop配置项整理(hdfs-site.xml)

HADOOP: hdfs-stie.xml配置: name valueDescription dfs.default.chunk.view.size32768namenode的http访问页面中针对每个文件的内容显示大小,通常无需设置。dfs.datanode.du.reserved1073741824每块磁盘

2016-11-26

YARN安装配置

(一)YARN初步理解yarn结构图如下:1、YARN  下一代的MapReduce系统框架,也称为MRv2(MapReduce version 2), 它是一个通用资源管理系统,可为上层应用提供统一的资源管理和调度。  YARN的基本思想是将JobTracker的两个主要功能(资源管理和作业调度/监控)分离,主要方法是创建一个全局的ResourceManager(

2016-11-26

hive配置参数的说明

hive配置参数的说明: hive.ddl.output.format:hive的ddl语句的输出格式,默认是text,纯文本,还有json格式,这个是0.90以后才出的新配置;hive.exec.script.wrapper:hive调用脚本时的包装器,默认是null,如果设置为python的话,那么在做脚本调用操作时语句会变为python ,null的话就是直接执行;h

2016-11-26

Hive2.1安装配置文件名称修改注意问题

注意问题:   要把hive-default.xml.template 改成hive-default.xml,发现此配置文件不会发生作用,还要把名称最终改为:hive-site.xml ,此名称的配置才会发生作用。

2016-11-25

Hive2.1:Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException:

Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D    at org.apache.hadoop.

2016-11-25

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveExcept

[root@master apache-hive-2.1.0-bin]# hiveSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/hive/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/s

2016-11-25

ubuntu系统查找命令

ubuntu系统查找命令一.以文件名查找:1. find 命令find  /  -name  "filename"目的:在根目录“/”开始搜被称为filename的文件,“filename”文件名可以包含通配符(*,?),注意:filename是文件名字符串,可以带双引号,也可不带find命令功能强大,它有很多选项让你以不同的方式搜索文件,例如,通过日期,文件大小,权限

2016-11-19

CentOS下安装postgresql 9.4

一、前言      PostgreSQL通常也简称Postgres,是一个关系型数据库管理系统,适用于各种Linux操作系统、Windows、Solaris、BSD和Mac OS X。PostgreSQL遵循PostgreSQL许可,是一个开源软件。PostgreSQL由PostgreSQL全球开发组开发,由极少数的公司志愿组成并进行监督管理,这些公司有红帽、EnterpriseDB等。

2016-11-19

solr教程,值得刚接触搜索开发人员一看


2016-11-19

Zookeeper实战之单机集群模式


2016-11-19

Linux CentOS6.5下编译安装MySQL

目录一、编译安装MySQL前的准备工作安装编译源码所需的工具和库yum install gcc gcc-c++ ncurses-devel perl  安装cmake,从http://www.cmake.org下载源码并编译安装wget http://www.cmake.org/

2016-11-19

Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.

异常详情如下:Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:us

2016-11-19

执行hive2.0中hplsql 遇到问题(未解决)

[root@master Desktop]# hplsql -f /home/hive/1.sqlSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/home/hive/apache-hive-2.0.0-bin/lib/hive-jdbc-2.0.0-standalon

2016-11-19

machine-learning-algorithms 机器学习算法

This book is an introduction to the world of machine learning, a topic that is becoming more and more important, not only for IT professionals and analysts but also for all those scientists and engineers who want to exploit the enormous power of techniques such as predictive analysis, classification,clustering and natural language processing. Of course, it's impossible to cover all the details with the appropriate precision; for this reason, some topics are only briefly described, giving the user the double opportunity to focus only on some fundamental concepts and, through the references, examine in depth all those elements that will generate much interest. I apologize in advance for any imprecision or mistakes, and I'd like to thank all Packt editors for their collaboration and constant attention.


Machine Learning with Spark - 2nd Edition

Throughout this book, we will focus on real-world applications of machine learning technology. While we may briefly delve into some theoretical aspects of machine learning algorithms and required maths for machine learning, the book will generally take a practical, applied approach with a focus on using examples and code to illustrate how to effectively use the features of Spark and MLlib, as well as other well-known and freely available packages for machine learning and data analysis, to create a useful machine learning system.


Hands-On Machine Learning with Scikit-Learn and TensorFlow ...

This book assumes that you know close to nothing about Machine Learning. Its goal is to give you the concepts, the intuitions, and the tools you need to actually implement programs capable of learning from data. We will cover a large number of techniques, from the simplest and most commonly used (such as linear regression) to some of the Deep Learning techniques that regularly win competitions. Rather than implementing our own toy versions of each algorithm, we will be using actual production-ready Python frameworks: Scikit-Learn is very easy to use, yet it implements many Machine Learning algorithms efficiently, so it makes for a great entry point to learn Machine Learning. TensorFlow is a more complex library for distributed numerical computation using data flow graphs. It makes it possible to train and run very large neural networks efficiently by distributing the computations across potentially thousands of multi-GPU servers. TensorFlow was created at Google and supports many of their large- scale Machine Learning applications.





Beginning Spring Boot 2 Applications and Microservices with the Spring Framework

Copyright © 2017 by K. Siva Prasad Reddy Spring is the most popular Java-based framework for building enterprise applications. The Spring framework provides a rich ecosystem of projects to address modern application needs, like security, simplified access to relational and NoSQL datastores, batch processing, integration with social networking sites, large volume of data streams processing, etc. As Spring is a very flexible and customizable framework, there are usually multiple ways to configure the application. Although it is a good thing to have multiple options, it can be overwhelming to the beginners. Spring Boot addresses this “Spring applications need complex configuration” problem by using its powerful autoconfiguration mechanism. Spring Boot is an opinionated framework following the “Convention Over Configuration” approach, which helps build Spring-based applications quickly and easily. The main goal of Spring Boot is to quickly create Spring-based applications without requiring the developers to write the same boilerplate configuration again and again. In recent years, the microservices architecture has become the preferred architecture style for building complex enterprise applications. Spring Boot is a great choice for building microservices-based applications using various Spring Cloud modules. This book will help you understand what Spring Boot is, how Spring Boot helps you build Spring-based applications quickly and easily, and the inner workings of Spring Boot using easy-to-follow examples.


Hadoop MapReduce 实战手册

这是一本学习Hadoop MapReduce 的一站式指南,完整介绍了 Hadoop生态体系,包括Hadoop平台安装、部署、运维等,以及 Hadoop生态系统成员Hive、Pig、HBase、Mahout等。最重要的是,书 中包含丰富的示例和多样的实际应用场景,以一种简单而直接的方式 呈现了90个实战攻略,并给出一步步的指导。本书从获取Hadoop并 在集群中运行Hadoop讲起,依次介绍了高级HDFS,高级Hadoop MapReduce管理,开发复杂的Hadoop MapReduce应用程序,Hadoop的 生态系统,统计分析,搜索与索引,聚类、推荐和寻找关联,海量文 本数据处理,云部署等内容。


Learning PySpark.pdf

In this book, we will guide you through the latest incarnation of Apache Spark using Python. We will show you how to read structured and unstructured data, how to use some fundamental data types available in PySpark, build machine learning models, operate on graphs, read streaming data, and deploy your models in the cloud. Each chapter will tackle different problem, and by the end of the book we hope you will be knowledgeable enough to solve other problems we did not have space to cover here.


Deep Learning with Hadoop.pdf

Deep Learning with Hadoop Copyright © 2017 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: February 2017 Production reference: 1130217 Published by Packt Publishing Ltd.


Hadoop 2.x Administration Cookbook

Hadoop is a distributed system with a large ecosystem, which is growing at an exponential rate, and hence it becomes important to get a grip on things and do a deep dive into the functioning of a Hadoop cluster in production. Whether you are new to Hadoop or a seasoned Hadoop specialist, this recipe book contains recipes to deep dive into Hadoop cluster configuration and optimization.


Docker开发指南 pdf

第一部分首先讲解什么是容器,以及为什么应该关注它。之后将示范Docker的基本操作。 最后会用较长篇幅来讲解 Docker 的基本概念和技术,其中包括 Docker 命令的概览。 ?? 第二部分讲解如何将 Docker 应用于软件开发的生命周期。首先讲解如何配置开发环境, 然后构建一个简单的 Web 应用,这个 Web 应用的例子将用于整个第二部分。这一部分还会涵盖开发、测试、集成,以及如何部署容器,如何有效地监控和记录生产环境的日志。 ?? 第三部分的内容更为深入,其中包括在多主机集群环境中,有哪些工具及技巧能使Docker 容器既安全又可靠地运行。这部分适合已经使用 Docker,并需要了解如何扩展或解决网络和安全问题的读者。


深入理解Redis(英文版)(Mastering Redis)

本书以由浅入深、由原理到应用场景的方式介绍了Redis 这款NoSQL 数据库产品。书中不仅细致地讲解了Redis 中的数据结构及流行的使用模式,还针对Redis 键的设计和管理,以及内存管理提出了建设性的方案。同时,作者深入Redis 源码,将其内部构造通过源代码调试的方式进行呈现。本书适合有一定NoSQL 经验的开发者或者架构师阅读。读者可以从书中找到许多应用场景和解决方案,例如Docker 部署、Redis 消息队列、基于Redis 的ETL 应用和基于Redis 的机器学习等。



本书包含了大量内容。我们采访了100多位创始人、投资人、内部创业者和创新者,他们中的许多人与我们分享了自己的经历,我们在书中呈现了30多个案例分析。我们也列出了许多 你可以立即应用的最佳实践模式。我们将这些内容分为四大部分。 第一部分聚焦于对精益创业和基本分析技术的理解,以及帮助你迈向成功的数据启示的思维方式。我们综述了很多现有的创业方法框架,并提出了我们自己专注于数据分析的框架。这是你在精益分析世界的第一课。在这部分末尾,你会对基本的分析技术有一个很好的了解。 第二部分展示如何将精益分析用于创业公司中。我们会以六种商业模式为例,讨论每个创业公司都要经历的五个发展阶段,在这些阶段中,企业逐渐探索出正确的产品和最佳的目标市场。我们也讨论了如何寻找你的业务的第一关键指标。读完这一部分,你会知道你所处的商业领域、所处的发展阶段以及应该去做的事情。 第三部分对指标的正常范围进行审视。除非你划出了一条不可逾越的底线,否则你永远不会知道你做得是好还是差。通过阅读这一部分,你会得到关于关键指标的一些参考值,并学到如何设置自己的目标。 第四部分展示了如何将精益分析用于你所在的组织,以改变组织内的文化,无论它是面向消费者或企业的创业公司,还是地位稳固的公司。毕竟数据驱动的方法不仅仅适用于初创企业。 大多数章的末尾都给出了一些问题,以帮助你思考和应用读过的内容。


Elasticsearch Server - Third Edition Learning February 2016

Elasticsearch Server - Third Edition Learning Rafał Kuć, Marek Rogoziński February 2016 1 customer reviews Leverage Elasticsearch to create a robust, fast, and flexible search solution with ease



熟悉SAP MDM主要功能 熟悉MDM客户端主要工具









