030 大数据之BI工具Zepplin

Apache Zeppelin : Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.

Interpreters in Apache Zeppelin
In this section, we will explain about the role of interpreters, interpreters group and interpreter settings in Zeppelin. The concept of Zeppelin interpreter allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Scala ( with Apache Spark ), Python ( with Apache Spark ), SparkSQL, JDBC, Markdown, Shell and so on.

1、IDEA通过JDBC试连KyLin

pom.xml添加kylin-jdbc依赖

<dependencies>
      <dependency>
          <groupId>org.apache.kylin</groupId>
          <artifactId>kylin-jdbc</artifactId>
          <version>3.0.2</version>
      </dependency>
</dependencies>

编写Java试连KyLin的代码

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;

public class TestKylin {

    public static void main(String[] args) throws Exception {

        //Kylin_JDBC 驱动
        String KYLIN_DRIVER = "org.apache.kylin.jdbc.Driver";

        //Kylin_URL
        // 连接字符串:jdbc:kylin://ip地址:7070/项目名称(project)
        String KYLIN_URL = "jdbc:kylin://hadoop102:7070/gmall";

        //Kylin的用户名
        String KYLIN_USER = "ADMIN";

        //Kylin的密码
        String KYLIN_PASSWD = "KYLIN";

        //添加驱动信息
        Class.forName(KYLIN_DRIVER);

        //获取连接
        Connection connection = DriverManager.getConnection(KYLIN_URL, KYLIN_USER, KYLIN_PASSWD);

        //预编译SQL
        PreparedStatement ps = connection.prepareStatement("SELECT SUM(sku_num) FROM DWD_ORDER_DETAIL ");

        //执行查询
        ResultSet resultSet = ps.executeQuery();

        //遍历打印
        while (resultSet.next()) {
            System.out.println(resultSet.getInt(1));
        }
    }
}

2、安装Zepplin

解压zeppelin

[atguigu@hadoop102 software]$ tar -zxvf zeppelin-0.8.0-bin-all.tgz -C /opt/module/

改端口号:默认是8080,为避免冲突,修改为其他端口号

[atguigu@hadoop102 conf]$ mv zeppelin-site.xml.template zeppelin-site.xml
[atguigu@hadoop102 conf]$ cat zeppelin-site.xml 
<property>
  <name>zeppelin.server.port</name>
  <!--改默认端口号8080为8000-->
  <value>8000</value>
  <description>Server port.</description>
</property>

启动zeppelin

[atguigu@hadoop102 zeppelin-0.8.0-bin-all]$ bin/zeppelin-daemon.sh start

登录网页:http://hadoop102:8000

3、通过Zepplin的KyLin解释器连接KyLin

配置Kylin解释器信息
在这里插入图片描述

4、通过Zepplin的python解释器连接KyLin

配置Python数据可视化环境

# conda创建python3.6环境
[atguigu@hadoop102 software]$ conda create --name python36 python=3.6
(python36) [atguigu@hadoop102 software]$ conda activate python36
# 安装kylinpy及绘图相关的包
(python36) [atguigu@hadoop102 software]$ pip install --upgrade kylinpy
(python36) [atguigu@hadoop102 software]$ pip install SQLAlchemy
(python36) [atguigu@hadoop102 software]$ pip install pandas
(python36) [atguigu@hadoop102 software]$ pip install matplotlib
(python36) [atguigu@hadoop102 software]$ conda deactivate

配置Python解释器信息
在这里插入图片描述
通过Python连接Kylin

import sys
import sqlalchemy as sa
import pandas as pd
import matplotlib.pyplot as plt

print(sys.version) 
print(sys.version_info)

# 配置kylin连接信息
kylin_engine = sa.create_engine('kylin://ADMIN:KYLIN@hadoop102:7070/gmall?version=v1')

# 编写SQL语句
sql = 'SELECT SUM(sku_num) FROM DWD_ORDER_DETAIL'

# 执行sql语句,获得执行结果
# results = kylin_engine.execute(sql)

# 执行sql语句,获得执行结果,并将结果存储到pandas
dataframe = pd.read_sql(sql, kylin_engine)

# 打印结果
print(dataframe)

# 将结果绘图
dataframe.plot(kind='pie')
plt.show()

5、pandas与matplotlib绘图入门

人人都能看懂的Matplotlib绘图原理

matplotlib.pyplot is a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area (Axes) in a figure, plots some lines in a plotting area, decorates the plot (title、Axis、label) with labels, etc.

In matplotlib.pyplot various states are preserved across function calls, so that it keeps track of things like the current figure and plotting area, and the plotting functions are directed to the current axes (please note that “axes” here and in most places in the documentation refers to the axes part of a figure and not the strict mathematical term for more than one axis).

import matplotlib.pyplot as plt
# plt是接口
print(type(plt),plt)
<class 'module'> <module 'matplotlib.pyplot' from 'C:\\Program Files\\Anaconda3\\lib\\site-packages\\matplotlib\\pyplot.py'>
fig, ax = plt.subplots()
# fig=figure, ax=Axes 是对象
print(type(fig),fig)
print(type(ax),ax)
<class 'matplotlib.figure.Figure'> Figure(432x288)
<class 'matplotlib.axes._subplots.AxesSubplot'> AxesSubplot(0.125,0.125;0.775x0.755)
# 为什么plt没有指定画布figure和区域Axes也能作图plot ?
"""
因为matplotlib默认在【最近创建】的画布上绘制,而当你没有指定区域,告诉它去画图,
他就会自动去生成一个Axes去绘制,进一步没有画布,也会自动去创建一个Figure,也称为隐式绘制。
""" 
plt.bar([1,2,3],[4,5,6])
# 当有多个Axes时,直接使用plt绘图只能操作最近一个创建的Axes
# 此时使用plt.subplots创建Axes数组,通过Axes元素单个操作每一个Axes
fig, axes = plt.subplots(1,3,figsize=(16,6))
ax1 = axes[0]
ax2 = axes[1]
ax3 = axes[2]
 
ax2.barh([1,2,3,4,5],[1,2,3,4,5])

plt.show()
plt.figure(figsize=(16,6))
ax1 = plt.subplot(1,3,1)

ax2 = plt.subplot(1,3,2)
# plt.bar得紧跟ax2才能操作ax2
plt.bar([1,2,3,4,5],[1,2,3,4,5])

# plt.subplots一次创建多个Axes,plt.subplot一次创建一个并每次指定Axes排列规则
ax3 = plt.subplot(1,3,3)

# ax1的Axes的plot非必须紧跟ax1
ax1.plot([1,2,3,4,5],[1,2,3,4,5])

plt.show()
fig = plt.figure(figsize=(16,6))
# plt.subplot、fig.add_subplot分别是通过pyplot接口、figure对象创建Axes
ax4 = fig.add_subplot(121)
ax4.plot([1,2,3,4,5],[1,2,3,4,5])
ax5 = fig.add_subplot(122)
ax5.plot([1,2,3,4,5],[1,2,3,4,5])

plt.show()

pandas.DataFrame和pandas.Series可以设置matplotlib axes object, default None
pandas.DataFrame.plot
pandas.Series.plot
matplotlib.axes

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值