Impala 大数据测试

19 篇文章 0 订阅


Impala 是什么不用介绍了,Impala  的作用是什么也不用介绍了,我分享的主要都是干货,

主要是在hadoop平台上,测试 Impala 的性能。如下的代码,都是测试的干货。


准备环境(安装好Impala在linux系统上,),测试数据和创建数据库 :


1.Generate Test Data.

execute the follwing commands:


a:mkdir test ; cd test
b:wget http://elasticmapreduce.s3.amazonaws.com/samples/impala/dbgen-1.0-jar-with-dependencies.jar
c:java -cp dbgen-1.0-jar-with-dependencies.jar DBGen -p /tmp/dbgen -b 1 -c 1 -t 1


java -Xms10240m -Xmx10240m -cp dbgen-1.0-jar-with-dependencies.jar DBGen -p /home/data -b 0.1 -c 1 -t 1
 
d:
hadoop fs -mkdir /data/
hadoop fs -put /tmp/dbgen/* /data/
hadoop fs -ls -h -R /data/


2.Create and Populate Impala Tables.
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/query-impala-create-tables.html
execute the follwing commands:


a: impala-shell


b: create EXTERNAL TABLE books( id BIGINT, isbn STRING, category STRING, publish_date TIMESTAMP, publisher STRING, price FLOAT )
                ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION '/data/books/';

c:create EXTERNAL TABLE customers( id BIGINT, name STRING, date_of_birth TIMESTAMP, gender STRING, state STRING, email STRING, phone STRING ) 
                    ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION '/data/customers/';

d:create EXTERNAL TABLE transactions( id BIGINT, customer_id BIGINT, book_id BIGINT, quantity INT, transaction_date TIMESTAMP ) 
                    ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION '/data/transactions/';


3.Query Data in Impala


a:


SELECT COUNT(*)
FROM customers
WHERE name = 'Harrison SMITH';


b:


SELECT category, count(*) cnt
FROM books
GROUP BY category
ORDER BY cnt DESC LIMIT 10;


c:


SELECT tmp.book_category, ROUND(tmp.revenue, 2) AS revenue
FROM (
  SELECT books.category AS book_category, SUM(books.price * transactions.quantity) AS revenue
  FROM books JOIN [SHUFFLE] transactions ON (
    transactions.book_id = books.id
    AND YEAR(transactions.transaction_date) BETWEEN 2008 AND 2010
  )
  GROUP BY books.category
) tmp
ORDER BY revenue DESC LIMIT 10;


d:
SELECT tmp.book_category, ROUND(tmp.revenue, 2) AS revenue
FROM (
  SELECT books.category AS book_category, SUM(books.price * transactions.quantity) AS revenue
  FROM books
  JOIN [SHUFFLE] transactions ON (
    transactions.book_id = books.id
  )
  JOIN [SHUFFLE] customers ON (
    transactions.customer_id = customers.id
    AND customers.state IN ('WA', 'CA', 'NY')
  )
  GROUP BY books.category
) tmp

ORDER BY revenue DESC LIMIT 10;


主要来自:

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-what-is-emr.html


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值