layout: post
title: windows环境下搭建spark开发环境(IDEA)
author: Yinux
categories:
[大数据, spark]
tag: 大数据
top: true
avatar:https://cdn.jsdelivr.net/gh/InfiniteYinux/cloud@master/avatar/avatar.png
author_url: http://www.codingpy.cn
thumbnail: true
mathjax: true
meta:
header: [title, author, date, categories]
footer: [updated, tags, share]
“大数据”(Big Data)指一般的软件工具难以捕捉、管理和分析的大容量数据。“大数据”之“大”,并不仅仅在于“容量之大”,更大的意义在于:通过对海量数据的交换、整合和分析,发现新的知识,创造新的价值,带来“大知识”、“大科技”、“大利润”和“大发展”。“大数据”能帮助企业找到一个个难题的答案,给企业带来前所未有的商业价值与机会。大数据同时也给企业的IT系统提出了巨大的挑战。通过不同行业的“大数据”应用状况,我们能够看到企业如何使用大数据和云计算技术,解决他们的难题,灵活、快速、高效地响应瞬息万变的市场需求。
前言
本文重点介绍在如何Windows 10下开发spark应用程序的依赖环境的搭建。
本章概要
版本说明
环境配置
jdk配置
scala安装配置
spark安装配置
hadoop安装配置
Intellij IDEA下载与配置
版本说明
jdk:1.8
scala:2.12.0
spark:2.4.3
hadoop:2.7.7
环境配置
jdk配置
下载:登录Oracle官网,接受协议,注册登录,选择对应版本。因为我的本机是64位Windows,所以需要下载64位(Windows x64)JDK安装包。
Windows下安装JDK非常方便,双击安装程序后,直接单击下一步即可,默认安装到
C:\Program Files\Java
目录下。其间会安装JRE,默认一下步即可。设置环境变量 :右键单击桌面上的“此电脑”图标,在弹出的右键快捷菜单中选择最后一个“属性”选项;在弹出的系统窗口中,单击左侧“高级系统设置”选项,弹出“系统属性”对话框,如下图。
然后单击中间的“高级”选项卡,再单击下方的“环境变量(N)…”按钮。在弹出的环境变量对话框中,首先单击下方的“新建(W)…”按钮,然后在弹出的新建环境变量中输入对应的值。
在环境变量中找到“Path”添加jdk和jre下bin的地址,如下图:
新建
CLASS_PATH
,如下图:检验配置是否成功,在cmd中运行java -version出现以下结果则说明jdk安装配置成功。
scala安装配置
下载:通过Spark官网下载页面 可知“Note: Starting version 2.0, Spark is built with Scala 2.11 by default.”,下载Spark2.4.3对应的
Scala 2.12.x
。登录Scala官网,单击download按钮,然后再“Other Releases”标题下找到“Last 2.12.x maintenance release - Scala 2.12.0”链接。进入downloan页面,下拉找到如下图内容,下载msi格式的安装包即可。安装: 默认安装到
C:\Program Files (x86)\scala
目录下环境变量:与设置Java环境变量类型,
SCALA_HOME=C:\Program Files (x86)\scala
Path
环境变量在最后追加;设置成功后在win+R
输入cmd
后打开命令行输入scala -version
可以看到安装的版本%SCALA_HOME%\bin
;
安装Maven
Maven的安装与配置可以参考:《Hadoop基础教程-第4章 HDFS的Java API(4.1 Maven入门)》
Intellij IDEA
上自带Maven,本文不再详细介绍Intellij IDEA下载与配置
下载与安装:登录官网,按照自己的需求下载(
ultimate
,旗舰版)或者(Community
,社区版)。Ultimate
版本是商业软件,需要付费,Community
版为免费版,足够平时日常开发需要。比如这里直接下载Community
启动,安装完成后,单击
IntelliJ IDEA
图标即可启动IntelliJ IDEA
. 由于是第一次安装,所以不需要导入配置。默认选项即可。选择“Evaluate for free”进入免费版
可以根据自身的习惯选择风格,并点击左下角“
Skip Remaining and Set Default”
安装
scala
插件:点击左下角:Configure->Plugins
搜索并安装scala安装完成后重启IDEA,然后开始配置全局scala SDK
配置JDK:首先打开
Project Structure
,如下图然后我们添加上文安装的JDK,配置完成后点击OK,如下图:配置JDK 配置全局scala SDK:选中“
Global Libraries
”,点击“+”号,在弹出的菜单中选中“Scala SDK”,如下图:在弹出的“
Select JAR's for the new Scala SDK
”中选择与本机scala版本一致的Version,在这里由于我的scala版本是2.12.0 所以我选择的是2.12.0版本点击右下角OK完成配置
创建Maven项目
单击“
Create New Project
”选择maven
点击
Next
,填写GroupID
和ArtifactID
点击Next,如下图:
点击Finish,如下图:(在此步骤可以更改Content root 和 Module file location 的路径)
创建完后右下角如果出现提示:
请点击
Enable Auto-Import
创建完后将scala框架添加到项目(若不设置有可能无法创建 scala class):在IDEA启动后进入的界面中,可以看到界面左侧的项目界面,已经有一个名称为simpleSpark的工程。请在该工程名称上右键单击,在弹出的菜单中,选择
Add Framework Surport
,在左侧有一排可勾选项,找到scala,勾选即可(我的这里没有找到,但是也能运行,为了确保无误,借用haijiege的图)将项目文件设置为source root ,选中scala–>右键
快捷菜单
–>Mark Directory as
–>Sources root
编辑代码
pom.xml
Spark2.4.3 Maven库请参见https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/2.4.3
<modelVersion>4.0.0modelVersion> <groupId>Test.packgroupId> <artifactId>SparkTestartifactId> <version>1.0-SNAPSHOTversion> <packaging>jarpackaging> <inceptionYear>2008inceptionYear> <properties> <spark.version>2.4.3spark.version> <scala.version>2.12.0scala.version> properties> <repositories> <repository> <id>nexus-aliyunid> <name>Nexus aliyunname> <url>http://maven.aliyun.com/nexus/content/groups/publicurl> repository> repositories> <pluginRepositories> <pluginRepository> <id>scala-tools.orgid> <name>Scala-Tools Maven2 Repositoryname> <url>http://scala-tools.org/repo-releasesurl> pluginRepository> pluginRepositories> <dependencies> <dependency> <groupId>org.apache.sparkgroupId> <artifactId>spark-core_2.12artifactId> <version>2.4.3version> dependency> <dependency> <groupId>junitgroupId> <artifactId>junitartifactId> <version>4.4version> <scope>testscope> dependency> <dependency> <groupId>org.specsgroupId> <artifactId>specsartifactId> <version>1.2.5version> <scope>testscope> dependency> dependencies> <build> <plugins> <plugin> <artifactId>maven-assembly-pluginartifactId> <version>2.2-beta-5version> <configuration> <classifier>distclassifier> <appendAssemblyId>trueappendAssemblyId> <descriptorRefs> <descriptor>jar-with-dependenciesdescriptor> descriptorRefs> configuration> <executions> <execution> <id>make-assemblyid> <phase>packagephase> <goals> <goal>singlegoal> goals> execution> executions> plugin> plugins> build>project>
保存pom.xml文件后,如果Intellij IDEA右下角出现如下提示,请单击“
Enable Auto-Import
”WordCount.scala :新建
Scala Class
类WordCount.scala
,Scala源文件后缀名是.scala。通过右键刚刚设置为sources root
的scala文件夹,就有了new
->scala class
的选项。新建一个scala class
,并且命名WordCount
,选择object类型。打开建好的WordCount.scala
文件,清空!然后黏贴以下代码:
程序运行
文件
数据文件
sampleDataSet:
2983651b-1867-4c69-9c00-c691dbfecb3b,2983651b-1867-4c69,2016/12/10 9:24,df8202bd-8-335312,76,male,f.jsp,window,360,182.84.175.85,河北省,保定,page_10,goods_425,shop_303,15484445f-9f36-442f-a023-00d8c18b0a8a,5484445f-9f36-442f,2016/12/10 9:36,6713b1b4-f-2716640,52,female,b.jsp,Linux,Firefox,182.91.216.230,湖北省,武汉,page_55,goods_838,shop_765,1fd13edd0-c00a-4bca-a918-b59799be5462,fd13edd0-c00a-4bca,2016/12/10 9:14,a4e25b76-2-2677513,32,female,h.jsp,Unix,IE,139.207.96.101,湖北省,黄冈,page_26,goods_641,shop_263,081e6d947-0b6a-46dd-ac16-9b739c441490,81e6d947-0b6a-46dd,2016/12/10 8:59,0188c353-2-292794,17,male,a.jsp,Unix,Safari,106.84.104.228,山西省,太原,page_78,goods_778,shop_77,0913673fa-cb37-47c8-a078-6cb1ab2e423c,913673fa-cb37-47c8,2016/12/10 9:26,3a39246f-6-68542,32,female,h.jsp,Linux,Firefox,139.206.242.96,湖南省,长沙,page_98,goods_585,shop_829,4804591be-b172-44c6-ba59-dee8bf557eca,804591be-b172-44c6,2016/12/10 10:20,cbc5e045-6-3702143,73,female,c.jsp,Linux,Chrome,210.34.162.246,湖南省,长沙,page_41,goods_392,shop_924,0fe5d743d-fa21-4577-b8d4-f9eba6fc4212,fe5d743d-fa21-4577,2016/12/10 8:42,b7bf048b-9-4685057,10,female,b.jsp,Unix,IE,171.8.174.15,上海,上海,page_69,goods_997,shop_477,0283b888f-7ad8-421f-8b94-e041e9c01234,283b888f-7ad8-421f,2016/12/10 10:29,28e3aedd-7-354474,50,male,c.jsp,Linux,Firefox,139.208.228.234,云南省,昆明,page_14,goods_653,shop_261,18d3799aa-3ab9-4837-99c2-bba7d1290d64,8d3799aa-3ab9-4837,2016/12/10 10:20,0fdaa9f8-8-1520544,72,female,a.jsp,window,Safari,139.209.79.243,湖北省,武汉,page_26,goods_90,shop_947,1ff29ee3f-4233-4c20-9233-a143db6877b8,ff29ee3f-4233-4c20,2016/12/10 10:37,62d08546-0-3198946,19,male,c.jsp,Unix,Opera,222.20.69.216,湖北省,黄冈,page_1,goods_636,shop_780,17ab3218f-739b-4aa7-a636-a06298cc7118,7ab3218f-739b-4aa7,2016/12/10 9:05,4f026dbf-1-1551672,39,male,b.jsp,Linux,Chrome,171.13.166.116,四川省,成都,page_51,goods_1,shop_84,05e7258f0-c203-4c14-9bfe-096bb4591406,5e7258f0-c203-4c14,2016/12/10 10:25,071885de-2-109737,1,female,d.jsp,Unix,360,171.11.73.242,上海,上海,page_99,goods_44,shop_821,211a4f38c-cac7-4fb5-bb77-9c860ceea20f,11a4f38c-cac7-4fb5,2016/12/10 7:54,0767e8b8-9-3700904,30,female,a.jsp,window,Chrome,36.63.109.195,山西省,太原,page_66,goods_773,shop_490,20a3995ac-7036-4bcd-ab49-5084c6f596e0,0a3995ac-7036-4bcd,2016/12/10 7:43,7e5935ba-b-1807435,39,female,g.jsp,Unix,Opera,210.42.10.187,湖北省,武汉,page_90,goods_976,shop_890,0beb71b00-18d1-4288-9a7c-09084fb2a66e,beb71b00-18d1-4288,2016/12/10 10:02,f879bf01-c-179190,34,female,c.jsp,Unix,360,222.81.123.250,湖北省,武汉,page_59,goods_658,shop_468,09cb4102b-380b-4f52-90a5-8eeb0015b7ca,9cb4102b-380b-4f52,2016/12/10 8:51,3671e26c-9-2892578,44,female,e.jsp,Linux,360,123.234.121.11,河南省,郑州,page_20,goods_510,shop_57,07c6dd29a-4230-491a-a386-b5eccec5578c,7c6dd29a-4230-491a,2016/12/10 8:12,f49e5172-8-5268823,28,female,b.jsp,Unix,IE,106.88.150.111,海南省,三亚,page_75,goods_444,shop_720,20e882711-c473-46f7-bca0-28d1faab2305,0e882711-c473-46f7,2016/12/10 10:04,0dc42bbb-f-1668224,18,female,i.jsp,window,Firefox,139.206.125.176,山西省,太原,page_33,goods_96,shop_448,401aef44a-06d6-4997-9f89-fac593bcde91,01aef44a-06d6-4997,2016/12/10 9:43,0121dfc6-5-4195464,5,female,a.jsp,window,Safari,182.86.177.237,山西省,太原,page_97,goods_820,shop_800,07f0286f0-1fdc-4ac7-accc-d04654ba7e39,7f0286f0-1fdc-4ac7,2016/12/10 9:26,86b1ba6a-1-6091699,64,male,e.jsp,Linux,IE,123.234.134.37,河北省,保定,page_44,goods_77,shop_593,3d0dafcfc-bd09-42bc-a96d-e2dc54f8bec9,d0dafcfc-bd09-42bc,2016/12/10 9:01,ae06785a-a-5056962,29,male,b.jsp,Linux,Firefox,210.34.53.155,海南省,三亚,page_87,goods_714,shop_866,4392be58f-cc0d-413b-8bad-deb235c6440e,392be58f-cc0d-413b,2016/12/10 9:17,698da0b3-3-707031,16,female,d.jsp,Linux,Chrome,222.39.3.50,福建省,厦门,page_67,goods_358,shop_125,356bfb76c-bb52-44c6-81f2-be4b547f47b5,56bfb76c-bb52-44c6,2016/12/10 10:01,c2c4ef2a-c-3100628,56,male,e.jsp,Linux,Firefox,210.38.90.13,福建省,泉州,page_9,goods_243,shop_668,0d9a81500-426c-4aa4-8d2a-b6b81faf2b84,d9a81500-426c-4aa4,2016/12/10 8:42,83193223-8-516018,74,female,i.jsp,Linux,Firefox,171.11.93.221,上海,上海,page_0,goods_905,shop_11,3a9fcaf34-3184-4a9f-a849-99853751d035,a9fcaf34-3184-4a9f,2016/12/10 8:35,43047c8a-7-6182956,52,female,h.jsp,Unix,360,36.56.172.149,湖北省,黄冈,page_98,goods_827,shop_559,384b8194d-8133-4f99-9589-1336e7bd29ff,84b8194d-8133-4f99,2016/12/10 8:49,ab1d396f-d-211581,30,female,c.jsp,Unix,IE,222.29.169.48,湖北省,武汉,page_27,goods_266,shop_149,334d96be3-df76-492b-b5a5-4fc4166d0098,34d96be3-df76-492b,2016/12/10 7:13,8a4a3bb4-8-1102798,30,male,f.jsp,Unix,Firefox,121.76.197.164,北京,北京,page_10,goods_699,shop_857,2e2f41083-8e9f-4c7e-96b4-ec6de92d2fe4,e2f41083-8e9f-4c7e,2016/12/10 8:49,045d3508-1-659358,19,male,d.jsp,Linux,360,106.88.85.184,四川省,成都,page_83,goods_215,shop_319,4ca698200-ac52-499d-87a4-382b1e0dff57,ca698200-ac52-499d,2016/12/10 8:53,d1fd36b1-9-55529,35,male,b.jsp,Unix,360,171.14.88.67,海南省,三亚,page_9,goods_184,shop_512,4488675f5-a022-48fe-84fc-0e2695d81842,488675f5-a022-48fe,2016/12/10 9:55,c1acaaea-1-6320546,59,male,b.jsp,window,Opera,36.59.107.168,山西省,太原,page_23,goods_683,shop_613,2e739d671-f209-4556-bfd4-c2223db5a1ba,e739d671-f209-4556,2016/12/10 7:10,e18f2335-f-2908748,34,female,g.jsp,Linux,Firefox,171.8.152.99,四川省,成都,page_23,goods_876,shop_270,299df9d19-9861-4425-81b0-7c55ca876b2f,99df9d19-9861-4425,2016/12/10 9:57,cf1da15c-d-3372284,22,male,f.jsp,window,Opera,139.206.112.136,黑龙江,哈尔滨,page_39,goods_618,shop_46,4c2c7409c-34ca-4a26-af11-fe2a63643efa,c2c7409c-34ca-4a26,2016/12/10 10:31,9f16d203-1-6776289,78,female,c.jsp,window,Firefox,222.61.221.40,福建省,泉州,page_92,goods_458,shop_122,275c601b8-7483-48ed-b59d-c932d8508990,75c601b8-7483-48ed,2016/12/10 9:10,96d2cbeb-e-787956,55,female,a.jsp,Unix,Chrome,210.30.98.152,湖北省,黄冈,page_75,goods_741,shop_972,3e051937a-e4c1-42be-bcc0-0cf0eaa70f85,e051937a-e4c1-42be,2016/12/10 7:29,b8b04c3c-0-4701139,22,male,e.jsp,window,Safari,61.235.235.180,海南省,三亚,page_22,goods_155,shop_300,47c1785b7-6e72-419a-a69e-2773df65fce2,7c1785b7-6e72-419a,2016/12/10 7:29,a16791d5-6-2051726,14,male,e.jsp,window,Opera,222.79.192.247,云南省,昆明,page_61,goods_965,shop_748,479394d02-3755-4d99-adaa-00d923bf4f00,79394d02-3755-4d99,2016/12/10 9:06,a9d6c03f-9-5197235,32,female,d.jsp,Unix,Firefox,123.235.10.123,北京,北京,page_94,goods_272,shop_183,06794a58f-636b-416c-9c0b-ea3dbf80d8f6,6794a58f-636b-416c,2016/12/10 10:27,b8eeacad-0-4839234,59,male,h.jsp,Unix,Chrome,171.13.218.136,福建省,厦门,page_34,goods_682,shop_515,1f139995b-3584-4e3e-985a-74a06fd0b89b,f139995b-3584-4e3e,2016/12/10 8:08,11740556-a-5445805,4,female,b.jsp,Linux,IE,121.77.18.19,上海,上海,page_81,goods_790,shop_209,16c274c84-1a3e-43f2-ae84-6ea555a9bfdb,6c274c84-1a3e-43f2,2016/12/10 7:33,e5ff1d84-0-1299133,42,male,d.jsp,window,IE,210.45.58.98,云南省,昆明,page_78,goods_304,shop_376,3d048a286-7bb0-4b4d-b1c8-8cc44cfa3dfd,d048a286-7bb0-4b4d,2016/12/10 10:18,dc363195-4-185391,33,female,c.jsp,Unix,IE,210.40.225.212,黑龙江,哈尔滨,page_45,goods_424,shop_600,4bdafbef8-65b9-404b-9eed-9b21ccf32ceb,bdafbef8-65b9-404b,2016/12/10 7:32,f13379c8-a-128317,43,male,i.jsp,window,IE,182.86.164.46,湖南省,长沙,page_25,goods_119,shop_264,4f8d600f5-4a90-47a1-b397-f80b50e44719,f8d600f5-4a90-47a1,2016/12/10 9:40,c7d3ea9c-1-876643,24,female,i.jsp,Unix,Firefox,123.233.180.172,湖北省,黄冈,page_6,goods_642,shop_677,214160619-8bf1-434d-af15-7318bfcec116,14160619-8bf1-434d,2016/12/10 8:07,65836e81-3-1479508,16,female,a.jsp,window,Safari,123.233.52.68,云南省,昆明,page_70,goods_401,shop_619,12cdf2d96-9e04-4899-9476-957abdd3d609,2cdf2d96-9e04-4899,2016/12/10 9:40,9d2b6186-a-4232155,13,female,h.jsp,Linux,Safari,139.214.59.191,湖南省,长沙,page_46,goods_183,shop_542,1939da5ea-a3dc-4d4f-abd2-a69ba8022c4d,939da5ea-a3dc-4d4f,2016/12/10 8:11,386c148b-d-1507592,55,male,d.jsp,window,Safari,171.15.187.6,福建省,泉州,page_5,goods_439,shop_199,12fbe6294-7bb0-4ff7-9aba-f2fd4ea5bd71,2fbe6294-7bb0-4ff7,2016/12/10 10:05,69d6f3c8-6-1954530,53,male,c.jsp,Unix,Safari,61.233.239.3,云南省,昆明,page_65,goods_116,shop_256,11fa96b1a-888c-4ecd-a853-16f99d62946c,1fa96b1a-888c-4ecd,2016/12/10 8:56,8147aa3b-6-1496554,63,female,c.jsp,Unix,360,61.236.59.151,福建省,厦门,page_26,goods_534,shop_678,2674f69f0-c32c-4673-bf15-6216d06f9ecd,674f69f0-c32c-4673,2016/12/10 8:13,79647bde-5-260238,32,male,i.jsp,window,IE,36.62.101.34,湖北省,武汉,page_10,goods_796,shop_809,12b6c4e54-5e95-4b96-b9cb-4b731d686530,2b6c4e54-5e95-4b96,2016/12/10 10:21,62a784a3-1-1413427,3,female,i.jsp,window,Safari,139.211.169.237,河北省,保定,page_53,goods_870,shop_973,127a6b62d-3fee-4723-a0d3-76604e54d0d4,27a6b62d-3fee-4723,2016/12/10 10:20,50bf17bc-8-6625864,46,male,f.jsp,window,Firefox,210.41.210.158,湖北省,武汉,page_58,goods_223,shop_583,0d62805c1-1c56-4ca0-891d-a0a63b02fb64,d62805c1-1c56-4ca0,2016/12/10 10:03,53a93163-3-2400283,79,male,h.jsp,Unix,Firefox,222.90.249.26,湖南省,长沙,page_1,goods_812,shop_15,1d2952e68-5952-40f6-b6b1-c5a21ab0dcdb,d2952e68-5952-40f6,2016/12/10 9:29,1cc1d3c6-6-2301392,33,female,e.jsp,Unix,Firefox,139.203.234.202,湖南省,长沙,page_72,goods_216,shop_34,14a7768dc-5080-4613-888d-42224500f931,4a7768dc-5080-4613,2016/12/10 8:44,76ef1883-4-4173032,73,female,b.jsp,Unix,Chrome,36.60.213.165,福建省,泉州,page_37,goods_861,shop_225,3656e76e9-5fc9-4afd-830c-93edf6d11455,656e76e9-5fc9-4afd,2016/12/10 8:24,9bc5dd0a-5-6322967,36,male,h.jsp,Linux,Opera,61.234.49.195,河南省,郑州,page_9,goods_795,shop_503,331071165-300f-48c1-9ce6-78fdb8fa97e5,31071165-300f-48c1,2016/12/10 9:37,d73814d4-0-5334871,33,male,f.jsp,Linux,IE,210.29.209.139,福建省,厦门,page_89,goods_116,shop_14,2a1b6186b-d411-4921-a1cf-f57920a6af9a,a1b6186b-d411-4921,2016/12/10 7:50,f63ea09c-1-5061472,45,male,i.jsp,Unix,Safari,121.76.223.137,湖北省,武汉,page_63,goods_142,shop_988,1275667a9-dfee-4630-b54a-6fe0a23d4d99,275667a9-dfee-4630,2016/12/10 10:28,1614cda4-6-6291530,59,female,h.jsp,window,Chrome,210.30.36.1,云南省,昆明,page_12,goods_795,shop_850,02d6eba62-c456-46a6-b411-582482d7d1ad,2d6eba62-c456-46a6,2016/12/10 8:54,c431a356-c-5246586,55,female,h.jsp,Linux,Chrome,210.46.183.94,湖北省,黄冈,page_17,goods_587,shop_261,41f8cb1b7-b111-4ba9-9bd5-1ce6b7a90874,1f8cb1b7-b111-4ba9,2016/12/10 7:14,4159b7a3-3-4138715,52,male,h.jsp,Linux,360,36.56.252.145,云南省,昆明,page_37,goods_603,shop_670,402c7cce6-c2b2-434c-b3be-cba378cb7800,02c7cce6-c2b2-434c,2016/12/10 10:00,5ef6b982-9-3566363,77,male,g.jsp,window,Chrome,121.77.134.152,江西省,南昌,page_90,goods_33,shop_964,3926bacca-6128-4961-972c-9725df532a94,926bacca-6128-4961,2016/12/10 7:50,ce895e7d-0-4957535,60,female,b.jsp,Unix,Opera,171.13.58.19,河南省,郑州,page_20,goods_787,shop_600,0e3d69c87-c040-47dd-b618-f642d7bfd792,e3d69c87-c040-47dd,2016/12/10 8:08,c4246a37-0-2571115,36,male,a.jsp,window,Chrome,139.200.242.73,福建省,厦门,page_88,goods_127,shop_347,1de5267e9-7414-4ff4-9965-a60ed0bfac03,de5267e9-7414-4ff4,2016/12/10 9:05,07e79697-6-1915723,59,male,a.jsp,window,Chrome,171.12.100.184,河南省,郑州,page_52,goods_353,shop_803,1ca436b40-b017-48a4-bab2-89a0cb25ecff,ca436b40-b017-48a4,2016/12/10 9:15,79276adc-9-750790,26,female,e.jsp,Linux,Safari,121.77.41.98,湖北省,黄冈,page_46,goods_938,shop_343,2e6843787-901f-4160-bbd3-f0a6958273c1,e6843787-901f-4160,2016/12/10 7:51,684bfa22-f-5820266,51,male,f.jsp,Linux,Chrome,182.89.230.5,北京,北京,page_83,goods_254,shop_151,013be0e4e-1118-4caf-b2ee-804c94ad92ce,13be0e4e-1118-4caf,2016/12/10 8:22,4448cdaa-4-5773322,5,female,d.jsp,window,IE,123.232.229.24,四川省,成都,page_55,goods_232,shop_974,193c9af19-1f23-416c-93e2-2ae7f22f168f,93c9af19-1f23-416c,2016/12/10 8:38,5e020c60-5-5245957,78,female,g.jsp,Linux,Opera,139.212.30.63,河北省,石家庄,page_18,goods_36,shop_247,34605af95-4602-4326-bdef-9a9876a68eca,4605af95-4602-4326,2016/12/10 10:09,f3069e77-c-724658,66,male,h.jsp,Unix,Firefox,222.45.71.65,上海,上海,page_99,goods_789,shop_133,4647f2a6f-b732-4c6f-b094-369c34eff4bf,647f2a6f-b732-4c6f,2016/12/10 7:20,be1305a3-4-2122574,17,male,i.jsp,Unix,IE,171.10.29.46,海南省,三亚,page_97,goods_914,shop_623,322047204-b5e9-4ff7-afcd-7c010a34edb8,22047204-b5e9-4ff7,2016/12/10 8:16,a5da5fe5-0-6063550,76,male,f.jsp,Unix,Safari,210.32.165.18,湖北省,黄冈,page_68,goods_172,shop_662,3aacba744-0f70-4560-a25f-81b0b3b42940,aacba744-0f70-4560,2016/12/10 9:28,133cd578-4-6123876,45,male,d.jsp,Unix,Opera,36.59.3.12,四川省,成都,page_70,goods_209,shop_633,3e64fa0bd-c622-4ca3-8e04-975ec780145b,e64fa0bd-c622-4ca3,2016/12/10 7:37,03a688ae-d-3633501,40,female,g.jsp,window,IE,61.232.231.251,湖北省,武汉,page_3,goods_430,shop_544,347c67587-76a9-4e26-90e9-4667fd7c5e63,47c67587-76a9-4e26,2016/12/10 8:28,1fb3b029-1-3598644,45,male,a.jsp,Unix,360,222.55.30.143,福建省,泉州,page_53,goods_656,shop_666,3dafc2d10-d307-48c1-b116-78dced9c36e9,dafc2d10-d307-48c1,2016/12/10 10:24,acafbe20-4-6564970,9,male,h.jsp,Linux,Chrome,36.57.180.218,福建省,泉州,page_70,goods_659,shop_544,285f87858-2a5f-4120-b621-e72fed646bcc,85f87858-2a5f-4120,2016/12/10 9:11,921d4141-7-5643235,47,female,h.jsp,window,Chrome,61.232.158.200,湖南省,长沙,page_43,goods_367,shop_365,46dfcada6-ee97-4b4b-8379-03d6fdf5a6e0,6dfcada6-ee97-4b4b,2016/12/10 10:31,5bb34012-d-2974791,72,male,i.jsp,window,360,106.92.170.85,湖北省,黄冈,page_25,goods_179,shop_634,180762a2a-18e3-40fe-92d3-666d45d48c88,80762a2a-18e3-40fe,2016/12/10 9:20,21f0a43a-e-5462851,64,male,i.jsp,window,Opera,121.76.1.232,河南省,郑州,page_68,goods_505,shop_103,4642b2838-8dc5-49f2-9375-871068831b7b,642b2838-8dc5-49f2,2016/12/10 8:54,8ed37193-4-254067,3,male,b.jsp,Unix,360,61.236.40.158,湖北省,黄冈,page_4,goods_663,shop_467,4d2177fd0-f256-414b-95d6-64cf973cd956,d2177fd0-f256-414b,2016/12/10 8:58,8580d4fe-3-4370748,1,male,h.jsp,window,Opera,182.90.61.40,湖南省,长沙,page_63,goods_414,shop_440,3d4d2e1d4-935b-4d6c-b814-aadebb70f86f,d4d2e1d4-935b-4d6c,2016/12/10 9:24,e16cfd5a-6-3669552,54,male,a.jsp,Linux,360,139.215.150.23,湖北省,武汉,page_54,goods_269,shop_967,0213c04e5-0f4a-44f1-93fc-ba20b56d12df,213c04e5-0f4a-44f1,2016/12/10 8:21,81e78063-e-5756122,0,male,g.jsp,Unix,Safari,222.37.130.113,湖北省,武汉,page_22,goods_211,shop_13,493a50ed3-1bef-4e95-9d7f-d811162eb8de,93a50ed3-1bef-4e95,2016/12/10 10:34,27a20fef-3-2408529,16,female,b.jsp,Unix,Opera,36.57.14.152,湖南省,长沙,page_72,goods_515,shop_555,30b9362b7-fe6b-4bc7-87e9-974035b23b6b,0b9362b7-fe6b-4bc7,2016/12/10 10:07,cce96f11-7-3839350,56,female,b.jsp,Linux,360,139.214.11.59,福建省,厦门,page_31,goods_942,shop_415,4353ac57e-1762-424e-aed6-7abf81d92c25,353ac57e-1762-424e,2016/12/10 10:24,12c641a8-f-3252490,75,female,f.jsp,window,Opera,123.233.35.166,北京,北京,page_45,goods_770,shop_656,1ccee87ee-6b8e-4a54-8584-7d56139ffcf6,ccee87ee-6b8e-4a54,2016/12/10 8:01,6b4c04c6-b-1899511,68,female,b.jsp,Linux,Safari,121.77.18.78,河南省,郑州,page_63,goods_622,shop_146,24998546e-1764-4e8f-a88c-5c534d191397,4998546e-1764-4e8f,2016/12/10 10:06,6eb9426e-0-5662898,46,female,c.jsp,window,Chrome,171.15.14.245,黑龙江,哈尔滨,page_50,goods_415,shop_327,03d24e375-b8fa-4d5d-97e3-cecb22bff8f4,3d24e375-b8fa-4d5d,2016/12/10 7:32,8f73f3b4-6-5150215,6,female,f.jsp,Unix,360,36.60.146.91,海南省,三亚,page_85,goods_257,shop_622,1c9a38a15-4b66-4287-a22c-75e08b17d9c2,c9a38a15-4b66-4287,2016/12/10 9:09,a99f74a0-d-630912,78,male,g.jsp,Unix,Firefox,106.95.26.239,湖北省,黄冈,page_83,goods_709,shop_916,03fa49165-d158-4d37-ba3d-dd56e4a53b81,3fa49165-d158-4d37,2016/12/10 7:50,267f69a9-4-1028039,50,male,c.jsp,Unix,IE,182.86.97.208,河北省,石家庄,page_51,goods_386,shop_317,012c56fba-f2e9-4114-99ed-c30849da1d5b,12c56fba-f2e9-4114,2016/12/10 9:55,b69f13b4-4-5707515,59,male,c.jsp,Unix,Opera,139.196.21.61,山西省,太原,page_34,goods_261,shop_394,213d4c541-7040-444a-ad97-734366a7a8f0,13d4c541-7040-444a,2016/12/10 10:23,635593e3-2-305888,63,female,d.jsp,Linux,Chrome,106.87.145.66,河南省,郑州,page_17,goods_695,shop_42,4528b3b2e-70c3-441a-904b-1826bca897ba,528b3b2e-70c3-441a,2016/12/10 7:35,3288bf66-1-4251051,4,female,c.jsp,Linux,360,106.84.160.5,湖南省,长沙,page_41,goods_577,shop_845,1b4e05184-c129-41d2-907e-efdbe7879f59,b4e05184-c129-41d2,2016/12/10 9:59,46fadc64-5-4065794,3,female,i.jsp,Linux,Firefox,61.236.25.39,山西省,太原,page_31,goods_525,shop_15,0cde9fef9-6cb2-4713-bb42-388db61a29cc,cde9fef9-6cb2-4713,2016/12/10 8:05,84ce85aa-8-1948677,46,female,e.jsp,Linux,IE,106.92.63.123,上海,上海,page_8,goods_815,shop_487,068af9f26-898a-4066-9379-098974e5cf38,68af9f26-898a-4066,2016/12/10 10:06,5a619e1c-0-441445,76,female,a.jsp,Linux,Safari,106.81.118.237,河南省,郑州,page_65,goods_952,shop_235,0e7a669b9-41d4-4859-bc5a-ff057cc0f326,e7a669b9-41d4-4859,2016/12/10 10:36,5c492d56-7-1235641,27,female,b.jsp,Unix,IE,36.59.241.120,福建省,厦门,page_24,goods_273,shop_226,1ccf66439-9d95-4aaa-a7cc-be8b4cab7d78,ccf66439-9d95-4aaa,2016/12/10 9:28,55f8213e-a-3815358,12,female,c.jsp,Linux,360,121.77.72.185,四川省,成都,page_1,goods_800,shop_961,08ab2aef1-0810-4b8d-9da8-53a7b3c8d911,8ab2aef1-0810-4b8d,2016/12/10 9:07,efa3ee27-d-6398014,69,female,f.jsp,Linux,Safari,139.205.53.13,云南省,昆明,page_34,goods_774,shop_873,2818914de-2e13-4cf7-9885-f17aecec275a,818914de-2e13-4cf7,2016/12/10 10:36,5c3d2ff1-a-5864780,5,male,b.jsp,Unix,IE,123.232.165.27,云南省,昆明,page_93,goods_96,shop_516,2
程序文件
WordCount:
package Hospit.trainimport org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext}object Gzhtest { val conf = new SparkConf().setAppName("TransformationAction").setMaster("local[4]") val sc = new SparkContext(conf) def main(args: Array[String]): Unit = { val data = sc.textFile("D:/Code/Spark/textfile/sampleDataSet")// val data = sc.textFile("data.log") // println(timeInterval("12")) ipGroupByPrc(data) println("--------------------------------------") ipGroupByTime(data) println("--------------------------------------") ipGroupByCity(data) println("--------------------------------------")// prctSttcIp(data) data.foreach(println) } def systemCount(data:RDD[String]): Unit ={ val res = data.map(_.split(',')).map(x=>(x(7),1)).reduceByKey(_+_) res.foreach(println) } def ipGroupByPrc(data:RDD[String]): Unit ={ val res = data.map(_.split(',')) .map(x=>(x(10),x(9))).distinct().groupByKey().map(x=>(x._1,x._2.size)) res.foreach(println) } def ipGroupByCity(data:RDD[String]): Unit ={ val res = data.map(_.split(',')) .map(x=>(x(11),x(9))).distinct().groupByKey().map(x=>(x._1,x._2.size)) res.foreach(println) } def ipGroupByTime(data:RDD[String]): Unit ={ val tmpres = data.map(_.split(',')).map(x=>(x(2),x(10),x(11),x(9))).cache() val res = tmpres.map(x=>((timeInterval(x._1.split(' ')(1).split(':')(0)),x._2),x._4)) .distinct() .groupByKey().map(x=>(x._1,x._2.size)) .map(x=>(x._1._1,(x._1._2,x._2))).groupByKey() res.foreach(println) } def timeInterval(time:String): String ={ val hour:Int = time.toInt if(hour<=9 && 0 <= hour ){ "[0-9]" } else if( (hour<=12 && 9 < hour )){ "[9-12]" } else if( (hour<=18 && 12 < hour )){ "[12-18]" } else{ "[18-24]" } }}
运行
在源文件代码中右键单击–>Run “WordCount
”
运行结果如下(输出的信息较多请上下翻一下就能找到)