Spark快速入门系列(三)深入理解RDD

深入 RDD

目标

深入理解 RDD 的内在逻辑, 以及 RDD 的内部属性(RDD 由什么组成)

案例

需求

给定一个网站的访问记录, 俗称 Access log
计算其中出现的独立 IP, 以及其访问的次数

创建个数据文件access_log_sample.txt(数据量太大,存不到这里,先用100)

190.217.63.59 - - [01/Nov/2017:00:00:15 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
76.114.21.96 - - [01/Nov/2017:00:00:31 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//tricolor.entravision.com/sacramento/escucha-en-vivo/&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
206.126.121.204 - - [01/Nov/2017:00:00:46 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//zone.msn.com/gameplayer/gameplayer.aspx%3Fgame%3Dfamilyfeud&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
154.121.8.18 - - [01/Nov/2017:00:01:01 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.google.dz%2Fsearch&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
190.238.37.217 - - [01/Nov/2017:00:01:17 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
147.147.163.182 - - [01/Nov/2017:00:01:31 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fs-usweb.dotomi.com%2Frenderer%2FdelPublishersCookies.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0"
200.78.93.132 - - [01/Nov/2017:00:01:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/login/device-based/regular/login/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.200.173.170 - - [01/Nov/2017:00:01:59 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/glade.js&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.252.185.4 - - [01/Nov/2017:00:02:15 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.google.cm%2Fblank.html&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; rv:34.0) Gecko/20100101 Firefox/34.0"
190.90.22.125 - - [01/Nov/2017:00:02:29 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.raicesdeeuropa.com/grandes-obras-de-los-principales-escritores-nacidos-durante-el-siglo-xix/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.62.158 - - [01/Nov/2017:00:02:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//bancaporinternet.interbank.com.pe/Warhol/redireccionaInicioLogueo&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
122.54.153.240 - - [01/Nov/2017:00:03:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
181.64.62.158 - - [01/Nov/2017:00:03:16 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.239.8 - - [01/Nov/2017:00:03:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Frlz%3D1C2AOHY_esPE760PE760%26source%3Dhp%26ei%3DUw_5WeGVA4TjmAHO8aCgDw%26q%3Dfb%26oq%3Dfb%26gs_l%3Dpsy-ab.3..0i131k1j0l4j0i131k1l2j0l3.1767.1916.0.2135.2.2.0.0.0.0.144.269.0j2.2.0....0...1.1.64.psy-ab..0.2.267....0.pWGbpZy6zwg%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
190.110.200.41 - - [01/Nov/2017:00:03:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/rsrc.php/v3i0KB4/ye/l/es_LA/G6VcGRK_54X.js&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
77.180.73.169 - - [01/Nov/2017:00:04:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//gomovies.co/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.146.42.248 - - [01/Nov/2017:00:04:22 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//mm-a.akamaihd.net/160/sn/assets/common/3d/particle/ns2/texture/line_040.dxt%3Fv%3D25960&cat=content-delivery-network HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
181.64.146.165 - - [01/Nov/2017:00:04:39 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/rsrc.php/yR/r/lvSDckxyoU5.ogg&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"
201.240.33.214 - - [01/Nov/2017:00:04:55 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//es.savefrom.net/%23url%3Dhttp%3A//youtube.com/watch%3Fv%3Dgr_3VrQC8qY%26utm_source%3Dyoutube.com%26utm_medium%3Dshort_domains%26utm_campaign%3Dssyoutube.com&cat=software-download HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.56.58 - - [01/Nov/2017:00:05:10 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//scontent.flim5-4.fna.fbcdn.net/v/t1.0-1/p32x32/22310580_351017335344058_8554274362948717253_n.jpg%3Foh%3D5da979568a22e425b79b7ba788dbc30a%26oe%3D5A65BCC3&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.192.238 - - [01/Nov/2017:00:05:26 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/%3Fgws_rd%3Dssl&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.255.225.35 - - [01/Nov/2017:00:05:41 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.ar/search%3Fq%3D886971865721%26oq%3D886971865721%26aqs%3Dchrome..69i57.719j0j7%26sourceid%3Dchrome%26ie%3DUTF-8%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.211.197.246 - - [01/Nov/2017:00:05:56 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.netflix.com/logout%3Flocale%3Des-EC&cat=media-streaming HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.3.230.121 - - [01/Nov/2017:00:06:11 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//baixar.programanex.com.br/latest/setup_nex.exe&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
175.158.226.85 - - [01/Nov/2017:00:06:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.facebook.com/%3F_rdc%3D1%26_rdr&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.151.60.116 - - [01/Nov/2017:00:06:43 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//www.edhelper.com/edhelper_monthly.htm&cat=educational-institution HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.157.88 - - [01/Nov/2017:00:06:58 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//auth.kaybo1.com/member/login.html%3Fback_url%3Dhttp%3A//pb.kaybo1.com/event/evt20170301_event/event01.html&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
75.73.28.212 - - [01/Nov/2017:00:07:13 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.43.170.133 - - [01/Nov/2017:00:07:29 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.121.250.170 - - [01/Nov/2017:00:07:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//r1---sn-5mncvap8p5-a2ce.googlevideo.com/generate_204&cat=media-streaming HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
190.237.183.6 - - [01/Nov/2017:00:08:01 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//musicaq.biz/song.php%3Fid%3DQ2hpbml0byBEZWwgQW5kZSAtICAgIFByaW1pY2lhfGh0dHBzOi8vYXBpLnNvdW5kY2xvdWQuY29tL3RyYWNrcy83MjkxNjE2NS9zdHJlYW0%252FY2xpZW50X2lkPTBmOGZkYmJhYTIxYTliZDE4MjEwOTg2YTdkYzJkNzJj&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.67.2.102 - - [01/Nov/2017:00:08:17 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//lcperu.edestinos.com.pe/check-in-online&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.121.218.21 - - [01/Nov/2017:00:08:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.4kdownload.com/buy/videodownloader%3Fsource%3Dvideodownloader%26redirect-locale%3Des%26ui_source%3Dshow-on-run-3&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
76.23.172.162 - - [01/Nov/2017:00:08:48 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//open.spotify.com/&cat=music HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.101.27 - - [01/Nov/2017:00:09:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
114.186.152.178 - - [01/Nov/2017:00:09:21 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.234.56.200 - - [01/Nov/2017:00:09:36 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//toroadvertisingmedia.com/cr%3Fb%3D218558%26p%3D7550%26c%3D6608%26h%3D0d7386ae207d128d276c8fc974f8f99b%26l%3DCO%26tz%3D-5.0%26sh%3D768.0%26sw%3D1360.0%26ad.trans.id%3Dwzj9mrkhearh%26t%3D1509494794724%26u%3Dhttps%253A%252F%252Fwww.popcornvod.com%252Fwelcome.html%253Faff%253D4054%2526theme%253D0922%2526clickid%253DOCM2NjA4IzI0MyM3NTUwfDIxODU1OHxDT3wzfDF8fHx3emo5bXJraGVhcmh8fHw%2526pub%253D1400%2526sub_pub_id%253D&cat=unknown HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.233.78.10 - - [01/Nov/2017:00:09:52 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.flvto.biz/es/downloads/mp3/yt_5S-Fjz5CR5s/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.46.172.102 - - [01/Nov/2017:00:10:07 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//jkanime.net/kimi-ni-todoke-2/5/&cat=entertainment-and-art HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.148.47.237 - - [01/Nov/2017:00:10:24 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2800.0 Iron Safari/537.36"
182.251.246.12 - - [01/Nov/2017:00:10:39 +0000] "GET /webapi/getcategory?uri=yakusoku.cocoloni.jp&cat=society HTTP/1.1" 200 60 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
181.234.203.122 - - [01/Nov/2017:00:10:56 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//bonusbitcoin.co/faucet&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
103.4.190.242 - - [01/Nov/2017:00:11:10 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//s1-word-edit-15.cdn.office.net/we/s/1687297775_App_Scripts/2057/WordEditor.Wac.TellMeModel.js&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
72.182.173.74 - - [01/Nov/2017:00:11:23 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=http%3A//store.steampowered.com/agecheck/app/744640/&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.179.100.64 - - [01/Nov/2017:00:11:36 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//swx.cdn.skype.com/assets/v/0.0.300/audio/m4a/call-dialing.m4a&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
74.71.124.140 - - [01/Nov/2017:00:11:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//my.netzero.net/s/sp&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
94.189.216.28 - - [01/Nov/2017:00:12:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.nba.com/&cat=sport HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
187.131.9.222 - - [01/Nov/2017:00:12:17 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.google.com.mx/%3Fgfe_rd%3Dcr%26dcr%3D0%26ei%3DWRH5WeezN-bo8AeEoo7oDw&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
179.7.171.84 - - [01/Nov/2017:00:12:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
200.106.89.161 - - [01/Nov/2017:00:12:49 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.pnp.gob.pe%2Fadmision_EESTP_PNP%2Fprospecto_proceso_admision_ETSPNP_2017_II.pdf&cat=government HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0"
187.222.252.169 - - [01/Nov/2017:00:13:05 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//s1-word-view-15.cdn.office.net/wv/s/1687297775_resources/3082/progress16.gif&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.146.42.248 - - [01/Nov/2017:00:13:22 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//dquchx93qmjdu.cloudfront.net/s3/resources/sound/common/pickweapon_69eea0cef175a3faa11eca989f346a4c.mp3&cat=content-delivery-network HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
189.181.11.35 - - [01/Nov/2017:00:13:38 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//adexc.net/network/%3Fref_prm%3D28401%26clck%3Db0ajqvw8zzni%26pub_sd%3DM82IMGZFR%26ad_spv%3D549&cat=botnet HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
190.186.200.125 - - [01/Nov/2017:00:13:56 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fes-la.facebook.com%2F&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
190.171.208.228 - - [01/Nov/2017:00:14:11 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//chiquitests.com/enchinan/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
75.86.115.195 - - [01/Nov/2017:00:14:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; CrOS x86_64 9765.85.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.123 Safari/537.36"
201.240.33.221 - - [01/Nov/2017:00:14:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//cf-media.sndcdn.com/OaJxdnP5Fsen.128.mp3%3FPolicy%3DeyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGlhLnNuZGNkbi5jb20vT2FKeGRuUDVGc2VuLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MDk0OTU2Nzh9fX1dfQ__%26Signature%3DSjqtGj2LWI9SCvgiIzNXs4M7P7eA-OCfi%7E%7EMwNzxFQ-Pft1DLkoDuUx1vnqf0JC0BGKRegqep0hiMxiJMUUBVLYzEtZq0jZFZKz90zO8lyfvOG38vwnbUj68Jcpb6PTTvwLK1lK9Oo8RA1DSQ-NmA1v1yj8N0DQBZmEF2RXRbmXxgh7kSledHq2OFfQ1Im-OLJyvFEH2Mq-4c3YruyvdxSPxBOkp81CL53ceEm9oAYNThc-7HXv5LPbqB%7EOrcjqXi0VihyE4MSoIou08%7E3sZBNTpq2fB4RhP8TnoNblAQtWsPMEj%7EhXTX9cJ3WrOvb9k67DV3HKf0RYfpiX-jFTfog__%26Key-Pair-Id%3DAPKAJAGZ7VMH2PFPW6UQ&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
202.151.22.3 - - [01/Nov/2017:00:15:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_WebFilter&ver=0.19.6.9&url=http%3A%2F%2Fwww.fijitimes.com%2Fstory.aspx&cat=news-and-media HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
181.176.73.81 - - [01/Nov/2017:00:15:16 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.facebook.com/login.php%3Flogin_attempt%3D1%26lwv%3D110&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.208.200 - - [01/Nov/2017:00:15:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fsafe%3Dstrict%26hl%3Des%26biw%3D1366%26bih%3D662%26tbm%3Disch%26sa%3D1%26ei%3DWxD5WebYNIj4wASqorPIDw%26q%3Dcontribucion%26oq%3Dcon%26gs_l%3Dpsy-ab.1.1.0i67k1l5j0l5.447088.451222.0.455291.37.12.0.0.0.0.394.1792.2-4j2.7.0....0...1.1.64.psy-ab..31.5.1471.0..0i30k1.248.GHQlbsuDZcQ%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.137.144.93 - - [01/Nov/2017:00:15:48 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.ar.avon.com/REPSuite/orderEntry.page%3Fredirected%3Dtrue%26isSuccess%3DY&cat=shopping HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
49.148.209.194 - - [01/Nov/2017:00:16:03 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
73.213.34.16 - - [01/Nov/2017:00:16:18 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//swx.cdn.skype.com/assets/v/0.0.300/audio/m4a/call-outgoing-p1.m4a&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.240.247.104 - - [01/Nov/2017:00:16:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fq%3Dcomo+prepara+par+ahacer+una+mascara+de+pantomima%26rlz%3D1C1NHXL_esPE709PE709%26oq%3Dcomo+prepara+par+ahacer+una+mascara+de+pantomima%26aqs%3Dchrome..69i57.19536j0j7%26sourceid%3Dchrome%26ie%3DUTF-8%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.130.189.170 - - [01/Nov/2017:00:16:49 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//es.123rf.com/imagenes-de-archivo/ombligo.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.204.104.89 - - [01/Nov/2017:00:17:04 +0000] "GET /webapi/getcategory?uri=www.google.co.ve&cat=search-engine HTTP/1.1" 200 67 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
190.42.233.34 - - [01/Nov/2017:00:17:20 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.136.98.155 - - [01/Nov/2017:00:17:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//musicaq.biz/descargar-musica/9f352ef6-santana-the-game-of-love-ft-michelle-branch.html&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
147.147.163.182 - - [01/Nov/2017:00:17:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.worldtimebuddy.com%2F&cat=unknown HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0"
190.232.70.238 - - [01/Nov/2017:00:18:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//sv3.onlinevideoconverter.com/download%3Ffile%3De4c2d3a0e4a0c2&cat=adult-and-pornography HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.148.209.194 - - [01/Nov/2017:00:18:21 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.roblox.com/&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.170.192.60 - - [01/Nov/2017:00:18:36 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//encrypted-tbn0.gstatic.com/images%3Fq%3Dtbn%3AANd9GcQcdjN8-1NJnSeC6ptIlx7S0wZucgg1jzL4N-i7IWE_8o8-F0gmjw&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
161.18.215.235 - - [01/Nov/2017:00:18:50 +0000] "GET /webapi/getcategory?uri=www.wattpad.com&cat=personal-site-and-blog HTTP/1.1" 200 75 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
138.36.222.166 - - [01/Nov/2017:00:19:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.230.112.110 - - [01/Nov/2017:00:19:20 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//civilgeeks.com/categor%25C3%25ADa/hidraulica/&cat=personal-site-and-blog HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.234.49.7 - - [01/Nov/2017:00:19:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fq%3DPINTEREST%26oq%3DPINTERE%26aqs%3Dchrome.0.69i59j69i60j69i65j69i57j0l2.2160j0j1%26sourceid%3Dchrome%26ie%3DUTF-8%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.239.11 - - [01/Nov/2017:00:19:49 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//scontent.flim5-3.fna.fbcdn.net/v/t1.0-1/p32x32/22687826_1976412995963948_3676302371441952941_n.jpg%3Foh%3D7bc40797d744c7b5d94dd368ae4de823%26oe%3D5A6CCB3A&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
73.193.233.55 - - [01/Nov/2017:00:20:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web2.secureinternetbank.com/pbi_pbi1151/login/Remote/221272028&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.66.152.36 - - [01/Nov/2017:00:20:19 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//es.answers.yahoo.com/question/index%3Fqid%3D20120715103200AAX15LS&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.12.190.248 - - [01/Nov/2017:00:20:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=http%3A//mesgmy.ebay.com/ws/eBayISAPI.dll%3FViewMyMessages%26_trksid%3Dp2057872.m2034.l3912%26CurrentPage%3DMyeBayMyMessages%26ssPageName%3DSTRK%3AME%3ALNLK%3ANone%26FClassic%3Dtrue&cat=auctions HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.88.204.14 - - [01/Nov/2017:00:20:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.espn.com.ve/futbol/resultados/_/liga/todo/fecha/20171030&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
121.208.9.139 - - [01/Nov/2017:00:21:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//contest.cartoonnetwork.com.au/mobile/&cat=entertainment-and-art HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.237.181.93 - - [01/Nov/2017:00:21:21 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.226.68.247 - - [01/Nov/2017:00:21:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//articulo.mercadolibre.com.ar/MLA-666799963-ipod-classic-_JM&cat=auctions HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.237.218.215 - - [01/Nov/2017:00:21:51 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fsafe%3Dstrict%26rlz%3D1C1NHXL_esPE700PE709%26ei%3DlRP5WaLvKMS1wQS-vbIw%26q%3Dsword+art+online+temporada+3+capitulo+1+sub+espa%25C3%25B1ol%26oq%3Dsword+art+online+temporada+3%26gs_l%3Dpsy-ab.1.1.0i67k1l2j0l8.4652.4951.0.6388.2.2.0.0.0.0.395.650.2-1j1.2.0....0...1.1.64.psy-ab..0.2.635....0.mr5_VTgCxKQ%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.202.159.92 - - [01/Nov/2017:00:22:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=http%3A%2F%2Fwww.excelsior.com.mx%2Feuropa%23view-1&cat=news-and-media HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
49.145.255.136 - - [01/Nov/2017:00:22:23 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//ff81k.voluumtrk2.com/8dc38b77-7604-481b-bd63-11eaca6207e4%3FID%3D74575527&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
81.103.165.211 - - [01/Nov/2017:00:22:39 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
129.7.0.190 - - [01/Nov/2017:00:22:55 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//elearning.uh.edu/bbcswebdav/pid-4102743-dt-content-rid-27567989_1/xid-27567989_1&cat=sport HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.137.235.112 - - [01/Nov/2017:00:23:12 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www1.tarjetacencosud.com.ar/sociosce/context/initPrivada.action&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
71.228.46.247 - - [01/Nov/2017:00:23:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.udacity.com/courses/data-science&cat=educational-institution HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.189.90.132 - - [01/Nov/2017:00:23:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.93.5.254 - - [01/Nov/2017:00:24:01 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.banesconline.com/MANTIS/WEBSITE/imagenesinhouse/imagenesinhouse.aspx&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.176.85.164 - - [01/Nov/2017:00:24:15 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.facebook.com/login.php%3Flogin_attempt%3D1%26lwv%3D110&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.229.2.7 - - [01/Nov/2017:00:24:30 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//windows-file-explorer.softonic.com/%3Fex%3DDSK-309.5&cat=software-download HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
101.102.214.204 - - [01/Nov/2017:00:24:44 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_WebFilter&ver=0.19.6.9&url=https%3A%2F%2Fwww.chatwork.com%2F%23!rid37781593&cat=computer-information HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0"
107.130.125.138 - - [01/Nov/2017:00:25:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
187.204.183.145 - - [01/Nov/2017:00:25:14 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//educacion.app.jalisco.gob.mx/cas/Default.aspx&cat=government HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
88.26.241.195 - - [01/Nov/2017:00:25:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrlLexibook?appid=android_safebrowser&ver=1.2.4&url=https://www-cdn.whatsapp.net/android/2.17.393/WhatsApp.apk&cat=internet-communication HTTP/1.1" 200 149 "-" "Dalvik/1.6.0 (Linux; U; Android 4.4.2; MFS100ES Build/KOT49H)"
190.218.173.239 - - [01/Nov/2017:00:25:43 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//offer.alibaba.com/exclusive_US_EN.html%3Ftv%3D2%26isFeature%3Dtrue%26imp%3D5b1aor1btqfgc6v2rk7%26xp%3D-baxEQ7WcvtuK1U3YXZj3e11KlWATqHSv3HPF5tfWmkCmo1TaYp8yWdHlHT3IkKE4blNtS6vAcINPyVmlLV4u-mPaUrlz_JCb14tWvEsxKI%26pid%3D1018325%26td%3DPropellerads%26cv%3D1020192%26aff_id%3D182463618%26ct%3D2%26size%3D300_250%26cn%3DPA%26an%3D50001%26bm%3Dcpa%26tp1%3D372702377464%26src%3Dsaf&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"

代码展示

class AccessLogAgg {
  @Test
  def ipAgg(): Unit = {
    Logger.getLogger("org").setLevel(Level.ERROR)
    //TODO 创建SparkContext
    val conf = new SparkConf().setMaster("local[6]").setAppName("ip_agg")
    val sc = new SparkContext(conf)
    //TODO 读取文件,生成数据集
    val path = "dataset\\access_log_sample.txt"
    val source: RDD[String] = sc.textFile(path)

    //TODO 取出IP 赋予出现次数为1
    val ipRDD: RDD[(String, Int)] = source.map(x => (x.split(" ")(0), 1))
    //TODO 简单清洗
      //去除空的数据
      //去掉非法的数据
      //根据业务再整理一下数据
      val cleanRDD: RDD[(String, Int)] = ipRDD.filter(x => StringUtils.isNotEmpty(x._1))

    //TODO 根据IP出现的次数进行聚合
    val ipAggRDD: RDD[(String, Int)] = cleanRDD.reduceByKey(_ + _)
    //TODO 根据IP出现的次数进行排序 默认升序
    val sortRDD: RDD[(String, Int)] = ipAggRDD.sortBy(x => x._2, ascending = false)

    //TODO 取出结果打印结果
    sortRDD.foreach(println)
  }
}

针对这个小案例, 我们问出互相关联但是又方向不同的五个问题

1.假设要针对整个网站的历史数据进行处理, 量有 1T, 如何处理?

放在集群中, 利用集群多台计算机来并行处理

2.如何放在集群中运行?

在这里插入图片描述
简单来讲, 并行计算就是同时使用多个计算资源解决一个问题, 有如下四个要点

  • 要解决的问题必须可以分解为多个可以并发计算的部分
  • 每个部分要可以在不同处理器上被同时执行
  • 需要一个共享内存的机制
  • 需要一个总体上的协作机制来进行调度

3.如果放在集群中的话, 可能要对整个计算任务进行分解, 如何分解?

在这里插入图片描述
概述

  • 对于 HDFS 中的文件, 是分为不同的 Block 的
  • 在进行计算的时候, 就可以按照 Block 来划分, 每一个 Block 对应一个不同的计算单元

扩展

  • RDD 并没有真实的存放数据, 数据是从 HDFS 中读取的, 在计算的过程中读取即可
  • RDD 至少是需要可以 分片 的, 因为HDFS中的文件就是分片的, RDD 分片的意义在于表示对源数据集每个分片的计算, RDD 可以分片也意味着 可以并行计算

4.移动数据不如移动计算是一个基础的优化, 如何做到?

每一个计算单元需要记录其存储单元的位置, 尽量调度过去
每一个计算单元需要记录其存储单元的位置, 尽量调度过去

5.在集群中运行, 需要很多节点之间配合, 出错的概率也更高, 出错了怎么办?
在这里插入图片描述
RDD1 → RDD2 → RDD3 这个过程中, RDD2 出错了, 有两种办法可以解决

  • 缓存 RDD2 的数据, 直接恢复 RDD2, 类似 HDFS 的备份机制
  • 记录 RDD2 的依赖关系, 通过其父级的 RDD 来恢复 RDD2, 这种方式会少很多数据的交互和保存

如何通过父级 RDD 来恢复?

  • 记录 RDD2 的父亲是 RDD1
  • 记录 RDD2 的计算函数, 例如记录 RDD2 = RDD1.map(…​), map(…​) 就是计算函数
  • 当 RDD2 计算出错的时候, 可以通过父级 RDD 和计算函数来恢复 RDD2

6.假如任务特别复杂, 流程特别长, 有很多 RDD 之间有依赖关系, 如何优化?

在这里插入图片描述
上面提到了可以使用依赖关系来进行容错, 但是如果依赖关系特别长的时候, 这种方式其实也比较低效, 这个时候就应该使用另外一种方式, 也就是记录数据集的状态

在 Spark 中有两个手段可以做到

  • 缓存
  • Checkpoint

再谈 RDD

目标

  1. 理解 RDD 为什么会出现
  2. 理解 RDD 的主要特点
  3. 理解 RDD 的五大属性

RDD 为什么会出现?

在 RDD 出现之前, 当时 MapReduce 是比较主流的, 而 MapReduce 如何执行迭代计算的任务呢?

在这里插入图片描述
多个 MapReduce 任务之间没有基于内存的数据共享方式, 只能通过磁盘来进行共享

这种方式明显比较低效

RDD 如何解决迭代计算非常低效的问题呢?

在这里插入图片描述在 Spark 中, 其实最终 Job3 从逻辑上的计算过程是: Job3 = (Job1.map).filter, 整个过程是共享内存的, 而不需要将中间结果存放在可靠的分布式文件系统中

这种方式可以在保证容错的前提下, 提供更多的灵活, 更快的执行速度.

RDD 的特点

RDD 不仅是数据集, 也是编程模型
RDD 即是一种数据结构, 同时也提供了上层 API, 同时 RDD 的 API 和 Scala 中对集合运算的 API 非常类似, 同样也都是各种算子

在这里插入图片描述
RDD 的算子大致分为两类:

  • Transformation 转换操作, 例如 map flatMap filter 等
  • Action 动作操作, 例如 reduce collect show 等

执行 RDD 的时候, 在执行到转换操作的时候, 并不会立刻执行, 直到遇见了 Action 操作, 才会触发真正的执行, 这个特点叫做 惰性求值

RDD 可以分区

在这里插入图片描述
RDD 是一个分布式计算框架, 所以, 一定是要能够进行分区计算的, 只有分区了, 才能利用集群的并行计算能力

同时, RDD 不需要始终被具体化, 也就是说: RDD 中可以没有数据, 只要有足够的信息知道自己是从谁计算得来的就可以, 这是一种非常高效的容错方式

RDD 是只读的

在这里插入图片描述
RDD 是只读的, 不允许任何形式的修改. 虽说不能因为 RDD 和 HDFS 是只读的, 就认为分布式存储系统必须设计为只读的. 但是设计为只读的, 会显著降低问题的复杂度, 因为 RDD 需要可以容错, 可以惰性求值, 可以移动计算, 所以很难支持修改.

  • RDD2 中可能没有数据, 只是保留了依赖关系和计算函数, 那修改啥?
  • 如果因为支持修改, 而必须保存数据的话, 怎么容错?
  • 如果允许修改, 如何定位要修改的那一行? RDD 的转换是粗粒度的, 也就是说, RDD 并不感知具体每一行在哪.

RDD 是可以容错的

在这里插入图片描述
RDD 的容错有两种方式

  • 保存 RDD 之间的依赖关系, 以及计算函数, 出现错误重新计算
  • 直接将 RDD 的数据存放在外部存储系统, 出现错误直接读取, Checkpoint

什么叫做弹性分布式数据集

分布式

  • RDD 支持分区, 可以运行在集群中

弹性

  • RDD 支持高效的容错
  • RDD 中的数据即可以缓存在内存中, 也可以缓存在磁盘中, 也可以缓存在外部存储中

数据集

  • RDD 可以不保存具体数据, 只保留创建自己的必备信息, 例如依赖和计算函数
  • RDD 也可以缓存起来, 相当于存储具体数据

总结: RDD 的五大属性

首先整理一下上面所提到的 RDD 所要实现的功能:

  • RDD 有分区
  • RDD 要可以通过依赖关系和计算函数进行容错
  • RDD 要针对数据本地性进行优化
  • RDD 支持 MapReduce 形式的计算, 所以要能够对数据进行 Shuffled

对于 RDD 来说, 其中应该有什么内容呢? 如果站在 RDD 设计者的角度上, 这个类中, 至少需要什么属性?

  • Partition List 分片列表, 记录 RDD 的分片, 可以在创建 RDD 的时候指定分区数目, 也可以通过算子来生成新的 RDD 从而改变分区数目
  • Compute Function 为了实现容错, 需要记录 RDD 之间转换所执行的计算函数
  • RDD Dependencies RDD 之间的依赖关系, 要在 RDD 中记录其上级 RDD 是谁, 从而实现容错和计算
  • Partitioner 为了执行 Shuffled 操作, 必须要有一个函数用来计算数据应该发往哪个分区
  • Preferred Location 优先位置, 为了实现数据本地性操作, 从而移动计算而不是移动存储, 需要记录每个 RDD 分区最好应该放置在什么位置
  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值