2016年11月_山谷來客

12月 11月 10月 08月 04月 03月 02月 01月

原创 pandas方法to_csv生成的数据导入hive方法汇总

step0:建表语句(hive脚本)USE databasename;CREATE TABLE OrderQuantity_Forecast_Table( masterhotel int COMMENT '酒店ID', orderdate string COMMENT '订单日期', city int COMMENT '城市ID', y_

2016-11-30 13:49:21 4937

原创 Scala DataFrame生成技巧

case1:List()到DataFrame()的简单转化//step1:我们首先创建一个case classcase class resultset(masterhotel:Int,quantity:Double,date:String,rank:Int,frcst_cii:Double,hotelid:Int)//step2//初始化resu

2016-11-24 00:07:52 14100 1

原创 hive自定义函数的python实现

案例1文件1：test.py# -*- coding: utf-8 -*-import sysfor line in sys.stdin: print line.strip('\n')文件2： input.loghello, world!python udf这是一个测试文件sys.stdin如何使用执行结果：[h

2016-11-21 16:51:46 5404

原创 Hive技能

一、动态分区举例：set hive.exec.dynamic.partition=true;set hive.exec.dynamic.partition.mode=nostrick;set hive.exec.max.dynamic.partitions.pernode=1000;set hive.exec.max.created.files=100000000;set

2016-11-17 14:52:07 633

转载 XGBoost参数调优完全指南（附Python代码）

译注：文内提供的代码和运行结果有一定差异，可以从这里下载完整代码对照参考。另外，我自己跟着教程做的时候，发现我的库无法解析字符串类型的特征，所以只用其中一部分特征做的，具体数值跟文章中不一样，反而可以帮助理解文章。所以大家其实也可以小小修改一下代码，不一定要完全跟着教程做~ ^0^需要提前安装好的库：numpy,matplotlib,pandas,xgboost,scikit-learn

2016-11-05 15:57:54 2035 1

spark版本xgboost的jar包

spark版本xgboost的jar包，博客中有scala-spark使用案例

2016-12-16

统计建模与R软件

R语言学习资料，薛毅，陈立萍编著。适合R语言初学者及用R语言做数据挖掘的工作人员。

2014-01-26

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人