1.准备
Windows XP
jdk-1_5_0_14
weka-3-5-7.exe
SQLServer2005
mysql-6.0.0
Oracle10.2.0.1.0
Microsoft SQL Server 2005 JDBC Driver 1.2--->sqljdbc.jar
MySQL Driver for JDBC--->mysql-connector-java-5.1.6-bin.jar
Oracle Driver for JDBC--->ojdbc14.jar
2.双击weka-3-5-7.exe安装weka
3.进入weka安装目录
3.1.解压缩weka.jar
解压后的目录结构
[Weka-3-5]
|____...
|____[weka]
|____[META-INF]
|____...
|____[weka]
|____...
|____...
3.2.新建lib目录,将数据库Driver for JDBC(jar包)拷贝进/lib
完成后的目录结构
[Weka-3-5]
|____...
|____[weka]
|____[META-INF]
|____...
|____[weka]
|____...
|____[lib]
|____mysql-connector-java-5.1.6-bin.jar
|____ojdbc14.jar
|____sqljdbc.jar
|____...
4.设置环境变量
WEKA_HOME
C:/Program Files/Weka-3-5
ClassPath
.;%WEKA_HOME%/lib/sqljdbc.jar;%WEKA_HOME%/lib/mysql-connector-java-5.1.6-bin.jar;%WEKA_HOME%/lib/ojdbc14.jar;%JAVA_HOME%/lib/tools.jar;%JAVA_HOME%/lib/dt.jar
设置完成后,weka就能找到放在/lib中的数据库jar包了.
5.修改DatabaseUtils.props
进入%WEKA_HOME%/weka/weka/experiment/你会看到:
...
DatabaseUtils.props
DatabaseUtils.props.hsql
DatabaseUtils.props.mssqlserver2005
DatabaseUtils.props.mssqlserver
DatabaseUtils.props.mysql
DatabaseUtils.props.odbc
DatabaseUtils.props.oracle
DatabaseUtils.props.postgresql
...
weka运行时会使用DatabaseUtils.props
其他的如:'DatabaseUtils.props.数据库名称'(这些是weka提供的针对不同数据库提供的模板)
我们先将DatabaseUtils.props随便改成一个其他的名字,如:DatabaseUtils.props.sample
然后将DatabaseUtils.props.mysql改成DatabaseUtils.props(假设我们需要连接mysql数据库)
打开现在的DatabaseUtils.props可以看到以下部分:(#表示注释)[小弟的注释]
[版本信息]
# Database settings for MySQL 3.23.x, 4.x
[小弟连接的是MySQL6所以改成--># Database settings for MySQL 6.x]
#
# url: http://www.mysql.com/
# jdbc: http://www.mysql.com/products/connector/j/
# author: Fracpete (fracpete at waikato dot ac dot nz)
# version: $Revision: 1.3 $
[JDBC版本--># version: $Revision: 5.1 $]
# JDBC driver (comma-separated list)
jdbcDriver=org.gjt.mm.mysql.Driver
[修改为-->jdbcDriver=com.mysql.jdbc.Driver]
# database URL
jdbcURL=jdbc:mysql://server_name:3306/database_name
[这个建议不修改,方便后面进入weka后,通过修改相应的'server_name','datebase_name'来连接相应的mysql数据库.其实大家在这里像这样子jdbcURL=jdbc:mysql://localhost:3306/foodmart写死了也没什么,进入weka后同样可以修改,但显得不够专业不是!~]
# specific data types
# string, getString() = 0; --> nominal
# boolean, getBoolean() = 1; --> nominal
# double, getDouble() = 2; --> numeric
# byte, getByte() = 3; --> numeric
# short, getByte()= 4; --> numeric
# int, getInteger() = 5; --> numeric
# long, getLong() = 6; --> numeric
# gloat, getFloat() = 7; --> numeric
# date, getDate() = 8; --> date
# text, getString() = 9; --> string
[呵呵,这里是重点!由于weka仅支持名词型(nominal),数值型(numeric),字符串(string),日期(date).所以我们要将现在数据库中的数据类型对应到这四种类型上来.]
[将上面的内容改成:
# specific data types
string, getString() = 0; --> nominal
boolean, getBoolean() = 1; --> nominal
double, getDouble() = 2; --> numeric
byte, getByte() = 3; --> numeric
short, getByte()= 4; --> numeric
int, getInteger() = 5; --> numeric
long, getLong() = 6; --> numeric
gloat, getFloat() = 7; --> numeric
date, getDate() = 8; --> date
text, getString() = 9; --> string
TINYINT=3
SMALLINT=4
#SHORT=4
SHORT=5
INTEGER=5
INT=5
BIGINT=6
LONG=6
REAL=7
NUMERIC=2
DECIMAL=2
FLOAT=2
DOUBLE=2
CHAR=0
TEXT=0
VARCHAR=0
LONGVARCHAR=9
BINARY=0
VARBINARY=0
LONGVARBINARY=9
BIT=1
BLOB=9
DATE=8
TIME=8
DATETIME=8
TIMESTAMP=8
这里参考了一些网友的帖子,自己google了一些,这里MySQL常用的数据类型都设置好了,再也不用担心weka不识别对应的数据类型了^-^
大家注意,上面有部分'#'要去掉哦!
在附录中会提供小弟为大家精心准备的DatabaseUtils.props文件:
DatabaseUtils.props.mssqlserver2005_ok
DatabaseUtils.props.mysql6_ok
DatabaseUtils.props.oracle10g_ok
文件名大家随意,使用的时候记得改成DatabaseUtils.props就好]
# other options
CREATE_DOUBLE=DOUBLE
CREATE_STRING=TEXT
CREATE_INT=INT
checkUpperCaseNames=false
checkLowerCaseNames=false
checkForTable=true
[其他设置,暂时不用修改]
6.制作weka.jar并替换原来的jar
因为weka软件运行时需要读取weka.jar,所以你修改之后要重新打包jar文件替换原来的jar才可以运行weka软件成功连接数据库.
6.1.从命令行进入%WEKA_HOME%/weka
6.2.执行jar cvf weka.jar weka/*.*
6.3.进入%WEKA_HOME%/weka会发现打包好了的weka.jar(没有的请刷新一下)
6.4.将%WEKA_HOME%/weka下的weka.jar复制到%WEKA_HOME%(建议将原来的weka.jar改名成weka.jar.sample备用,大家今后如果针对不同数据库创建了多个weka.jar不妨将其改名成-->'weka.jar.数据库名',用的时候将后缀去掉就行,体力活咱做一次就够了!~)
7.运行weka
奇怪的问题:运行-->Weka 3.5(不带控制台)进入weka连不上数据库(mysql,oracle,sqlserver都不行),说找不到合适的JDBC DRIVER.但运行-->Weka 3.5 (with console)则全部正常.期待达人解答!~
不理它,能用就行,毕竟现在还附送个'控制台'!~
7.1.运行-->Weka 3.5 (with console)
7.2.选择Applications--->Explorer
7.3.选择Open DB...
7.4.选择User...
根据自己的情况修改Database URL,Username,Password.
7.5.选择Connect
注意窗口下方的Info里的信息!
... = true --->恭喜你,连接成功!~
... = false --->失败!~别灰心,向上一步步地检查,你离true不远了!~
7.6.连接成功后光标会自动选择Query栏,等着各位兄台来输入sql语句.小弟输入一个超简单的,然后选择Execute执行sql语句.
7.7.执行成功后在Result栏中会有数据显示.
7.8.选择OK,呵呵!~weka已经捕获了相关数据,并显示相关信息,接下来各位爱怎么玩,就怎么玩!~
7.9.如果我不写sql语句,在连接成功后直接选择OK,会怎么样?嘿嘿,weka会说连接数据库有问题,没有合适的驱动.什么也不显示.所以还是告诉它我们需要哪些数据,不然接下来就没得玩了啦^_^
8.参考帖子
C6H5NO2
WEKA连接数据库指南(mysql版)
http://bbs2.wekacn.org/viewtopic.php?f=2&t=216&sid=13cdd0c42079a4b719d5d54b83855780
kongter
weka连接mysql数据库(windows xp版),无需解压缩,无需配置DatabaseUtils.props文件
http://bbs2.wekacn.org/viewtopic.php?f=2&t=293&sid=13cdd0c42079a4b719d5d54b83855780
ps:小弟刚开始就是使用上面的方法(由于需要对%WEKA_HOME%/RunWeka.bat进行修改,一味的追求连上数据库,破坏了源程序的完整性.就算配好了,更换数据库时便需要再度对其修改.因此不推荐大家使用),运气不好,没整出来!~
数据挖掘青年(DMman)
Weka如何连接数据库
http://blogger.org.cn/blog/more.asp?name=DMman&id=24991
ps:呵呵,这是比较正统的方法,没有这篇帖子就没有今天小弟的作品!~
9.附录
这里给大家提供较为完整的配置方案,基本上够用.其中specific data types部分参照了各个数据库的数据类型说明,不常用的数据类型没有列出.(当然,有一部分比较bt的数据类型实在是不知道让weka如何对应,如果哪位高手有更全面的设置,欢迎提供!~)
9.1.DatabaseUtils.props.mssqlserver2005_ok
# Database settings for Microsoft SQL Server 2005 Express Edition
#
# url: http://www.microsoft.com/
# jdbc: http://msdn2.microsoft.com/en-us/data/aa937724.aspx
# author: Fracpete (fracpete at waikato dot ac dot nz)
# version: $Revision: 1.2 $
# JDBC driver (comma-separated list)
jdbcDriver=com.microsoft.sqlserver.jdbc.SQLServerDriver
# database URL
jdbcURL=jdbc:sqlserver://server_name;databaseName=database_name
# specific data types
string, getString() = 0; --> nominal
boolean, getBoolean() = 1; --> nominal
double, getDouble() = 2; --> numeric
byte, getByte() = 3; --> numeric
short, getByte()= 4; --> numeric
int, getInteger() = 5; --> numeric
long, getLong() = 6; --> numeric
gloat, getFloat() = 7; --> numeric
date, getDate() = 8; --> date
text, getString() = 9; --> string
bit=1
tinyint=3
smallint=4
int=5
bigint=6
smallmoney=2
money=2
numeric=2
decimal=2
float=2
real=2
smalldatetime=8
datetime=8
timestamp=8
char=0
text=0
varchar=0
nchar=0
ntext=0
nvarchar=0
binary=0
varbinary=0
image=0
uniqueidentifier=9
rowversion=9
# other options
CREATE_DOUBLE=DOUBLE PRECISION
CREATE_STRING=VARCHAR(8000)
CREATE_INT=INT
checkUpperCaseNames=false
checkLowerCaseNames=false
checkForTable=true
9.2.DatabaseUtils.props.mysql6_ok
# Database settings for MySQL 6.x
#
# url: http://www.mysql.com/
# jdbc: http://www.mysql.com/products/connector/j/
# author: Fracpete (fracpete at waikato dot ac dot nz)
# version: $Revision: 5.1 $
# JDBC driver (comma-separated list)
jdbcDriver=com.mysql.jdbc.Driver
# database URL
jdbcURL=jdbc:mysql://server_name:3306/database_name
# specific data types
string, getString() = 0; --> nominal
boolean, getBoolean() = 1; --> nominal
double, getDouble() = 2; --> numeric
byte, getByte() = 3; --> numeric
short, getByte()= 4; --> numeric
int, getInteger() = 5; --> numeric
long, getLong() = 6; --> numeric
gloat, getFloat() = 7; --> numeric
date, getDate() = 8; --> date
text, getString() = 9; --> string
TINYINT=3
SMALLINT=4
#SHORT=4
SHORT=5
INTEGER=5
INT=5
BIGINT=6
LONG=6
REAL=7
NUMERIC=2
DECIMAL=2
FLOAT=2
DOUBLE=2
CHAR=0
TEXT=0
VARCHAR=0
LONGVARCHAR=9
BINARY=0
VARBINARY=0
LONGVARBINARY=9
BIT=1
BLOB=9
DATE=8
TIME=8
DATETIME=8
TIMESTAMP=8
# other options
CREATE_DOUBLE=DOUBLE
CREATE_STRING=TEXT
CREATE_INT=INT
checkUpperCaseNames=false
checkLowerCaseNames=false
checkForTable=true
9.3.DatabaseUtils.props.oracle10g_ok
# Database settings for Oracle 10g Express Edition
#
# url: http://www.oracle.com/
# jdbc: http://www.oracle.com/technology/software/tech/java/sqlj_jdbc/
# author: Fracpete (fracpete at waikato dot ac dot nz)
# version: $Revision: 1.3 $
# JDBC driver (comma-separated list)
jdbcDriver=oracle.jdbc.driver.OracleDriver
# database URL
jdbcURL=jdbc:oracle:thin:@server_name:1521:database_name
# specific data types
string, getString() = 0; --> nominal
boolean, getBoolean() = 1; --> nominal
double, getDouble() = 2; --> numeric
byte, getByte() = 3; --> numeric
short, getByte()= 4; --> numeric
int, getInteger() = 5; --> numeric
long, getLong() = 6; --> numeric
gloat, getFloat() = 7; --> numeric
date, getDate() = 8; --> date
text, getString() = 9; --> string
CHAR=0
NCHAR=0
VARCHAR2=0
NVARCHAR2=0
RAW=9
NUMBER=2
BINARY_FLOAT=2
DATE=8
TIMESTAMP=8
ROWID=9
DOUBLE_PRECISION=2
# other options
CREATE_INT=INTEGER
CREATE_STRING=VARCHAR2(4000)
CREATE_DOUBLE=NUMBER
checkUpperCaseNames=true
checkForTable=true
From: http://blog.csdn.net/senaku/archive/2008/03/28/2225943.aspx