mahout0.7 使用 JDBCDataModel

首先创建在mysql中创建库以及对应的表

mysql> create database mahout;
Query OK, 1 row affected (0.00 sec)
mysql> use mahout;
Database changed
mysql> create table intro(
    ->  uid varchar(20) not null,
    ->  iid varchar(50) not null,
    ->  val varchar(50) not null,
    ->  time varchar(50) default null
    -> );

注意 在计算的时候会损耗大量资源 建议 添加索引 在my.ini当中设置各种调优参数

(这里只是为了实现功能)

插入数据 (这里就使用mahout in action 第一个推荐例子当中的数据 注意 要把里面的空行删除 不然会有不能为空的提示)


mysql> load data local infile 'D:/intro.csv' replace into table intro fields terminated by ',' lines terminated by '\n' (@col1,@col2,@col3) set uid=@col1,iid=@col2,val=@col3;
Query OK, 21 rows affected (0.19 sec)
Records: 21  Deleted: 0  Skipped: 0  Warnings: 0

查看一下数据

mysql> select * from intro;
+-----+-----+-----+------+
| uid | iid | val | time |
+-----+-----+-----+------+
| 1   | 101 | 5.0 | NULL |
| 1   | 102 | 3.0 | NULL |
| 1   | 103 | 2.5 | NULL |
| 2   | 101 | 2.0 | NULL |
| 2   | 102 | 2.5 | NULL |
| 2   | 103 | 5.0 | NULL |
| 2   | 104 | 2.0 | NULL |
| 3   | 101 | 2.5 | NULL |
| 3   | 104 | 4.0 | NULL |
| 3   | 105 | 4.5 | NULL |
| 3   | 107 | 5.0 | NULL |
| 4   | 101 | 5.0 | NULL |
| 4   | 103 | 3.0 | NULL |
| 4   | 104 | 4.5 | NULL |
| 4   | 106 | 4.0 | NULL |
| 5   | 101 | 4.0 | NULL |
| 5   | 102 | 3.0 | NULL |
| 5   | 103 | 2.0 | NULL |
| 5   | 104 | 4.0 | NULL |
| 5   | 105 | 3.5 | NULL |
| 5   | 106 | 4.0 | NULL |
+-----+-----+-----+------+
21 rows in set (0.00 sec)


然后就是正式程序 写的比较简单主要是为了实现功能

import java.util.List;
import org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.JDBCDataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import com.mysql.jdbc.jdbc2.optional.MysqlDataSource;
public class MysqlJDBCRecommender {
    public static void main(String[] args) throws Exception {
        MysqlDataSource dataSource = new MysqlDataSource();
        dataSource.setServerName("localhost");
        dataSource.setUser("root");
        dataSource.setPassword("toor");
        dataSource.setDatabaseName("mahout");
                                                                        
        JDBCDataModel dataModel = new MySQLJDBCDataModel(dataSource, "intro", "uid", "iid", "val", "time");
                                                                        
        DataModel model = dataModel;
        UserSimilarity similarity=new PearsonCorrelationSimilarity(model);
        UserNeighborhood neighborhood=new NearestNUserNeighborhood(2,similarity,model);
                                                                        
        Recommender recommender=new GenericUserBasedRecommender(model,neighborhood,similarity);
                                                                        
        List<RecommendedItem> recommendations = recommender.recommend(1, 3);
        for (RecommendedItem recommendation : recommendations) {
            System.out.println(recommendation);
        }
    }
}

计算结果

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/java%e9%a9%b1%e5%8a%a8/mahout0.7/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/java%e9%a9%b1%e5%8a%a8/mahout0.7/mahout-examples-0.7-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/java%e9%a9%b1%e5%8a%a8/mahout0.7/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
13/12/07 13:56:41 WARN jdbc.AbstractJDBCDataModel: You are not using ConnectionPoolDataSource. Make sure your DataSource pools connections to the database itself, or database performance will be severely reduced.
RecommendedItem[item:104, value:4.257081]
RecommendedItem[item:106, value:4.0]


MySQLJDBCDataModel API中的建议

A JDBCDataModel backed by a MySQL database and accessed via JDBC. It may work with other JDBC databases. By default, this class assumes that there is a DataSource available under the JNDI name "jdbc/taste", which gives access to a database with a "taste_preferences" table with the following schema:

user_iditem_idpreference
9871230.9
9874560.1
6541230.2
6547890.3

preference must have a type compatible with the Java float type. user_id and item_id should be compatible with long type (BIGINT). For example, the following command sets up a suitable table in MySQL, complete with primary key and indexes:

 CREATE TABLE taste_preferences (
   user_id BIGINT NOT NULL,
   item_id BIGINT NOT NULL,
   preference FLOAT NOT NULL,
   PRIMARY KEY (user_id, item_id),
   INDEX (user_id),
   INDEX (item_id)
 )
 

The table may optionally have a timestamp column whose type is compatible with Java long.

Performance Notes

See the notes in AbstractJDBCDataModel regarding using connection pooling. It's pretty vital to performance.

Some experimentation suggests that MySQL's InnoDB engine is faster than MyISAM for these kinds of applications. While MyISAM is the default and, I believe, generally considered the lighter-weight and faster of the two engines, my guess is the row-level locking of InnoDB helps here. Your mileage may vary.

Here are some key settings that can be tuned for MySQL, and suggested size for a data set of around 1 million elements:

  • innodb_buffer_pool_size=64M

  • myisam_sort_buffer_size=64M

  • query_cache_limit=64M

  • query_cache_min_res_unit=512K

  • query_cache_type=1

  • query_cache_size=64M

Also consider setting some parameters on the MySQL Connector/J driver:

 cachePreparedStatements = true
 cachePrepStmts = true
 cacheResultSetMetadata = true
 alwaysSendSetIsolation = false
 elideSetAutoCommits = true
 

Thanks to Amila Jayasooriya for contributing MySQL notes above as part of Google Summer of Code 2007.


本文出自 “某人说我技术宅” 博客,请务必保留此出处http://1992mrwang.blog.51cto.com/3265935/1337759

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值