在spider搜索的网页基础上作的,依然连接mysql数据库
class
LinkToDb
...
{
protected Connection con;
protected PreparedStatement preCount;
protected PreparedStatement preSelect;
LinkToDb(String driver,String sqlurl)...{
try...{
Class.forName(driver);
con=DriverManager.getConnection(sqlurl);
preCount=con.prepareStatement("SELECT count(*) as qty FROM visited_tab;");
preSelect=con.prepareStatement("SELECT * FROM visited_tab;");
}
catch(Exception e)...{
}
}
public int GetTableNum()...{
int count=0;
try...{
ResultSet rs=preCount.executeQuery();
rs.next();
count=rs.getInt("qty");
}
catch(Exception e)...{
}
return count;
}
public ResultSet GetResult()...{
ResultSet rs=null;
try...{
rs=preSelect.executeQuery();
//rs.next();
}
catch(Exception e)...{
}
return rs;
}
protected Connection con;
protected PreparedStatement preCount;
protected PreparedStatement preSelect;
LinkToDb(String driver,String sqlurl)...{
try...{
Class.forName(driver);
con=DriverManager.getConnection(sqlurl);
preCount=con.prepareStatement("SELECT count(*) as qty FROM visited_tab;");
preSelect=con.prepareStatement("SELECT * FROM visited_tab;");
}
catch(Exception e)...{
}
}
public int GetTableNum()...{
int count=0;
try...{
ResultSet rs=preCount.executeQuery();
rs.next();
count=rs.getInt("qty");
}
catch(Exception e)...{
}
return count;
}
public ResultSet GetResult()...{
ResultSet rs=null;
try...{
rs=preSelect.executeQuery();
//rs.next();
}
catch(Exception e)...{
}
return rs;
}
GetResult()方法是获得数据库所有对象(不清楚一点,rs是引用还是类,要是类的话 如果数据库太大。。。)
建议类对象creatIndex ci=new creatIndex();
还有 IndexWriter writer=new IndexWriter(dir,new CJKAnalyzer(),true);用了cjkanalyzer呵呵,之后就用lucene建立索引
ci.createConnection();
count = ci.getTableNum();
if (count < 1 ) ... {
System.out.println("no record in database");
}
else ... {
rs=ci.getResult();
while(rs.next())...{
Document doc=new Document();
doc.add(Field.Keyword("url",rs.getString("url")));
doc.add(Field.Text("title",rs.getString("title")));
doc.add(Field.UnStored("text",rs.getString("text")));
doc.add(Field.UnIndexed("encode",rs.getString("encode")));
doc.add(Field.UnIndexed("last_modify_time",rs.getString("last_modify_time")));
writer.addDocument(doc);
System.out.println(rs.getString("url")+" has been indexed");
}
writer.optimize();
writer.close();
System.out.println("complete");
}
count = ci.getTableNum();
if (count < 1 ) ... {
System.out.println("no record in database");
}
else ... {
rs=ci.getResult();
while(rs.next())...{
Document doc=new Document();
doc.add(Field.Keyword("url",rs.getString("url")));
doc.add(Field.Text("title",rs.getString("title")));
doc.add(Field.UnStored("text",rs.getString("text")));
doc.add(Field.UnIndexed("encode",rs.getString("encode")));
doc.add(Field.UnIndexed("last_modify_time",rs.getString("last_modify_time")));
writer.addDocument(doc);
System.out.println(rs.getString("url")+" has been indexed");
}
writer.optimize();
writer.close();
System.out.println("complete");
}
其实搜索代码也作好了,由于spider没有使用网页分析算法,导致搜索出很多没必要的内容,想看看pagerank算法,改进一下spider