(Java实现)Cache数据库表结构扫描,坏行坏列扫描实现
- 最近接触了Cache数据库,需要将海量数据通过sqoop抽取到hadoop平台。但是发现一个问题,Cache数据库是一个后关系型数据,底层是按照global进行存存的,表结构的定义通过president持久class文件来实现。
- class文件实现的表结构有很多附加功能,可以添加展示数据和存储数据,添加compute列而不用存储compute值,面向对象开发才会展现。如果添加的compute计算公式相对复杂,引用了多个数据列,而在迭代公式中没有做非法数据判断的话就会导致表结构展现的异常,sqoop或者面向表结构读取信息的工具就会抛出数据库内部异常的错误。Cache这种独门数据对于传统行业的人来说简直是恶梦,根本不清楚如何进入global的世界,global的大门如此神秘。基于这个问题,在研究了Cache数据库这么多年的前提下,和大家分享一个java的扫描方式来批量获取表中的坏行坏列问题。并提供处理方式。大数据的到来,我们没法不懂任何一个数据库,也可能会接触各种各样的奇葩架构,所以要用传统技术来分析偏冷知识。
基本思路就是 :
获取主键–>存储主键(内存or文本)–>多线程访问主键–>根据数据库的行数和分片规则生成统一的调度计划–>监控线程执行情况 -->合并信息 -->生成扫描log
============================================================================= - 获取主键:由于cache数据库的主键并不是都为数字Integer类型,很多varchar类型主键,因此需要指定。而传统的select max(RowId)和select min(RowId)的方式并不适用~,此处为大坑!
- 存储主键:数据行数在1000W以下建议用ArryList的形势存储到内存,加快线程调用速度;大于1000W建议就落盘到文本中。1000W行有时候占用内存就1G了。
- 多线程访问主键:加快数据扫描速度
- 分片规则:建议为50W一个线程,最多多少个根据数据库的License情况来定,Licence占用过多就导致其他线程无法访问了
- 监控的线程执行情况:定义一个ArrayList ThreadMon 的集合,每个线程执行完毕会添加一个Interger的元素,而分片之前已经知道总线程数量,通过size和count的比对来监控是否完成。当然还有很多线程通信的方式,感觉不如这样简单。
- 合并信息:每个线程扫描到的错误都存储到ArrayList SQLTableScanInfo集合中。最后统一把集合信息遍历输出即可。
- 最终生成log,信息如下:
开始时间为:2018-10-30 09:19:58
执行结束,总共分了 13 片
每片数量为:500000
监控线程的ArryList长度为:13
总数据量:6138360
结束时间为:2018-10-30 09:55:16
发现错误数量4
====================================ErrorInfo===================================
PKName:id PKValue:555555
java.sql.SQLException: [SQLCODE: <-400>:<出现致命错误 >]
[Cache Error: <<ILLEGAL VALUE>%0Afirst+8^%sqlcq.pSYS.cls18.1>]
[Location: <ServerLoop - Query Fetch>]
[%msg: <Unexpected error occurred: <ILLEGAL VALUE>%0Afirst+8^%sqlcq.pSYS.cls18.1>]
PKName:id PKValue:30
java.sql.SQLException: [SQLCODE: <-400>:<出现致命错误 >]
[Cache Error: <<ILLEGAL VALUE>%0Afirst+8^%sqlcq.pSYS.cls18.1>]
[Location: <ServerLoop - Query Fetch>]
[%msg: <Unexpected error occurred: <ILLEGAL VALUE>%0Afirst+8^%sqlcq.pSYS.cls18.1>]
PKName:id PKValue:90
java.sql.SQLException: [SQLCODE: <-400>:<出现致命错误 >]
[Cache Error: <<ILLEGAL VALUE>%0Afirst+8^%sqlcq.pSYS.cls18.1>]
[Location: <ServerLoop - Query Fetch>]
[%msg: <Unexpected error occurred: <ILLEGAL VALUE>%0Afirst+8^%sqlcq.pSYS.cls18.1>]
PKName:id PKValue:100
java.sql.SQLException: [SQLCODE: <-400>:<出现致命错误 >]
[Cache Error: <<ILLEGAL VALUE>%0Afirst+8^%sqlcq.pSYS.cls18.1>]
[Location: <ServerLoop - Query Fetch>]
[%msg: <Unexpected error occurred: <ILLEGAL VALUE>%0Afirst+8^%sqlcq.pSYS.cls18.1>]
====================================ErrorInfo===================================
拿着主键和名和主键值就可完成对原库的比对,修复损坏数据。
具体代码如下:
package com.cache.repairtable;
import java.io.*;
import java.sql.*;
import java.sql.Date;
import java.text.SimpleDateFormat;
import java.util.*;
/*
*created by lq;
*For Details Connect to QQ:471832953
*/
@SuppressWarnings({ "unused" })
public class RepairTable4 {
//final static int SplitNum =5000;
final static ArrayList<Integer> ThreadMon = new ArrayList<>();
final static ArrayList<String> SQLTableScanInfo = new ArrayList<>();
@SuppressWarnings("resource")
public static void main(String[] args) throws ClassNotFoundException, SQLException, IOException {
System.out.println("输入ip:");
Scanner ipsc = new Scanner(System.in);
String ipstr = ipsc.nextLine();
System.out.println("输入port:");
Scanner portsc = new Scanner(System.in);
String portstr = portsc.nextLine();
System.out.println("输入namespace:");
Scanner nssc = new Scanner(System.in);
String nsstr = nssc.nextLine();
String DB_URL = "jdbc:Cache://" + ipstr + ":" + portstr + "/" + nsstr;
String JDBC_DRIVER = "com.intersys.jdbc.CacheDriver";
//String DB_URL = "jdbc:Cache://192.168.31.187:1972/%sys";
String UserName = "_system";
String Password = "sys";
Class.forName(JDBC_DRIVER);
Connection dbconnection = DriverManager.getConnection(DB_URL, UserName, Password);
dbconnection.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);
System.out.println(DB_URL);
SQLSelectRun(dbconnection, DB_URL, UserName, Password);
}
@SuppressWarnings("resource")
private static void SQLSelectRun(Connection dbconnection, String url, String user, String password)
throws IOException {
Scanner s = new Scanner(System.in);
System.out.println("开始获取表的原始信息:");
System.out.println("please input schema:");
String schema = s.nextLine();
Scanner t = new Scanner(System.in);
System.out.println("please input table:");
String tablename = t.nextLine();
String sqlstr = "select * from " + schema + "." + tablename;
// String sqlstr="select * from DHCCSYS.DatabaseInfo";
System.out.println(sqlstr);
try {
PreparedStatement sqlrst = dbconnection.prepareStatement(sqlstr, ResultSet.TYPE_FORWARD_ONLY,
ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
java.sql.ResultSet sqlRST = sqlrst.executeQuery();
int count = 0;
ResultSetMetaData rsmd = sqlRST.getMetaData();
System.out.println("TableColumns CountNum:" + rsmd.getColumnCount());
String[] arry = new String[rsmd.getColumnCount()];
for (int i = 1; i < rsmd.getColumnCount(); i++) {
String ColumnName = rsmd.getColumnName(i);
String ColumnType = rsmd.getColumnTypeName(i);
System.out.println(ColumnName + " --> " + ColumnType);
}
sqlRST.close();
System.out.println("Which Column is PrimaryKey:");
Scanner sc = new Scanner(System.in);
String PkName = sc.nextLine();
System.out.println("正在获取" + schema + "." + tablename + "的" + PkName + "主键......");
String sqlstr2 = "select " + PkName + " from " + schema + "." + tablename;
System.out.println(sqlstr2);
String FilePKInfo = schema + tablename + "pkinfo.txt";
PreparedStatement sqlrst2 = dbconnection.prepareStatement(sqlstr2, ResultSet.TYPE_SCROLL_INSENSITIVE,
ResultSet.CONCUR_READ_ONLY);
ResultSet sqlRST2 = sqlrst2.executeQuery();
// 执行分片方法
ExecuteSplit(PkName, sqlRST2, dbconnection, schema, tablename, PkName, url, user, password);
} catch (SQLException e) {
e.printStackTrace();
}
}
private static void ExecuteSplit(String PkName, ResultSet sqlRST2, Connection dbconnection, String Schema,
String Table, String PKName, String url, String user, String password) throws IOException, SQLException {
ArrayList<String> ALPrimaryKey = new ArrayList<>();
// ArrayList<Integer> ThreadMon=new ArrayList<>();
String PKFileName=Schema+"_"+Table+"_"+PkName+".txt";
BufferedWriter bw=new BufferedWriter(new FileWriter(PKFileName));
int count=0;
while (sqlRST2.next()) {
//ALPrimaryKey.add(sqlRST2.getString(PkName));
bw.write(sqlRST2.getString(PkName));
bw.newLine();
bw.flush();
count++;
}
bw.close();
System.out.println("主键获取完毕,开始扫描行信息有误报错!");
//for (String PK : ALPrimaryKey) {
// System.out.println(PK);
//}
// BufferedReader br=new BufferedReader(new FileReader(PKFileName));
// int count=0;
// while((br.readLine())!=null) {
// count++;
// }
System.out.println("总数据量:" + count);
System.out.print("请输入数据库允许的最大进程数:");
Scanner sc=new Scanner(System.in);
int ProcessLimitNum=Integer.valueOf(sc.nextLine());
int SplitNum;
SplitNum=count/ProcessLimitNum;
if(SplitNum<=500000) {
SplitNum=500000;
System.out.println("数据量不大,调整分片行数限制为"+SplitNum);
}else {
System.out.println("数据量较大,调整分片行数限制为"+SplitNum);
}
SimpleDateFormat Btime = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
String BeginTime = Btime.format(new java.util.Date());
int tempNum1 = count % SplitNum; // 余数
int tempNum2 = count / SplitNum; // 商
// System.out.println(tempNum2);
int splitcount = tempNum2; // 分了多少片呢?
// 行数大于预定的分片数,且有余数
if ((tempNum1 != 0) & (count > SplitNum)) {
for (int i = 1; i <= tempNum2; i++) {
System.out.println("第" + i + "片");
int beginnum = (i - 1) * SplitNum+1;
int endnum = i * SplitNum;
System.out.println(beginnum);
System.out.println(endnum);
new Thread(new Runnable() {
public void run() {
// ThreadFindEachRowInfo(ALPrimaryKey, beginnum, endnum,
// Schema, Table, PkName,url,user,password);
try {
Connection dbconnection = DriverManager.getConnection(url, user, password);
dbconnection.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);
BufferedReader br=new BufferedReader(new FileReader(PKFileName));
String pkname;
int cc=0;
while((pkname=br.readLine())!=null) {
cc++;
if((cc>=beginnum)&(cc<=endnum)){
String sqlstr = "select * from " + Schema + "." + Table + " where " + PKName + "=\'"
+ pkname + "\'";
//System.out.println(sqlstr);
try {
// System.out.println(sqlstr);
Statement stmt = dbconnection.createStatement();
ResultSet sqlRST1 = stmt.executeQuery(sqlstr);
while (sqlRST1.next()) {
}
} catch (SQLException e) {
SQLTableScanInfo.add("PKName:" + PKName + " PKValue:" + pkname);
SQLTableScanInfo.add(e.toString());
}
}
}
br.close();
ThreadMon.add(1);
dbconnection.close();
} catch (SQLException e1) {
e1.printStackTrace();
} catch (FileNotFoundException e1) {
e1.printStackTrace();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
}
}).start();
}
}
System.out.println("正在后台执行" + splitcount);
File file = new File(Schema + Table + "_Result.log");
FileWriter fw = new FileWriter(file);
try {
while (true) {
Thread.sleep(1000);
/// 每个线程都会往ArryList里add数值1.通过判断ArryList的长度和总子进程数是否相等来判断是不是所有的线程执行完毕
if (ThreadMon.size() == splitcount) {
System.out.println("开始时间为:" + BeginTime);
fw.write("开始时间为:" + BeginTime + "\r\n");
System.out.println("执行结束,总共分了 " + splitcount + " 片\r\n");
fw.write("执行结束,总共分了 " + splitcount + " 片\r\n");
System.out.println("每片数量为:"+SplitNum);
fw.write("每片数量为:"+SplitNum+"\r\n");
System.out.println("监控线程的ArryList长度为:" + ThreadMon.size());
fw.write("监控线程的ArryList长度为:" + ThreadMon.size() + "\r\n");
System.out.println("总数据量:" + count);
fw.write("总数据量:" + count + "\r\n");
SimpleDateFormat Etime = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss");
String EndTime = Etime.format(new java.util.Date());
System.out.println("结束时间为:" + EndTime);
fw.write("结束时间为:" + EndTime + "\r\n");
fw.write("发现错误数量"+SQLTableScanInfo.size()/2+"\r\n");
fw.write("====================================ErrorInfo===================================\r\n");
if (SQLTableScanInfo.size() != 0) {
for (String s : SQLTableScanInfo) {
System.out.println(s);
fw.write(s + "\r\n");
//fw.flush();
}
} else {
System.out.println("恭喜,未发现错误!");
fw.write("恭喜,未发现错误!\r\n");
}
System.out.println("log路径为:" + file.getAbsolutePath());
try {
dbconnection.close();
} catch (SQLException e) {
// TODO 自动生成的 catch 块
e.printStackTrace();
}
fw.write("====================================ErrorInfo===================================\r\n");
fw.close();
break;
}
}
} catch (InterruptedException e) {
// TODO 自动生成的 catch 块
e.printStackTrace();
}
}
private static void ThreadFindEachRowInfo(ArrayList<String> pkarry, int Begin, int End, String Schema, String Table,
String PKName, String url, String user, String password) {
// System.out.println("线程内部执行");
try {
Connection dbconnection = DriverManager.getConnection(url, user, password);
dbconnection.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);
for (int i = Begin; i <= End; i++) {
// System.out.println(pkarry.get(i));
String sqlstr = "select * from " + Schema + "." + Table + " where " + PKName + "=\'" + pkarry.get(i)
+ "\'";
try {
// System.out.println(sqlstr);
Statement stmt = dbconnection.createStatement();
ResultSet sqlRST1 = stmt.executeQuery(sqlstr);
while (sqlRST1.next()) {
continue;
}
} catch (SQLException e) {
SQLTableScanInfo.add("PKName:" + PKName + " PKValue:" + pkarry.get(i));
SQLTableScanInfo.add(e.toString());
}
}
ThreadMon.add(1);
dbconnection.close();
} catch (SQLException e1) {
// TODO 自动生成的 catch 块
e1.printStackTrace();
}
}
}