Forum
How to split and analysis a big file into mysql?
Content
UserTable contains xxM .csv file and xxxxxx lines.
Messages2010-2014 contains xG .csv file and xxxxxxxxx lines.
Replies2010-2014 contain seperately xx-xxG .csv files and over than approximately xx billion.
Process
Split the big file:
Linux Environment
Encoding:
file –mime-encoding filename
iconv -f utf-16le -t utf-8 filename1 >filename2
File cat and split:
cat filename1 filename2 … filenamen >new-filename
split -l number -a prefixnumber filename
BufferReader:
JVM Environment
The Java.io.BufferedReader class reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.Following are the important points about BufferedReader:
The buffer size may be specified, or the default size may be used.
Each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream.
Declaration and Defination:
BufferedReader bufferedReader = null;
bufferedReader = new BufferedReader(new InputStreamReader(
new FileInputStream(filePath), "UTF-8"),
5 * 1024 * 1024);
JDBC Connect MySQL
MySQL provides connectivity for client applications developed in the Java programming language with MySQL Connector/J, a driver that implements the Java Database Connectivity (JDBC) API.
Driver driver = new com.mysql.jdbc.Driver();
String url = "jdbc:mysql://158.182.9.117:3306/slqname?characterEncoding=utf8";
Properties info = new Properties();
info.put("user", "gaodeng");
info.put("password", "123");
Connection conn = null;
“158.182.9.117”is the ip of remote computer as the server.In order to accelerate the transimission of each records,it is better to use “localhost”.
IF there is an error or warning about SSL,we can add this “autoReconnect=true&useSSL=false” in the url.
Note:DO NOT FORGET BUILD PATH
Grant Privileges:
Linux Environment
There are two main steps to grant MySQL remote login account:
- It is necessary to make sure of the firewall on the server which not prevent 3306 port.
MySQL default port is 3306, you need to determine the firewall does not block port 3306, otherwise it is impossible to connect to the remote MySQL port by 3306.
If you specified a different port when you install MySQL, please open your MySQL uses the port number specified in the firewall.
If you do not know how to set up a firewall on your server, ask your server administrator. - Add to allow remote users to connect to MySQL and authorization.
1) First,login the root MySQL account.
In Linux host, enter the following command at the command prompt.
MySQL -uroot -p123
2) Create a remote login and authorize
Grant all PRIVILEGES on db to root@’123.123.123.123 ‘identified by ‘123’ with GRANT OPTION;
The above statement indicates that all the privileges granted from db database to root, allowing the ip is 123.123.123.123, and set the user’s password is 123.
After executing the above statement, and then execute the following statement, to take effect immediately.
Flush privileges;
Access the database via MySQLWorkbench
Configure the remote management:
Hostname:The IP you want to remote.
Username:When you grant the authorization on the server,you have set the username,likes root.
Password:As described above.
After test connection,access into the database “dbname”.
Analysis each records and split them into columns
Three types of records-users/messages/replies
Users:
There are 13 columns in users-table:
id_key | nickname | xxxxx | xxxxx | xxxxxx | xxx |
---|