Forum大数据preprocess-文件预处理与MySQL

Forum

How to split and analysis a big file into mysql?

Content

UserTable contains xxM .csv file and xxxxxx lines.

Messages2010-2014 contains xG .csv file and xxxxxxxxx lines.

Replies2010-2014 contain seperately xx-xxG .csv files and over than approximately xx billion.

Process

Split the big file:

Linux Environment

Encoding:

file –mime-encoding filename

iconv -f utf-16le -t utf-8 filename1 >filename2

File cat and split:

cat filename1 filename2 … filenamen >new-filename

split -l number -a prefixnumber filename

BufferReader:

JVM Environment

The Java.io.BufferedReader class reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.Following are the important points about BufferedReader:

The buffer size may be specified, or the default size may be used.

Each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream.

Declaration and Defination:

BufferedReader bufferedReader = null;
bufferedReader = new BufferedReader(new InputStreamReader(
          new FileInputStream(filePath), "UTF-8"),
                                 5 * 1024 * 1024);                              

JDBC Connect MySQL

MySQL provides connectivity for client applications developed in the Java programming language with MySQL Connector/J, a driver that implements the Java Database Connectivity (JDBC) API.

Driver driver = new com.mysql.jdbc.Driver();
String url = "jdbc:mysql://158.182.9.117:3306/slqname?characterEncoding=utf8";
Properties info = new Properties();
info.put("user", "gaodeng");
info.put("password", "123");
Connection conn = null;

“158.182.9.117”is the ip of remote computer as the server.In order to accelerate the transimission of each records,it is better to use “localhost”.

IF there is an error or warning about SSL,we can add this “autoReconnect=true&useSSL=false” in the url.

Note:DO NOT FORGET BUILD PATH

Grant Privileges:

Linux Environment

There are two main steps to grant MySQL remote login account:

  1. It is necessary to make sure of the firewall on the server which not prevent 3306 port.
    MySQL default port is 3306, you need to determine the firewall does not block port 3306, otherwise it is impossible to connect to the remote MySQL port by 3306.
    If you specified a different port when you install MySQL, please open your MySQL uses the port number specified in the firewall.
    If you do not know how to set up a firewall on your server, ask your server administrator.
  2. Add to allow remote users to connect to MySQL and authorization.

1) First,login the root MySQL account.
In Linux host, enter the following command at the command prompt.

MySQL -uroot -p123

2) Create a remote login and authorize

Grant all PRIVILEGES on db to root@’123.123.123.123 ‘identified by ‘123’ with GRANT OPTION;

The above statement indicates that all the privileges granted from db database to root, allowing the ip is 123.123.123.123, and set the user’s password is 123.

After executing the above statement, and then execute the following statement, to take effect immediately.

Flush privileges;

Access the database via MySQLWorkbench

Configure the remote management:

Hostname:The IP you want to remote.

Username:When you grant the authorization on the server,you have set the username,likes root.

Password:As described above.

After test connection,access into the database “dbname”.


Analysis each records and split them into columns

Three types of records-users/messages/replies

Users:

There are 13 columns in users-table:

id_key nickname xxxxx xxxxx xxxxxx xxx
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值