Forum大数据preprocess-文件预处理与MySQL

本文介绍了如何处理大型CSV文件并将其导入到MySQL数据库中。通过Linux环境下的文件分割和编码转换,Java的BufferedReader进行高效读取,然后使用JDBC连接MySQL,设置远程登录权限,并使用MySQLWorkbench进行数据库管理和数据插入。文章还提到了在MyEclipse中配置开发环境的相关步骤。
摘要由CSDN通过智能技术生成

Forum

How to split and analysis a big file into mysql?

Content

UserTable contains xxM .csv file and xxxxxx lines.

Messages2010-2014 contains xG .csv file and xxxxxxxxx lines.

Replies2010-2014 contain seperately xx-xxG .csv files and over than approximately xx billion.

Process

Split the big file:

Linux Environment

Encoding:

file –mime-encoding filename

iconv -f utf-16le -t utf-8 filename1 >filename2

File cat and split:

cat filename1 filename2 … filenamen >new-filename

split -l number -a prefixnumber filename

BufferReader:

JVM Environment

The Java.io.BufferedReader class reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines.Following are the important points about BufferedReader:

The buffer size may be specified, or the default size may be used.

Each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream.

Declaration and Defination:

BufferedReader bufferedReader = null;
bufferedReader = new BufferedReader(new InputStreamReader(
          new FileInputStream(filePath), "UTF-8"),
                                 5 * 1024 * 1024);                              

JDBC Connect MySQL

MySQL provides connectivity for client applications developed in the Java programming language with MySQL Connector/J, a driver that implements the Java Database Connectivity (JDBC) API.

Driver driver = new com.mysql.jdbc.Driver();
String url = "jdbc:mysql://158.182.9.117:3306/slqname?characterEncoding=utf8";
Properties info = new Properties();
info.put("user", "gaodeng");
info.put("password", "123");
Connection conn = null;

“158.182.9.117”is the ip of remote computer as the server.In order to accelerate the transimission of each records,it is better to use “localhost”.

IF there is an error or warning about SSL,we can add this “autoReconnect=true&useSSL=false” in the url.

Note:DO NOT FORGET BUILD PATH

Grant Privileges:

Linux Environment

There are two main steps to grant MySQL remote login account:

  1. It is necessary to make sure of the firewall on the server which not prevent 3306 port.
    MySQL default port is 3306, you need to determine the firewall does not block port 3306, otherwise it is impossible to connect to the remote MySQL port by 3306.
    If you specified a different port when you install MySQL, please open your MySQL uses the port number specified in the firewall.
    If you do not know how to set up a firewall on your server, ask your server administrator.
  2. Add to allow remote users to connect to MySQL and authorization.

1) First,login the root MySQL account.
In Linux host, enter the following command at the command prompt.

MySQL -uroot -p123

2) Create a remote login and authorize

Grant all PRIVILEGES on db to root@’123.123.123.123 ‘identified by ‘123’ with GRANT OPTION;

The above statement indicates that all the privileges granted from db database to root, allowing the ip is 123.123.123.123, and set the user’s password is 123.

After executing the above statement, and then execute the following statement, to take effect immediately.

Flush privileges;

Access the database via MySQLWorkbench

Configure the remote management:

Hostname:The IP you want to remote.

Username:When you grant the authorization on the server,you have set the username,likes root.

Password:As described above.

After test connection,access into the database “dbname”.


Analysis each records and split them into columns

Three types of records-users/messages/replies

Users:

There are 13 columns in users-table:

id_key nickname xxxxx xxxxx xxxxxx xxx
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值