在mac上安装mysql和Workbench很快,主要是在导入csv数据时踩了很多的坑。
启动mysql
在system preferences中点击MySql,即可进入启动mysql的界面。在terminal中输入输入 mysql -u root -p, 输入密码即可登陆mysql。
导入数据前,请确保导入的数据是csv的格式,主要是我原本的数据就是csv格式,没有尝试导入xlsx格式的数据。在其他网页上看到必须导入的数据必须是csv格式。导入数据,最主要是要清楚所有数据的格式再确定该列在数据库中的类型,例如:user_id中存在字符,若将user_id的类型设置为INT,导入数据遇到该列的值为字符串时,就会出现Incorrect integer value:的问题。
导入数据主要有两种方法:
第一种是直接在Workbench中直接导入,导入之前可以不用先建表;
第二种是在terminal中导入,这方法导入的速度要快很多,但得导入数据之前就要建好数据表。
1、将数据导入mysql时出现了ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement 的错误;
解决方法:可以用show variables like ‘%secure%’;命令查看secure-file-priv的值。如果value的值为null,则为禁止;如果值为文件夹目录,则只允许该目录下文件;如果值为空,则不限制目录。但是现在版本的mysql在/etc/下没有my.cnf文件,所以可以在这个网址中https://www.cnblogs.com/liangjiahao713/p/8097942.html 复制文件内容然后保存为my.cnf文件,再把secure-file-priv = ’‘添加进去。
cd /etc
sudo vim my.cnf
如果不用sudo会出现不能保存的问题!
# Example MySQL config file for medium systems.
#
# This is for a system with little memory (32M - 64M) where MySQL plays
# an important part, or systems up to 128M where MySQL is used together with
# other programs (such as a web server)
#
# MySQL programs look for option files in a set of
# locations which depend on the deployment platform.
# You can copy this option file to one of those
# locations. For information about these locations, see:
# http://dev.mysql.com/doc/mysql/en/option-files.html
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the "--help" option.
# The following options will be passed to all MySQL clients
[client]
default-character-set=utf8
#password = your_password
port = 3306
socket = /tmp/mysql.sock
# Here follows entries for some specific programs
# The MySQL server
[mysqld]
character-set-server=utf8
init_connect='SET NAMES utf8
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
character-set-server=utf8
init_connect='SET NAMES utf8'
# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
#
#skip-networking
# Replication Master Server (default)
# binary logging is required for replication
log-bin=mysql-bin
# binary logging format - mixed recommended
binlog_format=mixed
# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id = 1
# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
# the syntax is:
#
# CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
# MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
#
# where you replace <host>, <user>, <password> by quoted strings and
# <port> by the master's port number (3306 by default).
#
# Example:
#
# CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
# MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
# start replication for the first time (even unsuccessfully, for example
# if you mistyped the password in master-password and the slave fails to
# connect), the slave will create a master.info file, and any later
# change in this file to the variables' values below will be ignored and
# overridden by the content of the master.info file, unless you shutdown
# the slave server, delete master.info and restart the slaver server.
# For that reason, you may want to leave the lines below untouched
# (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id = 2
#
# The replication master for this slave - required
#master-host = <hostname>
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user = <username>
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password = <password>
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port = <port>
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin
# Uncomment the following if you are using InnoDB tables
#innodb_data_home_dir = /usr/local/mysql/data
#innodb_data_file_path = ibdata1:10M:autoextend
#innodb_log_group_home_dir = /usr/local/mysql/data
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
#innodb_buffer_pool_size = 16M
#innodb_additional_mem_pool_size = 2M
# Set .._log_file_size to 25 % of buffer pool size
#innodb_log_file_size = 5M
#innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
#innodb_lock_wait_timeout = 50
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates
default-character-set=utf8
[myisamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
2、这个时候如果将数据导入时出现了
ERROR 13 (HY000): Can't get stat of '/Users/zhangxin/Desktop/pool_oto/ccf_online_stage1_train.csv' (OS errno 13 - Permission denied)的错误
解决方法:用show variables like '%tmpdir%'; 查看mysql默认使用的临时目录,将文件转到该临时目录下即可;还有一种方法是将load data infile 改为load data local infile(待考证)http://www.360doc.com/content/15/1231/20/1073512_524491459.shtml
3、mysql导入数据时出现了Incorrect integer value: 'null' for column的问题。
解决方法:用python将NaN值转换为空值,再导入到数据库
最后成功运用着一段代码导入成功!!!!!!!
python代码:
import pandas as pd
data = pd.read_csv('/var/tmp/pool_oto/ccf_online_stage1_train.csv')
df = data.where(data.notnull(),'') #将NaN转换为空值‘’
df['Date'] = df['Date'].apply(lambda x: int(x) if x != '' else x)#因为时间被改为了浮点数,将时间转化为整型
df['Date_received'] = df['Date_received'].apply(lambda x: int(x) if x != '' else x)
df.to_csv('/var/tmp/pool_oto/ccf_online_stage1_train2.csv',index = False)
sql代码:
LOAD DATA INFILE '/var/tmp/pool_oto/ccf_online_stage1_train2.csv'
INTO TABLE oto.online_train
FIELDS TERMINATED BY ','
ignore 1 lines /*忽略首行,因为首行为列名,导入进去会出现Incorrect integer value:的错误,因为user_id为字符串*/
/*这里使用用户变量@,可以让数据根据我们想要的形式存入数据库, 同时要保证字段的顺序和要传入的表格顺序一致
否则导入表后会出现导入的值和字段名匹配不上的问题*/
(@User_id,@Merchant_id,@Action, @Coupon_id,@Discount_rate, @Date_received,@Date)
SET
User_id = IF(@User_id= '', NULL,@User_id),
Merchant_id= IF(@Merchant_id= '', NULL,@Merchant_id),
Action= IF(@Action= '', NULL,@Action),
Coupon_id = IF(@Coupon_id= '', NULL,@Coupon_id),
Discount_rate= IF(@Discount_rate= '', NULL,@Discount_rate),
Date_received= IF(@Date_received= '', NULL,@Date_received),
Date= IF(@Date= '', NULL,@Date)
;