说明:doris的curl 命令导入本地文件,每个文件的大小建议在1~2G之间,生成的TPC-H的1T数据,customer的表数据行数为153600000,总大小为24G,因此需要将customer的文件拆分为小文件,并利用curl的命令导入。
1、customer的文件拆分
(1)查看文件的大小
root@op-service:/home/tpch1t# du -sh customer.tbl
24G customer.tbl
(2)拆分文件,此命令的用法可见文章:
将customer拆分为多个小文件,每个文件500万行,将会生成30个小文件
split -a3 -l 5000000 -d customer.tbl customer/customer-
(3)拆分后的显示
root@op-qa-worker:/syw/customer# ll
total 24640092
drwxr-xr-x 2 root root 4096 Mar 7 10:52 ./
drwxr-xr-x 3 root root 4096 Mar 7 10:49 ../
-rw-r--r-- 1 root root 813791778 Mar 7 10:49 customer-000.tbl
-rw-r--r-- 1 root root 814790497 Mar 7 10:49 customer-001
-rw-r--r-- 1 root root 819907996 Mar 7 10:49 customer-002
-rw-r--r-- 1 root root 819972306 Mar 7 10:49 customer-003
-rw-r--r-- 1 root root 819859048 Mar 7 10:49 customer-004
-rw-r--r-- 1 root root 820130300 Mar 7 10:49 customer-005
-rw-r--r-- 1 root root 819848121 Mar 7 10:49 customer-006
-rw-r--r-- 1 root root 819916437 Mar 7 10:49 customer-007
-rw-r--r-- 1 root root 819877928 Mar 7 10:49 customer-008
-rw-r--r-- 1 root root 820106148 Mar 7 10:49 customer-009
-rw-r--r-- 1 root root 819984680 Mar 7 10:49 customer-010
-rw-r--r-- 1 root root 819955439 Mar 7 10:49 customer-011
-rw-r--r-- 1 root root 819943274 Mar 7 10:49 customer-012
-rw-r--r-- 1 root root 819941317 Mar 7 10:49 customer-013
-rw-r--r-- 1 root root 819977248 Mar 7 10:49 customer-014
-rw-r--r-- 1 root root 819879330 Mar 7 10:49 customer-015
-rw-r--r-- 1 root root 819998061 Mar 7 10:49 customer-016
-rw-r--r-- 1 root root 819927193 Mar 7 10:49 customer-017
-rw-r--r-- 1 root root 819985981 Mar 7 10:49 customer-018
-rw-r--r-- 1 root root 819918959 Mar 7 10:49 customer-019
-rw-r--r-- 1 root root 824934822 Mar 7 10:49 customer-020
-rw-r--r-- 1 root root 824903203 Mar 7 10:49 customer-021
-rw-r--r-- 1 root root 824877629 Mar 7 10:50 customer-022
-rw-r--r-- 1 root root 824903800 Mar 7 10:50 customer-023
-rw-r--r-- 1 root root 825023618 Mar 7 10:50 customer-024
-rw-r--r-- 1 root root 824987190 Mar 7 10:50 customer-025
-rw-r--r-- 1 root root 824854649 Mar 7 10:50 customer-026
-rw-r--r-- 1 root root 825067703 Mar 7 10:50 customer-027
-rw-r--r-- 1 root root 824985996 Mar 7 10:50 customer-028
-rw-r--r-- 1 root root 824900754 Mar 7 10:50 customer-029
-rw-r--r-- 1 root root 594112890 Mar 7 10:50 customer-030
导入数据
(1)导入的命令
curl --location-trusted -u root:Aa123456 -T /syw/customer/customer-001 -H "label:9" -H "column_separator:|" http://192.168.48.71:8030/api/syw/customer/_stream_load
用法解读
curl --location-trusted
-u root:Aa123456(用户名:密码 )
-T /syw/customer/customer-001(存放数据的路径)
-H "label:9" (唯一标识,每次导入改为不一样的值,可直接利用这里给出的格式)
-H "column_separator:|" (字段的分割方式)
http://192.168.48.71:8030/api/syw/customer/_stream_load (http://IP地址:端口/api/库名/表名/_stream_load),8030是默认的端口
(2)导入成功后的显示
root@op-qa-worker:/syw/customer# curl --location-trusted -u root:Aa123456 -T /syw/customer/customer-001 -H "label:9" -H "column_separator:|" http://192.168.48.71:8030/api/syw/customer/_stream_load
{
"TxnId": 2,
"Label": "9",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 5000000,
"NumberLoadedRows": 5000000,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 814790497,
"LoadTimeMs": 13230,
"BeginTxnTimeMs": 25,
"StreamLoadPutTimeMs": 213,
"ReadDataTimeMs": 9719,
"WriteDataTimeMs": 12937,
"CommitAndPublishTimeMs": 53
}
mysql> select count(*) from customer;
+----------+
| count(*) |
+----------+
| 5000000 |
+----------+
1 row in set (0.10 sec)
此时则说明第一个小文件customer-000的数据被成功导入,接着修改文件名继续导入即可