读取大文件CSV

    read_csv中有个参数chunksize,通过指定一个chunksize分块大小来读取文件,返回的是一个可迭代的对象TextFileReader,IO Tools 举例如下:


pd.readcsv 的chunksize


In [138]: reader = pd.read_table('tmp.sv', sep='|', chunksize=4)


In [139]: reader
Out[139]: <pandas.io.parsers.TextFileReader at 0x120d2f290>


In [140]: for chunk in reader:
   .....:     print(chunk)
   .....: 
   Unnamed: 0         0         1         2         3
0           0  0.469112 -0.282863 -1.509059 -1.135632
1           1  1.212112 -0.173215  0.119209 -1.044236
2           2 -0.861849 -2.104569 -0.494929  1.071804
3           3  0.721555 -0.706771 -1.039575  0.271860
   Unnamed: 0         0         1         2         3
0           4 -0.424972  0.567020  0.276232 -1.087401
1           5 -0.673690  0.113648 -1.478427  0.524988
2           6  0.404705  0.577046 -1.715002 -1.039268
3           7 -0.370647 -1.157892 -1.344312  0.844885
   Unnamed: 0         0        1         2         3
0           8  1.075770 -0.10905  1.643563 -1.469388
1           9  0.357021 -0.67460 -1.776904 -0.968914




    指定iterator=True 也可以返回一个可迭代对象TextFileReader :


In [141]: reader = pd.read_table('tmp.sv', sep='|', iterator=True)


In [142]: reader.get_chunk(5)
Out[142]: 
   Unnamed: 0         0         1         2         3
0           0  0.469112 -0.282863 -1.509059 -1.135632
1           1  1.212112 -0.173215  0.119209 -1.044236
2           2 -0.861849 -2.104569 -0.494929  1.071804
3           3  0.721555 -0.706771 -1.039575  0.271860
4           4 -0.424972  0.567020  0.276232 -1.087401

发布了9 篇原创文章 · 获赞 25 · 访问量 6万+
展开阅读全文

提高php-cli中大型csv文件解析的性能

03-21

<div class="post-text" itemprop="text"> <p>Good evening, I have a csv file of 400mb and I have to load it in a mysql database. the csv file is "irregular" because it contains information such as:</p> <pre><code>user|email|password user|password|otherdata </code></pre> <p>I have made a script in php-cli to read the file line by line and take the information I need: username and password. l 'username is always the first record. to know what 'is the password I look at the length of the string.</p> <p>I run the script five hours ago and still has not finished loading all the data in the database.</p> <p>how can I do to improve the performance of this script?</p> <pre><code><?php $fileHandle = fopen("C:/Users/AT/Documents/Backup/forumusers.csv", "r"); $mysqlHandle = mysql_connect("localhost", "root", ""); mysql_select_db("testbackupboard"); while(!feof($fileHandle)) { $fileRow = fgets($fileHandle); $line2Record = explode("|", $fileRow); foreach ($line2Record as $rowRecord) { if (strlen($rowRecord) == 40) { $datatoMysql[0] = $rowRecord; // password hash } } $datatoMysql[1] = $line2Record[0]; // username $execQuery = mysql_query("INSERT INTO forumusers (username, hash) VALUES ('".mysql_real_escape_string(utf8_encode($datatoMysql[1]))."', '".mysql_real_escape_string(utf8_encode($datatoMysql[0]))."')"); if($execQuery) { print"Record ".$rowRecord[1]." ".$rowRecord[0]." loaded into db "; } else { die(mysql_error()); } } fclose($fileHandle); ?> </code></pre> </div> 问答

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览