java 读取 cvs,读取CSV文件Java的最快方法

I´ve been trying to read several csv files (arround 20 MB) using openCSV, but so far it has been slow. Im trying to read 4 csv files which I´m loading into a heap, which I have designed. I was wondering, if there is any other way this could be done in a lot less time.

private Heap datosHeap;

public void loadMovingViolations()

{

Runtime garbage = Runtime.getRuntime();

garbage.gc();

try

{

FileReader fileReaderMes1 = new FileReader(FECHAS[0]);

FileReader fileReaderMes2 = new FileReader(FECHAS[1]);

FileReader fileReaderMes3 = new FileReader(FECHAS[2]);

FileReader fileReaderMes4 = new FileReader(FECHAS[3]);

CSVReader enero = new CSVReaderBuilder(fileReaderMes1).withSkipLines(1).build();

CSVReader febrero = new CSVReaderBuilder(fileReaderMes2).withSkipLines(1).build();

CSVReader marzo = new CSVReaderBuilder(fileReaderMes3).withSkipLines(1).build();

CSVReader abril = new CSVReaderBuilder(fileReaderMes4).withSkipLines(1).build();

String[] row;

while((row = enero.readNext()) != null)

{

int objectId = Integer.parseInt(row[0]);

int totalPaid = (int)Double.parseDouble(row[9]);

short fi = Short.parseShort(row[8]);

short penalty1 = Short.parseShort(row[10]);

datosHeap.insert(new VOMovingViolations(objectId, totalPaid, fi, row[2], row[13],

row[12],row[14], row[15], row[4], row[3], penalty1));

}

while((row = febrero.readNext()) != null)

{

int objectId = Integer.parseInt(row[0]);

int totalPaid = (int)Double.parseDouble(row[9]);

short fi = Short.parseShort(row[8]);

short penalty1 = Short.parseShort(row[10]);

datosHeap.insert(new VOMovingViolations(objectId, totalPaid, fi, row[2], row[13],

row[12],row[14], row[15], row[4], row[3], penalty1));

}

while((row = marzo.readNext()) != null)

{

int objectId = Integer.parseInt(row[0]);

int totalPaid = (int)Double.parseDouble(row[9]);

short fi = Short.parseShort(row[8]);

short penalty1 = Short.parseShort(row[10]);

datosHeap.insert(new VOMovingViolations(objectId, totalPaid, fi, row[2], row[13],

row[12],row[14], row[15], row[4], row[3], penalty1));

}

while((row = abril.readNext()) != null)

{

int objectId = Integer.parseInt(row[0]);

int totalPaid = (int)Double.parseDouble(row[9]);

short fi = Short.parseShort(row[8]);

short penalty1 = Short.parseShort(row[10]);

datosHeap.insert(new VOMovingViolations(objectId, totalPaid, fi, row[2], row[13],

row[12],row[14], row[15], row[4], row[3], penalty1));

}

}

catch (FileNotFoundException e) {

// TODO Auto-generated catch block

e.printStackTrace();

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

}

I would really appreciate any help or any idea someone could please give me.

解决方案

tl;dr

Reading a 20 MB CSV file, and instantiating an object per row, takes less than 1 second in total elapsed time.

Details

You did not define the term “slow”. So I did an experiment, a casual benchmark test.

First we create a 20 MB file of 40,000 Person records. Each Person holds a first & last name in French, a UUID, and some arbitrary text as a description. The data is written as four columns in a CSV file in UTF-8. I used the Apache Commons CSV library to write and read.

Secondly, this written file is read. Each row of data is read into memory, then used to instantiate and collect a Person object.

Reading this file, and instantiating Person object for each row took less than one second in total elapsed time. Each row takes about 20K nanoseconds. Actually, this includes reading the file twice, as we do a scan to count the number of rows of data to set initial capacity of the collected instances. Also, we are parsing a hex string input into the 128-bit value of a UUID, so we have some time spent on data-processing (not just reading).

Here is the Person class.

package work.basil.example;

import java.util.UUID;

public class Person

{

// Static

static public String LOREM_IPSUM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

// Member variables.

public String givenName, surname, description;

public UUID id;

public Person ( String givenName , String surname , UUID id , String description)

{

this.givenName = givenName;

this.surname = surname;

this.id = id;

this.description = description ;

}

@Override

public String toString ()

{

return "Person{ " +

"givenName='" + givenName + '\'' +

" | surname='" + surname + '\'' +

" | id='" + id + '\'' +

" }";

}

}

And here is the complete app that writes and then reads the 20 MB file. Please study and critique, as I whipped this up in a jiffy. I’ve not double-checked my work.

You will find a write method, and a read method. The main method calls both, and tracks time.

package work.basil.example;

import org.apache.commons.csv.CSVFormat;

import org.apache.commons.csv.CSVPrinter;

import org.apache.commons.csv.CSVRecord;

import java.io.BufferedReader;

import java.io.IOException;

import java.nio.charset.StandardCharsets;

import java.nio.file.Files;

import java.nio.file.Path;

import java.nio.file.Paths;

import java.time.Duration;

import java.time.Instant;

import java.time.temporal.ChronoUnit;

import java.util.ArrayList;

import java.util.List;

import java.util.UUID;

import java.util.concurrent.ThreadLocalRandom;

public class CsvSpeed

{

public List < Person > read ( Path path )

{

// TODO: Add a check for valid file existing.

List < Person > list = List.of(); // Default to empty list.

try

{

// Prepare list.

int initialCapacity = ( int ) Files.lines( path ).count();

list = new ArrayList <>( initialCapacity );

// Read CSV file. For each row, instantiate and collect `DailyProduct`.

BufferedReader reader = Files.newBufferedReader( path );

Iterable < CSVRecord > records = CSVFormat.RFC4180.withFirstRecordAsHeader().parse( reader );

for ( CSVRecord record : records )

{

String givenName = record.get( "givenName" );

String surname = record.get( "surname" );

UUID id = UUID.fromString( record.get( "id" ) );

String description = record.get( "description" );

// Instantiate `Person` object, and collect it.

Person person = new Person( givenName , surname , id , description );

list.add( person );

}

} catch ( IOException e )

{

e.printStackTrace();

}

return list;

}

public void write ( final Path path )

{

ThreadLocalRandom random = ThreadLocalRandom.current();

try ( final CSVPrinter printer = CSVFormat.RFC4180.withHeader( "givenName" , "surname" , "id" , "description" ).print( path , StandardCharsets.UTF_8 ) ; )

{

int limit = 40_000; // 40_000 yields about 20 MB of data.

List < String > givenNames = List.of( "Adrien" , "Aimon" , "Alerion" , "Alexis" , "Alezan" , "Ancil" , "Andre" , "Antoine" , "Archard" , "Aurélien" , "Averill" , "Baptiste" , "Barnard" , "Bartelemy" , "Bastien" , "Baylee" , "Beale" , "Beau" , "Beaumont" , "Beauregard" , "Bellamy" , "Berger" , "Blaize" , "Blondel" , "Boyce" , "Bruce" , "Brunelle" , "Brys" , "Burcet" , "Burnell" , "Burrell" , "Byron" , "Canaan" , "Carden" , "Carolas" , "Cavell" , "Chace" , "Chanler" , "Chante" , "Chappel" , "Charles" , "Chasen" , "Chason" , "Chemin" , "Chene" , "Cher" , "Chevalier" , "Cheyne" , "Clément" , "Clemence" , "Corbin" , "Coty" , "Cygne" , "Damien" , "Dandre" , "Dariel" , "Darl" , "Dauphine" , "Davet" , "Dax" , "Dean" , "Delice" , "Delmon" , "Destin" , "Dominique" , "Donatien" , "Duke" , "Eliott" , "Elroy" , "Enzo" , "Erwan" , "Etalon" , "Ethan" , "Fabron" , "Ferrand" , "Filberte" , "Florent" , "Florian" , "Fontaine" , "Forest" , "Fortune" , "Franchot" , "Francois" , "Fraser" , "Frayne" , "Gaëtan" , "Gabin" , "Gage" , "Gaige" , "Garland" , "Garner" , "Gaston" , "Gauge" , "Gaylord" , "Germain" , "Germaine" , "German" , "Gervaise" , "Giles" , "Gilles" , "Gitan" , "Grosvener" , "Guifford" , "Guion" , "Guy" , "Guzman" , "Henri" , "Holland" , "Hugo" , "Hugues" , "Hyacinthe" , "Jérémy" , "Jacquan" , "Jacques" , "Jacquez" , "Janvier" , "Jardan" , "Jay" , "Jaye" , "Jehan" , "Jemond" , "Jocquez" , "Jonathan" , "Jules" , "Julien" , "Justus" , "Karoly" , "Lado" , "Lafayette" , "Lamond" , "Lancelin" , "Landis" , "Landry" , "Laron" , "Larrimore" , "Laurent" , "LaValle" , "Leandre" , "Leggett" , "Leonce" , "Leron" , "Leverett" , "Lilian" , "Loïc" , "Lorenzo" , "Louis" , "Lowell" , "Luc" , "Lucien" , "Lukas" , "Macaire" , "Mace" , "Mahieu" , "Maison" , "Malleville" , "Manneville" , "Mantel" , "Marc" , "Marcel" , "Marion" , "Marius" , "Markez" , "Markis" , "Marmion" , "Marquis" , "Marquise" , "Marshall" , "Martial" , "Maslin" , "Mason" , "Matheo" , "Mathias" , "Mathys" , "Matthieu" , "Maxence" , "Mayson" , "Mehdi" , "Merle" , "Merville" , "Montague" , "Montaigu" , "Monte" , "Montgomery" , "Montreal" , "Montrel" , "Moore" , "Morel" , "Mortimer" , "Nerville" , "Neuveville" , "Nicolas" , "Noë" , "Noah" , "Noe" , "Norman" , "Norville" , "Nouel" , "Olivier" , "Onfroi" , "Paien" , "Parfait" , "Parnell" , "Pascal" , "Patrice" , "Paul" , "Peppin" , "Percival" , "Percy" , "Pernell" , "Peverell" , "Philipe" , "Pierpont" , "Pierre" , "Pomeroy" , "Prewitt" , "Purvis" , "Quennell" , "Quentin" , "Quincey" , "Quincy" , "Quintin" , "Rémi" , "Rafaelle" , "Ranger" , "Raoul" , "Raphaël" , "Rapier" , "Rawlins" , "Ray" , "Raynard" , "Remi" , "René" , "Renard" , "Rene" , "Reule" , "Reynard" , "Robin" , "Romain" , "Rondel" , "Roy" , "Royal" , "Ruff" , "Rush" , "Russel" , "Rustin" , "Sabastien" , "Sacha" , "Salomon" , "Samuel" , "Satordi" , "Saville" , "Scoville" , "Sebastien" , "Sennett" , "Severin" , "Shant" , "Shantae" , "Sidney" , "Siffre" , "Simeon" , "Simon" , "Sinclair" , "Sofiane" , "Somer" , "Stephane" , "Sully" , "Sydney" , "Sylvain" , "Talbot" , "Talon" , "Telford" , "Tempest" , "Teppo" , "Théo" , "Thayer" , "Thibault" , "Thibaut" , "Thiery" , "Tiennan" , "Tiennot" , "Titouan" , "Toussaint" , "Travaris" , "Tyson" , "Urson" , "Vachel" , "Valentin" , "Valere" , "Vallis" , "Verdun" , "Victoir" , "Victor" , "Waltier" , "William" , "Wyatt" , "Yanis" , "Yann" , "Yves" , "Yvon" , "Zosime" , "Abrial" , "Abrielle" , "Abril" , "Adele" , "Alair" , "Alerion" , "Amee" , "Angelique" , "Annette" , "Antonella" , "Arian" , "Ariane" , "Armandina" , "Aubree" , "Aubrielle" , "Audra" , "Avril" , "Bella" , "Berneta" , "Bette" , "Blaise" , "Blanche" , "Blasa" , "Bonte" , "Brie" , "Brienne" , "Brigit" , "Cachay" , "Calice" , "Camille" , "Camylle" , "Caprice" , "Caressa" , "Caroline" , "Catin" , "Celesta" , "Celeste" , "Cera" , "Cerise" , "Chablis" , "Chalice" , "Chambray" , "Champagne" , "Chandell" , "Chaney" , "Chantal" , "Chante" , "Chanterelle" , "Chantile" , "Chantilly" , "Chantrice" , "Charla" , "Charlotte" , "Charmane" , "Chaton" , "Chemin" , "Chenetta" , "Cher" , "Chere" , "Cheri" , "Cheryl" , "Christine" , "Cidney" , "Cinderella" , "Claire" , "Claudette" , "Colette" , "Cordelle" , "Cydnee" , "Daeja" , "Daija" , "Daja" , "Damzel" , "Darelle" , "Darlene" , "Darselle" , "Dejanelle" , "Deleena" , "Delice" , "Demeri" , "Deni" , "Denise" , "Desgracias" , "Desire" , "Desiree" , "Destanee" , "Destiny" , "Dior" , "Domanique" , "Dominique" , "Elaina" , "Elaine" , "Elayna" , "Elise" , "Eloisa" , "Elyse" , "Emeline" , "Emmaline" , "Emmeline" , "Estella" , "Estrella" , "Etiennette" , "Evette" , "Fabienne" , "Fabrienne" , "Fanchon" , "Fancy" , "Fawna" , "Fayana" , "Fayette" , "Fifi" , "Fleur" , "Fleurette" , "Fontanna" , "Fosette" , "Francine" , "Frederique" , "Gabriel" , "Gabriele" , "Gabrielle" , "Gaby" , "Garcelle" , "Gena" , "Genie" , "Georgette" , "Germaine" , "Gervaise" , "Gitana" , "Harriet" , "Heloisa" , "Holland" , "Honnetta" , "Isabelle" , "Ivette" , "Ivonne" , "Jacqueena" , "Jacquetta" , "Jacquiline" , "Jacyline" , "Jaime" , "Jakqueline" , "Janeen" , "Janelly" , "Janina" , "Janiqua" , "Janique" , "Jannnelle" , "Jaquita" , "Jardena" , "Jeanetta" , "Jermaine" , "Jessamine" , "Jewel" , "Jewell" , "Joli" , "Jolie" , "Josephine" , "Jozephine" , "Julieta" , "Karessa" , "Karmaine" , "Klara" , "Laine" , "Lanelle" , "Laramie" , "Layne" , "Layney" , "Leala" , "Leonette" , "Lissette" , "Lizette" , "Lourdes" , "Lucienne" , "Ly" , "Lyla" , "Lysette" , "Madelaine" , "Malerie" , "Manette" , "Marais" , "Marcelle" , "Marché" , "Mardi" , "Margo" , "Marguerite" , "Marie" , "Marie Claude" , "Marie Frances" , "Marie Joelle" , "Marie Pascale" , "Marie Sophie" , "Marjolaine" , "Marquise" , "Marvella" , "Mathieu" , "Matisse" , "Maurelle" , "Maurissa" , "Mavis" , "Melisande" , "Michelle" , "Miette" , "Mignon" , "Mimi" , "Mirya" , "Monet" , "Moniqua" , "Monteen" , "Musetta" , "Myrlie" , "Nadeen" , "Nadia" , "Nadiyah" , "Naeva" , "Nanon" , "Natalle" , "Naudia" , "Nettie" , "Nicholas" , "Nicki" , "Nicky" , "Nicole" , "Nicolette" , "Nicolina" , "Nicolle" , "Nikolette" , "Ninette" , "Ninon" , "Noelle" , "Nycole" , "Odelette" , "Opaline" , "Orane" , "Orva" , "Page" , "Parisa" , "Parnel" , "Parris" , "Patrice" , "Peridot" , "Pippi" , "Prairie" , "Rachele" , "Rachelle" , "Racquel" , "Raphaelle" , "Raquelle" , "Remi" , "Renée" , "Renea" , "Renelle" , "Renita" , "Risette" , "Rochelle" , "Romy" , "Rosabel" , "Rosiclara" , "Ruba" , "Russhell" , "Saleena" , "Salina" , "Satin" , "Sedona" , "Serene" , "Shandelle" , "Shanta" , "Shante" , "Shariah" , "Sharita" , "Sharleen" , "Sheree" , "Shereen" , "Sherell" , "Sherice" , "Sherry" , "Sidnee" , "Sidney" , "Sidnie" , "Sidonie" , "Sinclaire" , "Solange" , "Solen" , "Sorrel" , "Suzette" , "Sydnee" , "Sydney" , "Tallis" , "Tempest" , "Toinette" , "Turquoise" , "Veronique" , "Vignette" , "Villette" , "Violeta" , "Virginie" , "Voleta" , "Vonny" );

List < String > surnames = List.of( "Arceneau" , "Aucoin" , "Babin" , "Babineaux" , "Benoit" , "Bergeron" , "Bernard" , "Bertrand" , "Bessette" , "Blanc" , "Blanchard" , "Bonnet" , "Boucher" , "Bourg" , "Bourque" , "Boutin" , "Bouvier" , "Braud" , "Broussard" , "Brun" , "Chevalier" , "David" , "Depaul" , "Desmarais" , "Disney" , "Dubois" , "Dupont" , "Dupuis" , "Durand" , "Fortescue" , "Fournier" , "Garnier" , "Gaudet" , "Gillet" , "Gillette" , "Girard" , "Gravois" , "Grosvenor" , "Lambert" , "Landry" , "Laroche" , "Laurent" , "Lefevre" , "Leroy" , "Leveque" , "Lisle" , "Martin" , "Michel" , "Molyneux" , "Moreau" , "Morel" , "Neville" , "Pelletier" , "Petit" , "Prideux" , "Renard" , "Richard" , "Robert" , "Rousseau" , "Roux" , "Rufus" , "Simon" , "Thomas" );

for ( int i = 1 ; i <= limit ; i++ )

{

String givenName = givenNames.get( random.nextInt( 0 , givenNames.size() ) );

String surname = surnames.get( random.nextInt( 0 , surnames.size() ) );

UUID id = UUID.randomUUID();

String description = Person.LOREM_IPSUM;

printer.printRecord( givenName , surname , id , description );

}

} catch ( IOException e )

{

e.printStackTrace();

}

}

public static void main ( final String[] args )

{

// Launch the app.

CsvSpeed app = new CsvSpeed();

// Write.

String when = Instant.now().truncatedTo( ChronoUnit.SECONDS ).toString().replace( ":" , "•" );

Path pathOutput = Paths.get( "/Users/basilbourque/persons.csv" );

app.write( pathOutput );

System.out.println( "Writing file: " + pathOutput );

// Read.

long start = System.nanoTime();

Path pathInput = Paths.get( "/Users/basilbourque/persons.csv" );

List < Person > list = app.read( pathInput );

long stop = System.nanoTime();

// Time.

long elapsed = ( stop - start );

Duration d = Duration.ofNanos( elapsed );

System.out.println( "Reading elapsed: " + d );

System.out.println( "Reading took nanos per row: " + ( elapsed / list.size() ) );

System.out.println( "nanos elapsed: " + elapsed + " | list.size: " + list.size() );

}

}

When run:

Writing file: /Users/basilbourque/persons.csv

Reading elapsed: PT0.857816234S

Reading took nanos per row: 21445

nanos elapsed: 857816234 | list.size: 40000

Technology stack:

Java 11.0.2 — Zulu by Azul Systems (built from OpenJDK)

Run inside IntelliJ 2019.1

MacBook Pro (Retina, 15-inch, Late 2013)

Processor: 2.3 GHz Intel Core i7 (4 cores, 8 hyper)

16 GB 1600 MHz DDR3

Storage: Solid-state built-in by Apple

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值