在Scala中读取整个文件?

这篇文章探讨了在Scala中如何简单有效地将整个文件读入内存,包括使用scala.io.Source和控制字符编码的方式。讨论中提到了不同方法的优缺点,如效率、资源管理以及是否应该使用标准库。
摘要由CSDN通过智能技术生成

本文翻译自:Read entire file in Scala?

What's a simple and canonical way to read an entire file into memory in Scala? 在Scala中将整个文件读入内存的简单和规范方法是什么? (Ideally, with control over character encoding.) (理想情况下,控制字符编码。)

The best I can come up with is: 我能想到的最好的是:

scala.io.Source.fromPath("file.txt").getLines.reduceLeft(_+_)

or am I supposed to use one of Java's god-awful idioms , the best of which (without using an external library) seems to be: 或者我应该使用Java的一个令人难以置信的习语 ,其中最好的(不使用外部库)似乎是:

import java.util.Scanner
import java.io.File
new Scanner(new File("file.txt")).useDelimiter("\\Z").next()

From reading mailing list discussions, it's not clear to me that scala.io.Source is even supposed to be the canonical I/O library. 从阅读邮件列表讨论来看,我不清楚scala.io.Source甚至应该是规范的I / O库。 I don't understand what its intended purpose is, exactly. 我完全不明白它的目的是什么。

... I'd like something dead-simple and easy to remember. ...我想要一些简单易记的东西。 For example, in these languages it's very hard to forget the idiom ... 例如,在这些语言中,很难忘记这些成语......

Ruby    open("file.txt").read
Ruby    File.read("file.txt")
Python  open("file.txt").read()

#1楼

参考:https://stackoom.com/question/5O8V/在Scala中读取整个文件


#2楼

val lines = scala.io.Source.fromFile("file.txt").mkString

By the way, " scala. " isn't really necessary, as it's always in scope anyway, and you can, of course, import io's contents, fully or partially, and avoid having to prepend "io." 顺便说一下,“ scala. ”并不是必需的,因为它总是在范围内,当然,你可以完全或部分地导入io的内容,并且避免必须在“io”前加上。 too. 太。

The above leaves the file open, however. 但是,上面的文件保持打开状态。 To avoid problems, you should close it like this: 为避免出现问题,您应该像这样关闭它:

val source = scala.io.Source.fromFile("file.txt")
val lines = try source.mkString finally source.close()

Another problem with the code above is that it is horrible slow due to its implementation nature. 上面代码的另一个问题是,由于它的实现性质,它很慢。 For larger files one should use: 对于较大的文件,应使用:

source.getLines mkString "\n"

#3楼

// for file with utf-8 encoding
val lines = scala.io.Source.fromFile("file.txt", "utf-8").getLines.mkString

#4楼

The obvious question being "why do you want to read in the entire file?" 显而易见的问题是“为什么要读取整个文件?” This is obviously not a scalable solution if your files get very large. 如果文件变得非常大,这显然不是可扩展的解决方案。 The scala.io.Source gives you back an Iterator[String] from the getLines method, which is very useful and concise. scala.io.SourcegetLines方法返回一个Iterator[String] ,这非常有用和简洁。

It's not much of a job to come up with an implicit conversion using the underlying java IO utilities to convert a File , a Reader or an InputStream to a String . 使用底层的Java IO实用程序将FileReaderInputStream转换为String来进行隐式转换并不是一项工作。 I think that the lack of scalability means that they are correct not to add this to the standard API. 我认为缺乏可伸缩性意味着不将它添加到标准API是正确的。


#5楼

Just to expand on Daniel's solution, you can shorten things up tremendously by inserting the following import into any file which requires file manipulation: 只是为了扩展Daniel的解决方案,您可以通过将以下导入插入到需要文件操作的任何文件中来极大地缩短范围:

import scala.io.Source._

With this, you can now do: 有了这个,你现在可以做到:

val lines = fromFile("file.txt").getLines

I would be wary of reading an entire file into a single String . 我会警惕将整个文件读入单个String It's a very bad habit, one which will bite you sooner and harder than you think. 这是一个非常糟糕的习惯,会比你想象的更快,更难咬你。 The getLines method returns a value of type Iterator[String] . getLines方法返回Iterator[String]类型的值。 It's effectively a lazy cursor into the file, allowing you to examine just the data you need without risking memory glut. 它实际上是一个放入文件的懒惰光标,允许您只检查所需的数据而不会有内存过剩的风险。

Oh, and to answer your implied question about Source : yes, it is the canonical I/O library. 哦,并回答关于Source的隐含问题:是的,它是规范的I / O库。 Most code ends up using java.io due to its lower-level interface and better compatibility with existing frameworks, but any code which has a choice should be using Source , particularly for simple file manipulation. 大多数代码最终都使用java.io因为它的低级接口和与现有框架的更好兼容性,但任何有选择的代码都应该使用Source ,特别是对于简单的文件操作。


#6楼

I've been told that Source.fromFile is problematic. 我被告知Source.fromFile存在问题。 Personally, I have had problems opening large files with Source.fromFile and have had to resort to Java InputStreams. 就个人而言,我在使用Source.fromFile打开大文件时遇到了问题,并且不得不求助于Java InputStreams。

Another interesting solution is using scalax. 另一个有趣的解决方案是使用scalax。 Here's an example of some well commented code that opens a log file using ManagedResource to open a file with scalax helpers: http://pastie.org/pastes/420714 这是一个很好评论的代码示例,它使用ManagedResource打开一个日志文件,用scalax助手打开一个文件: http ://pastie.org/pastes/420714

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值