Scala文件读取
E盘根目录下scalaIO.txt文件内容如下:
文件读取示例代码:
val file=Source.fromFile("E:\\scalaIO.txt")
for(line <- file.getLines)
{
println(line)
}
file.close
- 说明1:file=Source.fromFile(“E:\scalaIO.txt”),其中Source中的fromFile()方法源自 import scala.io.Source源码包,源码如下图:
- file.getLines(),返回的是一个迭代器-Iterator;源码如下:(scala.io)
-
Scala 网络资源读取
val webFile=Source.fromURL("http://spark.apache.org")
webFile.foreach(print)
webFile.close()
fromURL()方法源码如下:
/** same as fromURL(new URL(s))
*/
def fromURL(s: String)(implicit codec: Codec): BufferedSource =
fromURL(new URL(s))(codec)
读取的网络资源资源内容如下:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>
Apache Spark™ - Lightning-Fast Cluster Computing
</title>
<meta name="description" content="Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.">
<link href="/css/cerulean.min.css" rel="stylesheet">
<link href="/css/custom.css" rel="stylesheet">
<script type="text/javascript">
<!-- Google Analytics initialization -->
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-32518208-2']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
function trackOutboundLink(link, category, action) {
try {
_gaq.push(['_trackEvent', category , action]);
} catch(err){}
setTimeout(function() {
document.location.href = link.href;
}, 100);
}
</script>
</head>
<body>
<script src="https://code.jquery.com/jquery.js"></script>
<script src="//netdna.bootstrapcdn.com/bootstrap/3.0.3/js/bootstrap.min.js"></script>
<script src="/js/lang-tabs.js"></script>
<script src="/js/downloads.js"></script>
<div class="container" style="max-width: 1200px;">
<div class="masthead">
<p class="lead">
<a href="/">
<img src="/images/spark-logo.png"
style="height:100px; width:auto; vertical-align: bottom; margin-top: 20px;"></a><span class="tagline">
Lightning-fast cluster computing
</span>
</p>
</div>
<nav class="navbar navbar-default" role="navigation">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse"
data-target="#navbar-collapse-1">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<div class="collapse navbar-collapse" id="navbar-collapse-1">
<ul class="nav navbar-nav">
<li><a href="/downloads.html">Download</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">
Libraries <b class="caret"></b>
</a>
<ul class="dropdown-menu">
<li><a href="/sql/">SQL and DataFrames</a></li>
<li><a href="/streaming/">Spark Streaming</a></li>
<li><a href="/mllib/">MLlib (machine learning)</a></li>
<li><a href="/graphx/">GraphX (graph)</a></li>
<li class="divider"></li>
<li><a href="http://spark-packages.org">Third-Party Packages</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">
Documentation <b class="caret"></b>
</a>
<ul class="dropdown-menu">
<li><a href="/docs/latest/">Latest Release (Spark 1.5.1)</a></li>
<li><a href="/documentation.html">Other Resources</a></li>
</ul>
</li>
<li><a href="/examples.html">Examples</a></li>
<li class="dropdown">
<a href="/community.html" class="dropdown-toggle" data-toggle="dropdown">
Community <b class="caret"></b>
</a>
<ul class="dropdown-menu">
<li><a href="/community.html">Mailing Lists</a></li>
<li><a href="/community.html#events">Events and Meetups</a></li>
<li><a href="/community.html#history">Project History</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark">Powered By</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/SPARK/Committers">Project Committers</a></li>
<li><a href="https://issues.apache.org/jira/browse/SPARK">Issue Tracker</a></li>
</ul>
</li>
<li><a href="/faq.html">FAQ</a></li>
</ul>
</div>
</nav>
<div class="row">
<div class="col-md-3 col-md-push-9">
<div class="news" style="margin-bottom: 20px;">
<h5>Latest News</h5>
<ul class="list-unstyled">
<li><a href="/news/submit-talks-to-spark-summit-east-2016.html">Submission is open for Spark Summit East 2016</a>
<span class="small">(Oct 14, 2015)</span></li>
<li><a href="/news/spark-1-5-1-released.html">Spark 1.5.1 released</a>
<span class="small">(Oct 02, 2015)</span></li>
<li><a href="/news/spark-1-5-0-released.html">Spark 1.5.0 released</a>
<span class="small">(Sep 09, 2015)</span></li>
<li><a href="/news/spark-summit-europe-agenda-posted.html">Spark Summit Europe agenda posted</a>
<span class="small">(Sep 07, 2015)</span></li>
</ul>
<p class="small" style="text-align: right;"><a href="/news/index.html">Archive</a></p>
</div>
<div class="hidden-xs hidden-sm">
<a href="/downloads.html" class="btn btn-success btn-lg btn-block" style="margin-bottom: 30px;">
Download Spark
</a>
<p style="font-size: 16px; font-weight: 500; color: #555;">
Built-in Libraries:
</p>
<ul class="list-none">
<li><a href="/sql/">SQL and DataFrames</a></li>
<li><a href="/streaming/">Spark Streaming</a></li>
<li><a href="/mllib/">MLlib (machine learning)</a></li>
<li><a href="/graphx/">GraphX (graph)</a></li>
</ul>
<a href="http://spark-packages.org">Third-Party Packages</a>
</div>
</div>
<div class="col-md-9 col-md-pull-3">
<div class="jumbotron">
<b>Apache Spark™</b> is a fast and general engine for large-scale data processing.
</div>
<div class="row row-padded">
<div class="col-md-7 col-sm-7">
<h2>Speed</h2>
<p class="lead">
Run programs up to 100x faster than
Hadoop MapReduce in memory, or 10x faster on disk.
</p>
<p>
Spark has an advanced DAG execution engine that supports cyclic data flow and
in-memory computing.
</p>
</div>
<div class="col-md-5 col-sm-5 col-padded-top col-center">
<div style="width: 100%; max-width: 272px; display: inline-block; text-align: center;">
<img src="/images/logistic-regression.png" style="width: 100%; max-width: 250px;" />
<div class="caption" style="min-width: 272px;">Logistic regression in Hadoop and Spark</div>
</div>
</div>
</div>
<div class="row row-padded">
<div class="col-md-7 col-sm-7">
<h2>Ease of Use</h2>
<p class="lead">
Write applications quickly in Java, Scala, Python, R.
</p>
<p>
Spark offers over 80 high-level operators that make it easy to build parallel apps.
And you can use it <em>interactively</em>
from the Scala, Python and R shells.
</p>
</div>
<div class="col-md-5 col-sm-5 col-padded-top col-center">
<div style="text-align: left; display: inline-block;">
<div class="code">
text_file = spark.textFile(<span class="string">"hdfs://..."</span>)<br />
<br />
text_file.<span class="sparkop">flatMap</span>(<span class="closure">lambda line: line.split()</span>)<br />
.<span class="sparkop">map</span>(<span class="closure">lambda word: (word, 1)</span>)<br />
.<span class="sparkop">reduceByKey</span>(<span class="closure">lambda a, b: a+b</span>)
</div>
<div class="caption">Word count in Spark's Python API</div>
</div>
</div>
</div>
<div class="row row-padded">
<div class="col-md-7 col-sm-7">
<h2>Generality</h2>
<p class="lead">
Combine SQL, streaming, and complex analytics.
</p>
<p>
Spark powers a stack of libraries including
<a href="/sql/">SQL and DataFrames</a>, <a href="/mllib/">MLlib</a> for machine learning,
<a href="/graphx/">GraphX</a>, and <a href="/streaming/">Spark Streaming</a>.
You can combine these libraries seamlessly in the same application.
</p>
</div>
<div class="col-md-5 col-sm-5 col-padded-top col-center">
<img src="/images/spark-stack.png" style="margin-top: 15px; width: 100%; max-width: 296px;" usemap="#stack-map" />
<map name="stack-map">
<area shape="rect" coords="0,0,74,95" href="/sql/" alt="Spark SQL" title="Spark SQL" />
<area shape="rect" coords="74,0,150,95" href="/streaming/" alt="Spark Streaming" title="Spark Streaming" />
<area shape="rect" coords="150,0,224,95" href="/mllib/" alt="MLlib (machine learning)" title="MLlib" />
<area shape="rect" coords="225,0,300,95" href="/graphx/" alt="GraphX" title="GraphX" />
</map>
</div>
</div>
<div class="row row-padded" style="margin-bottom: 15px;">
<div class="col-md-7 col-sm-7">
<h2>Runs Everywhere</h2>
<p class="lead">
Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
</p>
<p>
You can run Spark using its <a href="/docs/latest/spark-standalone.html">standalone cluster mode</a>, on <a href="/docs/latest/ec2-scripts.html">EC2</a>, on Hadoop YARN, or on <a href="http://mesos.apache.org">Apache Mesos</a>.
Access data in <a href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">HDFS</a>, <a href="http://cassandra.apache.org">Cassandra</a>, <a href="http://hbase.apache.org">HBase</a>,
<a href="http://hive.apache.org">Hive</a>, <a href="http://tachyon-project.org">Tachyon</a>, and any Hadoop data source.
</p>
</div>
<div class="col-md-5 col-sm-5 col-padded-top col-center">
<img src="/images/spark-runs-everywhere.png" style="width: 100%; max-width: 280px;" />
</div>
</div>
</div>
</div>
<div class="row">
<div class="col-md-4 col-padded">
<h3>Community</h3>
<p>
Spark is used at a wide range of organizations to process large datasets.
You can find example use cases at the <a href="http://spark-summit.org/summit-2013/">Spark Summit</a>
conference, or on the
<a href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark">Powered By</a>
page.
</p>
<p>
There are many ways to reach the community:
</p>
<ul class="list-narrow">
<li>Use the <a href="/community.html#mailing-lists">mailing lists</a> to ask questions.</li>
<li>In-person events include the <a href="http://www.meetup.com/spark-users/">Bay Area Spark meetup</a> and
<a href="http://spark-summit.org/">Spark Summit</a>.</li>
<li>We use <a href="https://issues.apache.org/jira/browse/SPARK">JIRA</a> for issue tracking.</li>
</ul>
</div>
<div class="col-md-4 col-padded">
<h3>Contributors</h3>
<p>
Apache Spark is built by a wide set of developers from over 200 companies.
Since 2009, more than 800 developers have contributed to Spark!
</p>
<p>
The project's
<a href="https://cwiki.apache.org/confluence/display/SPARK/Committers">committers</a>
come from 16 organizations.
</p>
<p>
If you'd like to participate in Spark, or contribute to the libraries on top of it, learn
<a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">how to
contribute</a>.
</p>
</div>
<div class="col-md-4 col-padded">
<h3>Getting Started</h3>
<p>Learning Spark is easy whether you come from a Java or Python background:</p>
<ul class="list-narrow">
<li><a href="/downloads.html">Download</a> the latest release — you can run Spark locally on your laptop.</li>
<li>Read the <a href="/docs/latest/quick-start.html">quick start guide</a>.</li>
<li>
Spark Summit 2014 contained free <a href="http://spark-summit.org/2014/training">training videos and exercises</a>.
</li>
<li>Learn how to <a href="/docs/latest/#launching-on-a-cluster">deploy</a> Spark on a cluster.</li>
</ul>
</div>
</div>
<div class="row">
<div class="col-sm-12 col-center">
<a href="/downloads.html" class="btn btn-success btn-lg" style="width: 262px;">Download Spark</a>
</div>
</div>
<footer class="small">
<hr>
Apache Spark, Spark, Apache, and the Spark logo are trademarks of
<a href="http://www.apache.org">The Apache Software Foundation</a>.
</footer>
</div>
</body>
</html>
Process finished with exit code 0
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
val webFile=Source.fromURL("http://www.baidu.com/")
webFile.foreach(print)
webFile.close()
读取中文资源站点,出现编码混乱问题如下:(解决办法自行解决,本文不是重点)
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1