spark连接cassandra配置说明

翻译 2015年11月19日 10:34:13

Cassandra Authentication Parameters

All parameters should be prefixed with spark.cassandra.

Property Name Default Description
auth.conf.factory DefaultAuthConfFactory Name of a Scala module or class implementing AuthConfFactory providing custom authentication configuration

Cassandra Connection Parameters

All parameters should be prefixed with spark.cassandra.

Property Name Default Description
connection.compression   Compression to use (LZ4, SNAPPY or NONE)
connection.factory DefaultConnectionFactory Name of a Scala module or class implementingCassandraConnectionFactory providing connections to the Cassandra cluster
connection.host localhost Contact point to connect to the Cassandra cluster
connection.keep_alive_ms 250 Period of time to keep unused connections open
connection.local_dc None The local DC to connect to (other nodes will be ignored)
connection.port 9042 Cassandra native connection port
connection.reconnection_delay_ms.max 60000 Maximum period of time to wait before reconnecting to a dead node
connection.reconnection_delay_ms.min 1000 Minimum period of time to wait before reconnecting to a dead node
connection.timeout_ms 5000 Maximum period of time to attempt connecting to a node
query.retry.count 10 Number of times to retry a timed-out query
query.retry.delay 4 * 1.5 The delay between subsequent retries (can be constant, like 1000; linearly increasing, like 1000+100; or exponential, like 1000*2)
read.timeout_ms 120000 Maximum period of time to wait for a read to return

Cassandra DataFrame Source Parameters

All parameters should be prefixed with spark.cassandra.

Property Name Default Description
table.size.in.bytes None Used by DataFrames Internally, will be updated in a future release toretrieve size from C*. Can be set manually now

Cassandra SQL Context Options

All parameters should be prefixed with spark.cassandra.

Property Name Default Description
sql.cluster default Sets the default Cluster to inherit configuration from
sql.keyspace None Sets the default keyspace

Cassandra SSL Connection Options

All parameters should be prefixed with spark.cassandra.

Property Name Default Description
connection.ssl.enabled false Enable secure connection to Cassandra cluster
connection.ssl.enabledAlgorithms Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA) SSL cipher suites
connection.ssl.protocol TLS SSL protocol
connection.ssl.trustStore.password None Trust store password
connection.ssl.trustStore.path None Path for the trust store being used
connection.ssl.trustStore.type JKS Trust store type

Read Tuning Parameters

All parameters should be prefixed with spark.cassandra.

Property Name Default Description
input.consistency.level LOCAL_ONE Consistency level to use when reading
input.fetch.size_in_rows 1000 Number of CQL rows fetched per driver request
input.metrics true Sets whether to record connector specific metrics on write
input.split.size_in_mb 64 Approx amount of data to be fetched into a Spark partition

Write Tuning Parameters

All parameters should be prefixed with spark.cassandra.

Property Name Default Description
output.batch.grouping.buffer.size 1000 How many batches per single Spark task can be stored inmemory before sending to Cassandra
output.batch.grouping.key Partition Determines how insert statements are grouped into batches. Available values are
  • none : a batch may contain any statements
  • replica_set : a batch may contain only statements to be written to the same replica set
  • partition : a batch may contain only statements for rows sharing the same partition key value
output.batch.size.bytes 1024 Maximum total size of the batch in bytes. Overridden byspark.cassandra.output.batch.size.rows
output.batch.size.rows None Number of rows per single batch. The default is 'auto'which means the connector will adjust the numberof rows based on the amount of datain each row
output.concurrent.writes 5 Maximum number of batches executed in parallel by a single Spark task
output.consistency.level LOCAL_ONE Consistency level for writing
output.metrics true Sets whether to record connector specific metrics on write
output.throughput_mb_per_sec 2.147483647E9 *(Floating points allowed)*
Maximum write throughput allowed per single core in MB/s.
Limit this on long (+8 hour) runs to 70% of your max throughput as seen on a smaller job for stability

使用Spark+Cassandra打造高性能数据分析平台

Cassandra是一个分布式、高可扩展的数据库,用户可以创建线上应用程序,实时处理大量数据。   Apache Spark是应用于Hadoop集群的处理引擎,在内存条件下可以为Hadoop加速10...
  • liyong1115
  • liyong1115
  • 2015-06-04 16:40:53
  • 3037

spark+cassandra实时数据分析方案

前言在本教程中,您将学习如何设置用于读取和写入数据至Cassandra的一个非常简单的spark应用程序。在开始前,你需要拥有spark和cassandra的基本知识,详情请参阅spark和cassa...
  • losangelesunshine
  • losangelesunshine
  • 2015-11-18 15:41:10
  • 2649

使用Spark SQL进行Cassandra Join (Java)

我们知道,Cassandra这种NoSQL数据库,天生无法执行join的操作。 但是如果你手上刚好有一个Spark集群,那么就方便很多了。我们可以 在Spark SQL之中进行join的操作 。 ...
  • u010695981
  • u010695981
  • 2018-01-10 20:27:58
  • 175

Spark学习笔记之-Spark-cassandra-connector阅读

最近使用Spark将处理数据存储到cassandra遇到些问题,于是乎大体看了下spark-cassandra-connector的源码 这是该项目的主页:https://github.com/dat...
  • dandykang
  • dandykang
  • 2015-09-09 09:58:40
  • 1191

java spark(spark sql)操作cassandar

前期准备: cassandra集群(可以参考网站 https://cassandrazh.github.io/) spark集群(可以参考我的文章 http://www.jianshu.c...
  • peterpeng881003
  • peterpeng881003
  • 2017-12-19 09:18:18
  • 382

Spark SQL和DataFrame的学习总结

1、DataFrame 一个以命名列组织的分布式数据集。概念上相当于关系数据库中一张表或在R / Python中的data frame数据结构,但DataFrame有丰富的优化。在sp...
  • myy1012010626
  • myy1012010626
  • 2016-05-10 19:28:03
  • 4708

Cassandra 的 cql入门使用

Cassandra 的 cql入门使用
  • u012965373
  • u012965373
  • 2016-06-28 23:03:53
  • 4961

使用Spark+Cassandra打造高性能数据分析平台(二)

【导读】笔者( 许鹏)看Spark源码的时间不长,记笔记的初衷只是为了不至于日后遗忘。在源码阅读的过程中秉持着一种非常简单的思维模式,就是努力去寻找一条贯穿全局的主线索。在笔者看来,Spark中的...
  • cqboy1991
  • cqboy1991
  • 2016-02-01 10:02:33
  • 2123

<em>spark</em>SQL连接<em>cassandra</em>和mysql的demo

2017-11-09 上传大小:148KB <em>spark</em> <em>spark</em>连接<em>cassandra</em>和mysql数据库,并执行sql查询,以及一些初级的<em>spark</em>的map flatmap filter等操作 ...
  • 2018年04月17日 00:00

spark连接cassandra配置说明

spark连接cassandra配置说明
  • losangelesunshine
  • losangelesunshine
  • 2015-11-19 10:34:13
  • 1519
收藏助手
不良信息举报
您举报文章:spark连接cassandra配置说明
举报原因:
原因补充:

(最多只允许输入30个字)