Cassandra Authentication Parameters
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
auth.conf.factory | DefaultAuthConfFactory | Name of a Scala module or class implementing AuthConfFactory providing custom authentication configuration |
Cassandra Connection Parameters
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
connection.compression | Compression to use (LZ4, SNAPPY or NONE) | |
connection.factory | DefaultConnectionFactory | Name of a Scala module or class implementingCassandraConnectionFactory providing connections to the Cassandra cluster |
connection.host | localhost | Contact point to connect to the Cassandra cluster |
connection.keep_alive_ms | 250 | Period of time to keep unused connections open |
connection.local_dc | None | The local DC to connect to (other nodes will be ignored) |
connection.port | 9042 | Cassandra native connection port |
connection.reconnection_delay_ms.max | 60000 | Maximum period of time to wait before reconnecting to a dead node |
connection.reconnection_delay_ms.min | 1000 | Minimum period of time to wait before reconnecting to a dead node |
connection.timeout_ms | 5000 | Maximum period of time to attempt connecting to a node |
query.retry.count | 10 | Number of times to retry a timed-out query |
query.retry.delay | 4 * 1.5 | The delay between subsequent retries (can be constant, like 1000; linearly increasing, like 1000+100; or exponential, like 1000*2) |
read.timeout_ms | 120000 | Maximum period of time to wait for a read to return |
Cassandra DataFrame Source Parameters
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
table.size.in.bytes | None | Used by DataFrames Internally, will be updated in a future release toretrieve size from C*. Can be set manually now |
Cassandra SQL Context Options
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
sql.cluster | default | Sets the default Cluster to inherit configuration from |
sql.keyspace | None | Sets the default keyspace |
Cassandra SSL Connection Options
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
connection.ssl.enabled | false | Enable secure connection to Cassandra cluster |
connection.ssl.enabledAlgorithms | Set(TLS_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_256_CBC_SHA) | SSL cipher suites |
connection.ssl.protocol | TLS | SSL protocol |
connection.ssl.trustStore.password | None | Trust store password |
connection.ssl.trustStore.path | None | Path for the trust store being used |
connection.ssl.trustStore.type | JKS | Trust store type |
Read Tuning Parameters
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
input.consistency.level | LOCAL_ONE | Consistency level to use when reading |
input.fetch.size_in_rows | 1000 | Number of CQL rows fetched per driver request |
input.metrics | true | Sets whether to record connector specific metrics on write |
input.split.size_in_mb | 64 | Approx amount of data to be fetched into a Spark partition |
Write Tuning Parameters
All parameters should be prefixed with spark.cassandra.
Property Name | Default | Description |
---|---|---|
output.batch.grouping.buffer.size | 1000 | How many batches per single Spark task can be stored inmemory before sending to Cassandra |
output.batch.grouping.key | Partition | Determines how insert statements are grouped into batches. Available values are
|
output.batch.size.bytes | 1024 | Maximum total size of the batch in bytes. Overridden byspark.cassandra.output.batch.size.rows |
output.batch.size.rows | None | Number of rows per single batch. The default is 'auto'which means the connector will adjust the numberof rows based on the amount of datain each row |
output.concurrent.writes | 5 | Maximum number of batches executed in parallel by a single Spark task |
output.consistency.level | LOCAL_ONE | Consistency level for writing |
output.metrics | true | Sets whether to record connector specific metrics on write |
output.throughput_mb_per_sec | 2.147483647E9 | *(Floating points allowed)* Maximum write throughput allowed per single core in MB/s. Limit this on long (+8 hour) runs to 70% of your max throughput as seen on a smaller job for stability |