Table 10.1. Tez Configuration Parameters
Configuration Parameter | Description | Default Value |
---|---|---|
tez.lib.uris | Location of the Tez jars and their dependencies. Tez applications download required jar files from this location, so it should be public accessible. | N/A |
tez.am.log.level | Root logging level passed to the Tez Application Master. | INFO |
tez.staging-dir | The staging directory used by Tez when application developers submit DAGs, or Dynamic Acyclic Graphs. Tez creates all temporary files for the DAG job in this directory. | /tmp/${user.name}/staging |
tez.am.resource.memory.mb | The amount of memory in MB that YARN will allocate to the Tez Application Master. The size increases with the size of the DAG. | 1536 |
tez.am.java.opts | Java options for the Tez Application Master process. The value specified for -Xmx value should be less than specified in tez.am.resource.memory.mb , typically 512 MB less to account for non-JVM memory in the process. | -server -Xmx1024m -Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC |
tez.am.shuffle-vertex-manager.min-src-fraction | In case of a Shuffle operation over a Scatter-Gather edge connection, Tez may start data consumer tasks before all the data producer tasks complete in order to overlap the shuffle IO. This parameter specifies the fraction of producer tasks which should complete before the consumer tasks are scheduled. The percentage is expressed as a decimal, so the default value of 0.2 represents 20%. | 0.2 |
tez.am.shuffle-vertex-manager.max-src-fraction | In case of a Shuffle operation over a Scatter-Gather edge connection, Tez may start data consumer tasks before all the data producer tasks complete in order to overlap the shuffle IO. This parameter specifies the fraction of producer tasks which should complete before all consumer tasks are scheduled. The number of consumer tasks ready for scheduling scales linearly between min-fraction and max-fraction. The percentage is expressed as a decimal, so the default value of 0.4 represents 40%. | 0.4 |
tez.am.am-rm.heartbeat.interval-ms.max | This parameter determines how frequently the Tez Application Master asks the YARN Resource Manager for resources in milliseconds. A low value can overload the Resource Manager. | 250 |
tez.am.grouping.split-waves | Specifies the number of waves, or the percentage of queue container capacity, used to process a data set where a value of1 represents 100% of container capacity. The Tez Application Master considers this parameter value, the available cluster resources, and the resources required by the application to calculate parallelism, or the number of tasks to run. Processing queries with additional containers leads to lower latency. However, resource contention may occur if multiple users run large queries simultaneously. | Tez Default:1.4 ; Hive Default: 1.7 |
tez.am.grouping.min-size | Specifies the lower bound of the size of the primary input to each task when The Tez Application Master determines the parallelism of primary input reading tasks. This configuration property prevents input tasks from being too small, which prevents the parallelism for the tasks being too large. | 16777216 |
tez.am.grouping.max-size | Specifies the upper bound of the size of the primary input to each task when the Tez Application Master determines the parallelism of primary input reading tasks. This configuration property prevents input tasks from being too large, which prevents their parallelism from being too small. | 1073741824 |
tez.am.container.reuse.enabled | A container is the unit of resource allocation in YARN. This configuration parameter determines whether Tez will reuse the same container to run multiple tasks. Enabling this parameter improves performance by avoiding the memory overhead of reallocating container resources for every task. However, disable this parameter if the tasks contain memory leaks or use static variables. | true |
tez.am.container.reuse.rack-fallback.enabled | Specifies whether to reuse containers for rack-local tasks. This configuration parameter is ignored unless tez.am.container.reuse.enabled is enabled. | true |
tez.am.container.reuse.non-local-fallback.enabled | Specifies whether to reuse containers for non-local tasks. This configuration parameter is ignored unless tez.am.container.reuse.enabled is enabled. | true |
tez.am.container.session.delay-allocation-millis | Determines when a Tez session releases its containers while not actively servicing a query. Specify a value of -1 to never release an idle container in a session. However, containers may still be released if they do not meet resource or locality needs. This configuration parameter is ignored unless tez.am.container.reuse.enabled is enabled. | 10000 (10 seconds) |
tez.am.container.reuse.locality.delay-allocation-millis | The amount of time to wait in milliseconds before assigning a container to the next level of locality. The three levels of locality in ascending order are NODE, RACK, and NON_LOCAL. | 250 |
tez.task.get-task.sleep.interval-ms.max | Determines the maximum amount of time in milliseconds a container agent waits before asking The Tez Application Master for another task. Tez runs an agent on a container in order to remotely launch tasks. A low value may overload the Application Master. | 200 |
tez.session.client.timeout.secs | Specifies the amount of time in seconds to wait for the Application Master to start when trying to submit a DAG from the client in session mode. | 180 |
tez.session.am.dag.submit.timeout.secs | Specifies the amount of time in seconds that the Tez Application Master waits for a DAG to be submitted before shutting down. The value of this property is used when the Tez Application Manager is running in Session mode, which allows multiple DAGs to be submitted for execution. The idle time between DAG submissions should not exceed this time. | 300 |
tez.runtime.intermediate-output.should-compress | Specifies whether Tez should compress intermediate output. | false |
tez.runtime.intermediate-output.compress.codec | Specifies the codec to used when compressing intermediate output. This configuration is ignored unless tez.runtime.intermediate-output.should-compress is enabled. | org.apache.hadoop.io.compress.SnappyCodec |
tez.runtime.intermediate-input.is-compressed | Specifies whether intermediate output is compressed. | false |
tez.runtime.intermediate-input.compress.codec | Specifies the codec to use when reading intermediate compressed input. This configuration property is ignored unless tez.runtime.intermediate-input.is-compressed is enabled. | org.apache.hadoop.io.compress.SnappyCodec |
tez.yarn.ats.enabled | Specifies that Tez should start the TimeClient for sending information to the Timeline Server. | false |