We have a use case where we need to export data from HDFS to a RDBMS. I saw this example . Here they have store the username and password in the code. Is there any way to hide the password while export the data like we have the option of password-alias in Sqoop.
解决方案
Setting the password
At the command line as a plaintext spark config:
spark-submit --conf spark.jdbc.password=test_pass ...
Using environment variable:
export jdbc_password=test_pass_export
spark-submit --conf spark.jdbc.password=$jdbc_password ...
Using spark config properties file:
echo "spark.jdbc.b64password=test_pass_prop" > credentials.properties
spark-submit --properties-file credentials.properties
With base64 encoding to "obfuscate":
echo "spark.jdbc.b64password=$(echo -n test_pass_prop | base64)" > credentials_b64.properties
spark-submit --properties-file credentials_b64.properties
Using the password in code
import java.util.Base64 // for base64
import java.nio.charset.StandardCharsets // for base64
val properties = new java.util.Properties()
properties.put("driver", "com.mysql.jdbc.Driver")
properties.put("url", "jdbc:mysql://mysql-host:3306")
properties.put("user", "test_user")
val password = new String(Base64.getDecoder().decode(spark.conf.get("spark.jdbc.b64password")), StandardCharsets.UTF_8)
properties.put("password", password)
val models = spark.read.jdbc(properties.get("url").toString, "ml_models", properties)
Edit: spark command line interface help docs for --conf and --properties-file:
--conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf.
The properties-file name is arbitrary.