ElementwiseProduct
class pyspark.ml.feature.ElementwiseProduct(scalingVec=None, inputCol=None, outputCol=None)
使用提供的“权重”向量输出每个输入向量的 Hadamard 乘积(即元素乘积)。换句话说,它通过标量乘数缩放数据集的每一列
01.初始化
from pyspark.sql import SparkSession
spark = SparkSession.builder.config("spark.Driver.host","192.168.1.3")\
.config("spark.ui.showConsoleProgress","false")\
.appName("ElementwiseProduct").master("local[*]").getOrCreate()
02.生成数据
from pyspark.ml.linalg import Vectors
df = spark.createDataFrame([(Vectors.dense([2.0, 1.0, 3.0]),)], ["values"])
df.show()
输出结果:
+-------------+
| values|
+-------------+
|[2.0,1.0,3.0]|
+-------------+
03.按照向量进行缩放
from pyspark.ml.feature import ElementwiseProduct
elementwiseProduct = ElementwiseProduct(inputCol="values",scalingVec=Vectors.dense([9.0,8.0,7.0]),outputCol="res")
elementwiseProduct.transform(df).show()
输出结果:
+-------------+---------------+
| values| res|
+-------------+---------------+
|[2.0,1.0,3.0]|[18.0,8.0,21.0]|
+-------------+---------------+
04.按照另一个向量进行缩放
elementwiseProduct2 = ElementwiseProduct(inputCol="values",scalingVec=Vectors.dense([9.0,2.0,3.0]),outputCol="res")
elementwiseProduct2.transform(df).show()
输出结果:
+-------------+--------------+
| values| res|
+-------------+--------------+
|[2.0,1.0,3.0]|[18.0,2.0,9.0]|
+-------------+--------------+
05.重新设置参数,并进行缩放:
elementwiseProduct2.setParams(scalingVec=Vectors.dense([1.0,2.0,3.0]))
elementwiseProduct2.transform(df).show()
输出结果:
+-------------+-------------+
| values| res|
+-------------+-------------+
|[2.0,1.0,3.0]|[2.0,2.0,9.0]|
+-------------+-------------+