DCT:离散余弦变换
class pyspark.ml.feature.DCT(inverse=False, inputCol=None, outputCol=None)
一个特征变换器,它对一个实向量进行一维离散余弦变换。 不对输入向量执行零填充。 它返回一个表示 DCT 的相同长度的实向量。 返回向量被缩放,使得变换矩阵是单一的(又名缩放的 DCT-II)
inverse = Param(parent=‘undefined’, name=‘inverse’, doc=‘Set transformer to perform inverse DCT, default False.’)
inverse = Param(parent=‘undefined’, name=‘inverse’, doc=‘设置transformer执行逆DCT,默认为False。’)
DCT计算过程:https://blog.csdn.net/li_wen01/article/details/72864485
01.初始化:
from pyspark.sql import SparkSession
spark = SparkSession.builder.config("spark.Driver.host","192.168.1.3")\
.config("spark.ui.showConsoleProgress","false")\
.appName("DCT").master("local[*]").getOrCreate()
02.创建数据并查看
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import DCT
df1 = spark.createDataFrame([(Vectors.dense([5.0, 8.0, 6.0]),)], ["vec"])
df1.show()
输出结果:
+-------------+
| vec|
+-------------+
|[5.0,8.0,6.0]|
+-------------+
03.进行向量的缩放
dct = DCT(inverse=False, inputCol="vec", outputCol="resultVec")
df2 = dct.transform(df1)
df2.show()
df2.head(1)
输出结果:
+-------------+--------------------+
| vec| resultVec|
+-------------+--------------------+
|[5.0,8.0,6.0]|[10.9696551146028...|
+-------------+--------------------+
[Row(vec=DenseVector([5.0, 8.0, 6.0]), resultVec=DenseVector([10.9697, -0.7071, -2.0412]))]
04.在通过缩放,转换回去
df3 = DCT(inverse=True, inputCol="resultVec", outputCol="origVec").transform(df2)
df3.show()
df3.head(1)
输出结果:
+-------------+--------------------+-------------+
| vec| resultVec| origVec|
+-------------+--------------------+-------------+
|[5.0,8.0,6.0]|[10.9696551146028...|[5.0,8.0,6.0]|
+-------------+--------------------+-------------+
[Row(vec=DenseVector([5.0, 8.0, 6.0]), resultVec=DenseVector([10.9697, -0.7071, -2.0412]), origVec=DenseVector([5.0, 8.0, 6.0]))]