transformation算子
filter算子,根据函数中的判断条件对rdd中的数据进行清洗,返回的是为真的值
from pyspark import SparkContext
sc=SparkContext()
rdd1=sc.parallelize([1,2,3,4,5,6])
rdd2=sc.parallelize(['a','b','c','a','c','d','f'])
#里面有函数
rdd_filter=rdd1.filter(lambda x:x % 2==0)
print(rdd_filter.collect())
distinct算子可以实现去重操作
```python
from pyspark import SparkContext
sc=SparkContext()
rdd1=sc.parallelize([1,2,3,4,5,6])
rdd2=sc.parallelize(['a','b','c','a','c','d','f'])
rdd_filter1=rdd2.distinct()
print(rdd_filter1.collect())