Spark
栗子ma
这个作者很懒,什么都没留下…
展开
-
【Spark】Create DataFrame from string content
以下文章转自:https://stackoverflow.com/questions/44028677/how-to-create-a-dataframe-from-a-string【如出现侵权问题,请第一时间联系我删除该文章】val s: String = """col1 col2 col3 col4 col5 col6 col7 col8 |val1 val...转载 2018-06-05 10:55:05 · 201 阅读 · 0 评论 -
【Spark MLlib】如何将海量字符串映射为数字——StringIndexer & IndexToString
【前言】在使用Spark MLlib协同过滤ALS API的时候发现Rating的三个参数:用户id,商品名称,商品打分,前两个都需要是Int值。那么问题来了,当你的用户id,商品名称是String类型的情况下,我们必须寻找一个方法可以将海量String映射为数字类型。好在Spark MLlib可以answer这一切。StringIndexer encodes astring column of ...翻译 2018-06-05 11:31:29 · 3984 阅读 · 0 评论 -
【Spark】用隐式偏好进行训练(推荐系统)
Training with Implicit Preference (Recommendation)用隐式偏好进行训练(推荐系统)There are two types of user preferences:explicit preference (also referred as "explicit feedback"), such as "rating" given to i...翻译 2018-06-05 15:00:51 · 2564 阅读 · 0 评论 -
【Spark】抽取,转换,特征选取——Spark机器学习
Extracting, transforming and selecting features - spark.ml此单元包含处理特征的算法,大致可以分为:抽取:从原数据抽取特征转换:Scaling,转化,修改特征选择:从大特征集选区子集This section covers algorithms for working with features, roughly divided into th...翻译 2018-06-06 01:01:31 · 1050 阅读 · 0 评论 -
【Spark】TF-IDF
TF-IDFTerm frequency-inverse document frequency (TF-IDF) is a feature vectorization method widely used in text mining to reflect the importance of a term to a document in the corpus. Denote a term b...翻译 2018-06-06 02:09:58 · 376 阅读 · 0 评论