spark
wshzd
机器学习,深度学习,NLP,强化学习
展开
-
spark python 机器学习
http://blog.csdn.net/u013719780?viewmode=contents转载 2017-06-29 10:34:33 · 275 阅读 · 0 评论 -
IPython/Jupyter SQL Magic Functions for PySpark
话题:本文主要讨论使用PySpark 在Jupyter notebooks上使用IPython custom magic functions for running SQLIf you are already famialiar with Apache Spark and Jupyter notebooks may want to go directly to the link翻译 2017-07-07 13:45:32 · 858 阅读 · 0 评论 -
python 扩大spark.driver.maxResultSize参数
spark默认的spark.driver.maxResultSize为1g,所以在运行spark程序的时候有时候会报错:ERROR TaskSetManager: Total size of serialized results of 8113 tasks (1131.0 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)原创 2017-07-28 12:51:28 · 3219 阅读 · 0 评论 -
spark在创建临时表时的异常
通过spark.dataframe(数据框的名字sparkdataframe)创建成临时表(createGlobalTempView(spark_view)),然后通过%%sql -o -q spark_sql select * from spark_view 最后检查发现sparkdataframe和spark_sql的数据量是不一样的,有人遇到这种情况了吗?原创 2017-07-28 18:43:56 · 2282 阅读 · 0 评论 -
spark实现tfidf
package xxximport org.apache.log4j.Loggerimport org.apache.log4j.Levelimport org.apache.spark.sql.SparkSessionimport org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}object TopicExtracti...转载 2019-07-11 14:22:16 · 1003 阅读 · 0 评论 -
spark获取dataframe中列的最大值索引
package com.xxximport org.apache.log4j.{Level, Logger}import org.apache.spark.sql.SparkSessionimport org.apache.spark.sql.SQLContextobject spark_vector_argmax{ def main(arg: Array[String]): ...原创 2019-07-18 13:08:33 · 4533 阅读 · 0 评论