一直以为 spark 与 hadoop 关系密切,最近读了文档才发现 spark 应该看着一个独立的分布式计算框架,于是开始独立尝试(没有 hadoop 环境)。
1. Windows 环境 spark 安装问题
从官网(http://spark.apache.org/downloads.html)下载 pre-built for hadoop 2.7 版本 spark。解压后运行 bin\pyspark2.cmd,报错:
2018-04-30 23:04:45 ERROR Shell:397 - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformatio