记录学习遇到的各种问题

目录

1. 跑王喆的https://github.com/wzhe06/SparrowRecSys代码,自己配置的版本有冲突

2.maven搭建spark环境出错

3.编译spark出错

4.spark Standalone 模式提交出错

5.spark yarn 模式提交出错

6. nlp pytorch data Field


1. 跑王喆的https://github.com/wzhe06/SparrowRecSys代码,自己配置的版本有冲突

一直报这个错Exception in thread "main" java.lang.IllegalArgumentException: Unsupported class file major version 55,去网上搜了一下,是版本不一致的问题,后来却是发现IDEA设置时java版本的不一致,项目中用的是11(File -> Project Structure ->Project Settings -> project,   IntelliJ IDEA -> 用的是8),均改成1.8版本,完美解决

2.maven搭建spark环境出错

Error:(3, 12) object apache is not a member of package org
import org.apache.spark.SparkConf

3.编译spark出错

(1)Spark Project Parent POM ........................... FAILURE

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-clean-plugin:3.0.0:clean (default-clean) on project spark-parent_2.11: Execution default-clean of goal org.apache.maven.plugins:maven-clean-plugin:3.0.0:clean failed: Plugin org.apache.maven.plugins:maven-clean-plugin:3.0.0 or one of its dependencies could not be resolved: Could not transfer artifact org.codehaus.plexus:plexus-component-annotations:jar:1.5.5 from/to central (https://repo.maven.apache.org/maven2): transfer failed for https://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-component-annotations/1.5.5/plexus-component-annotations-1.5.5.jar: Operation timed out (Read failed) -> 

解决方法:在编译之前,命令行线运行:mvn clean -Dmaven.clean.failOnError=false

(2)[INFO] Spark Project Launcher ............................. FAILURE 

[ERROR] Failed to execute goal on project spark-launcher_2.11: Could not resolve dependencies for project org.apache.spark:spark-launcher_2.11:jar:2.3.0: Could not find artifact org.apache.hadoop:hadoop-client:jar:2.6.0-cdh5.7.0 in central (https://repo.maven.apache.org/maven2) -> [Help 1]

解决方法:pom.xml添加

<repository>
  <id>cloudera</id>
  <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>

4.spark Standalone 模式提交出错

Exception: Python in worker has different version 2.7 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

解决方法:这是由于python版本导致的,我们在conf目录下的spark-env.sh配置一下PYSPARK_PYTHON、PYSPARK_DRIVER_PYTHON,我自己电脑上是3.7版本的python,查看安装目录,可以使用which python3.7,将显示的路径贴过来就好,我的路径:/Users/hh/anaconda3/bin/python3.7,所以spark-env.sh添加如下语句

PYSPARK_PYTHON=/Users/hh/anaconda3/bin/python3.7

PYSPARK_DRIVER_PYTHON=/Users/hh/anaconda3/bin/python3.7

21/06/05 16:07:58 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

解决方法:查看说资源不够,看了一下Memory in use: 7.0 GB Total, 1024.0 MB Used,我这边同一台机子上还跑了pySpark,停掉pySpark就可以了。

5.spark yarn 模式提交出错

Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.

解决方法,官网上说Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster.,我们去conf目录下的spark-env.sh配置一下

HADOOP_CONF_DIR=/Users/hh/app/hadoop-2.6.0-cdh5.15.1/etc/hadoop

6. nlp pytorch data Field

torch版本:1.10.1、spacy版本:3.2.1、torchtext版本:0.11.1

import torch
from torchtext.legacy import data
from spacy.lang import en

TEXT = data.Field(tokenize='spacy', tokenizer_language='en')

报错:Can't find model 'en'. It looks like you're trying to load a model from a shortcut, which is obsolete as of spaCy v3.0. 

解决方法:pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值