Python使用过程的 Bug 集

  1. Pandas ValueError: setting an array element with a sequence
    原先是想通过np.vectorize() 逐行处理DataFrame, 并返回几个新的字段,出现错误ValueError: setting an array element with a sequence.
def test():
    arr = np.random.randn(4,4)
    cols = ['a', 'b', 'c']
    df = pd.DataFrame(data=arr,columns=['e','f','g','h'])
    def func(a,b,c):
        output1 = a+1
        output2 = b*2
        output3 = c-4
        return pd.Series([output1,output2,output3])
    vfunc = np.vectorize(func)
    df[cols] = vfunc(df['e'],df['f'],df['g'])
    print(df)
test()

报错原因是赋值的df[cols],与vfunc返回的维度不一致所致,返回的数据帧和结果之间的形状不匹配, 使用apply解决,参数result_type=“expand”,表示结果将转换为列, 返回的每个值都会作为结果DataFrame一列中的值, apply(func)中,func返回的结果数量要与df[col] 中的col列数相同

def test():
    arr = np.random.randn(4,4)
    cols = ['a', 'b', 'c']
    df = pd.DataFrame(data=arr,columns=['e','f','g','h'])
    def func(row):
        a,b,c = row['e'],row['f'],row['g']
        output1 = a+1
        output2 = b*2
        output3 = c-4
        return output1,output2,output3
    df[cols] = df.apply(func,axis=1, result_type="expand")
    print(df)
test()

输出

          e         f         g         h         a         b         c
0  0.493280 -0.092513 -3.014135 -0.361842  1.493280 -0.185027 -7.014135
1  0.300695 -0.745392  0.591653 -1.752471  1.300695 -1.490785 -3.408347
2 -0.033944 -1.556307 -0.359979  1.808213  0.966056 -3.112615 -4.359979
3  0.701741 -0.272337  0.041114  0.150049  1.701741 -0.544674 -3.958886

对于单个列来说

df['id'] 

ID = ['id']
df[ID]

得到的结果是不一样的,前者是[1,2,3,4], 后者[[1],[2],[3],[4]]

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

  1. Spark 报错
    配置jupyter + spark 时遇到的坑。
    Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

**这是因为本地的python 环境与spark 集群使用的python 版本不一致所致,需要将其改成相同的python版本;

解决方案 ,
(1) 检查本地jupyter kernel版本
jupyter kernelspec list

检查jupyter 界面python 版本, 注意,界面显示的kernel 如python 2 实际可能是python 3(真坑)
import sys
print(sys.version)

(2) 配置环境变量

#配置pysark 使用的python环境变量
export PYSPARK_PYTHON=/usr/bin/python
export PYSPARK_DRIVER_PYTHON=“jupyter”

export PYSPARK_DRIVER_PYTHON_OPTS=“notebook --no-browser --ip 1.1.1.1 --port 8088 —log-level 10 py_spark”

(3) 重新启动jupyter**

20220525更新

1.linux下pyspark安装
java8+python3.6+spark-3.2.0-bin-hadoop3.2

参考链接:
https://blog.csdn.net/js010111/article/details/122755433
https://blog.csdn.net/qq_42363032/article/details/115098416

配置环境变量 vi ~/.bashrc source ~/.bashrc
export JAVA_HOME=‘/root/tools/jdk1.8.0_321’
export PATH= P A T H : PATH: PATH:JAVA_HOME/bin
export CLASSPATH= : C L A S S P A T H : :CLASSPATH: :CLASSPATH:JAVA_HOME/lib/
export SPARK_PYTHON=/usr/bin/python3
export SPARK_HOME=/root/tools/spark-3.2.0-bin-hadoop3.2

export PYTHON_HOME=/root/xx/Python-3.6.8
export PATH= P Y T H O N H O M E / b i n : PYTHON_HOME/bin: PYTHONHOME/bin:PATH
export SPARK_PYTHON=$PYTHON3_PATH

jupyter配置:
c.NotebookApp.allow_remote_access = True
c.NotebookApp.ip = “*”
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8888
c.NotebookApp.notebook_dir = “/root/workspace”
c.NotebookApp.allow_root =True
c.NotebookApp.token = ‘DEEPlearning+688’

2.The SPARK_HOME env variable is set but Jupyter Notebook doesn’t see it.
export PYSPARK_SUBMIT_ARGS=“–master local[2] pyspark-shell”

import findspark
findspark.init(‘C:/spark’)
py4j-0.10.9.2-src.zip

https://stackoverflow.com/questions/31841509/pyspark-exception-java-gateway-process-exited-before-sending-the-driver-its-po

3.Java gateway process exited before sending its port number
import os
os.environ[‘JAVA_HOME’] = ‘//root/tools/jdk1.8.0_321’
问题解决:https://blog.csdn.net/hejp_123/article/details/106784906

jupyter测试
import os
os.environ[‘JAVA_HOME’] = ‘//root/tools/jdk1.8.0_321’

from pyspark.sql import SparkSession
spark = SparkSession
.builder
.appName(“my_app_spark”)
.getOrCreate()

spark.sql(“select 1”).show()

5.安装mysql

卸载:
https://blog.csdn.net/weixin_45525272/article/details/107774348
删除mysql的数据文件
sudo rm /var/lib/mysql/ -r

删除mysql的配置文件
sudo rm /etc/mysql/ -r

https://blog.csdn.net/leacock1991/article/details/110406708
https://www.yisu.com/ask/4053.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值