python join_join与python实现列合并

最新推荐文章于 2024-06-23 15:22:06 发布

weixin_39759182

最新推荐文章于 2024-06-23 15:22:06 发布

阅读量368

点赞数

文章标签： python join

在linux下powerpath对盘与更改盘符名篇中提到了修改聚合后的多路径别名的问题，在数据库RAC增加存储盘的过程中，还会涉及一个常见的问题是多个RAC之间进行盘符名核对的问题。这里还是以三节点RAC 加 EMC存储盘为例，安装EMCpower path软件后，通过powermt查看时会有Logical device ID与聚合别名，其中Logical device ID与SCSI_ID对应，是唯一值。扫盘后，相同的Logical device ID在三台主机上对应的别名可能是不同的，想要修改一致，就需要核对三台主机同一ID之间的对就别名的区别。

想要实现上面的效果，通过shell join 命令可以实现，通过python脚本也可以实现。这里分别介绍下。

一、加盘前后对比

将加盘前后的结果处理后，通过diff命令对比后，将新增的内容获取出的结果如下：[root@irora13s ~]# powermt display dev=all|grep 'Pseudo\|Logical' |awk '{if(NR%2==0){printf $0 "\n"}else{printf "%s\t",$0}}' > /tmp/powermt_new

[root@irora13s ~]# cat diskinfo/powermt|grep 'Pseudo\|Logical' |awk '{if(NR%2==0){printf $0 "\n"}else{printf "%s\t",$0}}' > /tmp/powermt_old

[root@irora13s ~]# diff /tmp/powermt_new /tmp/powermt_old

27d26

< Pseudo name=emcpoweraz Logical device ID=0B56

29,37d27

< Pseudo name=emcpowerba Logical device ID=0B5A

< Pseudo name=emcpowerbb Logical device ID=0B5E

< Pseudo name=emcpowerbc Logical device ID=0B62

< Pseudo name=emcpowerbd Logical device ID=0B66

< Pseudo name=emcpowerbe Logical device ID=0B6A

< Pseudo name=emcpowerbf Logical device ID=0B6E

………………省略

由于这里只需要name和id对应值，这个再awk简单处理下，结果如下：

# diff /tmp/powermt_new /tmp/powermt_old |grep '>' |awk '{print $NF,$3}' > 1.txt

ID=0B56 name=emcpoweraz

ID=0B5A name=emcpowerba

…………省略

二、join 合并

按上面的操作，将host1、host2、host3分别处理后，就会发现：host1上0B56 对应的设备名为emcpoweraz ，host2上应的设备名可能为emcpowerax，host3上对应的可能为emcpoweran ，想要实现的效果如下：

#设备ID host1 host2 host3

ID=0B56 name=emcpoweraz name=emcpowerax name=emcpoweran

这样三台主机的那里有差距可以一目了然的看出。由于join 只支持两个文件的合并，而且合并前需要对相应的列进行排序，所以这里可以利用管道，对三个文件进行处理。命令如下：

[root@localhost disk]# join -a1 <(sort 3.txt) <(sort 2.txt) | join - <(sort 1.txt)

ID=0A72 name=emcpowerdq name=emcpowerbh name=emcpowerbh

ID=0A76 name=emcpowerdr name=emcpowerdq name=emcpowergb

ID=0A7A name=emcpowerds name=emcpowerdr name=emcpowergc

ID=0A7E name=emcpowerdt name=emcpowerds name=emcpowerds

ID=0A82 name=emcpowerdu name=emcpowerdt name=emcpowergh

…………省略

这里使用了a1参数的作用是，3.txt中如果有2.txt中不存在的内容也进行输出。由于三号点涉及ADG同步，所以这里对3.txt进行了特殊处理。

三、python实现

python的写法比较多，虽然不如join 简单，但更容易理解，这里总结了几种写法，如下：

脚本1：

f1 = open('1.txt', 'r').readlines()

f2 = open('2.txt', 'r').readlines()

#print [ "{0[0]} {0[1]} {1[1]}".format(l1.split(), l2.split()) for l1 in f1 for l2 in f2 if l1.split()[0] == l2.split()[0] ]

joinList = [ "{0[0]} {0[1]} {1[1]}".format(l1.split(), l2.split()) for l1 in f1 for l2 in f2 if l1.split()[0] == l2.split()[0] ]

for joinLine in joinList:

print joinLine

脚本2：

f1 = open('1.txt', 'r').readlines()

f2 = open('2.txt', 'r').readlines()

#print [ (l1.split()[0], l1.split()[1], l2.split()[1]) for l1 in f1 for l2 in f2 if l1.split()[0] == l2.split()[0] ]

joinList = [ (l1.split()[0], l1.split()[1], l2.split()[1]) for l1 in f1 for l2 in f2 if l1.split()[0] == l2.split()[0] ]

for joinLine in joinList:

print ' '.join(joinLine)

脚本3：

#!/usr/bin/env python

# -*- coding: utf-8 -*-

__author__ = '361way.com'

__author_site__ = 'www.361way.com'

import sys

import getopt

input_file1 = ""

input_file2 = ""

try:

opts, args = getopt.getopt(sys.argv[1:], "h", ["input1=", "input2="])

except getopt.GetoptError as err:

print(str(err))

for op, value in opts:

if op == "--input1":

input_file1 = value

elif op == "--input2":

input_file2 = value

elif op == "-h":

print("python get_value_according_first_column.py --input1 dat1 --input2 dat2 > out.txt")

sys.exit()

# 以上可忽略,定义shell中接受的参数及数据

f1 = open(input_file1, 'r')

f2 = open(input_file2, 'r')

lines1 = f1.readlines() # 将整个文件读作一个列表,可以添加 print lines1 查看,这里一行表示里边的一个元素(字符串),如lines1[0],则表示第一行

lines2 = f2.readlines() # 将整个文件读作一个列表,可以添加 print lines2 查看,第一行第一列,lines2[0][0]

for line1 in lines1: # 遍历列表lines1中的每个元素,及遍历读取文件1的每一行

line1 = line1.strip().split() # 这里的一行就是一个字符串,使用字符串的strip方法,去掉行尾换行符,使用split分割字符串成列表

for line2 in lines2:

line2 = line2.strip().split() # 同样遍历文件2中每一行

if line1[0] in line2: # line1[0] (注意是line 不是lines) 表示某一行的第一列,即查询某行第一列是否在文件2中,如果在

line1.extend(line2[1:]) # 在的话,则将文件2中的第二列以后的部分添加到第一行的后边

print ' '.join(line1) # 将列表 line1 转换成字符串打印

f1.close() # 关闭文件

f2.close() # 关闭文件

用法如下：

python test.py --input1 dat1.txt --input2 dat2.txt > 2.out.txt

脚本4：

from os import linesep

f1 = open('/tmp/disk/1.txt')

f2 = open('/tmp/disk/2.txt')

f3 = open('/tmp/disk/3.txt', 'w')

lines1 = f1.readlines()

lines2 = f2.readlines()

for line1 in lines1:

k = line1.split()[0]

for line2 in lines2:

if line2.split()[0] == k:

line3 = line1.replace(linesep, '') + " " + str(line2.split()[1]) + linesep

f3.write(line3)

f1.close()

f2.close()

f3.close()

这里作用是将1.txt与2.txt的比对结果输出到3.txt 。

weixin_39759182

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python join_join与python实现列合并

在linux下powerpath对盘与更改盘符名篇中提到了修改聚合后的多路径别名的问题，在数据库RAC增加存储盘的过程中，还会涉及一个常见的问题是多个RAC之间进行盘符名核对的问题。这里还是以三节点RAC 加 EMC存储盘为例，安装EMCpower path软件后，通过powermt查看时会有Logical device ID与聚合别名，其中Logical device ID与SCSI_ID对...
复制链接

扫一扫