删除array指定字符python_Python pandas dataframe：在数组列中，如果第一项包含特定字符串，则从数组中删除该项(Python pandas dataframe : In ...

最新推荐文章于 2022-09-20 09:29:48 发布

weixin_39640849

最新推荐文章于 2022-09-20 09:29:48 发布

阅读量348

点赞数

文章标签：删除array指定字符python

Python pandas dataframe：在数组列中，如果第一项包含特定字符串，则从数组中删除该项(Python pandas dataframe : In an array column, if first item contains specific string then remove that item from array)

我有一个数据框，有一些像下面的列，其中包含不同大小的数组：

column

["a_id","b","c","d"]

["d_ID","e","f"]

["h","i","j","k","l"]

["id_m","n","o","p"]

["ID_q","r","s"]

如果第一项包含“ID”或“id”，我想从每行的数组中删除第一项。因此，预期输出将如下所示：

column

["b","c","d"]

["e","f"]

["h","i","j","k","l"]

["n","o","p"]

["r","s"]

我们如何在包含数据框中的数组元素的列中检查这个？

I have a dataframe which has some column like below which contains arrays of different sizes:

column

["a_id","b","c","d"]

["d_ID","e","f"]

["h","i","j","k","l"]

["id_m","n","o","p"]

["ID_q","r","s"]

I want to remove first item from the array of every row if the first item contains "ID" or "id". So, expected output will look like:

column

["b","c","d"]

["e","f"]

["h","i","j","k","l"]

["n","o","p"]

["r","s"]

How do we check for this in the column containing array elements in the dataframe?

原文：https://stackoverflow.com/questions/47173057

更新时间：2019-09-28 12:55

最满意答案

使用str[0]选择列表中的第一个值，然后按contains ID检查ID ：

m = df['column'].str[0].str.contains('ID', case=False)

print (m)

0 True

1 True

2 False

3 True

4 True

Name: column, dtype: bool

然后使用str[1:]通过mask删除它：

df['column'] = df['column'].mask(m, df['column'].str[1:])

print (df)

column

0 [b, c, d]

1 [e, f]

2 [h, i, j, k, l]

3 [n, o, p]

4 [r, s]

Use str[0] for select first values in list and then check ID by contains:

m = df['column'].str[0].str.contains('ID', case=False)

print (m)

0 True

1 True

2 False

3 True

4 True

Name: column, dtype: bool

And then remove it by mask with str[1:]:

df['column'] = df['column'].mask(m, df['column'].str[1:])

print (df)

column

0 [b, c, d]

1 [e, f]

2 [h, i, j, k, l]

3 [n, o, p]

4 [r, s]

2017-11-08

相关问答

使用str[0]选择列表中的第一个值，然后按contains ID检查ID ： m = df['column'].str[0].str.contains('ID', case=False)

print (m)

0 True

1 True

2 False

3 True

4 True

Name: column, dtype: bool

然后使用str[1:]通过mask删除它： df['column'] = df['column'].mask(m, df['co

...

使用 In [677]: df[~(df == "9999-Don't Know").any(axis=1)]

Out[677]:

RespondentId Satisfaction - Timing Response Speed - Time

3 5nnvkkt 8-Very Good 9-Excellent

4 634deds 1-Very Unsatisfied 9-Excellent

要么 In [

...

你可以在你的例子中看到boxcox的结果是一个元组。这与文档一致，它表明boxcox返回已转换数据的元组和lambda值。请注意它在该页面上的示例： xt, _ = stats.boxcox(x)

。。。再次显示boxcox返回一个2元组。你应该做df['variable'] = stats.boxcox(df.variable)[0] 。 You can see in your example that the result of boxcox is a tuple. This i

...

我不确定尝试在这样的pandas中存储数组是否可取，您是否考虑过尝试序列化数组内容然后存储？如果存储一个数组就是你想要的，那么你可以尝试set_value()方法，就像这样(确保你处理列nct_id )： In [35]: df = pd.DataFrame(data=np.random.rand(5,5), columns=list('ABCDE'))

In [36]: df

Out[36]:

A B C D

...

col

0 [a, b, c, d]

1 [d, e, f]

2 [h, i, j, k, l]

3 [m, n, o, p]

4 [q, r, s]

使用str访问器方法： df.col.str[1:-1].str.join(sep='/')

0 b/c

1 e

2 i/j/k

3 n/o

4 r

Name: col, dtype: object

...

你的一个yyy缺少额外的y ;-) 使用df.columns.str.contains并使用loc过滤。 df.loc[:, df.columns.str.contains('|'.join(mylist))]

yyyy nnn mmm

0 10 5 5

1 9 3 4

2 8 7 0

这应该是您正在寻找的结果。对结果进行的后续to_csv调用将生成CSV文件。如果你的yyy元素是一个拼写错误，你实际上意味着yyyy ，那

...

此CSV文件包含BOM(字节顺序标记)签名，因此请尝试这种方式： df = pd.read_csv(path, encoding='utf-8-sig')

如何轻松识别这个问题(感谢@ jezrael的提示 )： In [11]: print(df.columns.tolist())

['\ufeffDate', 'Open', 'High', 'Low', 'Close', 'Volume']

并注意第一栏注意：正如@ayhan注意到的那样，从版本0.19.0开始， Pandas会自动处

...

这是一种方式。 s = df[df['p_and_s'] > 2].groupby('Event_Number')['Well'].apply(list)

df['well_array'] = df['Event_Number'].map(s)

说明在p_and_s上应用过滤器后，创建一个将Event_Number映射到Well的系列。通过pd.Series.map映射到原始数据帧。对于性能，应尽可能避免使用lambda函数，因为它们代表昂贵的隐式循环。结果 Event_Numb

...

您可以将numpy数组直接传递给DataFrame构造函数： In [11]: a = np.random.rand(3, 5)

In [12]: a

Out[12]:

array([[ 0.46154984, 0.08813473, 0.57746049, 0.42924157, 0.34689139],

[ 0.29731858, 0.83300176, 0.15884604, 0.44753895, 0.56840054],

[ 0.024796

...

而不是长度len ，我想你想要考虑每个组中Name的唯一值的数量。使用nunique() ，并查看这个整齐的配方过滤组。 df[df.groupby('ID').Name.transform(lambda x: x.nunique() == 1).astype('bool')]

如果升级到pandas 0.12，则可以对组使用新的filter方法，这使得它更加简洁明了。 df.groupby('ID').filter(lambda x: x.Name.nunique() == 1)

一般说来

...

weixin_39640849

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
删除array指定字符python_Python pandas dataframe：在数组列中，如果第一项包含特定字符串，则从数组中删除该项(Python pandas dataframe : In ...

Python pandas dataframe：在数组列中，如果第一项包含特定字符串，则从数组中删除该项(Python pandas dataframe : In an array column, if first item contains specific string then remove that item from array)我有一个数据框，有一些像下面的列，其中包含不同大小的数组：...
复制链接

扫一扫