python怎么获取数据findall,python-如何findall()对熊猫数据框的正则表达式序列?

我正在使用pandas findall函数提取一些模式.但是,我有几个正则表达式.这样,我怎么能找到所有带有N个大熊猫的正则表达式?

例如,假设我要提取特定列中的所有数字和所有日期:

在:

dfs = pd.DataFrame(data={'c1': ['This dataset 11/12/98 contains 5,000 rows, which were sampled from a 500,000 11/12/12 row dataset spanning the same time period. Throughout these analyses',

'the number of events you count will be about 100 times smaller than they 11/12/78 actually were, but the 01/12/11 proportions of events will still generally be reflective that larger dataset. In this case, a sample is fine because our purpose is to learn methods of data analysis with Python, not to create 100% accurate recommendations to Watsi.']})

dfs

出:

c1

0 This dataset 11/12/98 contains 5,000 rows, whi...

1 the number of events you count will be about 1...

我试图这样做,但是出现以下错误:

在:

dfs['patterns'] = dfs['c1'].str.findall([r'\d+',r'(\d+/\d+/\d+)']).apply(', '.join)

dfs

出:

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

in ()

----> 1 dfs['patterns'] = dfs['c1'].str.findall([r'\d+',r'(\d+/\d+/\d+)']).apply(', '.join)

2 dfs

/usr/local/lib/python3.5/site-packages/pandas/core/strings.py in wrapper2(self, pat, flags, **kwargs)

1268

1269 def wrapper2(self, pat, flags=0, **kwargs):

-> 1270 result = f(self._data, pat, flags=flags, **kwargs)

1271 return self._wrap_result(result)

1272

/usr/local/lib/python3.5/site-packages/pandas/core/strings.py in str_findall(arr, pat, flags)

827 extractall : returns DataFrame with one column per capture group

828 """

--> 829 regex = re.compile(pat, flags=flags)

830 return _na_map(regex.findall, arr)

831

/usr/local/Cellar/python3/3.5.2_2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/re.py in compile(pattern, flags)

222 def compile(pattern, flags=0):

223 "Compile a regular expression pattern, returning a pattern object."

--> 224 return _compile(pattern, flags)

225

226 def purge():

/usr/local/Cellar/python3/3.5.2_2/Frameworks/Python.framework/Versions/3.5/lib/python3.5/re.py in _compile(pattern, flags)

279 # internal: compile pattern

280 try:

--> 281 p, loc = _cache[type(pattern), pattern, flags]

282 if loc is None or loc == _locale.setlocale(_locale.LC_CTYPE):

283 return p

TypeError: unhashable type: 'list'

因此,如何使用findall函数“堆叠”,“嵌套”或应用多个正则表达式?我期望作为输出的是在单列中由分隔的每个正则表达式的分辨率:

col

0 '11/12/98', '5', '000', '500', '000', '11/12/12'

1 '100', '11/12/78', '01/12/11', '100'

更新

我试过了:

dfs['patterns'] = dfs['c1'].str.map(findall(),[r'\d+',r'(\d+/\d+/\d+)']).apply(', '.join)

dfs

解决方法:

仍无法清除所需的输出.

但是请检查以下代码.

dfs['patterns'] = dfs['c1'].str.findall(r'\d+\/\d+\/\d+|\d+')

print dfs['patterns'].sum()

['11/12/98', '5', '000', '500', '000', '11/12/12', '100', '11/12/78', '01/12/11', '100']

标签:pandas,python-3-x,python

来源: https://codeday.me/bug/20191111/2021739.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值