python随机选取列表中的一个字符串、将其倒序打印出来_如何从python中的列表中提取字符串的单词组合...

我有一个像这样的字符串:

my_string = "Hello, I need to find php, software-engineering, html, security and safety things or even Oracle in your dataset. #C should be another opetion, databases and queries"

和这样的列表:

my_list = ['C#', 'Django' 'Software-Engineering', 'C', 'PHP', 'Oracle Cload', 'React', 'Flask', 'IT-Security market', 'Databases and Queries']

我想从my_list中提取每个可能的my_string词。

这是我期望的:

['PHP', 'Software-Engineering', 'C', 'Oracle Cload', 'IT-Security market', 'Databases and Queries']

这是我尝试的:

import re

try:

user_inps = re.findall(r'\w+', my_string)

extracted_inputs = set()

for user_inp in user_inps:

if user_inp.lower() in set(map(lambda x: x.lower(), my_list)):

extracted_inputs.add(user_inp)

except Exception:

extracted_inputs = set()

但是我得到这个:

['php', 'C']

效率也是我关注的问题。任何帮助将不胜感激。

解决方案

如果要避免使用,您可以使用纯Python来完成大部分工作re。这将是大量的十万字的顺序列出快。

基本计划:清理标点符号,将所有内容标记化,使用集合进行匹配。对于小型应用程序,您可以修改关键字中的标记以省略诸如查找“ and”之类的内容。

my_string = "Hello, I need to find php, software-engineering, html, security and safety things or even Oracle in your dataset. #C should be another opetion, databases and queries"

my_list = ['C#', 'Django', 'Software-Engineering', 'C', 'PHP', 'Oracle Cload', 'React', 'Flask', 'IT-Security market', 'Databases and Queries']

# make table of tokens : phrases

keywords = {}

for word in my_list:

# split each word into tokens

tokens = {w.lower() for w in word.replace('-',' ').split()}

for t in tokens:

keywords[t] = word

# tokenize the string my_string

# note: this is specifically tailored to your input with commas and hyphens, you may need to

# make this more universal

my_string_tokens = {t.lower() for t in my_string.replace(',','').replace('-',' ').split()}

# now you can just intersect the sets, which is much more efficient than nested looping

matches = my_string_tokens & set(keywords.keys())

for match in matches: # do what you want here...

print(f'token: {match:20s}-> {keywords[match]}')

产生:

token: queries -> Databases and Queries

token: php -> PHP

token: oracle -> Oracle Cload

token: engineering -> Software-Engineering

token: databases -> Databases and Queries

token: software -> Software-Engineering

token: and -> Databases and Queries

token: security -> IT-Security market

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值