python随机选取列表中的一个字符串、将其倒序打印出来_如何从python中的列表中提取字符串的单词组合...-CSDN博客

本文链接：https://blog.csdn.net/weixin_29690065/article/details/113520172

我有一个像这样的字符串：

my_string = "Hello, I need to find php, software-engineering, html, security and safety things or even Oracle in your dataset. #C should be another opetion, databases and queries"

和这样的列表：

my_list = ['C#', 'Django' 'Software-Engineering', 'C', 'PHP', 'Oracle Cload', 'React', 'Flask', 'IT-Security market', 'Databases and Queries']

我想从my_list中提取每个可能的my_string词。

这是我期望的：

['PHP', 'Software-Engineering', 'C', 'Oracle Cload', 'IT-Security market', 'Databases and Queries']

这是我尝试的：

import re

try:

user_inps = re.findall(r'\w+', my_string)

extracted_inputs = set()

for user_inp in user_inps:

if user_inp.lower() in set(map(lambda x: x.lower(), my_list)):

extracted_inputs.add(user_inp)

except Exception:

extracted_inputs = set()

但是我得到这个：

['php', 'C']

效率也是我关注的问题。任何帮助将不胜感激。

解决方案

如果要避免使用，您可以使用纯Python来完成大部分工作re。这将是大量的十万字的顺序列出快。

基本计划：清理标点符号，将所有内容标记化，使用集合进行匹配。对于小型应用程序，您可以修改关键字中的标记以省略诸如查找“ and”之类的内容。

my_string = "Hello, I need to find php, software-engineering, html, security and safety things or even Oracle in your dataset. #C should be another opetion, databases and queries"

my_list = ['C#', 'Django', 'Software-Engineering', 'C', 'PHP', 'Oracle Cload', 'React', 'Flask', 'IT-Security market', 'Databases and Queries']

# make table of tokens : phrases

keywords = {}

for word in my_list:

# split each word into tokens

tokens = {w.lower() for w in word.replace('-',' ').split()}

for t in tokens:

keywords[t] = word

# tokenize the string my_string

# note: this is specifically tailored to your input with commas and hyphens, you may need to

# make this more universal

my_string_tokens = {t.lower() for t in my_string.replace(',','').replace('-',' ').split()}