2046幻想_寻找一个幻想足球的名字

2046幻想

Looking for a new fantasy football team name? I extracted over 500,000 fantasy football team names to help find your next team name! (Shameless plug: FantasyNameSearch.com)

寻找新的梦幻足球队名称? 我提取了超过500,000个梦幻足球队的名称,以帮助您找到下一个球队的名称! (无耻的插件: FantasyNameSearch.com )

背景 (Background)

I am the type of person that likes to build things. I am not the typical creative-type nor am I the most witty or punny person you’ll meet. So in a sense, this project was a match made in heaven. I created an app to fake my way into clever team names. I’ve never been more ashamed and proud of myself at the same time.

我是喜欢建造东西的人。 我不是典型的创意人士,也不是您遇到的最风趣或风趣的人。 因此,从某种意义上说,这个项目是天造地设的一场比赛。 我创建了一个应用来伪装成聪明的团队名称。 我从未同时感到羞愧和为自己感到骄傲。

Gone are the days of needing to search through countless blog posts to find a new team name. Through my app, users are able to search through actual fantasy team names based upon players on their own fantasy team or other keywords. For example, I have Todd Gurley and Saquon Barkley on my fantasy team. Searching for “Gurley” or “Saquon” results in team names such as “A Gurley Has No Name” and “Obi Saquon Kenobi” (among other more vulgar results) along with thousands more to choose from. I created FantasyNameSearch.com to make it easy for like-minded, wit-deprived people such as myself, to find their next team name.

需要搜索无数博客帖子以查找新团队名称的日子已经一去不复返了。 通过我的应用,用户可以根据自己的幻想团队或其他关键字上的玩家搜索实际的幻想团队名称。 例如,我的幻想团队有Todd Gurley和Saquon Barkley。 搜索“ Gurley”或“ Saquon”会产生团队名称,例如“ A Gurley No Name”和“ Obi Saquon Kenobi”(还有其他一些粗俗的结果),还有成千上万种可供选择。 我创建了FantasyNameSearch.com,以使志趣相投,机智匮乏的人(例如我)可以轻松找到他们的下一个团队名称。

That’s enough soapboxing, let’s get to the meat of the discussion shall we?

足够的肥皂盒,让我们开始讨论的内容吧?

数据提取 (Data Extraction)

The data was extracted from the Yahoo Fantasy Sports API using Python. If you’ve ever tried to use it, it’s not the most well-documented API, but in the last few years there has been a significant increase in the 3rd-party utilities to be able to more easily connect and extract data. Python API wrappers such as YFPY have helped to make it more accessible and also helps to clean up the messy JSON response objects.

数据是使用Python从Yahoo Fantasy Sports API中提取的。 如果您曾经尝试使用它,它并不是文档最齐全的API,但是在最近几年中,第三方工具的使用显着增加,以便能够更轻松地连接和提取数据。 诸如YFPY之类的Python API包装器有助于使其更易于访问,并且还有助于清理凌乱的JSON响应对象。

As a disclaimer, I am not affiliated with Yahoo. Their terms of service require that all data extracted from their API is acknowledged as such. Carry on!

作为免责声明,我与Yahoo无关。 他们的服务条款要求如此确认从其API提取的所有数据。 继续!

To get started with minimal effort, the following code will get you a response from the API. You’ll need to sign up for a Yahoo developer account, create an app, and generate access tokens before executing this code.

为了以最小的努力入门,以下代码将为您提供API的响应。 在执行此代码之前,您需要注册Yahoo开发人员帐户,创建应用程序并生成访问令牌。

from yahoo_oauth import OAuth2
import jsonoauth = OAuth2(None, None, from_file=’./auth/yahoo_api_creds.json’) #access/secret API tokens
if not oauth.token_is_valid():
oauth.refresh_access_token()game_key = '390' # 2019 game key
league_id = '123456' #whatever league you want to accessurl = ' rx = '[' + re.escape(''.join(chars_to_remove)) + ']'
word = re.sub(rx, '', word) #remove the list of chars defined above
word = re.sub(r"'S", r"'s", word) #one-off replacement for apostrophe-S
split_word = [letter for letter in word.split() if not any(i in letter for i in emoji.UNICODE_EMOJI)]
word = ' '.join(split_word)
word = word.strip()
return word

Next we want to remove some bad words and limit the searchable names. I removed any racial slurs or homophobic language that I could because who really needs to perpetuate that type of ignorance in 2020? I also removed any names that were less than four characters long because the search algorithm splits the data into trigrams (sub-strings of 3 characters) and keeping them in the data causes lengthy searches due to the higher 100% match rates associated with 1–3 character strings. Cleansing code is below:

接下来,我们要删除一些不良词并限制可搜索的名称。 我删除了所有可能的种族辱骂或同性恋语言,因为谁真正需要在2020年永久保留这种无知? 我还删除了所有少于四个字符的名称,这是因为搜索算法将数据拆分为三字组(3个字符的子字符串),并将它们保留在数据中会导致冗长的搜索,这是因为与1 – 3个字符串。 清洗代码如下:

from tqdm.notebook import tqdm # for nice progress bars in Jupyter Notebookdef format_words(list_x):
master_list = []
for word in tqdm(list_x):
bad_words = ['bad_word_1', 'bad_word_2'] #curses, slurs, and other bad words
if any(bad_word in word for bad_word in bad_words):
pass
if len(word)>3: # Dont add 3-character words or duplicate words
master_list.append(word)

return list(dict.fromkeys(master_list)) # A set, while useful, is not ordered, we want to preserve order while also removing duplicates

名称搜寻 (Name Search)

Determining the best way to search for a name brought me some real grief. Fuzzy matching is not an easy thing to do, especially on a large corpus of textual data. I first started with the

FuzzyWuzzy library. It uses Levenshtein distance to calculate the “closeness” of one string to another. FuzzyWuzzy does a great job at matching on a variety of different methods including partial string matches and exact phrases but the downside was that it’s slow. On 500,000 names, the average search time was between 25–35 seconds. Imagine if your preferred search engine took that long? You’d throw your device out of the window. I know I’ll never be able to rival the speed of Google searches, but I could do better than 25 seconds at least!

确定搜索名称的最佳方式给我带来了真正的悲伤。 模糊匹配不是一件容易的事,尤其是在大量文本数据上。 我首先从FuzzyWuzzy库开始。 它使用Levenshtein距离来计算一个字符串与另一个字符串的“紧密度”。 FuzzyWuzzy在各种不同方法的匹配方面做得很好,包括部分字符串匹配和精确短语,但缺点是速度很慢。 在500,000个名称上,平均搜索时间在25-35秒之间。 想象一下,如果您喜欢的搜索引擎花费了这么长时间? 您会将设备丢到窗外。 我知道我永远无法与Google搜索的速度相提并论,但我至少可以做到25秒以上!

Enter rapidfuzz. Because FuzzyWuzzy is on the GPL2 license and the associated limitations of it (I won’t pretend to know the intricacies), rapidfuzz was developed to keep the search function on the MIT license. It also implements faster C++ code to reduce compute time. Due to that speed increase, my searches dropped down to an average of 8 seconds, which in my use-case is much more manageable.

输入rapidfuzz 。 因为FuzzyWuzzy在GPL2许可证上,并且它具有相关的限制(我不会假装不知道复杂性),所以开发了fastfuzz来将搜索功能保留在MIT许可证上。 它还实现了更快的C ++代码以减少计算时间。 由于速度的提高,我的搜索平均下降到8秒,在我的用例中,这更易于管理。

In order to limit the input length and character set, I first implemented a check using regex to verify that the search terms contains between 3–20 alphanumeric characters including white space. The code I used is below:

为了限制输入长度和字符集,我首先使用正则表达式进行了检查,以验证搜索词是否包含3–20个字母数字字符(包括空格)。 我使用的代码如下:

from rapidfuzz import fuzz as rapid_fuzz
from rapidfuzz import process as rapid_processsearch_word = "Thielen"def bad_char_search(strg, search=re.compile(r'[^A-Za-z0-9 ]{3,20}').search):
return not bool(search(strg)def generate(search_name):for name in rapid_process.iterExtract(re.escape(search_name), team_data, score_cutoff=75):

yield name[0].encode('utf-16', 'surrogatepass').decode('utf-16') # encoding certain characters gets funky when displaying on HTMLfor name in output:
print(name)

I actually developed the website to stream the search results in real-time, which is why I created a Python generator and the need to use a For Loop to print the results. Constructing it this way allowed me to immediately start loading results into the HTML tables without needing to wait for the entire search to run. When the search completes, the user will have plenty of names to sift through before they reach the end.

实际上,我开发了一个网站以实时传输搜索结果,这就是为什么我创建了Python生成器以及需要使用For Loop打印结果的原因。 以这种方式构造它使我可以立即开始将结果加载到HTML表中,而无需等待整个搜索运行。 搜索完成后,用户将有很多名称可以筛选,直到到达结尾为止。

Coming soon: In-depth name analysis, more years of data, and more sports! Stay tuned.

即将推出:深入的名称分析,更多年的数据和更多运动! 敬请关注。

翻译自: https://medium.com/swlh/in-search-of-a-fantasy-football-name-5656e8af5944

2046幻想

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值