python中api
工作技能世界 (The World of Job Skills)
So you want to figure out where your skills fit into today’s job market. Maybe you’re just curious to see a comprehensive constellation of job skills, clean and standardized. Or you need a taxonomy of skills for a Resume parsing project. Well, the EMSI skills API is one possible tool for the job!
因此,您想弄清楚自己的技能适合当今的就业市场。 也许您只是好奇地看到一个完整,标准化的工作技能组合。 或者,您需要针对简历解析项目的技能分类。 嗯, EMSI技能API是一项可行的工具!
In this tutorial, I’ll walk you through some boilerplate code you can use to access a few key endpoints from the API: a global list of skills, skill extraction from a document, skill lookup by name, and lastly finding related skills by skill ID. Let’s get started.
在本教程中,我将指导您完成一些样板代码,您可以使用这些样板代码从API访问一些关键端点: 技能的全局列表,从文档中提取技能,按名称查找技能以及最后按技能查找相关技能ID 。 让我们开始吧。
建立 (Setup)
Getting started is as easy as signing up for the API’s free access. You’ll get authentication credentials emailed to you once you complete that process.
入门就像注册 API的免费访问一样容易。 完成该过程后,您将通过电子邮件将身份验证凭据发送给您。
进口声明 (Import Statements)
We’ll use a few packages here, so let’s import those first:
我们将在此处使用一些软件包,因此让我们首先导入它们:
All of these are pretty standard. I’m using the json_normalize
package which is an easy means of converting JSON to Pandas DataFrames, which will be nicer for readability.
所有这些都是相当标准的。 我正在使用json_normalize
包,这是将JSON转换为Pandas DataFrames的一种简便方法,这对于可读性会更好。
验证您的连接 (Authenticating Your Connection)
The first part of accessing the API is simply using the credentials in that signup email to establish a connection and get an access token. I ran the following in a cell in a Jupyter Notebook with Python.
访问API的第一部分只是使用注册电子邮件中的凭据来建立连接并获取访问令牌。 我在使用Python的Jupyter Notebook的单元格中运行了以下内容。
Sidenote: if my code blocks (like the one above) are cut off, please follow the source link in their caption to read the full code!
旁注:如果我的代码块(如上面的代码块)被切除,请按照其标题中的源链接阅读完整的代码!
This code results in an authentication JSON object, where one of the keys is the access_token
. Here I’ve explicitly accessed the value of that key and assigned it to a variable of the same name for later use.
这段代码生成一个身份验证JSON对象,其中的键之一是access_token
。 在这里,我已显式访问该键的值,并将其分配给同名变量,以供以后使用。
“你好,世界!” EMSI的技能API (The “Hello, World!” of EMSI’s Skills API)
EMSI has multiple APIs, but we’ll be focused on the Skills API in this tutorial. To get started, we’re just going to use that access token to pull the full list of skills available to us.
EMSI有多个API,但是在本教程中我们将重点介绍Skills API。 首先,我们将使用该访问令牌提取可供我们使用的完整技能列表。
拉全球职业技能清单 (Pull the Global List of Job Skills)
I wrote a simple function to pull the skills list and write it to a Pandas DataFrame for nicer formatting and readability.
我编写了一个简单的函数来提取技能列表,并将其写入Pandas DataFrame,以获得更好的格式和可读性。
I set the url to the skills list endpoint, concatenated the access token in with the necessary syntactical specifications for the API, and used the requests library to get the data. This results in the following global list of skills:
我将URL设置为技能列表端点,将访问令牌与API的必要语法规范连接在一起,并使用请求库获取数据。 这将产生以下全局技能列表:
You can see here there are both hard and soft skills, each skill has a unique ID, and each skill is standardized and proper cased. Each skill type has a type ID as well. There are nearly 30,000 skills listed here!
您可以在此处看到硬技能和软技能,每种技能都有唯一的ID,并且每种技能都经过标准化和适当的区分。 每个技能类型也都有一个类型ID。 这里列出了将近30,000种技能!
提取给定文档中出现的技能 (Extract the Skills That Appear in a Given Document)
Say instead you have a document (a resume or job description for example), and you want to find relevant skills that the resume holder has or the job poster wants. The following function will prompt you for a text input. Paste the text in there and set a confidence interval between 0 and 1 (I usually do 0.4 to see a longer list of skills), and voilà — skills extracted!
假设您有一个文档(例如,一份简历或职位描述),并且想找到简历持有人或职位发布者想要的相关技能。 以下功能将提示您输入文本。 在其中粘贴文本,并在0到1之间设置一个置信区间(通常我会做0.4来查看更多的技能列表),然后瞧瞧-提取出来的技能!
I had typed “python and such” as a simple example, which returned this skill extraction with a 100% (1.0) confidence level to no surprise:
我以简单的示例输入了“ python之类”,它以100%(1.0)的置信度返回了此技能提取,这并不奇怪:
This is all well and good. But what if you want to find how a skill is referred to in this taxonomy? Well, there’s an API that finds related skills by ID, but we need to know the ID first! Let’s find that now.
这一切都很好。 但是,如果您想查找此分类法中如何提及一项技能,该怎么办? 嗯,有一个API可通过ID查找相关技能,但我们需要首先了解ID! 让我们现在找到它。
通过名称查找技能以找到其ID (Look Up a Skill by Name to find its ID)
The following code uses Python’s str.contains
method to find skills that contain the substring entered as an argument to the function.
以下代码使用Python的str.contains
方法查找包含包含作为函数参数输入的子字符串的技能。
As you can see, using the str.contains(name_substring)
method results in finding all skills that have the word Python
in it. This allows us to see the full range of possibilities and select the IDs of the ones we want to find related skills for. The DataFrame returned by the above function is shown below:
如您所见,使用str.contains(name_substring)
方法会发现其中包含单词Python
所有技能。 这使我们能够看到所有可能性,并选择我们想要查找相关技能的ID。 上面的函数返回的DataFrame如下所示:
There is a lot of granularity here! Let’s next find related skills to Pandas and Python as an example by grabbing their IDs and inputting them into the next block of code.
这里有很多粒度! 接下来,让我们通过获取它们的ID并将其输入到下一个代码块中,来找到与Pandas和Python相关的技能作为示例。
查找与技能相关的技能 (Find Related Skills to a Skill)
We have our IDs for the skills of interest. Now we want to find related skills to them. I’ve added the IDs of the skills in question to the code in the payload and as comments at the top of the following code block. If you want to add more, pay close attention to the formatting of payload
. It escapes the “ and other nuances like needing the spacing before the closing }.
我们拥有感兴趣技能的ID。 现在,我们想找到与他们相关的技能。 我已经将有关技能的ID添加到有效负载中的代码中,并在以下代码块的顶部作为注释。 如果要添加更多内容,请密切注意payload
的格式。 它避免了“”和其他细微差别,例如在结束}前需要间隔。
We saw in the previous output of skills involving the word Python
that there were many options. I chose to find skills related to Python
and Pandas
. The resultant DataFrame is shown below:
在前面涉及Python
的技能输出中,我们看到了很多选择。 我选择查找与Python
和Pandas
相关的技能。 结果数据框如下所示:
This is great performance! It shows us other Python packages essentially, including NumPy which almost always accompanies Pandas in our import statements in Data Science!
这是很棒的表现! 它从本质上向我们展示了其他Python软件包,包括NumPy,它几乎总是在数据科学中的import语句中伴随Pandas!
结论和未来的工作 (Conclusion and Future Work to be Done)
Thanks for reading this quick tutorial on the EMSI Skills API. I hope you found it useful for whatever your use case may be. If you want to see this developed in a specific further direction, please leave me a comment below! There are many more interesting datasets from EMSI as well that are worth checking out, including those with information on the labor markets, job postings, and much more.
感谢您阅读有关EMSI Skills API的快速教程。 我希望您发现它对您的用例可能有用。 如果您想看到这个方向的进一步发展,请在下面给我留言! 有来自许多EMSI更有趣的数据集,以及那些值得检查,包括那些在劳动力市场信息,招聘信息,以及更多 。
For the next steps, I can re-engineer the related skills code block so that it’s a function, taking in a list of skill IDs as keyword arguments and adding them into the payload. Right now it’s a little finicky and not standardized. I’d like to engineer this into a module, where a connection is a class, and utilization of each endpoint is a method with more robust attributes and arguments. That would certainly save many lines of code.
对于下一步,我可以重新设计相关的技能代码块,使其成为一个功能,将技能ID的列表作为关键字参数,并将其添加到有效负载中。 现在,它有点挑剔且不规范。 我想将其设计到一个模块中,其中连接是一个类,每个端点的利用是一种具有更可靠的属性和参数的方法。 那肯定会节省很多行代码。
But till next time — happy coding!
但是直到下一次-编码愉快!
Riley
赖利
翻译自: https://towardsdatascience.com/finding-relevant-job-skills-via-api-in-python-ced56cbb3493
python中api