linkedin 数据抓取
by Aline Lerner
通过艾琳·勒纳(Aline Lerner)
LinkedIn的认可是愚蠢的。 这是数据。 (LinkedIn endorsements are dumb. Here’s the data.)
If you’re an engineer who’s been endorsed on LinkedIn for any number of languages/frameworks/skills, you’ve probably noticed that something isn’t quite right. Maybe they’re frameworks you’ve never touched or languages you haven’t used since freshman year of college.
如果您是一位获得各种语言/框架/技能的LinkedIn认可的工程师,您可能已经注意到有些不对劲。 也许它们是自大学一年级以来从未接触过的框架,或者是从未使用过的语言。
No matter the specifics, you’re probably at least a bit wary of the value of the LinkedIn endorsements feature. The internets, too, don’t disappoint in enumerating some absurd potential endorsements or in bemoaning the lack of relevance of said endorsements, even when they’re given in earnest.
无论具体细节如何,您可能至少对LinkedIn认可功能的价值有所警惕。 互联网也不会令人失望,即使列举了一些荒谬的潜在认可 ,也不会抱怨这些 认可 的缺乏意义 ,即使它们是认真的。
Having a gut feeling for this is one thing, but we were curious about whether we could actually come up with some numbers that showed how useless endorsements can be, and we weren’t disappointed. If you want graphs and numbers, scroll down to the “Here’s the data” section below. Otherwise, humor me and read my completely speculative take on why endorsements exist in the first place.
对此感到直觉是一回事,但我们很好奇我们是否真的可以拿出一些数字来表明无用的背书是如何的,我们并不感到失望。 如果需要图形和数字,请向下滚动到下面的“此处是数据”部分。 否则,请逗我一下,并阅读我关于为什么首先存在背书的完全投机的观点。
LinkedIn认可只是嘈杂的众包标签 (LinkedIn endorsements are just noisy crowdsourced tagging)
Pretend for a moment that you’re a recruiter who’s been tasked with filling an engineering role. You’re one of many people who pays LinkedIn ~$9K/year for a recruiter seat on their platform. (Roughly 60% of LinkedIn’s revenue comes from recruiting, so you can see why this stuff matters.)
假装您是一名招聘人员,承担了担任工程职位的任务。 您是为平台上的招聘者席位每年支付LinkedIn约$ 9K的众多人之一。 ( LinkedIn的收入中约有60%来自招聘 ,因此您可以了解这方面的重要性。)
That hefty price tag broadens your search radius (which is otherwise artificially constrained) and lets you search the entire system. Let’s say you have to find a strong back-end engineer. How do you begin?
高昂的价格标签扩大了您的搜索范围(否则将受到人为限制),并使您可以搜索整个系统。 假设您必须找到一个强大的后端工程师。 您如何开始?
Unfortunately, LinkedIn’s faceted search (pictured below) doesn’t come with a “can code” filter[1]:
不幸的是,LinkedIn的多面搜索(如下图所示)没有附带“可以编码”过滤器[1]:
So, instead of searching for what you really want, you have to rely on proxies. Some obvious proxies, even though they’re not that great, might be where someone went to school or where they’ve worked before.
因此,您不必依靠自己真正想要的东西,而必须依靠代理。 一些明显的代理, 即使它们不是那么出色 ,也可能是某人上学的地方或以前工作过的地方。
But if you need to look for engineering ability, you’re going to have to get more specific. If you’re like most recruiters, you’ll first look for the main programming language your company uses and go from there.
但是,如果您需要寻找工程能力,则必须变得更加具体。 如果您像大多数招聘人员一样,首先会寻找公司使用的主要编程语言,然后从那里开始。
And this is despite knowledge of a specific language not being a good indicator of programming ability and despite most hiring managers not caring which languages their engineers know.
尽管有一种特定的语言知识并不能很好地表明编程能力,而且尽管大多数招聘经理并不关心工程师知道哪种语言,但这仍然是可行的。
Now pretend you’re LinkedIn. You have no data about how good people are at coding, and though you do have a lot of resume/biographical data, that doesn’t tell the whole story.
现在假装您是LinkedIn。 您没有关于人们在编码方面有多好的数据,尽管您有大量的履历/传记数据,但这并不能说明全部情况。
You can try relying on engineers filling in their own profiles with languages they know, but given that engineers tend to be pretty skittish about filling in their LinkedIn profile with a bunch of buzzwords, what do you do? You build a crowdsourced tagger, of course! Then, all of a sudden, your users will do your work for you.
您可以尝试依靠工程师使用他们所知道的语言来填写自己的个人资料,但是鉴于工程师倾向于用一大堆时髦的词来填写他们的LinkedIn个人资料,这很不明智,您怎么办? 当然,您可以构建一个众包的标记器! 然后,您的用户突然会为您完成您的工作。
Why do I think this is the case? Well, if LinkedIn cared about true endorsements rather than perpetuating the skills-based myth that keeps recruiters in their ecosystem, they could have written a weighted endorsement system by now, at the very least. That way, an endorsement from someone with expertise in some field might mean more than an endorsement from your mom (unless, of course, she’s an expert in the field).
我为什么认为是这种情况? 好吧,如果LinkedIn关心真正的认可,而不是将使招聘人员留在其生态系统中的基于技能的神话永久存在,那么至少现在他们可以编写加权的认可系统。 这样,得到在某个领域有专长的人的认可可能不仅仅意味着得到您妈妈的认可(当然,除非她是该领域的专家)。
But they don’t do that, or at least they don’t surface it in candidate search. It’s not worth it. Because the point of endorsements isn’t to get at the truth. It’s to keep recruiters feeling like they’re getting value out of the faceted search they’re paying almost $10K per seat for. In other words, improving the fidelity of endorsements would likely cannibalize LinkedIn’s revenue.
但是他们没有这样做,或者至少他们没有在候选人搜索中浮出水面。 这不值得。 因为认可的目的不是要弄清事实。 这是为了让招聘人员感到,他们正在从多方面的搜索中获得价值,他们为此每席位支付了近1万美元。 换句话说,提高认可的忠诚度可能会蚕食LinkedIn的收入。
You could make the counterargument that despite the noise, LinkedIn endorsements still carry enough signal to be a useful first-pass filter and that having them is more useful than not having them. This is the question I was curious about, so I decided to cross-reference our users’ interview data with their LinkedIn endorsements.
您可能会反驳说,尽管有噪音,但LinkedIn代言仍然承载着足够的信号,可以用作有用的首过滤波器,而且拥有它们比不拥有它们更有用。 这是我很好奇的问题,因此我决定将用户的访谈数据与他们的LinkedIn认可进行交叉引用。
设置 (The setup)
So, what data do we have? First, for context, interviewing.io is a platform where people can practice technical interviewing anonymously with interviewers from top companies and, in the process, find jobs. Do well in practice, and you get guaranteed (and anonymous!) technical interviews at companies like Uber, Twitch, Lyft, and more. Over the course of our existence, we’ve amassed performance data from close to 5,000 real and practice interviews.
那么,我们有什么数据呢? 首先,就上下文而言,inceptions.io是一个平台,人们可以在该平台上与来自顶级公司的面试官进行匿名的技术面试,并在此过程中找到工作。 在实践中做得好,您将获得Uber,Twitch,Lyft等公司的技术面试保证(并且是匿名的!)。 在我们的生存过程中,我们已经从近5,000次实际和实践访谈中收集了绩效数据。
When an interviewer and an interviewee match on our platform, they meet in a collaborative coding environment with voice, text chat, and a whiteboard and jump right into a technical question. Interview questions on the platform tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role.
当访问者和被访问者在我们的平台上进行匹配时,他们将在带有语音,文本聊天和白板的协作编码环境中会面,并直接跳入技术问题。 平台上的面试问题通常属于您在电话屏幕上遇到的后端软件工程角色的类别。
After every interview, interviewers rate interviewees on a few different dimensions, including technical ability. Technical ability gets rated on a scale of 1 to 4, where 1 is “poor” and 4 is “amazing!”. On our platform, a score of 3 or above has generally meant that the person was good enough to move forward. You can see what our feedback form looks like below:
每次面试后,面试官都会在几个不同方面对受访者进行评分,包括技术能力。 技术能力的等级为1到4,其中1表示“差”,4表示“惊人!”。 在我们的平台上,满分3分或以上通常意味着该人足够优秀,可以向前迈进。 您可以看到我们的反馈表如下所示:
As promised, I cross-referenced our data with our users’ LinkedIn profiles and found some interesting, albeit not that surprising, stuff.
按照承诺,我将数据与用户的LinkedIn个人资料进行了交叉引用,发现了一些有趣的东西,尽管并不令人惊讶。
认可与人们实际使用哪种语言进行编程 (Endorsements vs. what languages people actually program in)
The first thing I looked at was whether the programming language people interviewed in most frequently had any relationship to the programming language for which they were most endorsed. It was nice that, across the board, people tended to prefer one language for their interviews, so we didn’t really have a lot of edge cases to contend with.
我首先要看的是人们最常接受采访的编程语言是否与他们最认可的编程语言有任何关系。 令人高兴的是,人们普遍倾向于在采访中偏爱一种语言,因此我们并没有太多的优势可以应付。
It turns out that people’s interview language of choice matched their most endorsed language on LinkedIn just under 50% of the time.
事实证明,人们选择的面试语言与他们在LinkedIn上最受欢迎的语言相匹配的时间不到50%。
Of course, just because you’ve been endorsed a lot for a specific language doesn’t mean that you’re not good at the other languages you’ve been endorsed for. To dig deeper, I took a look at whether our users had been endorsed for their interview language of choice at all. It turns out that people were endorsed for their language of choice 72% of the time. This isn’t a particularly powerful statement, though, because most people on our platform have been endorsed for at least 5 programming languages.
当然,仅仅因为您已经对某一种特定的语言获得了很多认可,并不意味着您不擅长认可其他的语言。 为了更深入地研究,我研究了我们的用户是否完全被他们选择的采访语言所认可。 事实证明,人们有72%的时间认可他们选择的语言。 但是,这并不是一个特别有力的声明,因为我们平台上的大多数人都认可了至少5种编程语言。
That said, even when an engineer had been endorsed for their interview language of choice, that language appeared in their “featured skills” section only 31% of the time. This means that most of the time, recruiters would have to click “View more” to see the language that people prefer to code in, if it’s even listed in the first place.
也就是说,即使工程师认可了他们选择的面试语言,该语言也仅在31%的时间出现在其“精选技能”部分中。 这意味着招聘人员在大多数情况下都必须单击“查看更多” 以查看人们喜欢编码的语言,即使该语言甚至排在首位。
So, how often were people endorsed for their language of choice? Quantifying endorsements[2] is a bit fuzzy, but to answer this meaningfully, I looked at how often people were endorsed for that language relative to how often they were endorsed for their most-endorsed language, in the cases when the two languages weren’t the same (recall that this happened about half the time).
那么,人们多久认可一次他们选择的语言? 量化背书[2]有点模糊,但是要有意义地回答,我研究了人们相对于最背书语言被背书的频率,人们对这种语言背书的频率。 t同样(记得发生这种情况的时间大约是一半)。
Perhaps if these numbers were close to 1 most of the time, then endorsements might carry some signal. As you can see in the histogram below, this was not the case at all.
也许如果这些数字在大多数情况下接近1,那么背书可能会发出一些信号。 正如您在下面的直方图中所看到的,根本不是这种情况。
The x-axis above is how often people were endorsed for their interview language of choice relative to their most-endorsed language. The bars on the left are cases where someone was barely endorsed for their language of choice, and all the way to the right are cases where people were endorsed for both languages equally as often. All told, the distribution is actually pretty uniform, making for more noise than signal.
上方的x轴是相对于其最认可的语言,人们认可他们选择的面试语言的频率。 左侧的条形图是某人勉强认可其选择的语言的情况,而一直到右侧的条形图是人们均被同等地认可两种语言的情况。 总而言之,分布实际上相当均匀,比信号产生更多的噪声。
认可与面试表现 (Endorsements vs. interview performance)
The next thing I looked at was whether there was any correspondence between how heavily endorsed someone was on LinkedIn and their interview performance. This time, to quantify the strength of someone’s endorsements[3], I looked at how many times someone was endorsed for their most-endorsed language and correlated that to their average technical score in interviews on interviewing.io.
我接下来要看的是在LinkedIn上对某人的认可程度与他们的访谈表现之间是否存在对应关系。 这次,为了量化某人认可的强度[3],我查看了某人因其最认可的语言而获得认可的次数,并将其与他们在访谈访问中的平均技术得分相关。
Below, you can see a scatter plot of technical ability vs. LinkedIn endorsements, as well as my attempt to fit a line through it. As you can see, the R² is piss-poor, meaning that there isn’t a relationship between how heavily endorsed someone is and their technical ability to speak of.
在下面,您可以看到技术能力与LinkedIn认可之间的散点图,以及我试图通过一条直线进行拟合的情况。 如您所见,R²差一点,这意味着某人对某人的认可程度与他们的技术能力之间没有关系。
认可与否认可…和结束思想 (Endorsements vs. no endorsements… and closing thoughts)
Lastly, I took a look at whether having any endorsements in the first place mattered with respect to interview performance. If I’m honest, I was hoping there’d be a negative correlation. That is, if you don’t have endorsements, you’re a better coder. After running some significance testing, though, it became clear that having any endorsements at all (or not) doesn’t matter.
最后,我看一看是否有任何认可对面试表现有影响。 老实说,我希望这之间存在负相关。 也就是说,如果您没有认可,那么您就是更好的编码者。 但是,在进行了一些重要性测试之后,很明显,是否有任何认可都无关紧要。
So, where does this leave us? As long as there’s money to be made in peddling low-signal proxies, endorsements won’t go away and probably won’t get much better.
那么,这把我们留在哪里呢? 只要有钱可以兜售低信号代理,代言就不会消失,而且可能不会变得更好。
It is my hope, though, that any recruiters reading this will take a second look at the candidates they’re sourcing and try to, where possible, look at each candidate as more than the sum of their buzzword parts.
不过,我希望,所有阅读此手册的招聘人员都将对他们正在寻找的候选人进行第二次审视,并在可能的情况下,尝试将每个候选人视为其流行语部分的总和。
Want to become awesome at technical interviews and land your next job in the process? Join interviewing.io!
想要在技术面试中变得很棒,并在此过程中找到下一份工作吗? 参加访谈吧!
[1] You know what comes with a can code filter? interviewing.io does! We know how people are doing rigorous, live technical interviews, which, in turn, lets us reliably predict how well they will do in future interviews. Roughly 60%3 of our candidates pass technical phone screens and make it onsite. Want to use us to hire?
[1]您知道can code过滤器附带什么吗? 采访! 我们知道人们如何进行严格的实时技术面试,从而使我们能够可靠地预测他们在未来的面试中的表现。 我们大约有60%3的候选人通过了技术电话屏幕并进入现场。 要使用我们来招聘吗?
[2] There are a lot of possible approaches to comparing endorsements, to each other and to other stuff. In this post, I decided to, as much as possible mimic how a recruiter might think about a candidate’s endorsements when looking at their profile. Recruiters are busy (I know; I used to be one) and get paid to make quick judgments. Therefore, given that LinkedIn doesn’t normalize endorsements for you, if a recruiter wanted to do it, they’d have to actually add up all of someone’s endorsements and then do a bunch of pairwise division. This isn’t sustainable, and it’s much easier and faster to look at the absolute numbers. For this exact reason, when comparing the endorsements for two languages, I chose to normalize the relative to each other rather than relative to all other endorsements. And when trying to quantify the strength of someone’s programming endorsements as a whole, I opted to just count the number of endorsements for someone’s most-endorsed language.
[2]有很多可能的方法可以比较代言,彼此和其他内容。 在这篇文章中,我决定尽可能地模仿招聘人员在查看候选人的个人形象时如何考虑其认可。 招聘人员很忙(我知道;我曾经是一个),并得到报酬以快速做出判断。 因此,鉴于LinkedIn不会为您标准化认可,如果招聘人员想要这样做,他们必须实际上将某人的所有认可加起来,然后进行成对划分。 这是不可持续的,并且查看绝对数字要容易得多且更快。 出于这个确切的原因,在比较两种语言的认可时,我选择了相对于彼此而不是相对于所有其他认可的归一化。 当试图量化某人的总体编程认可的强度时,我选择只计算某人最认可的语言的认可次数。
[3] See footnote 2 above. I used the same rationale.
[3]见上文脚注2。 我使用了相同的理由。
翻译自: https://www.freecodecamp.org/news/linkedin-endorsements-are-dumb-heres-the-data-386a9e1606f1/
linkedin 数据抓取