
Imposter syndrome
I am not a real data scientist.
I have never used a deep learning framework, like TensorFlow or Keras.
I have never touched a GPU.
I don’t have a degree in computer science or statistics. My degree is in mechanical engineering, of all things.
I don't know R.
But I haven’t given up hope. After reading a bunch of job postings, I figured out that all it will take to become a real data scientist is five PhD's and 87 years of job experience.
我从没有用过一个深度学习的框架,比如TensorFlow Keras
我从没有接触过GPU( 显卡)
我不懂R 语言。
但是我没有放弃希望。在读了一串的工作列表后,我的除了结论,成为数据科学家需要5 个博士学位和87 年的工作经历。
If this sounds familiar, know that you are not alone. You are not the only one who wonders how much longer they can get away with pretending to be a data scientist. You are not the only one who has nightmares about being laughed out of your next interview.
Imposter syndrome is feeling like everyone else in your field is more qualified than you are, that you will never get hired or, if you already have been, that you are a mistake of the hiring process. Despite its statistical implausibility, most of us feel below average. Based on my conversations with colleagues, I estimate that 9 out of 10 of us suffer from imposter syndrome at one time or another. (If this sounds entirely unfamiliar to you, I recommend an introspective reading of “ Unskilled and unaware of it” by Kruger and Dunning.)
冒名顶替者综合症是一种感觉在你的领域中的所有其他人都比你更有资格的感觉,你觉得你不会被录用,如果你已经被录用了,你觉得你是招聘过程的一个错误。尽管在统计学上难以置信,但是大多数人都感觉自己低于平局水平。基于我和同事的对话,我估计9/10 中的我们都在某些时刻经历冒名顶替者综合症。( 如果这听起来对你来说完全陌生,我推荐你自省地阅读Kruger Dunning 的文章“不熟练和不清楚”.)
 [a picture]
Even Ewoks feel like imposters sometimes. (Photo courtesy of Diane Rohrer.)
[ 一张图片,省略了,一只装成熊的狗的图片]
What a real data scientist looks like
“Data science” is a term that has generated a lot of excitement and, like a magnet, has pulled in lots of nearby subfields. The field we call data science is still relatively young, yet already too broad for an individual to be an expert in every corner of it. In my experience, the master-of-all-trades data science unicorn is a mythical beast. None of us can cover all the bases. So how are we to proceed?
There are two paths forward: generalist and specialist.
A good  generalist
  • is superficially familiar with every part of data science,
  • recognizes all the jargon and technical terms,
  • has a good notion of what tools and expertise are needed to solve a given problem, and
  • asks insightful questions in technical reviews.
  • 肤浅地熟悉数据科学的每个领域
  • 认识所有的行话和技术名词
  • 对解决一个给定一个问题需要使用什么工具应用什么技能有好的想法
  • 在技术审查的时候能够问得出有洞察力的问题
A good  specialist
  • understands one area deeply,
  • can explain their area of expertise to non-experts,
  • understands the tradeoffs between different approaches,
  • is up to date on current research and new tools, and
  • can use their tools quickly to produce high-quality results.
  • 深入地懂一个领域
  • 能够把他的领域的专门知识解释给非专家
  • 理解不同方案的利弊
  • 了解最新的研究和工具
  • 能够快速地使用工具产生高质量的结果
A generalist does not necessarily know the details of how an algorithm works and the tricks of using a tool. They will tell you that data cleaning is critical, but may not be able to enumerate the trade-offs between methods for replacing missing values. They will tell you that Spark is a good way to speed up your computations, but may not be able to advise you on the best settings to use.
一个通才不需要知道一个算法工作的细节或使用一个工具的窍门。他们会告诉你整理数据是关键的,但是可能不会枚举出不同的替换缺失的数据值的方法的利弊。他们会告诉你Spark 是好的加速运算的工具,但是可能不能给你如何配置他的建议。
A specialist does not necessarily know much about something that is outside their area. They will know the best architecture for running a linear regression on 500 million data points, but may not be able to explain a naive Bayes classifier. They will keenly grasp the trade-offs between square loss, hinge loss and logistic loss, but may be unable to query data from a Hive table.
一个专家不需要知道他的领域之外的事。他们会知道在500 百万数据点上运行线性回归的最好架构,但可能不知道如何解释基础贝叶斯分类器。他们会敏锐地掌握平方损失,hinge loss logistic loss ,但是可能不能从Hive 表中查询数据。
Another way to describe generalists and specialists is “broad” versus “deep”. They are both technically savvy, but their expertise is distributed differently. We are all part generalist and part specialist. As you evolve through your career, you get to find the mixture that works best for you.
This distinction can be helpful when  hiring data scientists too. Asking specifically for research experience in deep neural networks or a background in financial data visualization will draw applicants that fit your needs more effectively than calling for a "full-stack" data scientist.
How to prove that you are a real data scientist
Traditionally we establish our qualification in a field with advanced degrees. Unfortunately for most of us, there are few such degrees available in data science. We have no piece of paper to use as a shield when someone questions our qualifications. So what do we do instead? How can we answer our critics, or interviewers, our colleagues, and harshest of all, the voices in our head?
Consider woodworking. Imagine that you want to install a custom cabinet in your kitchen. Three carpenters show up inquiring after the job. The first one presents you with a certificate. She says, “I apprenticed with the premier cabinet maker in the city for seven years.” The second opens her toolbox and says, “My chisels are of the latest design, and no one has a sharper plane.” The third hands you a small box, cherry-colored and perfectly smooth. When you pull the handle with a fingertip, a drawer slides out soundlessly. She says, “I made this.”
考虑木工活。设想你想在厨房里安装一个橱柜。工作过后三个木匠来接受质询。第一个给你看了一个整数。她说,“我师从这个城市做橱柜的第一名7 年”。第二个打开她的工具箱说: “我的凿子是最新款,也没有人有比我的更锋利的刨子。第三个给你一个小盒子,樱桃色,非常光滑。当你用指尖拉把手,一个抽屉无声地滑出来。她说: “这是我做的。”
Certifications, tools and portfolio are all popular ways for establishing credentials. I won’t argue that one is superior to another, but portfolios are particularly effective for data scientists. Certifications are few and not yet standardized. Listing algorithms and computer languages we have used doesn’t convey our depth of familiarity with them or what we can do with them.  Building things shows to a non-technical audience what we can do for them and demonstrates our expertise for technical interviewers and colleagues. Of course, this doesn’t guarantee that you’ll get a job on your first interview. But even if you don't, that’s normal.  Keep interviewing.
How it feels to be a real data scientist
Note that both generalists and specialists have lots of things they don’t know. This means that even real data scientists will spend most of their days feeling lost. Our project lead will ask us questions that we don't know the answer to. Colleagues will talk comfortably about algorithms we've never heard of. Teammates will write code that we can't begin to decipher. Articles will cite "hot" subfields that we didn't know existed. Archiv papers will throw around equations that may as well be hieroglyphic gibberish. Interns will point out fundamental flaws in our reasoning. This is OK. You're not doing it wrong. This is OK.
Our goal isn’t to accumulate answers, but to ask better questions. If you are asking questions and using data to find answers, YOU ARE A DATA SCIENTIST. Period.
August 22, 2017
1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看REAdMe.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看REAdMe.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。 1、资源项目源码均已通过严格测试验证,保证能够正常运行; 2、项目问题、技术讨论,可以给博主私信或留言,博主看到后会第一时间与您进行沟通; 3、本项目比较适合计算机领域相关的毕业设计课题、课程作业等使用,尤其对于人工智能、计算机科学与技术等相关专业,更为适合; 4、下载使用后,可先查看READme.md或论文文件(如有),本项目仅用作交流学习参考,请切勿用于商业用途。 5、资源来自互联网采集,如有侵权,私聊博主删除。 6、可私信博主看论文后选择购买源代码。
评论 1




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


