数据科学家访谈录 百度网盘_您应该在数据科学访谈中向THEM提问。

A quick search on Medium with the keywords “Data Science Interview” resulted in hundreds of Medium articles to help guide the reader from what concepts are covered to even specific company interviews ranging from Tesla, Walmart, Twitter, Apple, AWS, etc.



Not everyone is going to land a data science job at a major company like Amazon and you might have to work at some companies where they are just starting out in data science but you should have a good idea of what you can expect during your time as an employee there and avoid any nasty surprises.


Through ONLY 3 THEMES you should be able to discern if this will be a nightmare job or if you will be supported in your career path as a data scientist. The last thing someone wants is to start a job and want to leave a few months in. Also, this does not mean every job will be easy, your job should challenge you but in a good way. And if you like really hard challenges that involve shaking up legacy systems/processes and bureaucracy in some companies that is fine but at least you have an idea of what you are getting yourself into.

通过仅3个主题,您应该能够辨别这将是一场噩梦,还是作为数据科学家在您的职业道路上得到支持。 某人想要做的最后一件事是开始工作并想离开几个月。而且,这并不意味着每一项工作都会很轻松,您的工作应该挑战您,但是会以一种很好的方式挑战您。 并且,如果您真的喜欢艰巨的挑战,包括在一些公司中改变旧系统/流程和官僚作风,那很好,但是至少您对自己的目标有所了解。

主题编号1:数据 (Theme Number 1: The Data)

Some of the first questions I ask are about the data. As a data scientist, you will be obviously expected to wrangle, explore, transform, and feed your models with data.

我首先问的一些问题是关于数据的。 作为数据科学家,很明显,您将需要纠缠,探索,转换并为模型提供数据。

  1. How is your data stored? How large is your data? If your data is all in a data lake and you are expected to create tables just know now you are also the database person other than data scientist. Make sure you are coming into at least a database that is already set up. Sometimes companies might be in the middle of migrating their data from databases which takes a while at some companies. At a startup I worked at they migrated their entire data from Redshift to Snowflake in under 3 months versus a large international company I worked at it took over a year.

    您的数据如何存储? 您的数据有多大? 如果您的数据全部在数据湖中,并且您应该创建表,那么现在就知道您也是数据库科学家,而不是数据科学家。 确保至少要进入一个已经建立的数据库。 有时,公司可能正处于从数据库迁移数据的过程中,这在某些公司中需要一些时间。 在我工作过的一家初创公司中,他们在不到3个月的时间内将所有数据从Redshift迁移到了Snowflake,而在我工作过的一家大型国际公司花了一年多的时间。

  2. How often does it get updated and how at a high level? Will I be expected to manage the data pipelines? (If working with large amounts of data they should have a Data Engineer if not you will be expected to wear both hats). There should be someone other than you making sure the database is being maintained.

    它多久更新一次,并在较高水平上更新? 是否需要管理数据管道? (如果要处理大量数据,那么他们应该拥有一名数据工程师,否则,您将被戴上两顶帽子)。 除了确保数据库正在维护之外,还应该有其他人。

  3. What are the data sources of the data? You would be surprised at how some of the data is updated, maybe even coming from through Excel uploads. Ideally, it would come in through an API or logs from a web application if you have a product. I have worked at a company where data came from MANY sources: survey forms, SMS data, email data, CRM, APIs, excel uploads. Make sure you have an idea of the depth and diversity of the data sources.

    数据的数据源是什么? 您会惊讶于某些数据是如何更新的,甚至可能来自Excel上传。 理想情况下,如果有产品,它将通过API或来自Web应用程序的日志进入。 我曾在一家公司的数据来自很多来源的公司工作:调查表,SMS数据,电子邮件数据,CRM,API,excel上载。 确保您对数据源的深度和多样性有所了解。

主题2:您的经理。 (Theme Number 2: Your manager.)

This theme is SO important and will be the theme with the longer paragraphs. Managers are really your gateway to success in a company, and if you have a bad manager but love the company it will be much harder to navigate. A book I read called Positioning: The Battle for your Mind is mostly about marketing and brand image, however; there was a paragraph that talked about managers on how they were really part of cultivating your brand image at a company. They are the ones that will give you projects that should be aligned to your interest or give you a project that you didn’t know you would like but end up liking it anyway because they understand you. A good manager wants you to succeed. I have had four managers and let me tell you it is a very different experience having a manager who sees the value in you and wants you to excel even if it means standing out more at times than themselves versus a manager who sees the value in you and is intimidated and tries to keep you from acquiring any project. However, also be careful with those managers who buy into the “hire someone smarter than yourself” but turns it into so their direct reports do all the hard work and the manager just sits back. Make sure your manager has skin in the game and wants you to excel. There is nothing worse than a manager that is not passionate about the role they play themselves at the company.

这个主题非常重要,将成为较长段落的主题。 经理确实是您通往公司成功的门户,如果您的经理不好,却热爱公司,将很难驾驭。 我读过的一本书叫做《 定位:为您的心灵而战》主要是关于营销和品牌形象的。 有一段内容谈到了经理们,他们实际上是如何在公司中培养您的品牌形象的。 他们会为您提供符合您的兴趣的项目,或者为您提供您不知道想要的项目,但最终还是喜欢它,因为他们了解您。 一个好的经理希望你成功。 我有四位经理,让我告诉您,拥有一个看到您内在价值并希望您脱颖而出的经理是一种截然不同的经历,即使这意味着与一个看到您内在价值的经理相比,有时会比自己更突出并被吓到并试图阻止您获得任何项目。 但是,也要对那些购买“雇用比自己聪明的人”但变成事实的经理人保持谨慎,以便他们的直接报告完成所有艰苦的工作,而经理人只能坐下来。 确保您的经理在游戏中有皮肤,并希望您表现出色。 没有一个经理比不热衷于他们在公司中扮演的角色更糟。

  1. What is the background of your supervisor and title? Ideally, your supervisor should speak your same language although the higher up your title is let’s say, Manager, Director, VP, or Chief Data Officer sometimes you will be reporting to someone who is more business savvy. At the startup I was at the Chief Data Officer would query from the database themselves and also use Python. Management understands data and strategy but they might not know why your Support Vector Model is doing worse than your Logistic Regression Model. If you are an entry-level or even experienced Data Scientist you might prefer that your supervisor can guide you. It also helps when a project is a road blocked, they will understand quickly and thoroughly what is causing the delay and can relay the communication to upper management.

    您的主管背景和职务是什么? 理想情况下,您的主管应该说相同的语言,尽管您的职位较高,例如经理,董事,副总裁或首席数据官,有时您将向业务更精明的人汇报工作。 在启动时,我在首席数据官处会从数据库本身查询并使用Python。 管理层了解数据和策略,但他们可能不知道为什么支持向量模型比Logistic回归模型做得更差。 如果您是入门级甚至是经验丰富的数据科学家,那么您可能希望主管可以指导您。 当项目遇到道路阻塞时,它也有帮助,他们将Swift而透彻地了解造成延迟的原因,并将通信传达给高层管理人员。

  2. Have they ever managed someone before? Note this is not something that people should be inherently biased towards. I have had a manager before that has never managed before and took the time to read books about management, really cared about my growth and development, and did not micromanage versus I have reported to a manager with TONS of management experience but has never managed an analytics person and did not know how to best leverage my skill set. However, both of these people never managed a data scientist before which to my earlier point it is so much easier when you can speak the same language as your supervisor, especially at an entry-level position.

    他们以前曾经管理过某人吗? 请注意,这不是人们固有的偏见。 我以前有位经理,以前从未管理过,花时间阅读有关管理的书籍,真正关心我的成长和发展,没有微观管理,而我曾向拥有TONS管理经验的经理报告过,但从未管理过分析人员,不知道如何最好地利用我的技能。 但是,这两个人都从未管理过数据科学家,而在我之前,您可以说出与主管相同的语言要容易得多,尤其是在入门级职位上。

  3. How many direct reports do they have and who are they? I don’t know if there is a rule of thumb on how many direct reports to each manager but the manager should be able to manage weekly check-ins with each direct report. It also helps to know who the other reports are to get an idea of the vastness your manager will have to juggle. A manager directing a bunch of data scientists or analysts is very different than managing a broader team with the subject matter expert covering communications, operations, finance, etc. That second manager will have to do more context switching on a team versus a manager managing a team of analysts. However, the first manager might have an analyst in each department and will have to understand the nuances that lie in the data relating to each department but they won’t have to know as much as the second manager since the analyst is the middle man to each department and everyone will share similar tools eg: SQL and Python or R.

    他们有多少直接报告,他们是谁? 我不知道向每个经理报告多少直接报告是否有经验法则,但是经理应该能够管理每个直接报告的每周签到。 这也有助于了解其他报表是谁,从而使您了解经理必须处理的工作量。 领导一群数据科学家或分析师的经理与管理范围更广的团队(与主题专家沟通,运营,财务等)截然不同。与管理一个团队的经理相比,第二个经理必须在团队中进行更多的上下文切换。分析师团队。 但是,第一位经理可能在每个部门中都有一位分析师,并且必须了解与每个部门相关的数据中的细微差别,但由于第二位经理是分析师的中间人,因此他们不必像第二位经理那么多每个部门和每个人都将共享类似的工具,例如:SQL和Python或R。

  4. What are some of the direct impacts the manager has done for the team? Any upper management title instantly makes you think of meetings. So sometimes it is hard to see the direct impact unless you are in those meetings with that person. However, managing a technical team the tangible impact they should be producing/ facilitating processes, projects, data products, models, case studies, webinars, etc for your team. Like in that book I mentioned earlier, the manager is a huge part of not only aiding your brand image but representing the brand image of your team. They may not be doing the analysis themselves but they should be helping to provide the environment where everything mentioned beforehand can be executed by investing in the right tools, people, and creating those processes.

    经理对团队产生了哪些直接影响? 任何高级管理人员职位都会立即让您想到会议。 因此,有时除非您正在与该人开会,否则很难看到直接的影响。 但是,管理技术团队应该对他们产生/促进流程,项目,数据产品,模型,案例研究,网络研讨会等产生切实的影响。 就像我之前提到的那本书一样,经理不仅是帮助您的品牌形象而且代表着团队的品牌形象的重要组成部分。 他们可能不会自己进行分析,但是应该通过提供合适的工具,人员并创建这些流程来帮助提供一个可以执行之前提到的所有内容的环境。

  5. What is the road map for their team and for you that they have in mind? This blends with the previous question on how have the manager contributed to the ecosystem of the team. Sometimes the manager is building the team from scratch and still defining some of those processes, however; they should at least have a road map of where they eventually want to be. It is a huge red flag if the manager can not articulate this road map, even if there is a lot to maintain. Sometimes a manager inherits legacy systems and tries to optimize or migrate them into a new world but there should always be a clear road map on why his or her team is focusing on X and how your role plays into it. This question will bring up what are the roadblocks right now and how they think of their team adds value to the company.

    他们对他们的团队和您的想法是什么? 这与先前的问题有关,即经理如何为团队的生态系统做出贡献。 有时经理会从头开始建立团队,但仍然定义其中一些流程。 他们至少应该有最终目标的路线图。 如果经理不能阐明这个路线图,即使有很多需要维护的地方,也是一个巨大的危险信号。 有时,经理会继承旧系统,并尝试对其进行优化或迁移到新的世界中,但是对于他或她的团队为何专注于X以及您的角色如何发挥作用,应该始终有明确的路线图。 这个问题将提出目前的障碍,以及他们对团队的看法如何为公司增加价值。

  6. What is the road map for your career? Does the company have an annual or six-month reviews? A company I was at, your bonus is tied to your annual review which I liked because it would force your manager to recognize your weaknesses and your strengths and reward you for them. This also helped maintain a record so made promotions a bit more seamless. Versus another company, I was at they had you on a probation period for six months and after that, there were no more annual reviews. The latter seemed more like the company cared for themselves versus the employee. Not every company has a clean-lined out promotion plan with dates and expectations (which is ideal) but there should be some sort of emphasis on career growth whether that is tuition reimbursement, professional development classes, bonuses, raises, etc.

    您的职业路线图是什么? 公司是否有年度或六个月的审查? 在我所在的公司中,您的奖金与我喜欢的年度检查挂钩,因为这会迫使您的经理认识到您的弱点和长处并为他们奖励。 这也有助于保持记录,使促销活动更加顺畅。 与另一家公司相比,我在他们的试用期为六个月,此后便没有年度审核了。 后者似乎更像是公司关心自己而不是员工。 并非每家公司都有一个明确的日期和期望的晋升计划(这是理想的),但无论是学费报销,专业发展课程,奖金,加薪等,都应该在某种程度上强调职业发展。

主题编号3:协作/团队环境。 (Theme Number 3: Collaboration / Team environment.)

  1. Is there code review done at your company and what is the process like? If you work with code, pipelines, models, analysis, etc there should ABSOLUTELY be a code review. I have had both types of experiences where the code review was heavily implemented versus another company where I implemented the code review process. Both companies had a Slack channel where we would be alerted when code was pushed to production. If a company pushes code from development → staging → production then that is a good indication they know what they are doing.

    贵公司是否进行过代码审查,流程如何? 如果您使用代码,管道,模型,分析等,则应该绝对进行代码审查。 我经历过两种类型的经历,即大量实施代码审查,而另一家公司则实施了代码审查过程。 两家公司都有一个Slack渠道,当将代码推送到生产环境时,我们会收到警报。 如果公司从开发→阶段→生产中推送代码,那么这很好地表明他们知道自己在做什么。

  2. What are the types of tools I would use? There is a balance of too many tools and too little tools. At least for a team analyzing data, there should be a database (Redshift, Postgres, Snowflake, etc), a visualization tool for non-data analytics shareholders also known as business intelligence tool (Sisense, Looker, Domo, etc) — this is different than the data people using open source to visualize data during exploratory data analysis, a version control (Github, Gitlab), containers for software building (eg Docker), data management of pipelines (Airflow or Luigi), communication tools (email, Slack, Zoom, JIRA, etc).

    我将使用哪种工具? 工具太多和工具太少之间存在一种平衡。 至少对于一个分析数据的团队来说,应该有一个数据库(Redshift,Postgres,Snowflake等),一个用于非数据分析股东的可视化工具,也称为商业智能工具(Sisense,Looker,Domo等),这是不同于在探索性数据分析过程中使用开源工具可视化数据的数据人员,版本控制(Github,Gitlab),用于软件构建的容器(例如Docker),管道的数据管理(Airflow或Luigi),通信工具(电子邮件,Slack ,缩放,JIRA等)。

  3. Do I work with more than one data scientist or analyst or am I the only one? This question is VERY important especially if you are an entry-level because not only do you want your supervisor to speak your same language but you also want to have others where you can learn from. You might be dedicated to different departments/stakeholders/projects but the fact that someone else with a similar skill set is comforting especially if your team believes in cross-training and having time set aside to collaborate, share project results, and do retrospects together.

    我是否与一位以上的数据科学家或分析师合作,或者我是唯一的一位? 这个问题非常重要,特别是在您是入门级的情况下,因为您不仅希望您的主管说相同的语言,而且还希望有其他可以学习的地方。 您可能致力于不同的部门/利益相关者/项目,但是事实是,其他具有类似技能的人会感到很安慰,尤其是如果您的团队相信交叉培训并留出时间进行协作,共享项目成果并共同回顾。

  4. Is there a subject matter expert? This is so so so important. I can’t stress this enough. Yes as a Data Scientist you can build a model with high accuracy, high AUC score, or F1 score but will it be usable/ implemented? It is important that Data scientists have as much context to the data as possible, whether it be from their background or access to a subject matter expert. At the end of the day, we build models for the company or a stakeholder to use. Also, a complex model could be beaten out in terms of training time/money by a simpler more explainable model like logistic regression just because some great feature engineering was used.

    有主题专家吗? 这是如此重要。 我不能太强调这一点。 是的,作为数据科学家,您可以构建具有高精度,高AUC分数或F1分数的模型,但是它将可用/实施吗? 重要的是,无论是从背景还是与主题专家接触,数据科学家都应尽可能多地了解数据。 归根结底,我们为公司或利益相关者建立模型以供使用。 同样,就复杂的模型而言,在训练时间/金钱方面,可以通过更简单,更易解释的模型(如逻辑回归)来击败,因为使用了一些出色的功能工程。

  5. What are the culture and work-life balance like? This question is not specific to data science teams but it is important to know how not only within teams reporting to the same manager but how do you communicate with the other departments you will be working with? I worked at a startup where we would have bi-weekly coffee dates that were set up with other people from different teams versus worked at a company where people did not know each other unless they worked there for almost ten years, there was no socializing across different departments and that definitely hinders companies at times especially if you are all working on a common goal that is specific. I believe a good data science team has done some socializing because it is an immense help to have context to data from processes.

    文化和工作与生活的平衡如何? 这个问题不是特定于数据科学团队的,但重要的是要知道不仅在团队中向同一个经理汇报,而且如何与将与之合作的其他部门沟通? 我在一家初创公司工作,在那里我们会与来自不同团队的其他人一起安排每两周一次的咖啡约会,而不是在一家彼此不认识的公司工作,除非他们在那里工作了将近十年,各个部门,这有时会阻碍公司发展,尤其是当您都在为特定的共同目标而努力时。 我相信一支优秀的数据科学团队已经进行了一些社交活动,因为对流程数据具有上下文有极大的帮助。

Thank you for your time! I hope this is helpful and if you have any thoughts please discuss below and/or want to reach out, you can at www.monicapuerto.com.

牛逼绞纱您的时间! 希望对您有所帮助,如果您有任何想法,请在下面进行讨论和/或与我们联系,请访问www.monicapuerto.com

