什么事数据科学
No way. No freaking way to enter data science any time soon…That is exactly what I thought a year back.
没门。 很快就不会出现进入数据科学的怪异方式 ……这正是我一年前的想法。
A little bit about my data science story: I am a complete beginner in the Data Science field and I was desperately looking for a switch from digital marketing to data science exactly 6 months back. I assume you may want to ask..why desperately? Well, Because I became over confident in my job hunting abilities and resigned my ex-job without a backup. I started panicking during the last few days of my notice period. All the courses and tutorials available online and just the vast number of topics I had to cover to get started in data science was overwhelming for me. They say time flies and boy do I agree! It has already been half a year into my first data science job. I cannot wait to share all the learnings and experiences with you. If you are currently in the same shoes as I was, go on and keep reading for insights and motivation.
关于数据科学的故事:我是数据科学领域的一个完整的初学者,而我拼命地希望在6个月前从数字营销转向数据科学。 我想你可能想问..为什么要拼命? 好吧,因为我对自己的求职能力变得过于自信,并辞掉了我的前工作而没有后援。 在通知期的最后几天,我开始惊慌失措。 在线提供的所有课程和教程,以及我在数据科学入门中必须涵盖的大量主题,对我来说是不胜枚举的。 他们说时光飞逝,男孩,我同意! 我的第一份数据科学工作已经半年了。 我迫不及待想与您分享所有的学习和经验。 如果您目前的状态与我相同,请继续阅读以获取见识和动力。
Practice more than you read:
练习比阅读更多:
I remember going through every single data science boot camp course available in Udemy and buying a couple of top rated courses that covered Python, SQL, Tableau and Machine Learning topics (Pro tip: Don’t go for generic “Data Science boot camps”. These courses don’t cover important topics in depth. Instead, try tool-specific boot camps like python boot camp, SQL boot camp, Deep Learning boot camp etc.). The courses were all detailed and honestly very helpful. But even after all the 50+ hours of lectures and many assignments, I was still someone with no data science experience. Even the basic analysis tasks in the first month of my job were relatively difficult for me. I was absolutely struggling to meet deadlines.
我记得我要遍历Udemy中的每个数据科学新手训练营课程,并购买几个涵盖Python,SQL,Tableau和机器学习主题的最受好评的课程(专业提示:不要参加通用的“数据科学新手训练营”。这些课程没有深入介绍重要的主题,而是尝试使用特定于工具的新手训练营,例如python新手训练营,SQL新手训练营,深度学习新手训练营等 。 这些课程都很详尽,说实话非常有帮助。 但是即使经过了50多个小时的讲座和许多任务,我仍然还是没有数据科学经验的人。 就连我上班第一个月的基本分析任务对我来说都是相对困难的。 我绝对难以按时完成任务。
Looking back, I feel that I focused more on learning and less on practicing. I listened to all the lectures which covered new topics in every lecture, did some teeny tiny assignments and thought I am doing it all the right way. However, I think of it all very differently now. Learning should be through practicing and implementing new ideas. That is when you make mistakes, observe new things, research on how to code the solution in a better way and you know..really learn. This certainly happened after starting my latest job as I had to work on new ideas every day and implement them. Trust me, that is when I picked up actual skills. If you are in the online course phase, spare some time to build projects and implement the topics you learned.
回顾过去,我觉得我更多地专注于学习而不是练习。 我听了所有讲座,每次讲座都涵盖了新主题,做了一些小小的小作业,并以为我做得很好。 但是,我现在对这一切的看法截然不同。 学习应该通过实践和实施新思想来进行。 那就是当您犯错,观察新事物,研究如何以更好的方式编写解决方案的代码时,您才真正了解。 这肯定是在开始我的最新工作后发生的,因为我每天必须研究和实施新的想法。 相信我,那是我掌握实际技能的时候。 如果您处于在线课程阶段,请花一些时间来构建项目并实施您学到的主题。
2. Coding skills:
2.编码技巧:
Most people who try to enter this field have a slight misconception that data science involves relatively less coding than software engineering. There is a little bit of truth to it. Because if you take Python which is the widely used language in data science, there are built-in libraries for almost all types of algorithms and operations. Though these libraries are very helpful, there is only so much they can do. I for one thought that data science is all about data analysis, plots, model fitting, prediction and accuracy metrics. These things are of course a part of it but software engineering is another huge part too. For example, when you want to build a production level product recommendation engine pipeline, you will have to work on many things like SQL scripts, data sync, training, tuning, prediction, evaluation frameworks, unit testing, logging, dashboards, admin panel, model deployment, version control and so much more. All of this combined involves a hell lot of critical thinking and coding. This is the kind of stuff you will work in the long run or maybe in your first few months! I am not saying that you need to know everything about coding everything but some level of proficiency in coding will be needed and also useful for you.
大多数尝试进入该领域的人都有些误解,认为数据科学涉及的编码少于软件工程。 有一点道理。 因为如果您使用Python(这是数据科学中广泛使用的语言),那么几乎所有类型的算法和操作都有内置的库。 尽管这些库非常有用,但是它们只能做很多事情。 我曾经以为,数据科学就是关于数据分析,图表,模型拟合,预测和准确性指标的全部。 这些当然是其中的一部分,但是软件工程也是另一个重要部分。 例如,当您要构建生产级别的产品推荐引擎管道时,您将需要处理许多事情,例如SQL脚本,数据同步,培训,调整,预测,评估框架,单元测试,日志记录,仪表板,管理面板,模型部署,版本控制等等。 所有这些结合在一起涉及大量的批判性思维和编码。 从长远来看,或者您可能会在头几个月中使用这种东西! 我并不是说您需要了解有关一切编码的所有知识,但是将需要一定程度的编码熟练度,并且对您也很有用。
3. No pressure to learn every single data science tool:
3.没有学习每个数据科学工具的压力:
There are way too many data science tools in the market and it can be quite confusing to find where to start. The best option is to learn one data science friendly coding language, one database tool and one visualization tool. This is a good way to begin with and is like the basic requirement for many entry level roles. When you are just laying the foundation, don’t pressure yourself to learn too many tools. Instead, take things slowly. Understand the basics and explore topics in depth in whatever tool you learn. You will eventually learn many tools when you are in the job due to project requirements or just while working on your passion projects.
市场上有太多的数据科学工具,很难找到从哪里开始。 最好的选择是学习一种数据科学友好的编码语言,一种数据库工具和一种可视化工具。 这是开始的好方法,就像许多入门级角色的基本要求一样。 当您只是奠定基础时,不要强迫自己学习太多的工具。 相反,慢慢来。 了解基础知识,并以所学的任何工具深入探讨主题。 由于项目要求或在从事激情项目时,您最终将在工作中学习许多工具。
I started with Python, SQL and Tableau when I was searching for a job. Nothing more. Now I know to work on a couple of other tools like Spark, Hbase, Kibana, Dash, Elasticsearch and Logstash. I am sure I will have to learn new tools in the coming days. The point is, learn a tool with utmost clarity of how it will be useful for your requirement.
在寻找工作时,我从Python,SQL和Tableau开始。 而已。 现在我知道要使用其他几个工具,例如Spark,Hbase,Kibana,Dash,Elasticsearch和Logstash。 我敢肯定,未来几天我将不得不学习新工具。 重点是,要学习一种最清楚如何满足您的需求的工具。
4. You are ready to take interviews:
4.您准备接受采访:
Tell that to yourself whenever you feel like skipping an interview call or meeting because your brain is telling you that you are not going to make it. I cannot remember the number of times I learned something new while attending an interview. It is either about the data science industry or new products or just a concept. I am not suggesting you to attend interviews randomly to learn stuff. It would be an obvious waste of time for the poor interviewer. Data science is a vague term and so are the job requirements for every data science role. You might never feel ready if you want to tick every single job requirement before attending an interview.
每当您想跳过面试电话或会议时告诉自己,因为您的大脑告诉您您不会参加。 我不记得参加面试时学习新知识的次数。 它与数据科学行业或新产品有关,或者只是一个概念。 我不建议您随机参加面试以学习知识。 对于可怜的面试官来说,这显然是浪费时间。 数据科学是一个模糊的术语,每个数据科学角色的工作要求也是如此。 如果您想在参加面试之前打勾每个工作要求,您可能永远也不会做好准备。
The preparation phase can be a long one too. It depends on your learning speed and prior knowledge. It is very easy to get stuck in that phase because there are too many topics to cover. Set goals during interview preparation and as you achieve those goals, start looking for interview opportunities. Every time you fail an interview, you will find the need to improve on a particular area or learn a new market requirement. And that my friend will help you in the next interviews.
准备阶段也可能很长。 这取决于您的学习速度和先验知识。 由于涉及的主题太多,因此很容易陷入这一阶段。 在准备面试时设定目标,并在实现这些目标时开始寻找面试机会。 每次面试失败时,您都会发现需要改进特定领域或了解新的市场需求。 我的朋友会在下次面试中为您提供帮助。
5. Ideal companies to apply for data science roles
5.申请数据科学职位的理想公司
Usually, people are flexible about roles and companies when applying for interviews as beginners. But if you are wondering what is the type of company in which you should apply for a data science role, it is completely subjective. Let us talk about product-based and service-based companies from a data science perspective. Service companies usually work on one-time data analysis or prototype whereas product companies involve rigorous software development and data analysis is just a part of it. Python, R. Powerpoint and Excel will do the job for you most of the days in service companies whereas product companies will want you to work on whatever tool is required to do the job. Basically, product companies will involve a lot of software engineering in addition to data analysis.
通常,在初学者申请面试时,人们会灵活选择角色和公司。 但是,如果您想知道应申请数据科学职位的公司类型,那完全是主观的。 让我们从数据科学的角度谈谈基于产品和基于服务的公司。 服务公司通常从事一次性数据分析或原型工作,而产品公司则涉及严格的软件开发,而数据分析只是其中的一部分。 在服务公司中,Python,R。Powerpoint和Excel大部分时间都可以为您完成工作,而产品公司则希望您使用所需的任何工具来完成工作。 基本上,产品公司除数据分析外还将涉及许多软件工程。
They work on projects that will help them to improve their products by incorporating data science in them or they make new data based products like product recommendation engine, AI-based chatbots etc. or they just use analytics to make better decisions in the organization. Service companies work on analytics projects purely based on client requirements. So like I said it is up to your interests. Choose wisely!
他们从事的项目将通过整合数据科学来帮助他们改善产品,或者开发基于新数据的产品,例如产品推荐引擎,基于AI的聊天机器人等,或者他们只是使用分析方法在组织中做出更好的决策。 服务公司纯粹根据客户需求来进行分析项目。 因此,就像我说的那样,这取决于您的兴趣。 做出明智的选择!
6. Data Science can be frustrating:
6.数据科学可能令人沮丧:
Data-based problems are very interesting to work on but some can be equally frustrating too. One of the difficult aspects of your work will be just to patiently wait for good results. Often you might not know whether you are going in the right direction. There are too many unknowns and a lot of things in your project will require plain trial and error to arrive at an optimal solution. Like they say it is all fun and games till you reach the hyper-parameter tuning part of your model :)
基于数据的问题非常有趣,但是有些问题同样令人沮丧。 工作的困难之处之一就是耐心等待良好的结果。 通常,您可能不知道自己是否朝着正确的方向前进。 未知数太多,您项目中的许多事情都需要经过反复试验才能得出最佳解决方案。 就像他们说的那样,这很有趣,也很有趣,直到您到达模型的超参数调整部分为止:)
Most of us do a Proof of Concept before implementing any solution. But sometimes even POCs fail to give insights about certain hiccups you might face during the actual task. For example, once at work, we spent an entire month researching and implementing a solution for our pipeline. It eventually didn’t work out. We had to start all over again and this caused a huge progress lag in the supposedly well-performing project. The key take away from a couple of incidents like this is that always set clear goals, evaluate your POC thoroughly and when stuck at a point for too long, just remember to try fast, fail fast, evaluate fast and try again fast. Being fast is super important for good progress.
我们大多数人在实施任何解决方案之前都要进行概念验证。 但是有时候,甚至POC都无法提供您在实际任务中可能遇到的某些打h的见解。 例如,一旦上班,我们就花了整整一个月的时间研究和实施管道解决方案。 最终没有奏效。 我们不得不重新开始,这在原本表现良好的项目中造成了巨大的进度滞后。 避免发生此类事件的关键是始终设定明确的目标,彻底评估POC,并且在某个时间停留太长时间时,请记住要快尝试,快失败,快评估并再试一次。 快节奏对于取得良好的进步至关重要。
7. Your storytelling skills will matter a lot:
7. 您的讲故事技巧非常重要:
You will most likely be dealing with customers from non-technical backgrounds. Your organization leaders may not be data scientists. Your own teammates might be from diverse backgrounds (pure mathematicians, some API users etc.). These are the people who will recognize your work and will add value to your work.
您很可能会与非技术背景的客户打交道。 您的组织负责人可能不是数据科学家。 您自己的队友可能来自不同的背景(纯数学家,某些API用户等)。 这些人将认可您的工作并为您的工作增添价值。
It is so important that you communicate your thoughts, ideas, analyses and results in an interactive and understandable way to your audience. I clearly remember struggling in my first team meeting with the CEO where we had to explain the progress in projects, discuss use cases and future AI goals. That is when it hit me that sticking to numbers and just analytical skills are not enough. A good story explaining the analysis can interest your manager. A story explaining how a particular data science solution can solve the pain point of a problem can interest your customer. Different stories have different impacts on different people. Frame your story carefully with data science elements like visualizations, dashboards, reports etc. and put your everything in it while delivering it.
以互动和易于理解的方式与听众交流思想,想法,分析和结果非常重要。 我清楚地记得,在与首席执行官的第一次团队会议中,我们不得不解释项目的进展,讨论用例和未来的AI目标时遇到的困难。 那就是让我感到震惊的是,仅仅依靠数字和仅仅分析技能是不够的。 讲解分析的好故事会让您的经理感兴趣。 解释特定数据科学解决方案如何解决问题痛点的故事可能会使您的客户感兴趣。 不同的故事对不同的人有不同的影响。 借助可视化,仪表板,报告等数据科学元素精心构建故事,并在交付时将所有内容放入其中。
Final Thoughts:
最后的想法:
Data Science is no rocket science. If I can do it, then you can do it too! There is no good time as now to enter this fast-growing field. That being said, it definitely gets a little bit tough to keep up with all the new things happening in this field and the competition. But, what matters is that we learn, implement, make mistakes and grow consistently. Happy analyzing:)
数据科学不是火箭科学。 如果我可以做到,那么您也可以做到! 现在没有进入这个快速增长领域的好时机。 话虽这么说,要跟上该领域和竞争中发生的所有新事物肯定会有点困难。 但是,重要的是我们学习,实施,犯错误并不断成长。 分析愉快:)
翻译自: https://medium.com/swlh/7-things-you-must-know-if-youre-trying-to-enter-data-science-2a9a531750e0
什么事数据科学