

There is no better teacher than experience. No matter how many stories a person hears or how many different scenarios they think about, He/She will find no substitute for real life experienceMy romance with Data Science is no different.

没有比经验更好的老师了。 不管一个人听到多少故事,或者他们想过多少种不同的情景, 他/她都将找不到现实生活中的替代品。我对数据科学的恋爱没有什么不同。

Before ACTUALLY working for a company on real-life projects, My experience and understanding of Data Science were confined to “neat and clean” academic Datasets from Kaggle or hour-long tutorials on Youtube.


Getting close to 78 percent accuracy on the Titanic Dataset gave me that false confidence that I was a soon to be “hot commodity” in the field and don’t get me wrong here.


Only the top 5 percent of the people who have participated in the Titanic Dataset has scored more than 80 percent and that is a massive feat but It’s only the beginning.


So here are the five things I learned in my first week as an internee


  1. Real Life Data is going to humble you


Image for post
Photo by Carlos Muza on Unsplash
卡洛斯·穆萨 ( Carlos Muza)Unsplash拍摄的照片

That’s right. Real Life data is leaps and bounds different than the academic or well-structured datasets commonly found on Kaggle.

那就对了。 现实生活中的数据与Kaggle上常见的学术数据或结构良好的数据集有很大的不同。

It’s unclean, It’s highly unstructured and it’s messy to the point that you will probably go through the whole day trying to make sense of what is on the Screen.

这是不干净的,它是高度结构化的,而且很混乱 ,以至于您可能会整天试图了解屏幕上的内容。

Now, I am not saying Datasets like those are not available on the internet, But the real-life dataset I went through had close to a million rows with null values scattered all over the place and based on multiple excel sheets too.


I walked into the office like someone “who knew their business”, talking to different people on whether they participated in the Titanic Dataset. Only to leave at the end of the day extremely humbled and realizing that I stood nowhere and had A LOT of ground to cover. So to summarize:

我像一个“知道他们的生意”的人走进办公室,与不同的人讨论他们是否参与了《泰坦尼克号数据集》。 只是在一天结束时离开,我感到非常谦虚,意识到我无处可坐,有很多地面可以掩盖。 总结一下:

The more you know, The more you know you don’t know.


2. Excel is your BEST Friend

2. Excel是您的最佳朋友

Image for post
Photo by Mika Baumeister on Unsplash
Mika BaumeisterUnsplash拍摄的照片

I cannot emphasize this point enough. All of the youtube tutorials I went through go over the importance of Python and R and I completely agree with them. These languages are extremely important to learn but I also believe that Excel is JUST as important if not more. A lot of companies have all of their records stored in Excel sheets and if you don’t how to work your way around Excel, You’re going to end up wasting A LOT of time and energy. The ability of Excel to create Pivot tables, remove null values, and overall well present the data is truly underestimated.

我不能足够强调这一点。 我经历的所有youtube教程都超过了Python和R的重要性,我完全同意。 这些语言对于学习非常重要,但我也相信Excel甚至同样重要。 许多公司的所有记录都存储在Excel工作表中,如果您不按照Excel的方式工作,您将最终浪费大量时间和精力。 Excel创建数据透视表,删除空值以及整体呈现​​数据的能力确实被低估了。

3. Focus on One Domain

3. 专注于一个领域

Image for post
Photo by Scott Graham on Unsplash
Scott GrahamUnsplash拍摄的照片

Data Science in itself is not one particular field. It can and is used in multiple domains such as finance, health, manufacturing, retail, and telecom just to name a few and that’s why Domain knowledge is a MUST to understand the underlying problem.|It’s not possible for an individual to be knowledgeable in all of the mentioned domains and that’s why, only one domain should be focused on and built expertise on.

数据科学本身并不是一个特定的领域。 它可以并且在多个领域中使用,例如金融,健康,制造,零售和电信,仅举几例,这就是为什么领域知识是必须理解潜在问题的原因。 一个人不可能在上述所有领域都具有知识 ,这就是为什么只应集中一个领域并建立专门知识的原因。

4. Focus on the business value of your solution

4. 关注解决方案的商业价值

Image for post
Photo by Nikita Kachanovsky on Unsplash
Nikita KachanovskyUnsplash拍摄的照片

Being a student of the field, I tend to use technical jargon a lot with my fellow aspiring Data Scientists BUT the thing is, Most clients who will receive the end product have little to no knowledge of the technicalities that go into making a machine learning model perform well. They are just interested in ONE simple thing and that is, “How can your solution add value to our business?”.

作为该领域的学生,我倾向于与有抱负的数据科学家一起经常使用技术术语,但事实是,大多数将收到最终产品的客户几乎不了解制作机器学习模型的技术知识表现良好。 他们只对一件简单的事情感兴趣,那就是“您的解决方案如何为我们的业务增加价值?”。

The client does not care whether you used KNN or Logistic Regression and quite frankly, He wouldn’t even know how to differentiate the two.


And that’s why when trying to communicate findings to prospective clients, you need to make sure the other person understands what it is that you are trying to communicate. So in short:

这就是为什么在尝试与潜在客户交流发现结果时,您需要确保对方了解您正在尝试交流的含义。 简而言之:

Your technical jargon doesn’t matter if the other person doesn’t see the value in your solution.


5. Being a good team player is a MUST

5. 做好团队合作是必须的

Image for post
Photo by Campaign Creators on Unsplash
Campaign CreatorsUnsplash拍摄的照片

Unlike, the academic Datasets I worked where I did all of the Data preprocessing, Model preparation, etc, Real life projects involve a WHOLE lot of team play. It’s just not possible for one individual to see a project from conception to completion and that is why the ability to work as a team is extremely important.

与我在所有数据预处理,模型准备等工作过的学术数据集不同,现实生活项目涉及整个团队的大量工作。 一个人从构思到完成都不可能看到一个项目,这就是为什么团队协作的能力非常重要的原因。

结论: (Conclusion:)

My first day as a Data Science internee humbled me. I learned A LOT about what goes into making a successful Data Science project and if I were to give my two cents to anyone starting in the field, I would say, get into a startup or a firm even as an internee and get your hands dirty with real-life projects. Understand the value of your what it is that you are trying to do and be a good team player.

作为数据科学实习生的第一天让我感到沮丧。 我学到了很多有关成功完成数据科学项目的知识,如果我将两分钱捐给从事该领域工作的任何人,我会说,即使是被拘禁的人,也要进入一家初创公司或一家公司, 弄脏你的手与现实生活中的项目。 了解您想要做的一个好团队成员的价值。

Does my learning end here?


I believe it’s just getting started.


翻译自: https://medium.com/swlh/my-first-week-as-an-internee-at-a-data-science-startup-57785368cf9f






