用一年的数据预测下一年数据_担任数据科学家一年后我学到了什么

用一年的数据预测下一年数据

介绍(Intro)

In October 2019 started my journey as a data scientist. Here I want to share my experience, tips, and things I wished I knew before to avoid you getting burned.

从2019年10月开始我作为数据科学家的旅程。 在这里,我想分享我的经验,技巧和我希望我以前知道的事情,以避免您被烫伤。

At the beginning of October, I will turn one year working as a data scientist at Evo. Creating awesome tools with cutting-edge technologies applied to the retail and supply chain industry.

10月初,我将在Evo担任数据科学家一年。 使用适用于零售和供应链行业的尖端技术创建出色的工具。

This year has been an incredible rollercoaster, I have learned a lot of things, and I consider myself very lucky to be part of an awesome team, which strives for perfection while being helpful, motivating, and competent.

今年是令人难以置信的过山车,我学到了很多东西,我认为自己很幸运能成为一支很棒的团队的一,该团队力求完美,同时提供帮助,激励和胜任力

You are the Average of the 5 People You Spend the Most Time With.

您是花费最多的5个人的平均值。

Well, I don’t know if it’s true or not, but if true I am definitely going for it.

好吧,我不知道这是不是真的,但如果是真的,我肯定会这么做

One of the things that scared me the most was when the pandemic (and when the lockdown) hit: I was afraid I would not be able to deliver the same way as before, since the lack of the office environment, colleagues, and the possibility to ask for help to the colleague next to you simply disappeared from one day to another.

最令我恐惧的是大流行(以及封锁)时:我担心我将无法以以前的方式提供服务,因为缺乏办公环境,同事以及缺乏可能性向您旁边的同事寻求帮助的过程从一天到一天都不见了。

Image for post
cottonbro from cottonbro,来自 Pexels Pexels

Luckily, as often, it turned out it was just a matter of getting used to it. As often we are naturally scared of new things, however, once we fall into the water either we (try to) swim or we sink.

幸运的是,事实证明,这仅仅是适应它的问题。 通常,我们天生就害怕新事物,但是,一旦我们掉入水中,我们就会(尝试)游泳或下沉。

我希望我以前知道 (What I wished I knew before)

Doing Data Science outside the academia or, to be clear, in a professional environment has many differences that not always are taught or explained:

学术界之外,或者要明确地说,在专业环境中进行数据科学,存在许多差异,而这些差异并非总是可以教授或解释的:

  • In a professional environment, you are not focusing on a single technology (eg. Python), rather you are in the middle of an ensemble of different technologies (SQL, Docker, R, Python, Bash…).

    在专业的环境中,你是不是集中在一个单一的技术(例如Python)的,而你是在不同的技术集合的中间(SQL,码头工人,R,Python和猛砸......)。

  • Sometimes the old spreadsheet is the way: managers are not always acquainted with development tools such as pandas, dplyr, or matplotlib.

    有时候,旧的电子表格就是这样:管理者并不总是熟悉诸如pandasdplyrmatplotlib类的开发工具。

Image for post
energepic.com from energepic.com,来自 Pexels Pexels
  • You have to be able to demonstrate and measure everything: sometimes you have tasks you need to work alone, and your teammates do not know much about the task, or even the technology you involved in solving the task. Be ready to write down results, and be able to deliver reports measuring the impact of your work.

    您必须能够演示衡量所有事情:有时您需要独自完成一些任务,而您的队友对任务甚至是解决任务所涉及的技术都不了解。 准备写下结果,并能够提供评估工作影响的报告。

  • Machine learning is not a black-box: if delivering a solution that includes some machine learning running in the back, you must able to explain and understand each prediction created by the model.

    机器学习不是黑匣子:如果提供的解决方案包含一些在后面运行的机器学习,则您必须能够解释理解模型创建的每个预测。

  • Everyone has its own opinions and points of view: be able to listen and respect other people’s views, and always be ready to change your mind.

    每个人都有自己的意见和观点:能够倾听并尊重他人的观点,并随时准备改变主意。

  • Very often you need to take a step back: when working on a single feature of a product it is very easy to get overwhelmed and lose the focus on the project. This is very dangerous. Keep always in mind why you are working on a particular thing and consider it always as a piece of a puzzle. Not a standalone entity.

    很多时候,您需要退后一步:在处理产品的单个功能时,很容易感到不知所措,而失去了对项目的关注。 这是非常危险的。 请始终记住为什么要处理特定的事物,并始终将其视为难题。 不是独立的实体。

Image for post
Photo by Ivan Bertolazzi from Pexels
PexelsIvan Bertolazzi
  • Do not be a fanboy: we all have our favorite technologies, libraries, cloud providers, or frameworks, however sometimes, we won’t have the possibility to work with our beloved tools, hence be ready and embrace the situations in which you need to use (and learn from scratch) a different stack.

    别成为狂热者:我们都有我们最喜欢的技术,库,云提供商或框架,但是有时,我们将无法使用我们心爱的工具,因此请做好准备并适应需要应对的情况使用(并从头开始学习)其他堆栈。

Remember, as a junior, you don’t have to know everything about a little, rather a little about everything!

请记住,作为一个初中生,您不必了解一点点什么,而应该了解一点点!

  • You will very often need to do different operations using spreadsheets, last but not least: spreadsheets are (still) often the to-go reporting tool. Get comfortable with operations such as pivot tables, VLOOKUP , IF , creating (and formatting) charts.

    最后,您经常需要使用电子表格执行不同的操作:最后(但并非最不重要):电子表格经常是(仍)要使用的报告工具。 熟悉数据透视表, VLOOKUPIF ,创建(和格式化)图表等操作

  • SQL: I could not mention how important is to (at least) know the basics of SQL. Data is very often stored in databases meaning that being familiar with SQL is a must. Get comfortable with joins, select statements, group by with least one SQL flavor (MySQL, MSSQL, Redshift).

    SQL:我无法提及(至少)了解SQL基础的重要性。 数据通常存储在数据库中,这意味着必须熟悉SQL。 熟悉joinsselect语句,并按至少一种SQL风格(MySQL,MSSQL,Redshift)进行分组

Image for post
Source: Wikipedia
资料来源:维基百科
  • Understand how different technologies communicate with each other. While working on a task you might need to connect to a database to get the data, get comfortable with libraries to connect to databases in your to-go language.

    了解不同的技术如何相互通信。 在执行任务时,您可能需要连接到数据库以获取数据,并习惯使用库以使用多语言连接数据库。
  • Bash/Powershell scripting: there is a good chance you will encounter things such as CRONjobs. Familiarize yourself with the Linux and bash environment, and learn how to launch scripts using arguments from the terminal, this is how often scripts are launched.

    Bash / Powershell脚本:您很可能会遇到诸如CRON作业之类的事情。 熟悉Linux和bash环境,并学习如何使用终端中的参数启动脚本,这是脚本启动的频率。

Image for post
Docopt, you will definitely hear of that one day… Docopt ,您一定会听说这一天的...
  • Another very common situation is when data comes from an API, it is worth understanding how APIs works, more specific concepts such as token authentication, JSON, and HTTP codes.

    另一个非常常见的情况是,当数据来自API时,值得了解API的工作原理,以及更具体的概念,例如令牌认证,JSON和HTTP代码。

  • If data is not stored on a database, or coming from an API there is a good chance you will work with Buckets, buckets are simple storage services provided by AWS where users can store different files such as .csv , images, and so on.

    如果数据未存储在数据库中或不是来自API,则很有可能您将使用Buckets ,bucket是AWS提供的简单存储服务,用户可以在其中存储.csv ,图像等不同文件。

Image for post
An example of how an S3 Bucket can be used.
如何使用S3存储桶的示例。
  • Working with big data: while learning or at university we are taught how to read data from files such as .csv , we call a small function such as .read_csv and automagically we have our data ready for manipulation. In real life, often, you won’t be able at all to perform something like this. Data collections are too big to fit in memory, and different strategies, such as reading files in chunks, using technologies such as spark, or working on smaller samples of data.

    处理数据:在学习或上大学时,我们被教导如何从诸如.csv文件中读取数据,我们调用诸如.read_csv类的小函数,并自动将数据准备好进行操作。 在现实生活中,通常您根本无法执行类似的操作。 数据收集都太大,不适合在内存中,不同的策略,如阅读文件,使用的技术,如spark ,或更小的数据采样工作。

  • Docker: to keep the whole team on track, avoid dependencies issues between dev and production environment, or simply to avoid the infamous phrase “it works on my machine”, Docker provides and infrastructure to manage environments (called containers). I would advise to familiarize yourself with Docker and its basic commands such as docker-compose and docker-build and understand what a registry is. This will save you a lot of time.

    Docker:为了保持整个团队的正常运转,避免开发环境和生产环境之间的依赖性问题,或者只是为了避免使用臭名昭著的短语“它在我的机器上工作” ,Docker提供了管理环境的基础架构(称为容器)。 我建议您熟悉Docker及其基本命令(例如docker-composedocker-build并了解什么是注册表。 这样可以节省您很多时间。

Image for post
Source: RSAConference 2019
资料来源:RSAConference 2019

结论 (Conclusions)

As previously mentioned later, very often, it is just a matter of getting used to it. Do not be afraid to learn new things or getting your hands dirty, keep in mind to always accept critics and listen to others, and finally: do not be afraid to “break things”.

如后面所述,通常,这只是习惯。 不要害怕学习新事物或弄脏手,记住始终接受批评家听取他人意见,最后:不要害怕“破坏事物”。

Finally, I understand it is very easy to get overwhelmed when learning something new, there are thousands of resources, articles, videos, courses, and when the menu is too long choosing becomes harder and harder, however:

最终,我了解在学习新知识时很容易变得不知所措,有成千上万的资源,文章,视频,课程,并且菜单时间过长,选择变得越来越难,但是:

Remember that you don’t have to get it right, you just have to get it started.

请记住,您不必一定要正确,只需要开始就可以。

翻译自: https://towardsdatascience.com/what-i-learned-after-one-year-working-as-a-data-scientist-99773296de25

用一年的数据预测下一年数据

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值