数据科学家访谈录 百度网盘_数据科学家工具开发人员访谈

数据科学家访谈录 百度网盘

About Peadar: Peadar Coyle is a data scientist, author and math geek who specializes in applying robust statistical or machine learning models to data to extract business value. His academic interests range from quantum computing to time series forecasting. Peadar has worked or consulted for Amazon, Vodafone, Import.io and JobTODAY, to name a few. He is a core developer of PyMC3 and a regular speaker and keynoter at prestigious industry conferences such as PyData. His recent book is available at https://leanpub.com/interviewswithdatascientists

关于PeadarPeadar Coyle是一位数据科学家,作家和数学极客,擅长将强大的统计或机器学习模型应用于数据以提取业务价值。 他的学术兴趣从量子计算到时间序列预测。 Peadar曾在Amazon,Vodafone,Import.io和JobTODAY等公司工作或咨询过。 他是PyMC3的核心开发人员,并在著名的行业会议(例如PyData)上担任定期发言人和主讲人。 他的最新著作可在https://leanpub.com/interviewswithdatascientists上找到。

介绍 (Introduction)

I interviewed one of the core members of the pandas Python Library Masaaki Horikoshi (sinhrks). I was really happy to interview him, and glad to show that Data-science and software development are really global things. 🙂 I lightly edited his answers at his request because English is not his native language.

我采访了熊猫 Python库Masaaki Horikoshi(sinhrks)的核心成员之一。 我很高兴采访他,并很高兴地证明数据科学和软件开发确实是全球性的事情。 at因为英语不是他的母语,所以我应他的要求轻轻地编辑了他的答案。

a正昌明的传记 (Masaaki Horikoshi’s Biography)

I work as a data analyst in a Japanese company. I mostly use Python and R in the work. Because I don’t expose project details of my job publicly, allow me to answer as a tool developer. I contribute to some open source software such as pandas (Python package for data analysis) in private, see https://github.com/sinhrks

我在一家日本公司担任数据分析师。 我在工作中主要使用Python和R。 因为我没有公开公开我工作的项目详细信息,所以请允许我作为工具开发人员来回答。 我私下为一些开源软件(例如pandas(用于数据分析的Python包))做出了贡献,请参阅https://github.com/sinhrks

问与答 (Q & A)

1.您从事过哪个项目,您希望回到过去并做得更好? (1. What project have you worked on do you wish you could go back to, and do better?)

I’ve learned a lot from the projects I’ve worked on, therefore I expect I can do better in most of them today. It’s because the most difficult part of the project is to clarify what the problem actually is, and I already know what the it was on the previous ones at least some extent:)

我从所从事的项目中学到了很多东西,因此我希望我今天在大多数项目中都能做得更好。 这是因为该项目最困难的部分是弄清问题的实质,而且我至少在一定程度上已经知道前面的问题:)

2.您对年轻的分析专业人士,特别是科学专业的博士生有何建议? (2. What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?)

I don’t have PhD, so my point may be basic. Even though the requirements are depending on what you’re working for.

我没有博士学位,所以我的观点可能是基本的。 即使要求取决于您要做什么。

I think it is a good learning experience to read source codes of popular OSS related to statistics / machine learning. I sometimes find myself not understanding a subject only by reading a textbook. Reading source codes and confirming each step sometimes reveal my misunderstandings. Also it can improve your programming skills because the software are mostly written in optimized and sophisticated ways.

我认为阅读与统计/机器学习相关的流行OSS的源代码是一种很好的学习体验。 有时我发现自己仅阅读教科书就无法理解该主题。 阅读源代码并确认每个步骤有时会暴露出我的误解。 此外,由于该软件主要以优化和复杂的方式编写,因此它也可以提高您的编程技能。

3.您希望早些了解成为一名数据科学家/数据工具开发人员的知识吗? (3. What do you wish you knew earlier about being a data scientist/ data tool developer?)

That communities are really important. It was only after I started attending some programming language conferences, I could meet a lot of skilled people in a broad range of fields, and communicating with them gives me a lot of knowledge in the fields I’m not familiar with. Also, feedback from tool users helps me to understand the needs and raises my motivation.

社区真的很重要。 直到我开始参加一些编程语言会议之后,我才能结识很多领域的技术人员,与他们进行交流可以使我获得许多我不熟悉的领域的知识。 此外,工具使用者的反馈有助于我理解需求并激发我的动力。

4.听到“大数据”这个短语时,您会如何应对? (4. How do you respond when you hear the phrase ‘big data’?)

I believe most of today’s companies have a lot of data. But it depends on the problem whether we actually need all of them. Using ‘big data’ without any specific objective looks unprofitable.

我相信当今大多数公司都有大量数据。 但这取决于问题,我们是否真的需要所有这些。 使用没有任何特定目标的“大数据”似乎无济于事。

Technically I’m interested in data processing and visualization of these data and use some tools like Spark.

从技术上讲,我对数据处理和这些数据的可视化感兴趣,并使用了一些诸如Spark之类的工具。

5.您所在领域最激动人心的事情是什么? (5. What is the most exciting thing about your field?)

Popularity of data-science and related programming languages (R and Python). I see many interesting news and blog posts about data-science almost every day, and small conferences hold few times in a month. It is a good opportunity to join the field. And we need more people, there is a lot of work to do!

数据科学和相关编程语言(R和Python)的普及。 我几乎每天都看到许多有趣的有关数据科学的新闻和博客文章,小型会议每个月举行几次。 这是加入该领域的好机会。 而且我们需要更多的人,还有很多工作要做!

6.如何解决软件工程问题-特别是如何避免花太长时间,如何管理期望等。如何知道什么是足够好的? (6. How do you go about framing a software engineering problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?)

This is what I feel the most difficult question. The important thing is to clarify the target and goal first.

这是我感到最困难的问题。 重要的是首先要明确目标。

Then we can decide a measurable indicator and consider executable action / implementation. During the discussion with end users, we can get back to the target and goal once agreed and can judge whether it is “good enough”.

然后,我们可以确定一个可衡量的指标并考虑可执行的操作/实现。 在与最终用户的讨论中,一旦达成共识,我们可以回到目标和目标,并可以判断它是否“足够好”。

7.您参与了一些开源项目,能否评论一下您对它们的重要性以及所做的令人兴奋的新事物? (7. You’re involved with some open source projects, can you comment how important you feel these are and also what exciting new things you’ve worked on?)

OSS is important to fulfill my daily requirements, besides this it is great place where we can learn more and give back to. I appreciate all the users and great contributors who I’ve got to work with!

OSS对于满足我的日常需求很重要,此外,它是一个我们可以学到更多并回馈的好地方。 我感谢与我一起工作的所有用户和杰出的贡献者!

翻译自: https://www.pybloggers.com/2016/02/interview-with-a-data-scientist-tool-developer/

数据科学家访谈录 百度网盘

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值