为什么选择投资组合? (Why portfolios?)

Data science is a tough field. It combines in equal parts mathematics and statistics, computer science, and black magic. As of mid-2020, it is also a booming field with numerous applicants swarming every job ad. Also, as I mentioned – it is mid-2020, the raging pandemic dragging everything down just that extra bit.

数据科学是一个艰巨的领域。 它在数学和统计学,计算机科学和黑魔法中均等地结合在一起。 截至2020年中,这也是一个蓬勃发展的领域,大量申请者蜂拥而至。 此外,正如我提到的那样- 到2020年中期 ,肆虐的大流行把一切都拖了下来。

Building up a list of course completion certificates won’t get you very far either, unless you’ve got some bona fides (Masters/PhD) in academic credentials. MOOC certificates like those from Coursera and EdX are nice, but I’ve yet to hear too many examples of them counting for much. Kaggle ain’t what it used to be, either. Its free competitions have become graveyards of useless overfit models, and real competitions are dominated by teams which is hard to compete with, and is of limited use for individual portfolios anyway.

建立课程结业证书列表也不会使您走得太远,除非您具有一定的学术证书(硕士/博士学位)。 像那些来回MOOC证书m路线 RA的 DX都不错,但我还没有听到他们的例子太多了非常重要的计数 Kagg le也不是以前的样子。 它的免费比赛已成为无用的过拟合模型的坟墓地,而真正的比赛则由难以竞争的团队主导,并且无论如何对于单个组合来说使用都很有限。

So, how does one go about building a profile online? My personal thought is that just as a famous band once said, you can go your own way.

那么,如何在线建立个人资料呢? 我个人的想法是,就像一个著名乐队曾经说过的那样, 您可以自己走

Instead of trying to do exactly what others do, or did, work on projects that you are interested in, build up a portfolio of your work, and put it up there for the world to see what you did, and what you can do.

与其尝试完全做别人所做或所做的事情,不如去从事那些 你有兴趣,建立一个投资组合 工作,然后放在那里,让全世界看看您做了什么,可以做什么。

Original Photo by Ricardo Gomez Angel on Unsplash
Ricardo Gomez的原始照片 Angel on Unsplash

Having said all that, I appreciate that it’s easier said than done. Not many data scientists are also designers/front-end developers, and not always keen to pick up that extra skill nor do they necessarily have the time to.

说了这么多,我很高兴说起来容易做起来难。 很少有数据科学家同时还是设计师/前端开发人员,并不总是热衷于学习这种额外技能,也不是一定有时间。

Luckily, we don’t always have to reinvent the wheel. Unlike the old days where portfolios were literally… portfolios full of glossy pages, or resumes that would only come across HR’s desk, many amazing portfolios are available online. These are invaluable resources, so why not make full use of them?

幸运的是,我们不必总是重新发明轮子。 与过去的投资组合实际上是……充满光泽页面的投资组合或仅会出现在HR办公桌上的简历不同,在线上可以找到许多惊人的投资组合。 这些都是宝贵的资源,那么为什么不充分利用它们呢?

学习/启发 (Learning / Inspiration)

Outside of using them as references for our own portfolios, these sites are also extremely valuable resources for learning, and for ideas.


Many of these authors’ projects are practical, interesting and original. They are, for my money, also great complementary learning tools. For example, seeing practical applications of an ML tool provides context when learning the theoretical side, as I consider where I might apply this tool in my work or for my clients.

这些作者的许多项目都是实用,有趣和新颖的。 用我的钱,它们也是很棒的补充学习工具。 例如,在学习理论方面时,看到ML工具的实际应用将提供上下文,因为我考虑了我可能在工作中或为客户应用此工具的地方。

I’ve said enough – let’s take a dive into some of these amazing works, to look at exactly how they are useful.


This is obviously just a few random selections of many, many great portfolios out there. Let me know in the comments about any of your favourites, and if you agree / disagree with my thoughts!

显然,这只是众多众多投资组合中的一些随机选择。 在评论中让我知道您的任何收藏夹,以及是否同意/不同意我的想法!

大卫·文图里 (David Venturi)

http://davidventuri.com/portfolio) http://davidventuri.com/portfolio )

I first came across David Venturi a few years ago while researching data science courses. He had written a blog post called “I Dropped Out of School to Create My Own Data Science Master’s – Here’s My Curriculum”.

几年前,我在研究数据科学课程时遇到了David Venturi。 他写了一篇博客文章“ 我辍学创建自己的数据科学硕士课程-这是我的课程 ”。

That post is from April 2016 and it has certainly stood the test of time, having racked up over 8000 claps on Medium as of August 2020!


Since then, he’s gone on to do much more. He’s created courses for DataCamp, including one for Scala, a part of a Tableau course and one using the MLB’s (baseball) Statcast data.

从那时起,他继续做更多的事情。 他为DataCamp创建了课程,包括一门针对Scala的课程,Tableau课程的一部分以及一门使用MLB(棒球)Statcast数据的课程。

He has even created a course titled UP AND DOWN WITH THE KARDASHIANS: Python project that uses pandas. Who’d have guessed the word Python to appear next to the word Kardashians, without it being a reference to the latest scandal or a terrible choice of a pet?

他甚至创建了一个名为UP AND DOWN THE KARDASHIANS的课程:使用熊猫的Python项目。 谁曾猜到Python一词会出现在Kardashians一词的旁边,却没有提到最新的丑闻或对宠物的糟糕选择?

Yup. The man is talented.

对。 这个人很有才华。

His portfolio site appropriately reflects this wide range of talents, to showcase the breadth of content types and variety of subject matters in his work.


http://davidventuri.com/portfolio) http://davidventuri.com/portfolio )

The headings on Venturi’s site organise its content by the type of end clients. They range all the way from courses, projects and content created for DataCamp or Udacity, to a set of personal projects including articles for FreeCodeCamp, sports analytics and a sprinkling of web apps.

文丘里网站上的标题根据最终客户的类型来组织其内容。 它们的范围很广,从为DataCamp或Udacity创建的课程,项目和内容,到一组个人项目,包括FreeCodeCamp的文章,体育分析和大量的Web应用程序。

What struck me after looking at the site for a while, though, was the clarity with which it demonstrated the exact types of outputs he is capable of producing. In other words:

在看了一段时间之后,令我震惊的是,它清楚地表明了他能够生产的确切产出类型。 换一种说法:

Each section on Venturi’s portfolio fulfils a marketing purpose.


The MOOCs are easy — after all, he is a seasoned course producer.


But the next section includes two very different videos to highlight his production skills and comfort in front of a camera. One is an instructional video and the other is a highly produced… dog video (a clever marketing video).

但是下一部分将包括两个截然不同的视频,以突出他的制作技巧和在镜头前的舒适感。 一个是教学视频,另一个是制作精良的…狗视频(聪明的营销视频)。

And his personal projects exist to highlight the output medium indicated with bright links. His outputs are segmented with link to one of “Code”, “Demo”, or “Website”. This allows the viewer to instantly see the output of interest in the context of a project.

他的个人项目的存在是为了突出显示带有明亮链接的输出介质。 通过链接到“代码”,“演示”或“网站”之一来细分其输出。 这使观看者可以在项目的上下文中立即看到感兴趣的输出。

Even his written works are clearly categorised as one of a “Report”, “Article” or “Post”, explicitly acknowledging intended audience types. Someone reading this is clearly led to a relevant sample product than a mess of “writing samples” sorted by subject matter. (It does make me wonder if he did any scraping or analysis of job postings to arrive at this taxonomy.)

甚至他的书面作品也被明确归类为“报告”,“文章”或“帖子”之一,明确承认了预期的受众类型。 显然,阅读本书的人可以找到一个相关的样本产品,而不是按照主题分类的“书写样本”。 (这确实使我感到奇怪,他是否对工作职位进行了任何抓取或分析以得出此分类法。)

Check out his portfolio here.


汉娜·严汉 (Hannah Yan Han)

https://www.hannahyan.com) https://www.hannahyan.com )

Being a data visualisation geek, this site just immediately filled me with a combination of joy and envy.


The majority of the projects represented on her front page are (gorgeous, I might add) visualisations. Each project is represented by an image, where a mouseover reveals further details about it — as shown in the animation below.

她头版上代表的大多数项目都是(很棒,我可能会补充)可视化效果。 每个项目都由一个图像表示,将鼠标悬停在该图像上可以显示有关该图像的更多详细信息-如以下动画所示。

So, within seconds of visiting the site, the reader is given the opportunity to see the range of visualisations that she’s produced, and her technical prowess in using a diverse range of tools from R, D3.js or P5.js to Tableau.


Personally I also really like the clean layout and simple and consistent interface. It’s simply a pleasure to navigate through.

我个人也很喜欢干净的布局和简单一致的界面。 浏览是一种乐趣。

https://www.hannahyan.com) https://www.hannahyan.com )

Clicking on each project takes the reader to an article about the visualisation.


She also has a dedicated data science portfolio, which she has placed on a separate page.


https://www.hannahyan.com/ds-projects.html) — look at that dog! https://www.hannahyan.com/ds-projects.html)—看那只狗!

Clearly, this layout is designed to convey more information about each data science than those in the visualisation page. By segregating projects by type like she has done, she’s able to achieve visual consistency within each page for the reader. This probably also indicates that generally, the reader (prospective client) is interested in only one of visualisation or data science, rather than both.

显然,此布局旨在传达有关每个数据科学的信息,而不是可视化页面中的信息。 通过像她所做的那样按类型分隔项目,她能够为读者提供每一页的视觉一致性。 这可能还表明,一般而言,读者(潜在客户)只对可视化或数据科学中的一种感兴趣,而对这两者都不感兴趣。

Check out her portfolio here.


多恩·马丁 (Donne Martin)

Before moving on to the next example portfolio, sit down, grab a drink, and brace yourself.


https://donnemartin.com/) https://donnemartin.com/ )

Donne Martin claims to be a software engineer at Facebook, but looking at his website and GitHub page, I am quite convinced that he a time traveller or some sort of a wizard who’s able to stretch time. I’ll get back to this point later, but for now, take a look at the animation below, scrolling through his main website.

Donne Martin自称是Facebook的软件工程师,但是在他的网站和GitHub页面上,我非常相信他是一位时光旅行者或某种能够延长时间的向导。 稍后,我将回到这一点,但是现在,看一下下面的动画,滚动浏览他的主要网站。

His approach to the portfolio site is quite different to those we’ve looked at before this. He takes the approach of letting the crowd noise (i.e. GitHub stars) do the talking, and boy — are they loud.

他访问投资组合网站的方法与我们之前所看过的完全不同。 他采取了让人群喧((即GitHub上的明星)说话的方法,男孩-他们大声吗。

He casually flaunts the multiple personal projects with 20k+ stars!

他随随便便地用20k +星标炫耀了多个个人项目!

https://donnemartin.com/#portfolio) https://donnemartin.com/#portfolio )

His GitHub page itself is very impressive. Since we are discussing data science portfolios, let’s take a look at his repo of data science notebooks.

他的GitHub页面本身令人印象深刻。 由于我们正在讨论数据科学产品组合,因此让我们看一下他的数据科学笔记本回购

Remember how I said that I think Martin might be a wizard? Whenever we go back to burning witches and wizards, this data science notebook repo is going to be my primary evidence submission against Martin.

还记得我说过我认为马丁可能是个巫师吗? 每当我们回到烧毁的女巫和巫师的时候,这个数据科学笔记本回购将成为我针对Martin的主要证据。

I just don’t understand when he could possibly have had time to create all of these unless he has the ability to slow down time. Here is just a sampling — a very small sampling, actually, of the notebooks that he has made available in this repo.

我只是不知道他什么时候可能有时间创建所有这些,除非他有能力减慢时间。 这只是一个样本,实际上是他在此回购中提供的笔记本的非常小的样本。

https://github.com/donnemartin/data-science-ipython-notebooks) https://github.com/donnemartin/data-science-ipython-notebooks )

It’s a dense list, but grouped by the primary library used, it does a great job as a showcase. Even before opening any of his notebooks or even reading the summaries of these notebooks, this list easily demonstrates his work ethic, breadth of skills and ability to communicate and teach.

这是一个密集的列表,但按使用的主库分组,它作为展示台表现出色。 即使在打开他的任何笔记本或什至阅读这些笔记本的摘要之前,此清单也很容易证明他的职业道德,技能广度以及沟通和教导的能力。

You could easily spend days, or weeks, browsing through Martin’s portfolio — and personally I don’t think it would be such a bad idea to do so. Check it out here.

您可以轻松地花几天或几周的时间浏览Martin的投资组合,而且我个人认为这样做并不是一个坏主意。 在这里查看

克劳迪娅·十箍 (Claudia Ten Hoope)

https://www.claudiatenhoope.com) https://www.claudiatenhoope.com )

Hoope’s website is clean, neat and easy to read. One key difference I wanted to highlight with this portfolio site is that it explicitly doubles as a hiring/enquiry page, with her daily rates etc.

Hoope的网站干净,整洁且易于阅读。 我想在此投资组合网站上强调的一个主要区别是,它可以作为招聘/查询页面(包括她的日薪等)明确地翻倍。

She is a freelancer, so it makes sense for her to spell out the exact services she offers to her prospective clients. The language she uses here also indicate that they are for those who may not necessarily be that familiar with data science.

她是一名自由职业者,因此她有必要向她的潜在客户说明确切的服务。 她在这里使用的语言还表明它们适用于那些不一定熟悉数据科学的人。

It’s a good reminder for us to think about who the intended audience is for every piece of communication that we put out there, and to tailor the content accordingly.


https://www.claudiatenhoope.com) https://www.claudiatenhoope.com )

Check it out — her page is here.


Julia·尼古尔斯基(Julia Nikulski) (Julia Nikulski)

http://julianikulski.com/portfolio) http://julianikulski.com/portfolio )

This is another excellent portfolio website, this time by Julia Nikulski. As with the others, she’s got some kickass projects listed here, each one with a hero image accompanied by a short description and key skills.

这是另一个出色的投资组合网站,这次是Julia·尼库尔斯基(Julia Nikulski)。 和其他人一样,她在此处列出了一些kickass项目,每个项目都有一个英雄形象,并附有简短的描述和关键技能。

I won’t write too much more about it — as the main layout seems to be similar to some of the others, and I don’t read German!


One super interesting and (very meta) highlight is a post entitled “How to Build a Data Science Portfolio Website”, which, if you are reading this, you might find relevant!

一篇非常有趣且(非常中继)的重点文章是标题为“ 如何构建数据科学投资组合网站 ”的帖子,如果您正在阅读此书,那么可能会发现与之相关!

Thanks for reading — that was just a small selection of sites I’ve found online. If you have your personal favourites, or (constructive) critiques of the article, please let me know in the comments or on twitter!

感谢您的阅读-这只是我在网上找到的一小部分网站。 如果您对本文有个人收藏或评论,请在评论中或在Twitter上告诉我!

Also, if you liked this, say 👋 / follow on twitter, or follow here for updates. ICYMI: I also wrote this article comparing Plotly Dash vs Streamlit:

另外,如果您喜欢这样做,请在wit上说👋/或在此处更新。 ICYMI:我还写了这篇文章,比较了Plotly Dash和Streamlit:

And this about visualising hidden relationships in data, using data from the NBA as an example:


翻译自: https://towardsdatascience.com/these-data-science-portfolios-will-awe-and-inspire-you-mid-2020-edition-728e1021f60






