如何为Twitter，Pinterest和Amazon构建AI

最新推荐文章于 2024-07-23 21:45:09 发布

weixin_26632369

最新推荐文章于 2024-07-23 21:45:09 发布

阅读量231

点赞数

文章标签： java python

原文链接：https://towardsdatascience.com/how-to-build-ai-for-twitter-pinterest-and-amazon-c3d048c738af

版权

We caught up with Douglas Mason, data scientist and CEO of Koyote Science, where he is building machine learning models to predict COVID-19 outbreaks. He freely shared his wisdom and lessons he has learned from over a decade of data science work, ranging from his PhD at Harvard to his time at large companies such as Pinterest, Twitter, and AWS (Amazon).

我们认识了Koyote Science的数据科学家兼首席执行官道格拉斯·梅森 ( Douglas Mason) ，他在那里建立机器学习模型来预测COVID-19的爆发。他自由地分享了自己在十年数据科学工作中所学到的智慧和教训，从他在哈佛大学的博士学位到在Pinterest，Twitter和AWS(Amazon)等大型公司的工作经历，不一而足。

道格拉斯的背景 (Douglas’s background)

Douglas took a unique path to becoming a data scientist. Although computers were always part of his household growing up, he thought they were boring, and he told his family he would “never study computer science.”

道格拉斯走上了成为数据科学家的独特道路。尽管计算机一直是他家庭成长的一部分，但他认为计算机很无聊，并且他告诉家人他“永远不会学习计算机科学”。

Instead, Douglas went to USC to study filmmaking, thinking he’d follow his dream of becoming a film director. Soon, he realized that filmmaking school wasn’t all he’d imagined it would be, and he took up classical guitar instead. From there, he accidentally discovered a passion for theoretical physics, which he found fascinating (and which paid more than guitar playing).

相反，道格拉斯去了南加州大学学习电影制作，认为他会追随自己成为电影导演的梦想。很快，他意识到电影制作学校不是他想像的那样，而是改用古典吉他。从那里，他意外地发现了对理论物理学的热情，他发现了这种迷人的魅力(而且付出的费用超过了弹吉他的费用)。

Not long after, Douglas discovered his interest in data science and climbed the ranks until he was heading engineering and data science teams at Twitter, Pinterest, and AWS. He describes working in the field as feeling like he’s living in a science fiction film.

不久之后，道格拉斯发现了他对数据科学的兴趣，并晋升了自己的行列，直到他领导Twitter，Pinterest和AWS的工程和数据科学团队。他形容在野外工作就像感觉自己住在科幻电影中一样。

“When I work on that kind of stuff, I feel like Doctor Strange — as if I’m in the Multiverse. You actually get to live this Rick and Morty parallel-universe life.”

“当我从事此类工作时，我感觉就像是奇异博士-好像我在多元宇宙中一样。 您实际上可以过上Rick和Morty平行宇宙的生活。”

But Douglas wasn’t content with using his expertise to improve revenue at large corporations, so he went on to found his own business, Koyote Science. He’s currently focused on building COVID-19 models to predict outbreaks.

但是道格拉斯不满足于利用自己的专业知识来提高大型公司的收入，因此他继续创立了自己的公司Koyote Science。他目前专注于构建COVID-19模型以预测疫情。

从运输机器学习项目中学到的教训 (Lessons learned from shipping machine learning projects)

Douglas’s success hasn’t come without some hard-earned lessons. He told us about some of the challenges he’s seen across many of the teams and projects he’s worked on.

没有一些来之不易的教训，道格拉斯的成功就来了。他向我们介绍了他所从事的许多团队和项目中所遇到的一些挑战。

第1课：使用机器学习与用户合作而不是接手 (Lesson 1: Use machine learning to work with users instead of taking over)

At Twitter, Douglas worked on a feature called “who to follow.” This gives Twitter users personalized recommendations about which accounts might be interesting for them. As a data scientist, Douglas discovered that people used this feature a lot. At first, it seemed great — people were following nearly everyone it recommended. But in the longer term, people who used this feature visited Twitter less.

在Twitter上，道格拉斯(Douglas)从事一项名为“谁可以遵循”的功能。这为Twitter用户提供了个性化的建议，关于哪些帐户可能对他们来说很有趣。作为数据科学家，道格拉斯发现人们经常使用此功能。乍一看，它似乎很棒–人们几乎跟随它推荐的每个人。但是从长远来看，使用此功能的人访问Twitter的人数较少 。

Their feeds were filled with tweets chosen by an algorithm, rather than tweets from people they chose themselves — and there were just too many of them.

他们的供稿中充满了算法选择的推文，而不是他们选择的人的推文本身-太多了。

By reducing the number of “who to follow” recommendations, Douglas improved long-term engagement.

通过减少在“遵循谁”建议的数量上，道格拉斯改善了长期参与度。

It’s common knowledge that long-term and short-term goals often conflict, but Douglas discovered a deeper lesson here. As machine learning solutions become more capable, it’s often tempting to use them to do too much. This is almost always a mistake. Douglas says:

众所周知，长期目标和短期目标经常会发生冲突，但是道格拉斯在这里发现了更深刻的教训。随着机器学习解决方案变得越来越强大，通常会试图使用它们做太多事情。这几乎总是一个错误。道格拉斯说：

“I aim to build products that work with the user rather than trying to take over from the user.”

“我的目标是开发与用户合作的产品，而不是试图从用户手中接管产品。”

AI as depicted in science fiction — with human-level intelligence — is probably to blame for the fact that many people try to do too much with machine learning. In many cases, it’s best used to augment human actions rather than replace them.

科幻小说中描述的具有人类水平智能的AI可能归咎于许多人尝试在机器学习中做太多事情的事实。在许多情况下，最好将其用于增强人类行为，而不是替代人类行为。

第2课：数据管道和良好的工程设计比数学和算法更重要 (Lesson 2: Data pipelines and good engineering are more important than math and algorithms)

People get very excited about new machine learning algorithms. First we had neural networks (NNs), then convolutional neural networks (CNNs), then generative adversarial networks (GANs), transformers, and more. Algorithms are fun and exciting to talk about and explore.

人们对新的机器学习算法感到非常兴奋。首先，我们有神经网络(NN)，然后是卷积神经网络(CNN)，然后是生成对抗网络(GAN)，转换器等。讨论和探索算法既有趣又令人兴奋。

But Douglas, a self-confessed math nerd, has learned that the math and algorithms tend to get far too much attention, while real success comes from good data, good engineering, focusing on the customer’s problem, and not getting trapped in the math. He says:

但是自gla为数学书呆子的道格拉斯(Douglas)已经了解到，数学和算法往往会引起太多关注，而真正的成功来自于良好的数据，良好的工程设计，专注于客户的问题，而不会陷入数学困境。 他说：

“It’s very, very rare for the algorithm to make the difference. It’s almost always the data pipeline. In my work, I have been able to reduce errors by 90% with a data pipeline, compared to 75% with a better algorithm. And yet everyone wants to talk to me about the algorithm, but no one wants to talk about the data pipeline.”

“这种算法能够发挥作用是非常非常罕见的。 几乎总是数据管道。 在我的工作中，使用数据管道可以将错误减少90％，而使用更好的算法则可以将错误减少75％。 然而，每个人都想与我讨论该算法，但是没人愿意谈论数据管道。”

We use metaphors that associate machine learning algorithms with neuroscientists and data pipelines with plumbing, so it’s not surprising which one grabs popular attention. Douglas found success by focusing on the less glamorous aspects of machine learning. In most cases, deciding what data to use and how to present it to the algorithm is more important than the algorithm itself.

我们使用的隐喻将机器学习算法与神经科学家联系在一起，并将数据管道与管道联系在一起，因此，哪一个受到了广泛关注就不足为奇了。道格拉斯(Douglas)通过专注于机器学习的不太吸引人的方面找到了成功。在大多数情况下，决定使用什么数据以及如何向算法呈现 数据比算法本身更为重要。

专注于客户的目标 (Focus on the customer’s goals)

Many “AI startups” today talk far more about the solutions they provide than the problems they solve, and Douglas has learned to maintain a laser focus on customer goals. Sometimes this means pulling himself away from the more enticing theoretical aspects of machine learning. He says:

如今，许多“人工智能初创企业”谈论的更多的是解决方案，而不是解决的问题，道格拉斯学会了保持对客户目标的专注。有时，这意味着将自己从机器学习的更诱人的理论方面拉开。他说：

“As a mathematician, I love all the nuances of the math, and easily get lost in it. But the reality is that there’s an infinite amount of math out there to learn. It’s not feasible to lock myself in my room and learn all the math before I focus on customer goals.”

“作为一名数学家，我喜欢数学的所有细微差别，并且很容易迷失其中。 但是现实是，那里有无数的数学需要学习。 在专注于客户目标之前，将自己锁在房间里并学习所有数学方法是不可行的。”

The truth that many data scientists don’t want to hear is that successful machine learning solutions are not usually about creating something new, powerful, and exciting. More often, seeing problems from the correct angle and using tried and tested approaches is what you need.

许多数据科学家不愿听到的事实是，成功的机器学习解决方案通常并不在于创造新颖，强大和令人兴奋的东西。通常，您需要从正确的角度看问题并使用经过实践检验的方法。

与经验丰富的工程师合作并向他们学习 (Work with and learn from experienced engineers)

Douglas has personally engineered many successful machine learning solutions and led teams of software engineers, but he remains modest about his engineering ability and emphasizes the importance of solid engineering.

道格拉斯亲自设计了许多成功的机器学习解决方案，并领导了软件工程师团队，但是他对自己的工程能力仍然谦虚，并强调了实体工程的重要性。

“At Amazon, I let the engineers do as much as possible, because they’re better than me at engineering. I would love to give you another answer, but they’re efficient, they’re thoughtful, they’ve seen these structures before, so they know about implementation details.”

“在亚马逊，我让工程师尽可能多地去做，因为他们在工程方面比我更好。 我想再给您一个答案，但是它们高效，周到，以前见过这些结构，所以他们知道实现细节。”

It’s not all smooth sailing though. Douglas acknowledges the difficulties of getting different experts to work with each other, especially when highly technical people tend to have very strong opinions about tiny decisions.

虽然并非一帆风顺。道格拉斯(Douglas)承认让不同的专家互相合作非常困难，尤其是当技术高超的人对微小的决定有强烈的见解时。

The best way he’s found to get everyone on the same page is by constantly releasing Minimal Viable Products (MVPs), which takes us to our next lesson.

发现他让所有人都在同一页面上的最佳方法是不断发布最小可行产品(MVP)，这使我们进入下一课。

第3课：始终建立最低限度的可行产品(MVP) (Lesson 3: Always build Minimum Viable Products (MVPs))

Douglas swears by MVPs, which demonstrate core pieces of a solution, even if many of the features are missing. When developing a machine learning solution, he’ll aim to deliver a new MVP every week.

道格拉斯(Douglas)向MVP宣誓，即使许多功能缺失，它们也可以证明解决方案的核心内容。在开发机器学习解决方案时，他的目标是每周提供一个新的MVP 。

He uses these to:

他使用这些来：

Avoid traps: If a project is taking too long, the difficulty of building even an MVP can be used to argue that the project should be cut early, before years of effort are wasted. Douglas says:
避免陷阱：如果一个项目花费的时间太长，那么即使建立一个MVP的难度也可以用来争论该项目应该在浪费多年的精力之前就尽早削减。道格拉斯说：

“If something ends up being way harder and I keep doing MVPs and never reach the goals, then that gives us information about the difficulty of what we’re attempting to do.”

“如果事情变得越来越困难，而我继续做MVP，却从未达到目标，那么这将为我们提供有关我们尝试做的困难的信息。”

Communicate: Both technical and non-technical people tend to better understand things they can see and use, rather than abstract ideas.
交流：技术人员和非技术人员都倾向于更好地理解他们可以看到和使用的事物，而不是抽象的想法。

“People’s response to an abstract concept of something is often completely different to their response when they see something real. That’s why I’m always putting out MVPs. People who are looking from a higher-level perspective can gain the required intuition to give me feedback.”

人们对事物的抽象概念的React通常与他们看到真实事物时的React完全不同。这就是为什么我总是拿出MVP的原因。从更高层次上进行观察的人们可以获得所需的直觉，以便为我提供反馈。”

It’s better to have to trash two weeks of work than two months, and MVPs can help with this.

最好浪费两个星期的工作，而不是两个月，MVP可以提供帮助。

MVPs have other benefits too. By releasing stripped-down versions of a solution, Douglas often discovers that less is more.

MVP也有其他好处。通过发布简化的解决方案版本，道格拉斯经常发现少即是多。

“What you end up delivering is often much simpler than the thing you originally intended to do, but it’s refined.”

“最终交付的内容通常比您最初打算做的要简单得多，但它已经完善了。”

Of course, customers are sometimes unhappy when it turns out that the best solution was the simplest one. Douglas compares building machine learning solutions to creating art: it’s about the time that went into development, not the effort required for the final product.

当然，当事实证明最好的解决方案是最简单的解决方案时，客户有时会感到不满意。道格拉斯将构建机器学习解决方案与创造艺术进行了比较：这是开发的时间，而不是最终产品所需的工作。

“There’s a classic Zen story about a king who hires an artist. The artist works for a year, but then paints the final painting in only three seconds. When the king complains, the artist says, ‘Oh, I spent a year trying to paint much harder things.’”

“有一个经典的禅宗故事，讲述一位国王雇用艺术家的故事。 艺术家工作了一年，但是仅用三秒钟就完成了最后一幅画。 当国王抱怨时，艺术家说：“哦，我花了一年的时间试图画出更难的东西。”

MVPs keep you open to finding a better, simpler solution, even late in the development process, and it’s important to stay agile so you can pivot to these better solutions if necessary.

MVP使您即使在开发过程的后期也可以寻求更好，更简单的解决方案，并且保持敏捷非常重要，这样您就可以在必要时转向这些更好的解决方案。

People often think something has to be complicated in order to be powerful, but in fact the opposite is often true.

人们常常认为，要使其强大，就必须将某些事物复杂化 ，但实际上，情况恰恰相反。

第4课：控制和精度比尺寸和功率更重要 (Lesson 4: Control and precision are more important than size and power)

Large machine learning models, such as GPT-3, are exciting and often make their way into headline news. But Douglas compares large models to early (failed) attempts to build planes. These planes competed against the famous, successful plane built by the Wright Brothers. What made them different? The Wright Brothers focused on control, while their competitors were going for size and power.

大型机器学习模型(例如GPT-3)令人兴奋，并且经常成为头条新闻。但是道格拉斯将大型模型与早期(失败的)飞机制造尝试进行了比较。这些飞机与赖特兄弟(Wright Brothers)建造的著名成功飞机竞争。是什么让他们与众不同？莱特兄弟(Wright Brothers)专注于控制，而他们的竞争对手则在追求规模和实力。

“What the Wright brothers did that was so ingenious was that they didn’t go for bigger engines. They were bicycle mechanics. They didn’t even use powerful engines. And instead, what they focused on was control.”

“莱特兄弟的所作所为非常巧妙，以至于他们不追求更大的引擎。 他们是自行车修理工。 他们甚至没有使用强大的引擎。 相反，他们专注于控制。”

This is similar to machine learning models. As Douglas says:

这类似于机器学习模型。正如道格拉斯所说：

“We made the biggest model that does all this stuff. But then people ask, ‘How do I interpret this stuff?’ ‘How do I control it?’ ‘How do I make sure that my models don’t go off the rails?’”

“我们制造了可做所有这些事情的最大模型。 但是后来人们问，“我怎么解释这些东西？” “我如何控制它？” “如何确保我的模型不会偏离轨道？””

Large machine learning models might often be more powerful, but unless they solve real problems, they’re not useful. If a model produces amazing results unpredictably and only some of the time, that’s not useful. If a model produces accurate results but we don’t understand why and can’t be sure the results will always be accurate, then that’s also not useful.

大型机器学习模型通常可能更强大，但是除非它们解决了实际问题，否则它们将无用。如果模型仅在某些时间 意外地产生惊人的结果，那就没有用了。如果模型产生了准确的结果，但是我们不知道为什么并且不能确定结果将始终是准确的 ，那么这也没有用。

Instead, smaller, simpler, and arguably less powerful models that offer more interpretability and consistency are more valuable in nearly every case. Just like with flying, we need to be able to steer and to land, not just to go fast.

相反，几乎在每种情况下，提供更多可解释性和一致性的更小，更简单且功能不那么强大的模型都更有价值。就像飞行一样，我们需要能够操纵和降落，而不仅仅是飞快。

成功完成机器学习项目 (Shipping machine learning projects successfully)

Douglas shared many lessons in our chat with him, but these were his most important rules for successfully shipping machine learning solutions:

道格拉斯(Douglas)在与他的聊天中分享了许多教训，但这是他成功交付机器学习解决方案的最重要规则：

Use machine learning to work with users: don’t overstep and try to take over from them.
使用机器学习与用户合作：不要超越并尝试取代他们。
Focus more on the problem, data, and engineering than on math and novel algorithms.
重点放在问题，数据和工程上，而不是数学和新颖的算法上。
Focus on being in control of the solution you produce, rather than making it as ambitious and as large as possible.
专注于控制您产生的解决方案，而不是使其变得雄心勃勃并尽可能地大。
Build MVPs and ship smaller pieces regularly. Even if they don’t have all the features, tight feedback loops are even more important here than they are in software engineering.
建立MVP并定期运送较小的零件。 即使它们没有所有功能，但紧密的反馈回路在这里比在软件工程中更为重要。

We (Data Revenue) have shipped dozens of successful projects. If you need help with yours, don’t hesitate to get in touch.

我们(数据收入)已经发布了数十个成功的项目。如果您需要帮助，请随时与我们联系。

翻译自: https://towardsdatascience.com/how-to-build-ai-for-twitter-pinterest-and-amazon-c3d048c738af

weixin_26632369

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
如何为Twitter，Pinterest和Amazon构建AI

We caught up with Douglas Mason, data scientist and CEO of Koyote Science, where he is building machine learning models to predict COVID-19 outbreaks. He freely shared his wisdom and lessons he has le...
复制链接

扫一扫