每个项目经理应了解有关管理数据科学和AI项目的哪些知识

If you are a project manager, being assigned a data science or AI project may be a conflicting experience.

如果您是项目经理,则被分配数据科学或AI项目可能会带来冲突。

AI alone is slated to create up to $2.9 trillion (yes with a ‘t’) in business value by 2021, and despite the overall damper of the coronavirus, remains on the forefront of technology powering a recovery. But in the same breath, the odds of successful project delivery are not in your favour. Estimates of project failure in the field start at 80% and go downhill, with a July 2019 VentureBeat AI report estimating that 87% of data science projects never make it into production.

2021年 ,仅AI 一项就有望创造高达2.9万亿美元的业务价值(是的,有一个't') ,尽管冠状病毒整体受到抑制,但仍处于推动复苏的技术的最前沿 。 但是,与此同时,成功交付项目的几率对您不利。 2019年7月的VentureBeat AI报告估计,该领域中项目失败的估计从80%开始,然后下降,其中87%的数据科学项目从未将其投入生产。

This is not for a lack of trying. On the topic of managing data science and AI projects, we have reputable university courses, published research, specialised tools and full blown professional certifications. But despite this effort, we seem to be no closer to success besides learning that deploying more people and money may — surprisingly — not be the answer.

这不是缺乏尝试。 在管理数据科学和AI项目的主题上,我们拥有著名的大学课程已发表的研究专门的工具以及全面的专业认证 。 但是,尽管做出了这种努力,但我们似乎除了获得部署更多的人和金钱的答案(令人惊讶的是)并不是答案之外,还没有取得成功。

And this is because resources are like computation. Throwing more resources at a problem does not get to the right answer — it just gets to the wrong answer more quickly.

这是因为资源就像计算一样。 在问题上投入更多的资源并不能得出正确的答案,而只会更快地得到错误的答案。

Throwing more resources at a problem does not get to the right answer — it just gets to the wrong answer more quickly.

在问题上投入更多的资源并不能得出正确的答案,而只会更快地得到错误的答案。

We have been analysing data to support decisions since the early days of modern computation. And we have been implementing systems to help those decisions for almost as long. Why then, is bringing them together, so. darn. hard?

自从现代计算开始以来,我们就一直在分析数据以支持决策。 而且我们已经实施了几乎可以帮助这些决策的系统。 为什么然后将它们聚集在一起呢 该死的 硬?

Project managers have a terrific skill set. Most fundamentally, they manage uncertainty through dealing with risks, issues, requests and offers — often racking them up by the dozens (or even hundreds) in a single project. In addition, they have intuition and stakeholder management skills that increase with experience, enabling them to grease the grooves of project execution processes and making them masters of going from requirements to solution.

项目经理具有出色的技能。 从根本上说,他们通过处理风险,问题,请求和要约来管理不确定性 ,通常在单个项目中将其累加数十(甚至数百)。 此外,他们的直觉和利益相关者管理技能会随着经验的增长而增加,从而使他们能够润滑项目执行流程的缝隙,并使他们成为从需求到解决方案的精通者

And ironically it is this mastery in planning in an ‘end-to-end’ way from requirements to solution that is the downfall of many data science and AI projects. Because unless you have accounted for data uncertainty, starting with requirements is a trap.

具有讽刺意味的是,这是从需求到解决方案以“端到端”方式进行规划的精通,这是许多数据科学和AI项目的失败。 因为除非您考虑了数据不确定性,否则从需求开始就是一个陷阱。

Unless you have accounted for data uncertainty, starting with requirements is a trap.

除非您考虑了数据不确定性,否则从需求开始就是一个陷阱。

数据科学和AI项目的独特风险 (The distinctive risk of data science and AI projects)

The distinctive risk of data science projects is data uncertainty. This is the question asking:

数据科学项目的独特风险是数据不确定性。 这是一个问的问题:

“Is there enough information in the data to develop models that are sufficiently useful?”

“数据中是否有足够的信息来开发足够有用的模型?”

Without answering this question, there is a real danger of developing and scaling up a solution that does not deliver business value because model performance has not crossed a threshold that makes it worth using. If this sounds abstract, examples may include:

如果不回答这个问题,则存在开发和扩展无法交付业务价值的解决方案的真正危险,因为模型性能还没有超过值得使用的阈值。 如果听起来很抽象,则示例可能包括:

  • Predictive maintenance: In manufacturing environments, incidents and deviations often kick off robust mitigation strategies that ensure that modes of failure seldom repeat and are thus inherently hard to predict through traditional supervised learning approaches.

    预测性维护:在制造环境中,事件和偏差通常会启动可靠的缓解策略,这些策略可确保失败模式很少重复,因此固有地很难通过传统的监督学习方法进行预测。
  • Sentiment analysis: Performance can vary significantly depending on the way models are developed and the nature of the data they are used on. If a positive-negative sentiment analysis tool is only 60% accurate, it may be close to useless considering a random guess gets you to 50%.

    情感分析:根据模型的开发方式和所使用的数据的性质,性能可能会有很大差异。 如果一个正负情绪分析工具的准确率只有60%,考虑到随机猜测会使您达到50%,它可能几乎没有用。

In each case, the project is ultimately infeasible due to limitations of the data at hand. And when even the feasibility of a project has not been established, conversations around architecture, automated pipelines and operations are meaningless. If a project is destined to fail due to the data, wouldn’t it be better to find out by committing one data scientist for two weeks rather than finding out six months later with a team of ten?

在每种情况下,由于手头数据的限制,该项目最终都是不可行的。 而且,即使尚未确定项目的可行性,围绕架构,自动化管道和运营的对话也毫无意义。 如果一个项目注定要因数据而失败,那么与一个十位团队一起寻找六个月而不是六个月之后找出一个数据科学家来进行查找,这会更好吗?

Expressed in a different way, a data science or AI project is only valuable if you have a high predictive signal, or useful information in the data. If that is not present, the project is a lost cause regardless of the business value of the use case and the quality of the data available.

用不同的方式表达,数据科学或AI项目仅在您具有较高的预测信号或数据中的有用信息时才有价值。 如果不存在该项目,则无论用例的商业价值和可用数据的质量如何,项目都是一个失败的原因。

How then should we structure data science projects?

那我们应该如何构造数据科学项目?

问题的根源实际上是STEM (The root of the problem is actually the STEM)

The mess is at its heart, a tangle that comes from taking a shortcut in thinking about disciplines in STEM (Science, Technology, Engineering and Math). The acronym rolls off the tongue so easily that it tempting to treat STEM professionals as a homogeneous group. But they are anything but that.

混乱是其核心,它是从捷径思考STEM(科学,技术,工程和数学)学科中产生的。 首字母缩略词很容易从舌头上滚下来,以至于很容易将STEM专业人士视为同一个群体。 但是,它们不过是什么。

Image for post
Source: Radical Abundance by Eric Drexler, from the Farnum Street blog资料来源:来自Farnum Street博客的Eric Drexler提供的Radical Abundance

Referencing Eric Drexler’s explanation on science and engineering, not only are the two ‘not quite the same’, but seen through the lens of information flows, they are completely opposite. Science starts from reality, gathers data, then — looking at the problem through the lens of inquiry — ends with a new useful model. Engineering starts at the opposite endpoint. It starts with a model, adds detail through specification, then — looking at the problem through the lens of design — ends with a new useful reality.

引用埃里克·德雷克斯勒(Eric Drexler)关于科学和工程的解释,这两者不仅“不完全相同”,而且从信息流的角度看,它们是完全相反的 。 科学从现实开始,收集数据,然后-通过探究的眼光看问题-以新的有用模型结束。 工程始于相反的端点。 它从模型开始,通过规格添加细节,然后-从设计的角度看问题-以新的有用现实结束。

This is relevant because data science and AI projects are effectively managed in exactly these two phases, broadly corresponding to the lenses of science and engineering:

这很重要,因为数据科学和AI项目在这两个阶段中得到有效管理,这两个阶段大致对应于科学工程学的角度

  • ‘Science’ characterises the initial parts of data science projects where feasibility is in question. Starting from a broad problem scope or hypothesis, the main goal of this phase is to address the question of data uncertainty and determine if the project is feasible before it is scaled and ‘put into production’. Here, space is needed to be iterative and experimental. It is a world of defined effort with uncertain outcomes, and management permission must be given to try and fail in rapid succession in order to win the long game. At this point — and I say this as an IT professional — requirements and much of IT is inconsequential. In fact, requirements in the project management sense are the output of this phase, not its starting point.

    “科学”描述了数据科学项目在可行性方面存在问题的最初部分。 从广泛的问题范围或假设出发,此阶段的主要目标是解决数据不确定性问题,并确定项目扩展规模和投入生产之前是否可行。 在这里,空间是需要迭代和实验的。 这是一个充满不确定性的明确努力的世界,必须赢得管理层的许可才能尝试连续失败,以赢得长久的胜利。 在这一点上-我以IT专业人员的身份称呼它-要求,而大多数IT无关紧要。 实际上,项目管理意义上的要求是此阶段的输出 ,而不是其起点。

  • At the point where requirements become clear, work transits into ‘engineering’. This is the domain familiar to project managers: a world of defined outcomes, and people, processes and technology. The vast realm of solution architectural choices invade your world here. Performance, security, automation and scalability become crucial. Management to tight specifications and timelines are non-negotiable, and scope creep is an enemy to be squashed.

    在需求变得清晰的时候,工作就会转变为“工程”。 这是项目经理熟悉的领域:定义明确的结果以及人员,流程和技术的世界。 解决方案体系结构选择的广阔领域已侵入您的世界。 性能,安全性,自动化和可伸缩性变得至关重要。 严格管理规格和时间表是不容商,的,范围爬升是要压制的敌人。
Image for post
The two ‘siblings’ of data science — distinct but friendly. Photo by Eye for Ebony on Unsplash
数据科学的两个“兄弟姐妹”-既独特又友好。 Eye for EbonyUnsplash拍摄的照片

A major cause of data science project failure is this mashup of science and engineering, with long term architectural and operations considerations being discussed before the project is even deemed feasible and valuable. This dysfunction also extends to teams, where the idea of ‘maturing’ a data science team equates to making it ‘more engineering’, or vice versa.

数据科学项目失败的主要原因是科学和工程学的这种融合,在甚至认为该项目可行且有价值之前,就长期架构和运营考虑进行了讨论。 这种功能失调还扩展到了团队,在这些团队中,“使”数据科学团队“成熟”的想法等同于使其成为“更多工程”,反之亦然。

Both are needed, but each is relevant in different parts of a data science project. Each also requires distinctly different management paradigms and skill sets to thrive. Doing core research when there are available engineering solutions is unnecessary, while tighter engineering controls do not make science labs more creative.

这两个都是必需的,但是每个都与数据科学项目的不同部分相关。 每一种都需要截然不同的管理范例和技能才能蓬勃发展。 在没有可用的工程解决方案的情况下,无需进行核心研究,而更严格的工程控制不会使科学实验室更具创造力。

Both science and engineering have to be done right for data science magic to happen.

科学和工程都必须正确完成,以使数据科学魔术发生。

In light of this perspective, there are a few implications worth noting:

从这个角度来看,有一些含义值得注意:

  • The skill set in supporting a single decision is vastly different from the skills needed to develop systems to support the decision many times over. This is a root cause of confusion in data science roles. Doing data cleaning as part of exploring a modeling approach as a data scientist can be vastly different from setting up an automated system to execute the same task as a data engineer.

    支持单个决策的技能与开发用于多次支持决策的系统所需的技能完全不同。 这是造成数据科学角色混乱的根本原因。 作为数据科学家,探索数据建模是探索建模方法的一部分,这与设置自动化系统来执行与数据工程师相同的任务大不相同。

  • Training courses often cause confusion because they implicitly teach within the context of either the iterative science-like portion or the IT-heavy engineering portion.

    培训课程通常会引起混乱,因为它们隐含地在类似迭代科学的部分或IT繁重的工程部分的上下文中进行教学。
  • Many data scientists double up as data engineers, model operations, and system testers, but the defining responsibility of the data scientist is dealing with data uncertainty and checking for the strength of the predictive signal. No other team member can fill that role.

    许多数据科学家可以兼任数据工程师,模型操作和系统测试员,但数据科学家的主要职责是处理数据不确定性并检查预测信号的强度。 没有其他团队成员可以担任该职务。
  • Projects not going into production is absolutely fine if they are the result of feasibility studies failing early and preventing the business from expensive failures later.

    如果可行性研究的结果是早期失败并且以后防止业务遭受昂贵的失败,则不投产的项目绝对是好的。

In summary, we should think of data science or AI projects in two phases. The first has a clear focus on dealing with data uncertainty by sending a small data science focused team to model with the data at hand to see if there is sufficient predictive signal for the use case to be feasible. If we fail here, we fail well.

总而言之,我们应该分两个阶段考虑数据科学或AI项目。 第一个明确地专注于处理数据不确定性,方法是派遣一个以数据科学为中心的小型团队来对现有数据进行建模,以查看是否有足够的预测信号使该用例可行。 如果我们在这里失败,我们将失败。

The place for project management to lead then kicks in if and only if data uncertainty has been dealt with and a scalable IT solution is now the clear order of the day. This is where teams grow and and the full range of IT competencies become crucial.

然后,当且仅当解决了数据不确定性并且可扩展的IT解决方案已经成为当务之急时,项目管理的领导才能介入。 这是团队成长的地方,而全面的IT能力变得至关重要。

Data science and AI magic happens here.

数据科学和AI魔术就在这里发生。

Or at least assuming that our application of data science and AI is responsible and ethical. But that is another story. And its a long one.

或至少假设我们对数据科学和AI的应用负责且合乎道德。 不过那是另一回事了。 它是一个很长的。

All images displayed above are solely for non-commercial illustrative purposes. This article is written in a personal capacity and do not represent the views of the organizations I work for or I am affiliated with.

上面显示的所有图像仅用于非商业说明目的。 本文以个人身份撰写,并不代表我所工作或所属组织的观点。

翻译自: https://towardsdatascience.com/what-every-project-manager-should-know-about-managing-data-science-and-ai-projects-d13f3f8f62a

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值