数据科学项目_进行一些前功可以帮助销售数据科学项目

数据科学项目

Have you been trying to convince your bosses or colleagues that a certain data science project would benefit the company or business, but only to have it dismissed or discounted even though you can see it would be so good to do?

您是否一直在试图说服您的老板或同事,某个数据科学项目将使公司或企业受益,但是即使您发现这样做非常好,也只能将其解雇或打折?

Often you can be rejected with several of the most common reasons being:

通常,您会因为以下几种最常见的原因而被拒绝:

  • Can’t see the benefits

    看不到好处
  • The current way is better, why would we risk change?

    当前的方法更好,为什么我们要冒险改变?
  • Seen as too costly or complicated to be worth the risk

    看起来太昂贵或太复杂,不值得冒险

The list can go on, but the fundamental result is that key stakeholders are just not engaged an onboard and you never get their sign off.

清单可以继续,但是基本的结果是主要的利益相关者只是没有参与其中,而您永远不会得到他们的批准。

所以,你可以做什么? (So, what can you do?)

Similar to an article I wrote previously on starting your projects by focussing only on the core technology that will drive it (rather than all the bells and whistles a finished project needs), you can increase your chances by keeping the ask simple. If you ask for a huge amount of resources or a large amount of money to people who have never seen the benefits a well-planned, executed and designed data science project can bring, then they’ll never agree.

与我之前写的关于开始项目的文章类似,该文章仅关注于会推动项目发展的核心技术(而不是完成项目所需的所有杂项),您可以通过简单地提出问题来增加机会。 如果您向从未见过规划,执行和设计的数据科学项目所带来的收益的人们索要大量资源或大量资金,那么他们将永远不会同意。

The key here is to stick to:

这里的关键是要坚持:

KEEP IT SIMPLE

把事情简单化

A nice way of doing this is rather than delivering a full product stick to a Proof-Of-Concept (POC) strategy. Here you try to deliver just enough that you can demonstrate if a full project is feasible.

做到这一点的一种好方法是将完整的产品坚持概念验证(POC)策略。 在这里,您尝试提供足够的内容,以证明整个项目是否可行。

It also has the great benefit of giving you a better idea for the full project of:

它还具有很大的好处,可以为您提供以下整个项目的更好的主意:

  • What skills are needed

    需要什么技能
  • Total effort that would be required

    所需的总精力
  • Likely benefit (i.e. accuracy and competency of the solution compare to existing solutions)

    可能的收益(即解决方案的准确性和能力与现有解决方案相比)
  • Feel it will put people out of a job (maybe even their own)

    感觉这会让人们失业(甚至可能是他们自己的)

To help I will give a basic example we can go through.

为了帮助我,我将举一个可以通过的基本示例。

追踪那些坑洼 (Tracking Those Potholes)

Image for post
Kaggle) Kaggle )

You work for a firm that manages roads for a local authority. As a new joiner to the firm you note that one of their main problems with maintenance on the aging roads is dealing with the number of potholes that require fixing each year.

您在一家为地方当局管理道路的公司工作。 作为该公司的新成员,您注意到他们在老化道路上进行维护的主要问题之一是处理每年需要修复的坑洞数量。

Potholes require water ingress and traffic in order to form. What happens is that water gets into cracks in the top surface, it freezes and expands pushing the road surface up and increasing the cracks. When it thaws and contracts a hole under the surface is left, which the weight of traffic then breaks up. This material is then lost, and the hole widens over time.

坑洼需要水的进入和运输才能形成。 发生的情况是水进入顶部表面的裂缝,冻结并膨胀,从而推动路面向上并增加裂缝。 当它解冻并收缩时,会在表面下留下一个洞,然后交通的重量就会破裂。 然后这些材料会丢失,并且Kong会随着时间的流逝而扩大。

A nice illustrated graphic of this process is shown below:

下面显示了此过程的漂亮图示:

Image for post
Wikipedia) Wikipedia )

Therefore, potholes can increase over time as roads age and cracks start to form. If left to grow (around 40 mm deep is considered needing urgent repairs in some areas) it can become big enough that it can damage vehicles and claims can be made against the councils (and they claim it back from the company).

因此,随着道路的老化和裂缝的形成,坑洞会随着时间的流逝而增加。 如果任其发展(某些地区深约40毫米的深度被认为需要紧急维修),它可能变得足够大,足以损坏车辆,并可以向议会提出索赔(他们要求公司退还)。

The company tries to repair potholes and a planned repair is 17% cheaper than an emergency repair. The issue they have is keeping track of potholes and monitoring the condition of roads.

该公司试图修复坑洼,计划的修复比紧急修复便宜17%。 他们面临的问题是跟踪坑洼并监控道路状况。

Currently the firm has teams it drives around the road network it manages to examine and mark areas of road that require maintenance. However, they tend to spend a lot of time on high volume roads and the firm only has a few teams of trained personnel that can do it.

目前,该公司拥有在公路网周围行驶的团队,并负责检查和标记需要维护的道路区域。 但是,他们倾向于在高流量道路上花费大量时间,而该公司只有几支经过培训的人员可以做到。

This means that lots of side roads are not monitored actively and rely on reporting by the public to be brought to attention and often this is when they are already very large and requiring immediate repair.

这意味着许多小路没有得到主动监控,而是依靠公众的报告引起注意,通常这是当它们已经很大并且需要立即维修时。

数据科学在哪里? (Where’s the Data Science?)

As a data scientist at the company we recognise that an opportunity exists. If we could get footage of the road network, we could apply a vision-based machine learning solution to recognise the holes. If we coupled it with a GPS, we could then build a map and mark where they are.

作为公司的数据科学家,我们认识到存在机会。 如果我们可以获得公路网的镜头,则可以应用基于视觉的机器学习解决方案来识别漏洞。 如果我们将其与GPS耦合,则可以构建地图并标记它们在哪里。

Just think of it! We could get someone to drive along the network once a week or month with a dash mounted camera and some electronics and we generate an active image of the road quality. We could even add in some other issues (like foreign objects on the road or damaged manhole covers). Just think how this could revolutionise how the business works! We would go from a heavily reactive to a pro-active one.

试想一下! 我们可以每周或每月用行车记录仪和一些电子设备让某人在网络上行驶一次,并生成道路质量的主动图像。 我们甚至可以添加其他一些问题(例如道路上的异物或人Kong盖损坏)。 试想一下,这将如何改变业务运作方式! 我们将从被动应对转变为积极主动。

不幸的是,没有人买 (Unfortunately, No Ones Buying Into It Though)

Image for post
Photo by Tim Mossholder on Unsplash
Tim MossholderUnsplash拍摄的照片

This is generally when you find these projects go nowhere. While your boss or line manager may see that it is not a bad idea (of course they wouldn’t have hired you otherwise), they probably do not have the budget or sign off for what you are asking for and neither the reach to get the different stakeholders involved as it crosses departments.

通常这是当您发现这些项目无济于事时。 尽管您的老板或直属经理可能会认为这不是一个坏主意(当然他们不会雇用您),但他们可能没有预算或签核您所要的东西,也没有得到解决的机会跨部门时涉及的不同利益相关者。

These are the ones you need to convince, and they aren’t keen. This can be because they’ve never seen what a successful data science project can do, but also if you think about what you are asking for:

这些是您需要说服的,他们并不热衷。 这可能是因为他们从未见过成功的数据科学项目可以做什么,而且还没有考虑您的要求:

  • Access to road vehicles needed for their own jobs (perhaps for a while to get enough example footage to train on)

    访问自己工作所需的公路车辆(可能需要一段时间才能获得足够的示例镜头进行训练)
  • Permission to add electronics and equipment to them

    允许向其添加电子设备和设备
  • Get people (who are doing their main jobs) to run and use that equipment

    让人们(正在从事主要工作的人)运行和使用该设备
  • Get custom electronics and kit made that they can use (the GPS and video recording)

    获取可使用的定制电子产品和套件(GPS和视频记录)
  • Computing resource and software design for the mapping and processing functions to generate the results dashboard (so IT may not be happy about this either)

    计算用于映射和处理功能的资源和软件设计,以生成结果仪表板(因此IT对此可能也不满意)
  • Not only your time, but the time of others for a sufficient period to make this work. This could be a couple of months to pull together if you think about it

    不仅要花费您的时间,还要花费他人足够的时间来完成这项工作。 如果考虑一下,这可能需要几个月的时间

Add all of this together and you can see this is not a small resource request and if it doesn’t succeed that could be a lot to lose and they’d have to answer for it.

将所有这些加在一起,您会发现这不是一个小的资源请求,如果请求不成功,则可能会损失很多,因此他们必须对此做出回答。

我们能做什么? (What Can We Do?)

Well this is where the POC thoughts come in. We just need to show that the key deliverable is possible.

好的,这就是POC想法出现的地方。我们只需要证明关键交付物是可能的即可。

This is a key thing to think about, because if it’s their first data science project and it goes badly wrong, they may never buy into another project again. Doing a lower risk POC might enable you to make sure you don’t lose the ability to try again.

这是需要考虑的关键问题,因为如果这是他们的第一个数据科学项目,并且犯了严重错误,那么他们可能永远也不会再购买另一个项目。 降低POC风险可以使您确保不会失去再次尝试的能力。

什么是关键交付成果? (What is the Key Deliverable?)

Image for post
Photo by Kira auf der Heide on Unsplash
图片由 Kira auf der Heide Unsplash

Can we detect potholes to a reliable enough level that it is worth developing into a business capability?

我们是否可以将坑洞检测到足够可靠的水平,值得将其发展为业务能力?

We also want to thin this down so we are asking for enough resources that we can do this to. If we think about what we need to show the key deliverable all we need is:

我们还希望对此进行精简,因此我们需要足够的资源来完成此任务。 如果我们考虑展示关键交付物所需要的内容,那么我们需要做的是:

  • Data we can use

    我们可以使用的数据
  • A suitable model to use

    适合使用的模型
  • Computing resource & time to run and prepare the data

    计算资源和时间来运行和准备数据

This is where some resourcefulness can be useful. Often you can find there are large repositories of different open source datasets which while maybe not perfect can fit the bill and after a bit of foraging there is indeed a pothole dataset on Kaggle.

这是一些足智多谋的地方。 通常,您会发现有很多不同开源数据集的存储库,这些存储库虽然可能并不完美,但却可以满足要求,并且经过一番探索之后, Kaggle上确实存在一个坑坑洼洼的数据集。

Model wise this is part of being a data scientist and I would probably look at using an existing image recognition model and transfer learn so we don’t need a huge amount of new images to re-train it and also we should get superior performance than if we trained from start.

在模型方面,这是成为数据科学家的一部分,我可能会考虑使用现有的图像识别模型并进行转移学习,因此我们不需要大量的新图像来对其进行重新训练,并且我们应该获得比如果我们从一开始就训练。

This should all be possible on the computing resources you have at your disposal as a data scientist (e.g. a powerful laptop or access to a computing resource on the company server).

在您作为数据科学家可以使用的计算资源上(例如功能强大的笔记本电脑或对公司服务器上的计算资源的访问),这一切应该都是可能的。

Note: If you don’t have access to any computing resource, now is a good time to tell your work that you need it to do your job.

注意:如果您无权访问任何计算资源,那么现在是时候告诉您的工作需要它来完成工作的好时机。

建立模型 (Build the Model)

I used TensorFlow as my ML tool of choice and LabelImg to label the raw images with potholes.

我使用TensorFlow作为我选择的ML工具,并使用LabelImg用坑洼标记原始图像。

I did around 150 images and then split it into 100 for training and 50 for testing. This smaller curated set was useful as I could select images that look as similar as I would expect for a dash camera taken image (i.e. they are looking up the road like a car would travel).

我做了大约150张图像,然后将其分为100张用于训练和50张用于测试。 这个较小的策展集合非常有用,因为我可以选择看起来与行车记录仪拍摄的图像相似的图像(即,它们就像汽车在行驶一样在道路上注视)。

Image for post
Image for post
Labelling Potholes on our Dataset
在我们的数据集上标记坑洞

I then set my model training and looked at the output. I’ve put an example below, but the results were pretty good. I set myself a deadline of just a day so I had to stop the model training early, but it looked like it was achieving around 70% accuracy, which is pretty good for a draft model!

然后,我设置模型训练并查看输出。 我在下面举了一个例子,但是结果很好。 我给自己设定了最后一天的截止日期,所以我不得不提早停止模型训练,但是看起来它达到了大约70%的准确性,这对于草稿模型来说是相当不错的!

Image for post
Image for post
Initial POC model identifying potholes in an image it has never seen
初始POC模型可识别从未见过的图像中的坑洞

结果(The Result)

So, what do we do with this? Well what we have done is taken some very rough data, done some rough cleaning and quick training on it and are getting very good results. We know that with more data that is in line with what we expect (a lot of images aren’t quite inline for a dash cam footage) and we only very quickly trained a mode on a small dataset getting a good result.

那么,我们该怎么办? 好吧,我们所做的是获取了一些非常粗糙的数据,对其进行了一些粗糙的清洁和快速培训,并获得了很好的结果。 我们知道,有了更多与我们期望相符的数据(很多图像对于行车记录仪的镜头来说并不是很内联),而且我们只能非常Swift地在小型数据集上训练模式,从而获得良好的结果。

We can confidently say that we would expect performance to be at or above this level and we have images and results we can compile into a defined path and use real images to show comparable results of what a full system might work with. For example, we can now show them a design like this:

我们可以自信地说,我们希望性能达到或超过此级别,并且我们可以将图像和结果编译为定义的路径,并使用真实图像来显示整个系统可以使用的可比结果。 例如,我们现在可以向他们展示这样的设计:

Image for post

While it may not look important. Being able to show actual results and give a better handle on things can help and with some numbers we can start to make a business case.

虽然看起来可能并不重要。 能够显示实际结果并更好地处理问题会有所帮助,并且通过一些数字我们可以开始进行业务案例研究。

In our example, we find:

在我们的示例中,我们发现:

  • They tend to miss the potholes on more minor roads as priority is given to higher volume roads

    他们倾向于错过更多次要道路上的坑洼,因为优先考虑大容量道路
  • We would hope to identify at least 70% of the potholes just by taking dash camera footage of the roads

    我们希望仅通过拍摄道路的破折号镜头就能找出至少70%的坑洼
  • This footage can be taken by anyone driving the route and doesn’t rely on the limited pothole trained members

    可以由驾驶该路线的任何人拍摄此镜头,而无需依赖受过有限坑洼训练的成员
  • Proactive repair is 17% cheaper than reactive repair

    主动维修比被动维修便宜17%

Combined this together we can then say that we would expect to save 17% on 70% of the potholes that are there.

结合起来,我们可以说我们可以节省70%的坑洞,从而节省17%。

现在怎么办? 实数接地 (What Now? Ground it in Real Numbers)

Image for post
Photo by Fabian Blank on Unsplash
Fabian BlankUnsplash拍摄的照片

From here I would encourage getting some firm numbers so that the stakeholders can see the potential and see whether the cost of running a full project of this would be worthwhile.

从这里开始,我鼓励获得一些公司数字,以便利益相关者可以看到潜力,并查看运行一个完整项目的成本是否值得。

For example, if we find that:

例如,如果我们发现:

  • £49 dollars to proactive repair each pothole

    主动修复每个坑洞,花费49英镑
  • £60 dollars to reactively repair potholes

    60美元用于React性修补坑洼
  • The roads that are rarely surveyed tend to have around 100 pothole repairs per month

    很少被调查的道路每月大约有100次坑洼维修
  • 5 per month are dangerously large and incur claimable repair costs by vehicles of £300 dollars per dangerous pothole (separate to pothole repair cost)

    每月5辆是危险的大货车,每辆危险坑洞的车辆维修成本为£300美元(与坑洼维修费用分开)
  • We assume all these potholes are currently reactive repairs after report from public

    根据公众的报告,我们认为所有这些坑洞目前都是被动修复

We can now say that currently the costs are:

现在我们可以说当前的成本是:

  • system current costs £72,000 per year to repair 1200 potholes

    系统当前每年花费£72,000来修复1200个坑洞
  • claimable repair costs are £18,000 per year for the 60 dangerous potholes

    60个危险坑洞的可索赔维修成本为每年£18,000

With this new system we can say we would hope for at least:

有了这个新系统,我们可以说至少希望:

  • £41,160 (840 proactive repairs) and £21,600

    41,160英镑(840次主动维修)和21,600英镑
  • £5,400 for claimable repair costs

    5,400英镑可索取的维修费用

Costs change from £90,000 down to £68,160 per year. Saving the company £21,840 per year.

每年的费用从90,000英镑降至68,160英镑。 每年为公司节省21,840英镑。

This estimated minimum saving can then be evaluated by stakeholders to better balance the risk/rewards, but also against project costs. You could also expand the savings by saying how it could affect the wider network as well. All this from a day of work!

然后,利益相关者可以评估此​​估计的最低节省量,以更好地平衡风险/回报,同时也可以与项目成本进行权衡。 您还可以通过说出它如何影响更广泛的网络来扩展节省的资金。 所有这些都是从一天的工作开始的!

With this sort of POC you can go from a vague proposition to a much firmer position and the start of a business cost to get people on board.

借助这种POC,您可以从模糊的主张转变为更加牢固的立场,并开始招揽业务以吸引更多人加入。

概要 (Summary)

Image for post
Photo by Daniil Kuželev on Unsplash
DaniilKuželevUnsplash拍摄的照片

We went from a grand project vision with only subjective savings and costs, to a small resource cost (which your manager can probably give you) with a POC whose results can start to give ideas of resourcing and savings. From there you can build a case that will increase your chances of selling a data science project.

我们从仅具有主观节省和成本的宏伟项目愿景,到具有POC的少量资源成本(您的经理可能会给您提供),POC的结果可以开始提供资源和节约的想法。 从那里您可以构建一个案例,从而增加销售数据科学项目的机会。

This tends to be a strategy I use throughout my work. Instead of trying to sell and start with huge projects, a smaller scope project revolving around the key deliverables can often lower the risks and increase your chances of success. It will also tend to mean you do a lot of different things, which is good if you like variety.

这往往是我在整个工作中使用的策略。 围绕大型关键项目而不是试图出售并开始大型项目,通常可以降低风险并增加成功的机会。 这也往往意味着您要做很多不同的事情,如果您喜欢变化,那很好。

Note: Where possible I’ve tried to use realistic numbers wherever possible. However, in some cases I’ve used numbers that seemed reasonable. These are of course for illustrative purposes only.

注意在可能的情况下,我尽力使用实际数字。 但是,在某些情况下,我使用的数字似乎合理。 这些当然仅出于说明目的。

翻译自: https://towardsdatascience.com/a-little-pre-work-can-help-sell-data-science-projects-1cd32c94fe19

数据科学项目

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值