

Have you been trying to convince your bosses or colleagues that a certain data science project would benefit the company or business, but only to have it dismissed or discounted even though you can see it would be so good to do?


Often you can be rejected with several of the most common reasons being:


  • Can’t see the benefits

  • The current way is better, why would we risk change?

  • Seen as too costly or complicated to be worth the risk


The list can go on, but the fundamental result is that key stakeholders are just not engaged an onboard and you never get their sign off.


所以,你可以做什么? (So, what can you do?)

Similar to an article I wrote previously on starting your projects by focussing only on the core technology that will drive it (rather than all the bells and whistles a finished project needs), you can increase your chances by keeping the ask simple. If you ask for a huge amount of resources or a large amount of money to people who have never seen the benefits a well-planned, executed and designed data science project can bring, then they’ll never agree.

与我之前写的关于开始项目的文章类似,该文章仅关注于会推动项目发展的核心技术(而不是完成项目所需的所有杂项),您可以通过简单地提出问题来增加机会。 如果您向从未见过规划,执行和设计的数据科学项目所带来的收益的人们索要大量资源或大量资金,那么他们将永远不会同意。

The key here is to stick to:




A nice way of doing this is rather than delivering a full product stick to a Proof-Of-Concept (POC) strategy. Here you try to deliver just enough that you can demonstrate if a full project is feasible.

做到这一点的一种好方法是将完整的产品坚持概念验证(POC)策略。 在这里,您尝试提供足够的内容,以证明整个项目是否可行。

It also has the great benefit of giving you a better idea for the full project of:


  • What skills are needed

  • Total effort that would be required

  • Likely benefit (i.e. accuracy and competency of the solution compare to existing solutions)

  • Feel it will put people out of a job (maybe even their own)


To help I will give a basic example we can go through.


追踪那些坑洼 (Tracking Those Potholes)

Kaggle) Kaggle )

You work for a firm that manages roads for a local authority. As a new joiner to the firm you note that one of their main problems with maintenance on the aging roads is dealing with the number of potholes that require fixing each year.

您在一家为地方当局管理道路的公司工作。 作为该公司的新成员,您注意到他们在老化道路上进行维护的主要问题之一是处理每年需要修复的坑洞数量。

Potholes require water ingress and traffic in order to form. What happens is that water gets into cracks in the top surface, it freezes and expands pushing the road surface up and increasing the cracks. When it thaws and contracts a hole under the surface is left, which the weight of traffic then breaks up. This material is then lost, and the hole widens over time.

坑洼需要水的进入和运输才能形成。 发生的情况是水进入顶部表面的裂缝,冻结并膨胀,从而推动路面向上并增加裂缝。 当它解冻并收缩时,会在表面下留下一个洞,然后交通的重量就会破裂。 然后这些材料会丢失,并且Kong会随着时间的流逝而扩大。

A nice illustrated graphic of this process is shown below:


Wikipedia) Wikipedia )

Therefore, potholes can increase over time as roads age and cracks start to form. If left to grow (around 40 mm deep is considered needing urgent repairs in some areas) it can become big enough that it can damage vehicles and claims can be made against the councils (and they claim it back from the company).

因此,随着道路的老化和裂缝的形成,坑洞会随着时间的流逝而增加。 如果任其发展(某些地区深约40毫米的深度被认为需要紧急维修),它可能变得足够大,足以损坏车辆,并可以向议会提出索赔(他们要求公司退还)。

The company tries to repair potholes and a planned repair is 17% cheaper than an emergency repair. The issue they have is keeping track of potholes and monitoring the condition of roads.

该公司试图修复坑洼,计划的修复比紧急修复便宜17%。 他们面临的问题是跟踪坑洼并监控道路状况。

Currently the firm has teams it drives around the road network it manages to examine and mark areas of road that require maintenance. However, they tend to spend a lot of time on high volume roads and the firm only has a few teams of trained personnel that can do it.

目前,该公司拥有在公路网周围行驶的团队,并负责检查和标记需要维护的道路区域。 但是,他们倾向于在高流量道路上花费大量时间,而该公司只有几支经过培训的人员可以做到。

This means that lots of side roads are not monitored actively and rely on reporting by the public to be brought to attention and often this is when they are already very large and requiring immediate repair.


数据科学在哪里? (Where’s the Data Science?)

As a data scientist at the company we recognise that an opportunity exists. If we could get footage of the road network, we could apply a vision-based machine learning solution to recognise the holes. If we coupled it with a GPS, we could then build a map and mark where they are.

作为公司的数据科学家,我们认识到存在机会。 如果我们可以获得公路网的镜头,则可以应用基于视觉的机器学习解决方案来识别漏洞。 如果我们将其与GPS耦合,则可以构建地图并标记它们在哪里。

Just think of it! We could get someone to drive along the network once a week or month with a dash mounted camera and some electronics and we generate an active image of the road quality. We could even add in some other issues (like foreign objects on the road or damaged manhole covers). Just think how this could revolutionise how the business works! We would go from a heavily reactive to a pro-active one.

试想一下! 我们可以每周或每月用行车记录仪和一些电子设备让某人在网络上行驶一次,并生成道路质量的主动图像。 我们甚至可以添加其他一些问题(例如道路上的异物或人Kong盖损坏)。 试想一下,这将如何改变业务运作方式! 我们将从被动应对转变为积极主动。

不幸的是,没有人买 (Unfortunately, No Ones Buying Into It Though)

Photo by Tim Mossholder on Unsplash
Tim MossholderUnsplash拍摄的照片

This is generally when you find these projects go nowhere. While your boss or line manager may see that it is not a bad idea (of course they wouldn’t have hired you otherwise), they probably do not have the budget or sign off for what you are asking for and neither the reach to get the different stakeholders involved as it crosses departments.

通常这是当您发现这些项目无济于事时。 尽管您的老板或直属经理可能会认为这不是一个坏主意(当然他们不会雇用您),但他们可能没有预算或签核您所要的东西,也没有得到解决的机会跨部门时涉及的不同利益相关者。

These are the ones you need to convince, and they aren’t keen. This can be because they’ve never seen what a successful data science project can do, but also if you think about what you are asking for:

这些是您需要说服的,他们并不热衷。 这可能是因为他们从未见过成功的数据科学项目可以做什么,而且还没有考虑您的要求:

  • Access to road vehicles needed for their own jobs (perhaps for a while to get enough example footage to train on)

  • Permission to add electronics and equipment to them

  • Get people (who are doing their main jobs) to run and use that equipment

  • Get custom electronics and kit made that they can use (the GPS and video recording)

  • Computing resource and software design for the mapping and processing functions to generate the results dashboard (so IT may not be happy about this either)

  • Not only your time, but the time of others for a sufficient period to make this work. This could be a couple of months to pull together if you think about it

    不仅要花费您的时间,还要花费他人足够的时间来完成这项工作。 如果考虑一下,这可能需要几个月的时间

Add all of this together and you can see this is not a small resource request and if it doesn’t succeed that could be a lot to lose and they’d have to answer for it.


我们能做什么? (What Can We Do?)

Well this is where the POC thoughts come in. We just need to show that the key deliverable is possible.


This is a key thing to think about, because if it’s their first data science project and it goes badly wrong, they may never buy into another project again. Doing a lower risk POC might enable you to make sure you don’t lose the ability to try again.

这是需要考虑的关键问题,因为如果这是他们的第一个数据科学项目,并且犯了严重错误,那么他们可能永远也不会再购买另一个项目。 降低POC风险可以使您确保不会失去再次尝试的能力。

什么是关键交付成果? (What is the Key Deliverable?)

Photo by Kira auf der Heide on Unsplash
图片由 Kira auf der Heide Unsplash

Can we detect potholes to a reliable enough level that it is worth developing into a business capability?


We also want to thin this down so we are asking for enough resources that we can do this to. If we think about what we need to show the key deliverable all we need is:

我们还希望对此进行精简,因此我们需要足够的资源来完成此任务。 如果我们考虑展示关键交付物所需要的内容,那么我们需要做的是:

  • Data we can use

  • A suitable model to use

  • Computing resource & time to run and prepare the data


This is where some resourcefulness can be useful. Often you can find there are large repositories of different open source datasets which while maybe not perfect can fit the bill and after a bit of foraging there is indeed a pothole dataset on Kaggle.

这是一些足智多谋的地方。 通常,您会发现有很多不同开源数据集的存储库,这些存储库虽然可能并不完美,但却可以满足要求,并且经过一番探索之后, Kaggle上确实存在一个坑坑洼洼的数据集。

Model wise this is part of being a data scientist and I would probably look at using an existing image recognition model and transfer learn so we don’t need a huge amount of new images to re-train it and also we should get superior performance than if we trained from start.


This should all be possible on the computing resources you have at your disposal as a data scientist (e.g. a powerful laptop or access to a computing resource on the company server).


Note: If you don’t have access to any computing resource, now is a good time to tell your work that you need it to do your job.


建立模型 (Build the Model)

I used TensorFlow as my ML tool of choice and LabelImg to label the raw images with potholes.


I did around 150 images and then split it into 100 for training and 50 for testing. This smaller curated set was useful as I could select images that look as similar as I would expect for a dash camera taken image (i.e. they are looking up the road like a car would travel).

我做了大约150张图像,然后将其分为100张用于训练和50张用于测试。 这个较小的策展集合非常有用,因为我可以选择看起来与行车记录仪拍摄的图像相似的图像(即,它们就像汽车在行驶一样在道路上注视)。

Labelling Potholes on our Dataset

I then set my model training and looked at the output. I’ve put an example below, but the results were pretty good. I set myself a deadline of just a day so I had to stop the model training early, but it looked like it was achieving around 70% accuracy, which is pretty good for a draft model!

然后,我设置模型训练并查看输出。 我在下面举了一个例子,但是结果很好。 我给自己设定了最后一天的截止日期,所以我不得不提早停止模型训练,但是看起来它达到了大约70%的准确性,这对于草稿模型来说是相当不错的!

Initial POC model identifying potholes in an image it has never seen

结果(The Result)

So, what do we do with this? Well what we have done is taken some very rough data, done some rough cleaning and quick training on it and are getting very good results. We know that with more data that is in line with what we expect (a lot of images aren’t quite inline for a dash cam footage) and we only very quickly trained a mode on a small dataset getting a good result.

那么,我们该怎么办? 好吧,我们所做的是获取了一些非常粗糙的数据,对其进行了一些粗糙的清洁和快速培训,并获得了很好的结果。 我们知道,有了更多与我们期望相符的数据(很多图像对于行车记录仪的镜头来说并不是很内联),而且我们只能非常Swift地在小型数据集上训练模式,从而获得良好的结果。

We can confidently say that we would expect performance to be at or above this level and we have images and results we can compile into a defined path and use real images to show comparable results of what a full system might work with. For example, we can now show them a design like this:

我们可以自信地说,我们希望性能达到或超过此级别,并且我们可以将图像和结果编译为定义的路径,并使用真实图像来显示整个系统可以使用的可比结果。 例如,我们现在可以向他们展示这样的设计:

While it may not look important. Being able to show actual results and give a better handle on things can help and with some numbers we can start to make a business case.

虽然看起来可能并不重要。 能够显示实际结果并更好地处理问题会有所帮助,并且通过一些数字我们可以开始进行业务案例研究。

In our example, we find:


  • They tend to miss the potholes on more minor roads as priority is given to higher volume roads

  • We would hope to identify at least 70% of the potholes just by taking dash camera footage of the roads

  • This footage can be taken by anyone driving the route and doesn’t rely on the limited pothole trained members

  • Proactive repair is 17% cheaper than reactive repair


Combined this together we can then say that we would expect to save 17% on 70% of the potholes that are there.


现在怎么办? 实数接地 (What Now? Ground it in Real Numbers)

Photo by Fabian Blank on Unsplash
Fabian BlankUnsplash拍摄的照片

From here I would encourage getting some firm numbers so that the stakeholders can see the potential and see whether the cost of running a full project of this would be worthwhile.


For example, if we find that:


  • £49 dollars to proactive repair each pothole

  • £60 dollars to reactively repair potholes

  • The roads that are rarely surveyed tend to have around 100 pothole repairs per month

  • 5 per month are dangerously large and incur claimable repair costs by vehicles of £300 dollars per dangerous pothole (separate to pothole repair cost)

  • We assume all these potholes are currently reactive repairs after report from public


We can now say that currently the costs are:


  • system current costs £72,000 per year to repair 1200 potholes

  • claimable repair costs are £18,000 per year for the 60 dangerous potholes


With this new system we can say we would hope for at least:


  • £41,160 (840 proactive repairs) and £21,600

  • £5,400 for claimable repair costs


Costs change from £90,000 down to £68,160 per year. Saving the company £21,840 per year.

每年的费用从90,000英镑降至68,160英镑。 每年为公司节省21,840英镑。

This estimated minimum saving can then be evaluated by stakeholders to better balance the risk/rewards, but also against project costs. You could also expand the savings by saying how it could affect the wider network as well. All this from a day of work!

然后,利益相关者可以评估此​​估计的最低节省量,以更好地平衡风险/回报,同时也可以与项目成本进行权衡。 您还可以通过说出它如何影响更广泛的网络来扩展节省的资金。 所有这些都是从一天的工作开始的!

With this sort of POC you can go from a vague proposition to a much firmer position and the start of a business cost to get people on board.


概要 (Summary)

Photo by Daniil Kuželev on Unsplash

We went from a grand project vision with only subjective savings and costs, to a small resource cost (which your manager can probably give you) with a POC whose results can start to give ideas of resourcing and savings. From there you can build a case that will increase your chances of selling a data science project.

我们从仅具有主观节省和成本的宏伟项目愿景,到具有POC的少量资源成本(您的经理可能会给您提供),POC的结果可以开始提供资源和节约的想法。 从那里您可以构建一个案例,从而增加销售数据科学项目的机会。

This tends to be a strategy I use throughout my work. Instead of trying to sell and start with huge projects, a smaller scope project revolving around the key deliverables can often lower the risks and increase your chances of success. It will also tend to mean you do a lot of different things, which is good if you like variety.

这往往是我在整个工作中使用的策略。 围绕大型关键项目而不是试图出售并开始大型项目,通常可以降低风险并增加成功的机会。 这也往往意味着您要做很多不同的事情,如果您喜欢变化,那很好。

Note: Where possible I’ve tried to use realistic numbers wherever possible. However, in some cases I’ve used numbers that seemed reasonable. These are of course for illustrative purposes only.

注意在可能的情况下,我尽力使用实际数字。 但是,在某些情况下,我使用的数字似乎合理。 这些当然仅出于说明目的。

翻译自: https://towardsdatascience.com/a-little-pre-work-can-help-sell-data-science-projects-1cd32c94fe19


