数据科学与大数据技术的案例_作为数据科学家解决问题的案例研究

数据科学与大数据技术的案例

There are two myths about how data scientists solve problems: one is that the problem naturally exists, hence the challenge for a data scientist is to use an algorithm and put it into production. Another myth considers data scientists always try leveraging the most advanced algorithms, the fancier model equals a better solution. While these are not fully groundless, they represent two common misunderstandings on how data scientists work: one emphasizes too much on the “execution” side, and the other overstate the “algorithm” part.

关于数据科学家如何解决问题有两个神话:一个是问题自然存在,因此数据科学家面临的挑战是使用算法并将其投入生产。 另一个神话认为,数据科学家总是尝试利用最先进的算法,更高级的模型等于更好的解决方案。 尽管这些并不是完全没有根据的,但它们代表了关于数据科学家如何工作的两个常见误解:一个在“执行”方面过分强调,而另一个则夸大了“算法”部分。

Obviously, these myths are not how we actually solve problems. From my perspective, problem-solving for a data scientist is:

显然,这些神话并不是我们实际解决问题的方式。 从我的角度来看,为数据科学家解决问题的方法是:

  • more about “how to abstract the problem out of the business context”, not just “be handed with a specific task”

    更多关于“如何从业务环境中抽象出问题”,而不仅仅是“处理特定任务”
  • more about “solve the problem with an algorithm”, not just “use the best algorithm to solve a problem”

    更多关于“使用算法解决问题”,而不仅仅是“使用最佳算法来解决问题”
  • more about “iteratively deliver business value”, not just “implement the code and call it a day”.

    更多关于“迭代地交付业务价值”,而不仅仅是“实施代码并称其为一天”。

With this said, I observe there are usually four stages involved in the problem-solving process, and I would like to share what are the four stages, and how it works in action with a case study, and then how can we get there with the right mindsets.

如此说来,我观察到解决问题的过程通常涉及四个阶段,我想分享这四个阶段是什么,以及它如何与案例研究一起发挥作用,然后我们如何才能达到目标?正确的心态。

故事始于…… (The story starts with, once upon a time …)

My first job was in a company that operates an automotive pricing and information website and it went through the initial public offering (IPO) in May 2014. It was a great experience and I vividly remember everyone around was cheering on that day for the birth of a public company. As a public company, our revenue started to receive a lot of attention, especially with the first quarterly earnings report coming out in August. In early July, the director in the revenue department came to the Data Scientists' seating area, and it did not look like he got good news to share.

我的第一份工作是在一家经营汽车价格和信息网站的公司中,该公司于2014年5月进行了首次公开募股(IPO)。这是一次很棒的经历,我生动地记得那天周围的每个人都为该公司的诞生欢呼雀跃。上市公司。 作为一家上市公司,我们的收入开始受到广泛关注,尤其是在八月份发布了第一份季度收益报告之后。 7月初,税务部门的主管来到了数据科学家的办公区,看来他没有什么好消息可分享。

“We are in trouble, a percentage of the sales revenue cannot be credited appropriately; we need your help.”

“我们有麻烦,不能适当地记入一定比例的销售收入; 我们需要您的帮助。”

Here are some relevant contexts: the company’s revenue is generated based on the fact that it introduces more sales to car dealers. To get the deserved commission, we need to match the sale of a vehicle to the correct customer. If our data providers can tell us which customer bought which vehicle, then the matching is done and no extra effort is needed; however, the problem is that one data provider decided to not provide the 1-to-1 sale record: it has to be done in a batch (visualization on what is a “batch” shown as below), then it is much harder and uncertain to know which customer bought which car.

以下是一些相关的上下文:公司的收入是基于这样的事实而产生的:它为汽车经销商带来了更多的销售。 为了获得应得的佣金,我们需要将车辆的销售与正确的客户匹配。 如果我们的数据提供商可以告诉我们哪个客户购买了哪辆汽车,那么匹配就完成了,不需要额外的工作; 但是,问题在于,一个数据提供者决定不提供一对一的销售记录:必须分批处理(可视化显示如下所示的“批处理”),这会变得更加困难,并且不确定要知道哪个客户买了哪辆车。

Image for post

The revenue team was surprised by this change and after spending the past month trying to solve the problem, only 2% of sales from that data provider could be recovered manually. This would be bad news for the first earning call, so they came to seek help from Data Scientists. This is clearly an urgent problem that needs to be solved, so we jumped right on it.

收入团队对此更改感到惊讶,在花费了过去一个月的时间来解决问题之后,只能手动恢复该数据提供商2%的销售额。 这对于第一次打来的电话来说是个坏消息,因此他们来寻求数据科学家的帮助。 显然,这是一个亟待解决的紧迫问题,因此我们跳过了。

阶段1.了解问题,然后使用数学术语重新定义 (Stage 1. understand the problem, and then redefine it using mathematical terms)

This is the first stage of problem-solving in Data Science. Regarding “understand the problem” part, one needs to clearly identify the pain points so that once the pain point is resolved, the problem should be gone; regarding “redefine” the problem part, this is usually why a problem needs Data Scientist help.

这是数据科学中解决问题的第一步。 关于“理解问题”部分,需要清楚地识别痛点,以便一旦痛点得到解决,问题就应该消除。 关于“定义”问题部分,通常这就是为什么问题需要数据科学家的帮助。

For the specific one asked by our revenue team, the problem is: we cannot assign each sold vehicle to a customer, then we lose the revenue.

对于我们的收入团队要求的特定问题,问题是:我们无法将每辆售出的车辆分配给客户,然后我们损失了收入。

The pain point is: finding who purchased a vehicle in the given batch is manual and inaccurate, considering there are thousands of batches that need matching sales, it is very time-consuming and not sustainable.

痛点是:考虑到成千上万的批次需要匹配的销售,找到谁在给定的批次中购买了汽车是手动且不准确的,这非常耗时且不可持续。

The “redefined” problem in a mathematical term is: given a batch with customer C1,

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值