敏捷数据科学pdf_如何将敏捷框架应用于数据科学项目

敏捷数据科学pdf

In this article, we'll discuss how agile principles and values can be applied to the way you approach data science projects.

在本文中,我们将讨论如何将敏捷性原则和价值观应用于您处理数据科学项目的方式。

Project management methodologies are commonly used to get projects done or get a product (often referred to as a tool) produced. They are, in general, processes and frameworks which break down the overall objective to individual tasks organised on a timeline. This can be adapted and used to approach data science projects.

项目管理方法通常用于完成项目或获得产品(通常称为工具)。 通常,它们是将总体目标分解为按时间表组织的各个任务的流程和框架。 可以对其进行修改,并用于处理数据科学项目。

In the past, the traditional Waterfall methodology (dated way back to 1970) has been very popular. It defines all requirements and parameters of the product at the start, so that the project team can work towards this target in sequential phases.

过去,传统的Waterfall方法 (可追溯至1970年)非常流行。 它从一开始就定义了产品的所有要求和参数,以便项目团队可以在相继的阶段朝着这个目标努力。

This method has been successful in the manufacturing industry where product specifications seldom vary with time. It requires very extensive upfront planning, and ideally, the output product is exactly the same as specified in the beginning.

这种方法在制造规格很少随时间变化的制造业中很成功。 它需要非常广泛的前期计划,理想情况下,输出产品与开始时指定的产品完全相同。

But the Waterfall methodology started to become unsuitable for software projects. Because of this, many popular project management methodologies have emerged over the years, especially in the software development industry. Let me share the most popular one.

但是瀑布方法论开始变得不适用于软件项目。 因此,多年来出现了许多流行的项目管理方法 ,尤其是在软件开发行业中。 让我分享最受欢迎的一个。

敏捷框架 (Agile Framework)

Agile is a way of working developed in 2001, and is a widely used to manage software development projects. It is suitable for fast-paced development cycles and has provision for changing specifications throughout the design and build process. It is flexible, and strives for iterative incremental improvement in the product through team collaboration. In short, Agile is to plan, build, test, learn, repeat.

敏捷是2001年开发的一种工作方式,被广泛用于管理软件开发项目。 它适用于快节奏的开发周期,并可以在整个设计和构建过程中更改规格。 它具有灵活性,并通过团队协作努力实现产品的迭代增量改进。 简而言之,敏捷就是计划,构建,测试,学习,重复。

Agile teams are responsive to the unpredictable requirements as the project unfolds, through iterative work processes. Below are Agile principles which serve as a framework (guideline) to the way of working:

随着项目的开展,敏捷团队通过迭代的工作流程对不可预测的需求做出响应。 以下是用作工作方式框架(准则)的敏捷原则

  • Customer satisfaction through early and continuous software delivery

    通过尽早连续交付软件来使客户满意
  • Accommodate changing requirements throughout the development process

    在整个开发过程中适应不断变化的需求
  • Frequent delivery of working software, as the working software is the primary measure of progress

    频繁交付工作软件,因为工作软件是进度的主要衡量标准
  • Collaboration and interaction between the business stakeholders (client) and developers (vendor) throughout the project, including face-to-face communication within the development team

    在整个项目中,业务涉众(客户)与开发人员(供应商)之间的协作和互动,包括开发团队内部的面对面交流
  • Support, trust, and motivate the people involved

    支持,信任和激励相关人员
  • Agile frameworks to support a consistent development pace

    敏捷框架支持一致的发展速度
  • Attention to technical detail and design enhances agility

    注重技术细节和设计可增强敏捷性
  • Simplicity in looking for solutions

    寻找解决方案的简便性
  • Regular reflections in the self-organising team on how to become more effective

    自组织团队定期思考如何提高效率

Agile projects are characterized by a series of tasks that are conceived, executed and adapted as the situation demands. However, Agile focus is not on what to do, but how to think. Agile values and places priority on:

敏捷项目的特点是根据情况需要构思,执行和调整一系列任务。 但是,敏捷的重点不是做什么,而是如何思考 。 敏捷价值观和重中之重 上:

  • Individuals and interactions (rather than processes and tools)

    个人和互动(而不是流程和工具)
  • Working software (rather than comprehensive documentation)

    工作软件(而不是全面的文档)
  • Customer collaboration (rather than contract negotiation)

    客户协作(而不是合同谈判)
  • Response to change (rather than following a predefined rigid plan)

    对变化的响应(而不是遵循预先定义的严格计划)

敏捷实践与数据科学 (Agile practices and Data Science)

While Agile principles and priorities are employed for greater productivity, most of them can be leveraged for data science (DS) projects.

尽管采用敏捷原则和优先级来提高生产率,但大多数原理和优先级可用于数据科学(DS)项目。

Moreover, data scientists do not know how to schedule the project because it is impossible to determine a specific timeline for the type of “research” and exploratory work. Most DS projects require trial and error by going down different paths and trying different techniques. They do not have an element of certainty in the output, so Agile can be used to direct the workflow.

此外,数据科学家不知道如何安排项目时间,因为无法确定“研究”和探索性工作的具体时间表。 大多数DS项目需要走不同的道路并尝试不同的技术,因此需要反复试验。 它们在输出中没有确定性,因此可以使用敏捷来指导工作流程。

Most other projects deal with what customers want, what the developers want, and what the business seeks. When working with DS, another perspective is added: what the data is telling you.

其他大多数项目都处理客户的需求,开发人员的需求以及企业的需求。 使用DS时,添加另一个角度: 数据告诉您什么

Data scientists cannot make any sense out of the data unless they develop a basic understanding of it. There is a lot of investigation, exploration, testing and tuning. Agile uses the concept of iteration and constant feedback in order to refine a system under development, in order to move up the Data-Value Pyramid.

除非他们对数据有基本的了解,否则数据科学家无法从数据中获得任何意义。 有很多调查,探索,测试和调整。 敏捷使用迭代和恒定反馈的概念来优化正在开发的系统,以提升Data-Value Pyramid

When working on DS projects, insights are not immediately achievable. Multiple iterations are needed before any insights can be discovered.

在DS项目上工作时,无法立即获得见解。 在发现任何见解之前,需要多次迭代。

如何应用敏捷实践 (How agile practices can be applied)

I will explain the main Agile working practices (Scrum framework), and how they can be applied to DS:

我将解释主要的敏捷工作实践( Scrum框架 ),以及如何将其应用于DS:

Define the business need and the project objective. This is usually driven by the product owner who is responsible for the product features and quality. It is the big picture stuff, but this is the core belief that you will refer back to as you build.

定义业务需求和项目目标 。 这通常是由负责产品功能和质量的产品所有者驱动的。 这是全局的东西,但这是您在构建时会引用的核心信念。

In DS, the product owner could be the client, the business, or the end customer (for example, end user of a prediction tool). Understand what problems the product owner is facing and tailor the project proposal to meet their needs.

在DS中,产品所有者可以是客户,企业或最终客户(例如,预测工具的最终用户)。 了解产品负责人面临的问题,并定制项目建议书以满足他们的需求。

Build the backlog. Focusing on the user requirements (“user stories” in Agile), a list of tasks is derived that you need to accomplish to build product features or improve product performance.

建立积压 。 着眼于用户需求(敏捷中的“用户案例”),将导出您需要完成的任务列表,以构建产品功能或提高产品性能。

The DS team builds the backlog together with the product owner to determine the product features and performance targets. The backlog could start from getting the data in the structured way before they can be analysed. Then it could be a list for feature selection or feature engineering, or a list of models to select, tune and optimise.

DS团队与产品所有者一起构建积压订单,以确定产品功能和性能目标。 积压工作可以从以结构化方式获取数据开始,然后再对其进行分析。 然后可能是用于特征选择或特征工程的列表,或者是要选择,调整和优化的模型的列表。

Prioritise the backlog, identify the backlog tasks which will bring the most value with the least effort.

优先安排待办事项 ,确定将以最少的努力带来最大价值的待办事项。

In DS, not every approach is worth trying, so cover the most promising ones first. When the main ones are conveyed, you might find that the remaining others are not as important as initially thought.

在DS中,并非每种方法都值得尝试,因此请首先涵盖最有前途的方法。 当传达主要的内容时,您可能会发现其余的内容并不像最初想象的那么重要。

Do a sprint (the actual development work). Sprints are usually two-weeks cycles where high priority tasks on the backlog are worked on.

进行冲刺 (实际开发工作)。 冲刺通常是两个星期的周期,其中需要处理积压的高优先级任务。

In DS, each sprint could be two to four weeks depending on the team size. During the sprint, always complete the task with the highest priority before moving on to the next in line.

在DS中,每个冲刺可能需要两到四个星期,具体取决于团队规模。 在sprint期间,请始终以最高优先级完成任务,然后再继续进行下一行。

Have daily standups. Standup meetings are for team members to be accountable to one another on their progress in the current sprint. Each team member take turns reporting their status — what was done the day before, what to do today, any potential obstacles. The most effective communication happens when DS team members meet face-to-face to share their work.

每天站起来 。 站立会议的目的是使团队成员对当前冲刺中的进度互相负责。 每个团队成员轮流报告其状态-前一天做什么,今天该做什么以及任何潜在的障碍。 当DS团队成员面对面分享他们的工作时,最有效的沟通发生了。

Review the sprint output (sprint retrospective meeting). At the end of two weeks, there should be a functional output for the project team to demonstrate, with an incremental improvement in the product.

查看sprint输出 (sprint回顾会议)。 在两周末,应该有一个功能输出供项目团队演示,并在产品上进行逐步改进。

Data scientists should share the outputs before trying to perfect the processes. Get feedback from client stakeholders and prepare for the next sprint. Regular feedback is a key principle for the Agile way of iterative incremental improvement.

数据科学家应在尝试完善流程之前共享输出。 获得客户利益相关者的反馈,为下一个冲刺做准备。 定期反馈是迭代改进增量的敏捷方法的关键原则。

Prepare for the next sprint. Identify the tasks that are going well and keep doing them, and identify those that are impediments to be removed.

准备下一个冲刺 。 确定进展顺利的任务并继续执行,并找出要消除的障碍。

It is important to understand that, unlike software development, DS is more experiment-based than task-based. DS helps explore data so it should be treated as multiple research experiments. Once again, build and prioritise the backlog so that the next sprint can be carried out, to work on the next improvement areas.

重要的是要理解,与软件开发不同,DS更加基于实验而不是基于任务。 DS有助于探索数据,因此应将其视为多个研究实验。 再次构建积压工作并确定积压工作的优先级,以便可以进行下一个冲刺,以进行下一个改进领域。

Roll out the final product. When all stakeholders agree that no more improvement is needed in the product, it is ready for the final deployment.

推出最终产品 。 当所有利益相关者同意不再需要产品改进时,就可以进行最终部署了。

DS projects follow the “law of diminishing improvement”. For example, if a model has achieved 70% accuracy, the next 5–10% improvement will take a lot more effort than before, and it also depends on the limitations in the data set. Decide in the team whether the efforts are worth the incremental improvement.

DS项目遵循“递减改进法则”。 例如,如果模型已达到70%的准确度,那么下一个5–10%的改进将比以前花费更多的精力,并且它还取决于数据集的限制。 确定团队中的努力是否值得进行逐步改进。

与客户的挑战 (Challenges with the client)

Besides having adequate communication between the DS team and the client, the client’s expectations have to be managed.

除了在DS团队和客户之间进行充分的沟通外,还必须管理客户的期望。

All clients generally love the idea that Agile is flexible, and that it grants them more opportunities to change their mind as the project develops. However, they might not realise that such flexibility is also costly in both time and money. Here are some things you should do:

所有客户通常都喜欢敏捷性是灵活的想法,并且随着项目的发展,它为他们提供了更多改变主意的机会。 但是,他们可能没有意识到这种灵活性在时间和金钱上都是昂贵的。 这是您应该做的一些事情:

灵活性成本 (The cost of flexibility)

Get the client to understand that flexibility is inevitably expensive. It is like how a flexible full-fare economy ticket which allows itinerary changes will cost much more than the fixed one. Making changes also means that the client is paying for past wasted time and effort.

让客户了解灵活性是不可避免的 。 这就像一张灵活的,可以更改行程的全票价经济舱机票,其价格将比固定票价高得多。 进行更改还意味着客户要为过去浪费的时间和精力付费。

设定期望 (Set expectations)

Set the client’s expectation to commit time for frequent sprint retrospective meetings (e.g. every two weeks) to evaluate the completed sprints.

设定客户的期望,以花时间参加频繁的冲刺回顾会议 (例如,每两周一次),以评估完成的冲刺。

On top of that, the client representative in each meeting needs to be (empowered by higher management) able to make decisions on product specifications. For Agile to work, the client needs to provide continuous feedback and priority setting to keep the project moving.

最重要的是,每次会议的客户代表都必须(由高级管理层授权 )能够做出有关产品规格的决策。 为了使敏捷工作,客户需要提供连续的反馈和优先级设置,以保持项目的进展。

信任很重要 (Trust is important)

Earn the client’s trust and show them that each iteration is done with the best possible efforts to deliver value and improve the product.

赢得客户的信任,并向他们表明,每次迭代都是尽最大的努力来交付价值和改进产品。

While holding the decision making power, the client also expects an iteration to have tremendous improvement.

客户在拥有决策权的同时,还希望迭代能够带来巨大的改进。

Such imbalance in responsibility in the client-vendor relationship should be converted to mutual trust and willingness to experiment together. Agile’s principle in collaboration means it is a team effort in both making decisions and delivering value.

客户与供应商关系中责任的这种不平衡应转化为相互信任和愿意共同试验的意愿。 敏捷的协作原则意味着这是团队在决策和交付价值上的努力。

最低可行产品 (Minimum Viable Product)

One key feature of the Agile way of working is the development of a minimum viable product (MVP). This is the most fundamental configuration of the product (or tool).

敏捷工作方式的一个关键特征是开发最小可行产品( MVP )。 这是产品 (或工具)的最基本配置

After the project objectives have been defined, the team makes a proposal regarding the approach to the problem. This includes building the MVP within the shortest possible time (like one month for DS projects). The MVP has only the most important functionalities, but its performance may not be the most optimal.

定义项目目标后,团队将就解决问题的方法提出建议。 这包括在最短的时间内(例如DS项目一个月)构建MVP。 MVP仅具有最重要的功能,但其性能可能并非最佳。

This might seem very risky – putting a less-than-finished version up for the client to test. So the team (including the client) has to be prepared for it. The purpose is to make the MVP work, test it, and see if it is really going in the correct direction of solving the problem and helping the business case.

这似乎很有风险-放置一个未完成的版本供客户端测试。 因此,团队(包括客户)必须为此做好准备。 目的是使MVP正常工作,对其进行测试,并查看它是否真的朝着解决问题和帮助业务案例的正确方向发展。

The MVP will grow better, because the DS team is going to use what they have learnt from the MVP feedback to build an improved version. Agile is about continuously deploying and learning from your mistakes, and working with the client to make the product better.

MDS将会变得更好,因为DS团队将使用他们从MVP反馈中学到的知识来构建改进版本。 敏捷是指不断地从错误中进行部署和学习,并与客户合作以使产品更好。

Agile is to plan, build, test, learn, repeat.
敏捷就是计划,构建,测试,学习,重复。

DS项目可交付成果 (DS project deliverable)

The Agile way of working allows data scientists the ability to prioritize and create roadmaps based on requirements and goals. With each iteration, data scientists can learn something new, get more refined results, and ride on them for the next incremental improvement.

敏捷的工作方式使数据科学家能够根据需求和目标确定优先级并创建路线图。 每次迭代,数据科学家都可以学习新知识,获得更完善的结果,并利用它们进行下一次增量改进。

Below are some Agile project deliverables to shape and guide project process:

以下是一些敏捷项目可交付成果,用于塑造和指导项目流程:

  • Project vision statement: A summary that articulates the goals for the project.

    项目愿景声明 :概述项目目标的摘要。

  • Project roadmap: The high-level view of the requirements needed to achieve the project vision.

    项目路线图 :实现项目远景所需需求的高级视图。

  • Project backlog: Ordered by priority, this is the full list of what is needed to support your project.

    项目积压 :按优先级排序,这是支持您的项目所需的完整列表。

  • Release plan: A timetable for the release of a working product (or tool), but not documentation. Projects should be self-documenting along the way.

    发布计划 :发布有效产品(或工具)的时间表,而不是文档。 在此过程中,项目应该是自我记录的。

  • Sprint backlog: The user stories (requirements), goals, and tasks linked to the current sprint.

    Sprint积压 :与当前Sprint链接的用户案例(要求),目标和任务。

  • Increment: The working product functionality that is presented to the stakeholders at the end of the sprint and could potentially be given to the client. The goal is not to deliver more but to get a higher value output.

    增量 :在sprint结束时提供给涉众的工作产品功能,并且可以潜在地提供给客户。 目标不是提供更多,而是获得更高的价值

摘要 (Summary)

Agile is going to be adopted by more DS project teams in the near future. Many data scientists have reported that it makes them more productive.

在不久的将来,更多的DS项目团队将采用敏捷。 许多数据科学家报告说,这使它们更具生产力。

This is not because the data scientists have become more skilled, but because Agile can help them optimize their projects. Instead of spending time on models that are unlikely to reveal any productive results, it is better to spend that time for other result-driven purposes.

这不是因为数据科学家变得更加熟练,而是因为敏捷可以帮助他们优化项目。 与其将时间花在不可能显示任何有效结果的模型上,不如将时间花在其他以结果为导向的目的上。

Being “agile” (flexible) means you need to adopt a dynamic approach in planning and be adaptable to the changing needs of the new situation when it arises.

“敏捷”(灵活)意味着您需要在规划中采用动态方法,并在新情况出现时适应新的变化需求。

The Agile environment appeals to quick action, fail quickly, discuss and evaluate, then try again using a different approach or an improved method. It works great in dynamic environments where there is a potential for changing or evolving requirements.

敏捷环境呼吁采取快速行动,Swift失败,进行讨论和评估,然后使用其他方法或改进的方法再试一次。 它在动态环境中非常有用,因为动态环境中可能会发生变化或不断变化的需求。

All the best to your DS projects!

祝您的DS项目一切顺利!

Reference:Data-science? Agile? Cycles? My method for managing data-science projects in the Hi-tech industry.

参考: 数据科学? 敏捷? 周期? 我在高科技行业中管理数据科学项目的方法。

翻译自: https://www.freecodecamp.org/news/applying-agile-methodology-to-data-science-projects/

敏捷数据科学pdf

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值