In my career, I have led many analytical transformations that allowed organisations to move into advanced analytics and data science space. This includes leading technical teams and educating business executives. A pretty common question is:
在我的职业生涯中,我领导了许多分析转型,使组织可以进入高级分析和数据科学领域。 这包括领先的技术团队和教育业务主管。 一个很常见的问题是:
“Is it [Data Science] hard?”
“ [数据科学]很难吗?”
I come from a quantitative background and qualified as an actuary. I spent the early years of my career within “non-traditional” actuarial disciplines. Then, I stumbled into the field of data & analytics which has flourished in digital and technology transformation.
我来自定量背景,并具有精算师资格。 我在职业生涯的最初几年都在“非传统”精算专业中学习。 然后,我偶然发现了在数字和技术转型中蓬勃发展的数据与分析领域。
In my early years, most of the statistical models were already built. Our primary responsibility was to ensure that its assumptions and adjustment factors were updated.
在我的早期,大多数统计模型已经建立。 我们的主要责任是确保更新其假设和调整因素。
Moving forward, most predictive models introduce the concept of AutoML (Automated Machine Learning). This involves the automated model selections and calibration based on certain business user settings.
展望未来,大多数预测模型引入了AutoML(自动机器学习)的概念。 这涉及基于某些业务用户设置的自动模型选择和校准。
If it is all automated, what is the actual role of a data scientist? Also, how could it be so hard?
如果全部自动化,那么数据科学家的实际角色是什么? 另外,怎么会这么难?
模型设计 (Model Design)
Every model is highly dependent on the input configuration. For example, what is the dependent variable, what are the input range variables, do we want a general or specific model, and many others.
每个模型都高度依赖于输入配置。 例如,什么是因变量,什么是输入范围变量,我们需要通用模型还是特定模型,以及许多其他模型。
Whilst the AutoML orchestrates the process, human decision-making remains. Data Scientist is then expected to understand the underlying statistical model mechanics, assumptions, and principles. This ensures that the selected models behave according to expectations.
在AutoML协调流程的同时,仍然需要人工决策。 然后,期望数据科学家了解基本的统计模型机制,假设和原理。 这样可以确保所选模型的行为符合预期。
业务相关性 (Business Relevance)
Data Science models are built to solve business problems. Data Scientist has the responsibility that the models work according to the business process. It is a business profession that deals with mathematics, it is not a mathematics profession that deals with business.
构建数据科学模型来解决业务问题。 数据科学家负责模型根据业务流程工作。 它是一个与数学打交道的商业专业,而不是一个与业务打交道的数学专业。
Beyond the model build, data scientist needs to evangelise and uplift the organisations data literacy. This includes areas such as model transparency, model data lineage, and model understanding to increase organisations data confidence.
除了建立模型之外,数据科学家还需要宣传和提高组织的数据素养。 这包括模型透明性,模型数据沿袭和模型理解等领域,以提高组织数据的可信度。
数据清理 (Data Cleaning)
In my various years of experiences, the quest for clean data is an elusive one. I believe that it is a journey, which requires a defined process to continuously improve and integrate.
在我多年的经验中,对干净数据的追求是一种难以捉摸的。 我相信这是一个旅程,需要一个确定的过程来不断改进和整合。
Data Scientist is expected to play a major part in the data cleaning. The old saying of “90% of the time in data prep and 10% of the time in modeling” still remains true. As per my reasoning above, executives trust data scientists that understand the business. It is everyone’s responsibility on the data pipeline including data engineers and reporting analysts that also make it happen.
预计数据科学家将在数据清理中扮演重要角色。 俗话说:“数据准备中90%的时间和建模中10%的时间”。 根据我上面的推理,高管信任了解业务的数据科学家。 做到这一点是每个人在数据管道上的责任,包括数据工程师和报告分析师。
模型获利 (Model Monetisation)
For organisations, a model should be treated as an asset. With any asset, it needs to be governed and maintained. Also, it needs to be leveraged for various use cases rather than just a single-use.
对于组织而言,应将模型视为资产。 对于任何资产,都需要对其进行管理和维护。 此外,还需要将其用于各种用例,而不仅仅是一次性使用。
For example, we created a customer attrition predictive model for one of my previous clients. Beyond identifying at-risk customers, we also used this for customer engagement segmentation and as an input to a credit risk scorecard. Build one and re-use many will drive higher ROI for any asset, which will promote more use cases for other model development.
例如,我们为我以前的一位客户创建了客户流失预测模型。 除了识别高风险客户外,我们还将其用于客户参与度细分,并作为信用风险记分卡的输入。 建立一个并重复使用将为任何资产带来更高的投资回报率,这将促进其他模型开发的更多用例。
“数据科学难吗?” (“Is Data Science hard?”)
Data scientist needs a good grasp of mathematics, business, and technology. Those that think a solid quantitative degree is enough, will find it challenging to thrive in the commercial environment. Those that have the right focus will be able to embrace the data science journey and bring others along with them.
数据科学家需要对数学,业务和技术有很好的掌握。 那些认为定量的程度足够的人会发现,在商业环境中蓬勃发展充满挑战。 那些具有正确重点的人将能够拥抱数据科学之旅,并带动其他人。
Check out my other articles if you want to learn more about practical and impactful data analytics topics. If you have further questions or topic suggestions, feel free to connect and message further through LinkedIn.
如果您想了解有关实用且有影响力的数据分析主题的更多信息,请查看我的其他文章 。 如果您还有其他问题或主题建议,请随时通过LinkedIn进行连接和进一步发送消息。
About the author: Albert Suryadi is a proven leader in enabling advanced analytics and data science capability in blue chip organisations. He is recognised as the leader of the Analytics CoP (Community of Practice) that empowers and motivate others beyond the status quo.
关于作者: Albert Suryadi 是在蓝筹组织中实现高级分析和数据科学能力的公认领导者。 他被公认为是Analytics CoP(实践社区)的领导者,该组织赋予并激励他人超越现状。
翻译自: https://towardsdatascience.com/is-data-science-hard-20a159f1e5e9