数据库的数据字典的重要性_适当的数据文化的重要性

数据库的数据字典的重要性

Beginning with AI means you need a proper data culture to start with. AI is not magic, despite what many may still think. Before even thinking of AI, the data needs to be in order. You need documentation, policies, and most importantly a proper data culture. How to achieve this? Continue reading…

从AI开始意味着您需要一种适当的数据文化。 尽管许多人仍会认为,人工智能不是魔咒。 甚至在考虑AI之前,数据都必须是有序的。 您需要文档,策略,最重要的是需要适当的数据文化。 如何实现呢? 继续阅读…

Afke SchoutenAssaad Moawad交谈 (Afke Schouten in conversation with Assaad Moawad)

This is the first in a series of interviews with practitioners in the field about generating business value with AI. Assaad co-founded the company DataThings in Luxemburg, which is all about translating data into actionable insights. He believes that data can help you better understand your business.

这是该领域从业人员关于利用AI创造业务价值的一系列访谈中的第一篇。 阿萨德( Assaad)在卢森堡与他人共同创立了DataThings公司,其宗旨将数据转化为可行的见解 。 他认为数据可以帮助您更好地了解自己的业务。

The core technology of DataThings is a Temporal Many-World Graph database. In a nutshell, this defines graph storage and process framework for all data that is in motion be it communication flows, social networks, smart grids, etc. This is actually a great database to use for organizational network analysis.

DataThings的核心技术是时间多世界图数据库。 简而言之,这为移动中的所有数据(通信流,社交网络,智能网格等)定义了图形存储和过程框架。这实际上是用于组织网络分析的出色数据库。

In our conversation, we speak about Assaad’s experience working with clients, and his view on what is needed in the field.

在我们的交谈中,我们对阿萨德的经验与客户 合作 ,和他需要在场上有什么看法说。

艾弗克(Afke):“我们经常谈论这个话题,但是再一次-您能说出与刚开始使用AI主题的客户相比,与更先进的客户之间的区别吗?” (Afke: “We spoke about this so often, but once more — Can you tell the difference from clients that are just starting with the topic of AI, compared to your clients that are more advanced?”)

Assaad: “When we come in with new clients, the usual expectations towards AI, is that that AI is a magic genius. Advanced clients have gone through the realization that this is a utopia themselves. They now understand the investment that is needed to be done to create a proper data infrastructure before even starting with AI.

阿萨德:“当我们结识新客户时,人们通常对AI的期望是AI是一个神奇的天才。 高级客户已经意识到这本身就是乌托邦。 现在,他们了解甚至在开始使用AI之前,创建适当的数据基础架构所需的投资。

New clients expect AI to behave like a magic hat, they expect that you can put in a mix of uncleaned data and you get a rabbit out.”

新客户希望AI像魔术帽一样工作,他们希望您可以混合使用未经清理的数据,然后得到一只兔子。”

Afke:“这在我看来并不罕见,要脱掉帽子上的兔子,需要做很多工作。 您会给处于这种情况的人们什么建议,成功的关键因素是什么?” (Afke: “That doesn’t sound uncommon to me, there is a lot of work that needs to happen to get rid of this rabbit from a hat. What advice would you give to people that are in this situation, what are the key success factors?”)

Assaad: “First and foremost, a proper data culture is needed. Part of this is a unified data policy in big companies. Many big companies have sub-teams, and each team is using a different technology or format for data storage. The cost of aggregating data will be big. Data documentation is very important too because with time the people who collected the data leave the company, and nobody can understand anymore what the data is about. Same for a unit of measurements, if they are not documented, or not the same (uniform) in an industrial context, it’s hard to use the data.”

阿萨德:“首先,需要适当的数据文化 。 这部分是大公司的统一数据策略 。 许多大公司都有子团队,每个团队使用不同的技术或格式进行数据存储。 汇总数据的成本将很高。 数据文档也非常重要,因为随着时间的流逝,收集数据的人离开了公司,没有人再能理解数据的含义了。 对于一个度量单位而言,如果它们没有记录,或者在工业环境下不相同(统一),则很难使用这些数据。”

Assaad: “A proper data culture, a unified data policy, data documentation, and consistent terminology is what I would recommend”

阿萨德:“我建议您采用适当的数据文化统一的数据策略数据文档一致的术语

“To give you an example, the challenges in working in a multicultural environment like Luxembourg is that it can get reflected in the data, you probably know the same from Switzerland. We find several languages, abbreviations, data formats, data schemas, terminologies, within the same dataset. Each coming from a different culture (French/German/English…). That is why consistent terminology, language, schema, format, unit, is so important”.

“举个例子,在像卢森堡这样的多元文化环境中工作所面临的挑战是,它可以反映在数据中,您可能从瑞士也知道这一点。 我们在同一数据集中可以找到几种语言,缩写,数据格式,数据模式,术语。 每种都来自不同的文化(法语/德语/英语...)。 这就是为什么统一的术语 ,语言,模式,格式,单位如此重要的原因。”

艾弗克(Afke) :“如果没有适当的数据基础,我会与此联系,很难从数据中获取价值,您会建议什么,如何开始?” (Afke: “I can relate to that, without a proper data basis, it will be difficult to get value out of your data, what would you recommend, how to get started?”)

Assaad: “Investing in the data infrastructure is very important especially when speed/throughput in the data processing pipeline is needed. Our temporal graph database can process at about 400 000 values/second. Older technologies only do 10,000 v/s, e.g in the banking sector, if you have 1 billion transactions, it’s the difference between 41 minutes and 28 hours.

Assaad :“ 对数据基础架构进行投资非常重要,尤其是在需要数据处理管道中的速度/吞吐量时。 我们的时间图数据库可以每秒处理约40万个值。 较旧的技术仅能以10,000 v / s的速度运行,例如在银行业中,如果您有10亿笔交易,则相差41分钟和28小时。

Also, investing in hardware is important; AI is very consuming in processing power, GPU, and hungry in memory. When you have a fast database, you can iterate several times, test several models, wasting less the time of data scientists (just waiting for model training), using fewer servers, reducing infrastructure costs. Tons of benefits. GPUs are very important for image processing or for very big datasets. They can accelerate the machine learning time by 10-20x.

同样, 对硬件的投资也很重要。 人工智能在处理能力,GPU和内存消耗方面非常消耗。 如果您拥有一个快速的数据库 ,则可以进行多次迭代 ,测试多个模型,减少数据科学家的时间(仅等待模型训练),使用更少的服务器,降低基础架构成本。 吨的好处。 GPU对于图像处理或非常大的数据集非常重要。 他们可以将机器学习时间缩短10-20倍。

Investing in the right software is as important to get the maximum usage of the hardware. That’s why we’re developing a technology dedicated to AI at large scale.”

购买正确的软件对于最大程度地利用硬件也很重要。 因此,我们正在大规模开发专门用于AI的技术。”

Assaad: “Invest in your data infrastructure, hardware and software

阿萨德:“ 投资您的数据基础架构硬件软件

Afke:“您和我过去谈论过不满意的数据科学家,您认为该领域的从业人员感到沮丧的主要原因是什么?” (Afke: “You and I have spoken about unhappy data scientists in the past, what do you think are the main reasons that practitioners in the field are frustrated?”)

Assaad: “Few like data cleaning, and it’s a tedious and time-consuming process. The same holds for data aggregation from different sources. We end up spending half of the project time writing importers and exporters to connect to all the different formats within a company.

Assaad:“很少有像数据清理这样的过程,这是一个乏味且耗时的过程。 来自不同来源的数据聚合也是如此。 我们最终花费了一半的项目时间来编写进口商和出口商以连接到公司内的所有不同格式。

Furthermore, data analytics itself is not enough for a final product, it needs to be integrated into a full software environment. Many stakeholders expect it’s the data scientist's job to do everything: from data cleaning to modeling, to the storage, to analytics, to visualization, to software orchestration (docker container), to running in production. But that is actually the job of a full IT team. It is important to staff the team with different profiles and skills, which is what we aspire in our projects.”

此外,数据分析本身还不足以提供最终产品,需要将其集成到完整的软件环境中。 许多利益相关者期望数据科学家要做的一切都是:从数据清理到建模,到存储,到分析,到可视化,软件编排(泊坞窗容器)以及在生产中运行。 但这实际上是整个IT团队的工作。 为团队配备不同的个人资料和技能至关重要,这是我们在项目中追求的目标。”

Assaad: “Many stakeholders expect it’s the data scientist’s job to do everything, but that is actually the job of a full IT team.”

Assaad:“许多利益相关者期望做所有事情都是数据科学家的工作,但这实际上是整个IT团队的工作。”

阿夫克:“您对公司有何建议? (Afke: “What would you suggest companies do about this?)

Assaad: “Learn about the topic of AI, implement data culture policies, and be ready to invest in the proper infrastructure.

Assaad:“ 了解 AI主题, 实施数据文化政策,并准备投资 适当的基础架构

I like to use the analogy of building a building: you invest first in the infrastructure, before building a fancy looking wall — AI is just the fancy wall, behind there is a lot of data infrastructure to be put in place.”

我喜欢用建筑的类比:在建造精美的墙之前,您首先要对基础设施进行投资-AI只是精美的墙,背后要放置许多数据基础架构。”

“No magic, no free lunch, no shortcuts.”

“没有魔术,没有免费的午餐,没有捷径。”

阿夫克:“您会建议从业者对此做些什么? (Afke: “And what would you suggest practitioners do about this?)

Assaad: “Be patient, be curious and learn about the different topics, software pipeline, and work on fixing the problem of data cleaning at the source, at the collection, database level.”

Assaad:“要耐心,好奇,并了解不同的主题,软件管道,并在源头,集合和数据库级别解决数据清理问题。”

In summary, you need proper data culture. For companies starting, the advice is to invest in your data structure, invest in hardware, and invest in software. Educating yourself and setting the right expectations towards the team is important as well. For data scientists, yes data cleaning is part of the job, let’s make data engineering sexy!

总之,您需要适当的数据文化 。 对于刚起步的公司,建议是投资于您的数据结构 ,投资于硬件以及投资于软件教育自己并为团队设定正确的期望也很重要。 对于数据科学家来说,是的,数据清理是工作的一部分, 让我们让 数据工程 变得 性感

Thank you, Assaad for this interesting conversation and your tips for those who want to get started with AI. I wish you all the best with DataThings. Do you want to read more from Assaad? Check out his medium posts or the blog of DataThings.

感谢您,阿萨德(Asaad)进行的有趣的对话以及您对那些想开始使用AI的人的提示。 祝您使用DataThings一切顺利。 您想要了解更多有关阿萨德的信息吗? 查看他的 中等帖子 DataThings 博客

About me: I am an AI Management Consultant and Director of Studies for “AI Management” at a local business school. I am on a mission to help organizations generating business value with AI and creating an environment in which Data Scientists can thrive. Sign up to my newsletter for new articles, insights, and offerings on AI Management here.

关于我:我是当地一家商学院的AI管理顾问和“ AI管理”研究主任。 我的使命是帮助组织通过AI创造业务价值,并创造一个数据科学家可以蓬勃发展的环境。 此处 注册我的 时事通讯, 以获得有关AI Management的新文章,新见解和新产品

翻译自: https://towardsdatascience.com/the-importance-of-a-proper-data-culture-48c1b19ccd82

数据库的数据字典的重要性

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值