数据陷阱_与非数据科学家交流时的五个陷阱

数据陷阱

For many data scientists, often the coding and the analytics are the easy part. The challenge comes when you have to communicate the results of your work to non-data scientists. In many cases those individuals are clients or customers, or they hold positional superiority in the organization. This means that it’s important to get the communication right. If they leave the room or Zoom with the wrong conclusions, or just plain confused, you risk all your previous work being for nothing.

对于许多数据科学家而言,编码和分析通常是容易的部分。 当您必须将工作结果传达给非数据科学家时,挑战就来了。 在许多情况下,这些人是客户或客户,或者他们在组织中拥有职位优势。 这意味着正确沟通很重要。 如果他们以错误的结论离开房间或Zoom,或者只是一头雾水,您可能会冒着所有以前的工作一无是处的风险。

The goals of any communication of your work or results should be threefold:

与您的工作或结果进行任何交流的目标应该是三重的:

  1. Ensure a common understanding of the problem or issue being discussed.

    确保对所讨论的问题有共同的理解。
  2. Ensure a clear understanding of what the analysis has revealed about the problem, ensuring not to overstate or overcomplicate.

    确保对分析揭示的有关问题的内容有清晰的了解,并确保不要夸大其词或使其过于复杂。
  3. Ensure confidence in the approach and results, if possible.

    如果可能,请确保对方法和结果有信心。

Over my time doing analytics in the corporate space, I’ve learned a few things to avoid in order to maximise your chances of achieving these three goals.

在我的公司空间分析工作中,我学会了一些避免的事情,以便最大程度地实现这三个目标。

1.不要直接进入分析 (1. Don’t jump straight into the analysis)

Often it is assumed that everyone in the discussion is already clear on what the purpose of the discussion is and what problem is being addressed, but more often than not this common understanding does not exist. Participants often don’t have the context, or have forgotten a previous discussion, or have come out of a previous discussion with a different idea of purpose or objective.

通常假定讨论中的每个人都清楚讨论的目的是什么以及正在解决的问题,但是通常这种共同的理解并不存在。 参与者通常没有上下文,或者忘记了先前的讨论,或者出于对目的或目标的不同想法而离开了先前的讨论。

Launching straight into analytics without ensuring that the context and objective is clear can cause all sorts of problems later in the discussion. The way people absorb content is deeply tied to how they relate it to an objective or goal. If people perceive that objective or goal differently, it’s likely any subsequent material you share will be absorbed differently by different people, sowing confusion and leading to inefficient use of discussion time.

在不确保上下文和目标明确的情况下直接进入分析可能会导致稍后讨论中出现的各种问题。 人们吸收内容的方式与他们如何将其与目标或目的紧密联系在一起。 如果人们对目标的看法不同,那么您分享的任何后续材料很可能会被不同的人以不同的方式吸收,从而造成混乱并导致讨论时间的利用效率低下。

I usually spend the first ten or so minutes of a one hour discussion simply clarifying the context and objectives. Why are we having this discussion? What specific question or problem are we trying to solve? If possible try to state the question being addressed using a single statement, ensure that there is agreement on this, and if possible try to relate this statement to a previous discussion to ensure that there is traceability in the event of a disagreement. By ensuring that this is clarified up front, you gain the capability to get the conversation back on track if it digresses later.

我通常会花一个小时讨论的前十分钟左右的时间,只是为了弄清楚上下文和目标。 我们为什么要进行这个讨论? 我们要解决什么具体问题? 如果可能,请尝试使用单个声明陈述要解决的问题,并确保对此达成共识,如果可能,请尝试将此声明与以前的讨论联系起来,以确保在发生分歧时可以追溯。 通过确保事先弄清楚这一点,便可以使对话在以后离题时回到正轨。

Image for post
Source: unsplash.com
资料来源:unsplash.com

2.不要太参与方法和数据 (2. Don’t get too involved in the methodology and data)

If I’ve used a particular methodology or data source that worked well for my problem, it often excites me a lot and I am keen to tell others about it. But that’s because I am a data scientist. Frankly, these things are less likely to excite non-data scientists, and it’s important not to waste too much time indulging on methodology or data when your audience want to know results and so whats.

如果我使用的特定方法论或数据源可以很好地解决我的问题,那么它通常会让我非常兴奋,并且我很想告诉其他人。 但这是因为我是一名数据科学家。 坦白说,这些事情不太可能激发非数据科学家的兴趣,重要的是当您的听众想知道结果时,不要浪费太多时间沉迷于方法论或数据。

Keep the discussion on data limited to what’s absolutely necessary to ensure confidence in your approach. A quick description of the data source, providing an opportunity for someone to shout if they are concerned about it, is all that’s necessary. If there are known issues with the data, these can also be stated and acknowledged, particularly if you need to reference these issues later in the context of sharing results. But try to do this as succinctly as possible. A good trick is to reference more details in an appendix. Personally, I am a big user of appendices.

将对数据的讨论限制在绝对必要的范围内,以确保您对方法的信心。 只需对数据源进行快速描述,如果有人担心该数据源,就可以为其大喊大叫。 如果数据存在已知问题,也可以声明和确认,特别是如果您稍后需要在共享结果的情况下引用这些问题时。 但是,请尽量简洁地执行此操作。 一个好的技巧是在附录中引用更多详细信息。 我个人是附录的大用户。

When it comes to methodology, if you are using a well-accepted method, that’s really all you need to say about it. I usually only go into more detail if I have had to design a bespoke methodology for the problem. Keep the description at an abstracted level which illustrated the core logic of the approach, and remember to stick to only what’s absolutely necessary to ensure confidence in your approach.

在方法论方面,如果您使用的是公认的方法,那么实际上就是您需要说的。 通常,只有在必须设计针对该问题的定制方法时,我才会更加详细。 将描述保持在抽象的水平上,以阐明该方法的核心逻辑,并记住仅遵循绝对必要的条件,以确保对方法的信心。

Again, if there are known weaknesses or limitations to your methodology, acknowledge these where absolutely necessary for later discussions. Make liberal use of appendices or footnotes and avoid the use of formulae or code in your presentation that would not be understood by non-data scientists. Some data scientists think that showing code makes their work look credible, but that’s a mistake. Credibility comes from clear communication of the logic of your approach.

同样,如果您的方法存在已知的弱点或局限性,请在以后讨论绝对必要的地方确认这些弱点或局限性。 随意使用附录或脚注,并避免在演示文稿中使用非数据科学家无法理解的公式或代码。 一些数据科学家认为,显示代码可使他们的工作看起来可信,但这是一个错误。 信誉来自对方法逻辑的清晰沟通。

Image for post
Source: unsplash.com
资料来源:unsplash.com

3.不要夸大结果 (3. Don’t overstate the results)

It is an extremely common behavior for data scientists to overstate their results in punchy headlines. For example, writing a headline like “A more diverse sales team will perform better” can really capture attention and get people excited about your results — but is it actually true? Have you established the statistical significance and proved reasonable causality to be able to make this statement? Would it be more truthfully stated as “Among the sales teams we analyzed, we found that the better performing ones were more diverse”?

对于数据科学家来说,在头条新闻中夸大其结果是一种极为普遍的行为。 例如,撰写标题为“ 多元化的销售团队将表现更好”这样的标题可以真正引起人们的注意并使人们对您的结果感到兴奋,但这是真的吗? 您是否已建立统计意义并证明合理的因果关系才能做出此陈述? 是否可以更真实地表述为“ 在我们分析的销售团队中,我们发现表现更好的销售团队更加多样化”

It’s important not to overstate the conclusions of your work in headlines or summaries. While it is tempting to want to generate excitement and positivity in the work, it’s important to remember the the headlines and summaries are the parts most likely to be absorbed and taken away as truth by your audience. You also need to ensure your work stands up to critique. If another analysis of sales teams in a different part of the organization reveals no relationship with diversity, you can end up looking a bit foolish if you let the first statement above catch fire in your organization.

重要的是不要在标题或摘要中夸大您的工作结论。 尽管想要在作品中激发兴奋和积极性是很诱人的,但重要的是要记住标题和摘要是观众最有可能吸收并带走为真理的部分。 您还需要确保您的作品能经受批评。 如果对组织不同部分中的销售团队的另一项分析显示与多样性没有关系,那么如果让上面的第一个陈述在组织中引起人们的注意,您最终看起来会有些愚蠢。

Image for post
Source: unsplash.com
资料来源:unsplash.com

4.不要在结果中提供太多细节 (4. Don’t give too many details in the results)

Showing lots of statistics is not necessarily going to help your audience believe your results — in fact it can end up creating more confusion. I often see data scientist present a big table of model statistics and then make a statement about them, where it’s not at all clear to a lay person how that statement relates to the statistics being shown. One of the most common reactions I hear when this happens is “What exactly am I supposed to be looking at here”?

显示大量统计数据不一定能帮助您的听众相信您的结果-实际上,最终可能会造成更多的混乱。 我经常看到数据科学家摆出一张巨大的模型统计数据表,然后对它们进行陈述,而对于一个外行人来说,根本不清楚该陈述与所显示的统计信息之间的关系。 发生这种情况时,我听到的最常见的React之一是“我到底应该在这里看什么”?

If the point you are making relates to a single statistic or trend, just show the data directly related to that statistic or trend, and show it in the most intuitive graphical way possible. The goal is that a lay person can clearly see how your headline relates to what’s shown on the page. Avoid copy pasting large tables of data where the reader has to be pointed to the subset that matters. If you are using integrated data science documents like R Markdown or Jupyter, do take care to ensure that any output is appropriately filtered or graphed before it is displayed.

如果您要说明的点与单个统计或趋势有关,则只需显示与该统计或趋势直接相关的数据,然后以最直观的图形方式显示即可。 目的是使外行人可以清楚地看到您的标题与页面上显示的内容之间的关系。 避免将大型数据表复制粘贴到必须将读者指向重要子集的地方。 如果使用的是R MarkdownJupyter之类的集成数据科学文档,请务必确保在显示任何输出之前对其进行适当的过滤或绘制图形。

Try to differentiate between results that are absolutely critical to the conclusions and follow-up actions versus those that are interesting but somewhat tangential. I tend to filter off the latter and move them to an appendix, or reference them in small print or footnotes. While a result can be of mathematical interest to a data scientist, if it doesn’t have a practical ‘so what?’, it is not likely to be valuable in the discussion. Slipping it into an appendix allows you to make it available to those who are interested without needing to waste valuable discussion time on it.

尝试区分对结论和后续行动绝对重要的结果与有趣但有些切线的结果。 我倾向于过滤掉后者并将其移至附录,或以小字体或脚注进行引用。 虽然结果对于数据科学家来说可能具有数学上的意义,但是如果没有实际的“那又如何呢?” ,在讨论中可能没有任何价值。 将其放到附录中,可以使您对有兴趣的人使用它,而无需浪费宝贵的讨论时间。

Image for post
Source: unsplash.com
资料来源:unsplash.com

5.不要强迫自己采用交流形式 (5. Don’t be forced into a communication format)

Over the years, I’ve found that the communication format you use has a big impact on how well your results are absorbed and understood. One of the biggest lessons I’ve learned is not to put things into a slide format just because that’s what everyone else has done or because that’s what is expected. One of the single most important aspects to a strong communication of data science results is that the reader can easily follow your logical flow. Breaking your material into a horizontal slide presentation can often make this more difficult.

多年来,我发现您使用的通信格式对结果的吸收和理解程度有很大的影响。 我所学到的最大的经验教训之一就是,不要仅仅因为那是其他所有人所做的事情,或者因为那是期望的结果,而将其放到幻灯片形式。 强大的数据科学技术交流最重要的方面之一就是,读者可以轻松地遵循自己的逻辑流程。 将您的资料分解为水平的幻灯片演示通常会使此操作更加困难。

If you need to establish a fluid linear logical flow in your communication, where the reader can easily move from one step to the next, don’t be afraid to communicate vertically. Sometimes this can be as simple as a box note or Word document, but it’s often more efficient to build your analysis and your narrative together in an integrated R Markdown document or Jupyter notebook.

如果您需要在通信中建立流畅的线性逻辑流程,使读者可以轻松地从一个步骤移至下一个步骤,请不要担心垂直通信。 有时,这可以像盒子便笺或Word文档一样简单,但是在集成的R Markdown文档或Jupyter笔记本中一起构建分析和叙述通常会更有效。

I also find that vertical documents are often more effective as pre-reads. Since slides are oriented around how a presenter plans to communicate their work verbally, they can be ineffective to read as standalone documents. Often vertical pre-reads can serve very effectively to save time in the actual discussion.

我还发现,纵向文档通常比预读更有效。 由于幻灯片是围绕演示者计划口头交流其工作的方式而设计的,因此将它们作为独立文档阅读时可能会无效。 通常,垂直预读可以非常有效地节省实际讨论的时间。

These are just some of the things I’ve learned to watch out for that have helped me more confidently communicate data science work. Thanks to these, I now find that I can communicate much more effectively than when I first started out in data science. If you have any other tips or tricks do feel free to share in the comments.

这些只是我所要注意的一些事情,这些事情帮助我更加自信地交流了数据科学工作。 多亏了这些,我现在发现与第一次接触数据科学时相比,我可以更有效地进行交流。 如果您还有其他提示或技巧,请随时在评论中分享。

Originally I was a Pure Mathematician, then I became a Psychometrician and a Data Scientist. I am passionate about applying the rigor of all those disciplines to complex people questions. I’m also a coding geek and a massive fan of Japanese RPGs. Find me on LinkedIn or on Twitter. Also check out my blog on drkeithmcnulty.com.

最初,我是一名纯粹的数学家,后来成为一名心理计量师和一名数据科学家。 我热衷于将所有这些学科的严谨性应用于复杂的人的问题。 我也是编码极客,也是日本RPG的忠实拥护者。 LinkedIn Twitter 上找到我 还可以在 drkeithmcnulty.com 上查看我的博客

翻译自: https://towardsdatascience.com/five-pitfalls-when-communicating-with-non-data-scientists-4ed4adf17ce3

数据陷阱

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
研究磁盘数据恢复的,可以参考 09\JIAN-H 第9章的9.1 JIAN-H.EXE 建立汉字库头文件 09\HZCALL 第9章的9.2 HZCALL.OBJ 显示汉字程序模块 09\NAME 第9章的9.3 NAME.OBJ 图形方式下处理输入字符 10\ARSE 第10章的10.1 ARSE.EXE 读扇区数据(汇编) 10\CRSE 第10章的10.2 CRSE.EXE 读扇区数据(C) 10\HCRSE 第10章的10.3 HCRSE.EXE 读扇区数据(C、汉显) 11\AWSE 第11章的11.1 AWSE.EXE 写扇区数据(汇编) 11\CWSE 第11章的11.2 CWSE.EXE 写扇区数据(C) 11\HCWSE 第11章的11.3 HCWSE.EXE 写扇区数据(C、汉显) 12\ALLSE 第12章的12.1 ALLSE.EXE 对扇区进行多种操作 12\HALLSE 第12章的12.2 HALLSE.EXE 对扇区进行多种操作(汉显) 13\READSF 第13章的13.1 READSF.EXE 读扇区备份文件 13\HREADSF 第13章的13.2 HREADSF.EXE 读扇区备份文件(汉显) 14\SEDIT 第14章的14.1 SEDIT.EXE 编辑扇区文件字节值 14\HSEDIT 第14章的14.2 HSEDIT.EXE 编辑扇区文件字节值(汉显) 15\SBLOCK 第15章的15.1 SBLOCK.EXE 扇区文件块拷贝 15\HSBLOCK 第15章的15.2 HSBLOCK.EXE 扇区文件块拷贝(汉显) 16\JIAN-H1 第16章的16.2.3 JIAN-H1.EXE 扩充汉字库头文件 16\HZCALL1 第16章的16.2.3 HZCALL1.OBJ 重新编译汉字显示程序模块 16\COMPSF 第16章的16.1 COMPSF.EXE 比较扇区文件 16\HCOMPSF 第16章的16.2 HCOMPSF.EXE 比较扇区文件(汉显) 17\0SE63 第17章的17.1 0SE63.EXE 显示0磁道扇区数据 17\H0SE63 第17章的17.2 H0SE63.EXE 显示0磁道扇区数据(汉显) 18\EARSE 第18章的18.2 EARSE.EXE 扩展读扇区数据(汇编) 19\EAWSE 第19章 EAWSE.EXE 扩展写扇区数据(汇编) 20\RSECTOR 第20章的20.1-20.3 RSECTOR.EXE C调用汇编扩展读 20\HRSECTOR 第20章的20.4 HRSECTOR.EXE C调用汇编扩展读(汉显) 21\WSECTOR 第21章的21.1-21.3 WSECTOR.EXE C调用汇编扩展写 21\HWSECTOR 第21章的21.4 HWSECTOR.EXE C调用汇编扩展写(汉显) 22\EALLSE 第22章的22.1 EALLSE.EXE 对扇区多种扩展操作 22\HEALLSE 第22章的22.2 HEALLSE.EXE 对扇区多种扩展操作(汉显) 23\JIAN-H2 第23章的23.2 JIAN-H2.EXE 扩充汉字库头文件 23\HZCALL2 第23章的23.2 HZCALL2.OBJ 重新编译汉字显示程序模块 23\BOOTF
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值