ai算法针对小样本数据_如何针对数据科学和AI产品调整开发者关系策略

ai算法针对小样本数据

The global market for artificial intelligence products is supposed to grow roughly 10 times by 2025 to almost $120 billion, according to market research firm Tractica. Many companies are attempting to capture that market, including IBM with its Watson suite of developer tools.

根据市场研究公司Tractica的数据,到2025年,人工智能产品的全球市场预计将增长约10倍,达到近1200亿美元。 许多公司正在尝试占领该市场,包括IBM及其Watson 开发人员工具套件

I spoke to my colleague Upkar Lidder about how to adapt a developer-relations strategy to current and future generations of developer-facing AI products.

我曾与我的同事Upkar Lidder谈过如何使开发人员关系策略适应当前和未来面向开发人员的AI产品。

Upkar Lidder is a full-stack developer and data wrangler with a decade of development experience in a variety of roles. He speaks at various conferences and participates in local tech groups and meetups. Upkar went to graduate school in Canada and currently resides in the United States.

Upkar Lidder是一名全职开发人员和数据争用者,在各种角色方面拥有十年的开发经验。 他在各种会议上发表演讲,并参加当地的技术小组和聚会。 Upkar在加拿大读研究生,目前居住在美国。

问:您已经与从事各种AI项目的开发人员合作,从简单的101式教程到实施大型系统的客户。 AI开发与传统编程有何不同? (Q: You’ve worked with developers working on all kinds of AI projects, from simple 101-style tutorials to customers implementing huge systems. How does AI development differ from more conventional programming?)

There’s a lot of learning, trial, and experimentation with AI and machine learning. The goals for AI projects may be vague: “reduce the number of customer complaints,” for example.

关于AI和机器学习的学习,试用和实验很多。 人工智能项目的目标可能含糊不清:例如,“减少客户投诉的数量”。

By comparison, classical software development user requirements may  look like “give me a dialog box with a button on it” — specific and well-defined. Of course, there is a lot of user research and design that goes into the software spec to get to that point, and as a developer, you work to that spec. On the contrary, as a data scientist, you may only be pointed to an unstructured data set, then the real fun starts: You start exploring it! I love the data-wrangling aspect of AI development. You can get into a Jupyter Notebook and start exploring specific outliers, shapes of data, types of data, and see how the data looks through different visual representations.

相比之下,传统的软件开发用户需求可能看起来像“给我一个带有按钮的对话框”,具体且定义明确。 当然,要达到这一点,有很多用户研究和设计都包含在软件规范中,而作为开发人员,您需要遵循该规范。 相反,作为数据科学家,您可能只会被指向一个非结构化的数据集,然后真正的乐趣就开始了:您开始探索它! 我喜欢AI开发中处理数据的方面。 您可以进入Jupyter笔记本,开始探索特定的离群值,数据形状,数据类型,并通过不同的视觉表示查看数据的外观。

Then you make decisions. What do I do with the missing values? How is that going to affect my projected outcome? Even in these first two stages, there are a lot of unknowns. In software development many programmers walk a well-worn path which their colleagues and predecessors have paved since decades. In data science, you have an exploratory period where you try to find a path to take. Once you’re done cleaning and transforming, you choose an appropriate modeling technique and proceed with your analysis. A lot of that exploration is brute force. XKCD has my favorite cartoon on data science.

然后您做出决定。 如何处理缺失的值? 那将如何影响我的预期结果? 即使在前两个阶段,也有许多未知数。 在软件开发中,许多程序员走了一条破旧的道路,这是他们的同事和前任们几十年来铺平的道路。 在数据科学中,您有一个探索期,在此期间您尝试寻找道路。 清理和转换完成后,您可以选择适当的建模技术并继续进行分析。 大部分的探索都是蛮力的。 XKCD有我最喜欢的数据科学漫画。

Like I said, some of data science is just brute force. Even with helper libraries and frameworks, you have to sketch out an educated starting point yourself and let the library do much of the rest on its own. Afterward, you’ll analyze how the results compare with your other benchmark algorithms and repeat the procedure.

就像我说的,某些数据科学只是蛮力。 即使有了帮助程序库和框架,您也必须自己勾勒出一个受过良好教育的起点,并让该库独自完成其余的工作。 之后,您将分析结果如何与其他基准算法进行比较,并重复该过程。

问:这提出了一个问题:您如何向非技术用户解释您的项目和模型? (Q: This raises the question: How do you explain your project and model to non-technical users?)

It’s a great question: how well do you want to be able to explain your thought process and decisions to business users? Some models like decision trees are easy to explain, whereas something built with neural networks or ensemble models, your models can get more complicated and harder to explain. Compare this to traditional software development: except for some tricky bugs, problems of explanation like that just don’t happen.

这是一个很大的问题:您希望如何向企业用户解释自己的思维过程和决策? 诸如决策树之类的某些模型很容易解释,而使用神经网络或集成模型构建的模型会使您的模型变得更加复杂且难以解释。 将此与传统软件开发进行比较:除了一些棘手的错误外,不会发生诸如此类的解释问题。

Now with the more advanced systems like AutoAI, you give the data to the system, and it will take care of more of the heavy lifting on your behalf. For example, I’m working with some data scientists on a project analyzing NPS scores for some internal departments. We’re building a system where, as a support call is going on, the system can identify red flags in the call that show it “going downhill” and alert a manager while the call is still in process. We have access to data points such as call length, customer tier, and sentiment analysis, so we can use this data to automatically flag issues before they explode. Interestingly, we tried running AutoAI on the data — the data scientists didn’t like it! The main issue is that it can be a bit of a “black box,” and the scientists wanted to be able to explain how they reached their conclusions.

现在,使用诸如AutoAI之类的更高级的系统,您可以将数据提供给系统,它将代您处理繁重的工作。 例如,我正在与一些数据科学家合作开发一个项目,该项目分析了一些内部部门的NPS分数。 我们正在构建一个系统,在该系统中,随着支持电话的进行,该系统可以识别呼叫中的红旗,表明其“下坡”,并在呼叫仍在进行中时提醒经理。 我们可以访问诸如通话时长,客户等级和情绪分析之类的数据点,因此我们可以使用这些数据在问题爆发之前自动对其进行标记。 有趣的是,我们尝试对数据运行AutoAI-数据科学家不喜欢它! 主要问题是它可能有点“黑匣子”,科学家希望能够解释他们如何得出结论。

In the annual data science survey, one of the biggest gaps in data science is skillsets. So, on the one hand, we need black box systems like this where you don’t have to have a Ph.D. in math to understand why the system works; it will do feature engineering, Hyperparameter optimization — at the same time, the data scientists are not fully trusting it.

在年度数据科学调查中,技能组是数据科学中最大的差距之一。 因此,一方面,我们需要像这样的黑匣子系统,而您无需拥有博士学位。 在数学中了解系统为何运行; 它会进行功能工程,超参数优化 -同时,数据科学家对此并不完全信任。

问:您已经在IBM工作了几年。 在进入AI之前,您做了什么?如何进行转换? (Q: You’ve been working at IBM for a few years. What did you do before you got into AI, and how did you make the switch?)

I joined through the support group at IBM, so I’d get calls from clients around the world with issues and try to help them out. I was Level 2-3, so the problems would be escalated to me. So the customers were already angry by the time they talked to me! In a lot of ways, I feel that the beginning role was similar to what I do now. I talk with developers and try to figure out how to help them, even though I approach that from an education perspective more than support. Then I was a Java developer, building products with Eclipse. From there I went to a client-facing technical role working on client projects, so very different from product development. From there I became a functional lead, which is essentially a project management role. I had a team of developers that I’d work with to scope solutions and ensure they were delivered on time. After two years of that, I moved into DevRel.

我加入了IBM支持小组,所以我会接到来自世界各地客户的电话,以解决问题。 我是2-3级,所以问题会升级给我。 因此,当客户与我交谈时,他们已经很生气! 从很多方面来说,我觉得开始的角色与我现在的角色相似。 我与开发人员进行交谈,并试图弄清楚如何为他们提供帮助,即使我从教育的角度而不是从支持的角度进行研究。 那时我是一名Java开发人员,使用Eclipse构建产品。 从那里我去了一个面向客户的技术角色,负责客户项目,与产品开发有很大的不同。 从那时起,我成为了职能领导,从本质上说,他是项目管理人员。 我有一个开发人员团队,我将与他们合作确定解决方案的范围,并确保按时交付。 两年后,我进入了DevRel。

Before working in developer relations, I would enjoy mentoring coding school and bootcamp students on the side; so when this developer-relations job came up I thought, “Wow, it would be great to do that as a job and get paid for it!”

在从事开发人员关系工作之前,我会喜欢指导编码学校和训练营的学生。 因此,当开发人员关系工作出现时,我想:“哇,把它当做一份工作并得到报酬真是太好了!”

问:您以前提倡使用API​​和无服务器架构等产品和技术。 您开发了哪些新策略来谈论AI和机器学习? (Q: You’ve previously advocated for products and technologies like APIs and serverless architecture. What new tactics have you developed to talk about AI and machine learning?)

With AI/ML, you have to do — less talking, more doing.  For other software development topics like serverless, you can have a longer lecture and then get into a demo. With AI/ML, there’s an emphasis on experimentation. You have to get your hands dirty or it won’t work. I love Jupyter Notebook because you can do something, see the causation, see the result, and only then think about why.

随着AI / ML,你必须做的 -少说话,多干什么 。 对于其他软件开发主题(如无服务器),您可以进行更长的授课,然后进入演示。 使用AI / ML,重点在于实验。 您必须弄脏双手,否则将无法使用。 我喜欢Jupyter Notebook,因为您可以执行某些操作,查看因果关系,查看结果,然后再思考原因。

I feel like there’s more abstract theory, math, and intuition behind data science. You can always memorize a formula, but to be able to get an intuition about something, that is ideal. And that comes from experimentation. Through visualization and plotting, you can understand the math behind the different data science concepts. Contrast that with something more DevOps-oriented — it’s a different approach. So in data science and AI developer relations, you have to make sure the attendees are doing something and engaged. Otherwise you lose them very fast — because there’s math involved!

我觉得数据科学背后有更多的抽象理论,数学和直觉。 您总是可以记住一个公式,但是能够对某些事物有一个直觉,这是理想的。 那来自实验。 通过可视化和绘图,您可以了解不同数据科学概念背后的数学原理。 与更面向DevOps的东西相比,这是另一种方法。 因此,在数据科学和AI开发人员关系中,您必须确保参与者正在做某事并且参与其中。 否则,您会很快失去它们-因为涉及数学!

One of the things that’s worked for me is to put a lot of time into my workshops, explaining every step in great detail. In my slides, I’ll use arrows, annotated rectangles, and the like to ensure that the students are able to follow along easily and naturally. When I teach Jupyter Notebooks, I craft half-baked solutions, where I build out a solution that works to a certain point and then the next two cells would be questions: find the frequency of the data we just queried. You can do a demo, where you do and they watch, then you can do a follow-along, where you both do at the same time, and finally, you walk through an exercise method, where they do the work first. The last two are most useful for data science concepts.

对我来说有用的一件事是花很多时间在我的工作室里,详细解释每一步。 在我的幻灯片中,我将使用箭头,带注释的矩形等来确保学生能够轻松自然地跟随。 当我教Jupyter Notebooks时,我会制作半熟的解决方案,在其中建立可以在特定点工作的解决方案,然后接下来的两个单元将是一个问题:找到我们刚刚查询的数据的频率。 您可以做一个演示,在哪里观看,然后他们观看,然后您可以进行跟进,在这两个过程中您都可以同时进行,最后,您逐步完成一种锻炼方法,首先由他们完成工作。 对于数据科学概念,最后两个最有用。

问:让我们更多地讨论动手研讨会。 我们发现自己在IBM举办越来越多的研讨会。 您可以分享哪些最佳做法? (Q: Let’s talk more about hands-on workshops. We find ourselves doing more and more workshops at IBM. What best practices can you share?)

The top five things that work for me in workshops:

在研讨会上对我有用的五件事:

  • Prerequisites — Get workshop attendees to complete some prerequisites before the workshop. If you have special codes for attendees to use, distribute them ahead of time. When they check in at registration, the first thing you do is add the code to upgrade their account. A lot of time in workshops is wasted on setting up; the speaker spends the first 10 minutes saying “Hey, follow me.” Avoid this if possible by preparing beforehand. And of course, as much as you try, it’s impossible to get everybody set up before you start; you’ll have to cater to these users before you start your presentation.

    先决条件—让讲习班参与者在参加讲习班之前完成一些先决条件。 如果您有供与会者使用的特殊代码,请提前分发它们。 当他们在注册时签到时,您要做的第一件事就是添加代码以升级他们的帐户。 在工作坊上浪费了很多时间来建立; 演讲者在头10分钟内说“嘿,跟我来。” 如果可能,请事先做好准备,避免这种情况。 当然,无论您尝试多少,都不可能在开始之前就设置好所有人。 在开始演示之前,您必须迎合这些用户。
  • Step-by-step instructions — Even if the attendees have no issues following along, have a backup plan with slide numbers that they can go back to and follow. Who reads the book that comes with the vacuum cleaner? Nobody, but you may need to consult it later if you have issues.

    分步说明—即使与会者没有任何后续问题,也要制定一个备份计划,其中要有幻灯片编号,他们可以回头再看。 谁会读吸尘器随附的书? 没人,但是如果有问题,您可能需要稍后再咨询。
  • Have the final solution ready — If you’re using GitHub, have different branches for the different steps; if users are less technical or need to skip a section, they can check out that branch and still be able to keep up with the workshop. This type of content takes time to develop.

    准备好最终解决方案-如果您使用的是GitHub,请为不同的步骤提供不同的分支; 如果用户技术不太熟练或需要跳过某个部分,则可以签出该分支机构,并且仍然可以跟上讲习班的进度。 这类内容需要花费时间才能开发。
  • Stretch goals — You’ll get an audience of all backgrounds and experiences, and it’s important to cater to all of them (to the extent possible). You’ll either lose the beginners — it’s important not to lose them because it may be their first time doing something — but you don’t want to lose the intermediate and advanced users either, and this is where stretch goals are important.

    延伸目标-您将获得所有背景和经验的听众,并且照顾到所有这些(可能的话)很重要。 您可能会失去初学者-重要的是不要失去他们,因为这可能是他们第一次做某事-但您也不想失去中级和高级用户,这是扩展目标很重要的地方。
  • Resources — Tell your students where to go and what to do next, outside of the logistics of the workshops. Make sure you have assistants during the sessions as a resource also.

    资源-在讲习班的后勤工作之外,告诉学生要去哪里以及下一步要做什么。 确保在会议期间有助手也作为资源。

问:在开发人员关系世界中,您想召集谁来做好工作或扩大开发人员关系的界限? (Q: Who would you like to call out in the developer-relations world for doing a good job or stretching the boundaries of developer relations?)

Fortunately, the DevRel world is filled with people I look up to! Some of the names that come to mind are:

幸运的是,DevRel的世界充满了我所仰望的人们! 我想到的一些名字是:

  • Josh Gordon, Google, @random_forests

    乔什·戈登(Josh Gordon),Google,@ random_forests
  • Paige Bailey, Google, @DynamicWebPaige

    Paige Bailey,Google,@ DynamicWebPaige
  • James Thomas, IBM, @thomasj

    詹姆斯·托马斯(James Thomas),IBM,@ thomasj
  • Gabriela de Queiroz, IBM, @gdequeiroz

    Gabriela de Queiroz,IBM,@ gdequeiroz
  • Vijay Bommireddipalli, IBM, @vjbytes

    Vijay Bommireddipalli,IBM,@vjbytes
  • Renee M. P. Teate, Heliocampus, @BecomingDataSci

    Renee MP Teate,Heliocampus,@ BecomingDataSci

下一步 (Next steps)

翻译自: https://www.freecodecamp.org/news/adapting-your-developer-relations-strategy-for-data-science-and-ai-products/

ai算法针对小样本数据

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值