r语言处理数据集编码_在强调编码语言或工具之前,请学习这3个基本数据概念

r语言处理数据集编码

重点 (Top highlight)

I got an Instagram DM the other day that really got me thinking. This person explained that they were a data analyst by trade, and had years of experience. But, they also said that they felt that their technical skills were slightly lacking, as they had never heard of many of the terms mentioned on my page. This person mentioned that they were looking forward to expanding their skill set by learning more technical tools (SQL, Python, R, etc.)

前几天,我得到了一个Instagram DM,这确实让我思考。 此人解释说,他们是贸易数据分析师,并且有多年的经验。 但是,他们还说,他们觉得自己的技术技能有些欠缺,因为他们从未听说过我页面上提到的许多术语。 该人提到他们希望通过学习更多技术工具(SQL,Python,R等)来扩展自己的技能。

As I thought about how to advise this person further, I realized that this person was in the perfect position to make the transition that they desired. Why? They had already mastered the data skills and data mindset that is crucial to being successful in the field of data.

当我考虑如何进一步建议此人时,我意识到该人处于完成他们所希望的过渡的完美位置。 为什么? 他们已经掌握了数据技能和数据思维方式,这对于在数据领域取得成功至关重要。

I (and so many others) worry about mastering every technical tool or product that is out there. I worry about only having experience with Microsoft products (SQL Server, Excel, Power BI), and feel that I need to broaden my horizons to be a better data analyst. I constantly see data scientists questioning and debating online about whether Python or R is better in their line of work.

我(以及许多其他人)担心掌握其中的每个技术工具或产品。 我担心只拥有Microsoft产品(SQL Server,Excel,Power BI)的经验,并感到我需要开阔视野才能成为更好的数据分析师。 我经常看到数据科学家在网上质疑和辩论关于Python还是R在他们的工作中是否更好。

But, speaking with my new Instagram friend helped me realize that these worries and debates are quite silly. Tools and programming languages are constantly evolving and changing, coming and going. But you know what is here to stay? The core concepts. Every tool or language that is ever built will always fall back on these core concepts.

但是,与我的新Instagram朋友交谈使我意识到这些担忧和辩论很愚蠢。 工具和编程语言不断发展变化,不断发展。 但是你知道这里还能留下什么吗? 核心概念。 曾经构建的每种工具或语言都将始终依赖于这些核心概念。

If you understand how to take a data set, manipulate it, and present it in a way that provides genuine insight (or at least invites more questions that you didn’t have before… because that happens!!), you are on the right path to succeed as some sort of data professional.

如果您了解如何获取数据集,进行操作并以提供真正洞察力的方式进行呈现(或至少会邀请您之前从未有过的其他问题……因为那样的话!),那么您就对了。成为某种数据专业人员的成功之路。

This base understanding of data is so powerful. You can take this understanding, and combine it with any technical tool of your choice. Then, you can group and filter data for business reporting and KPI monitoring, conduct statistical tests to answer questions about data, predict future data, or even generate AI models to use data to help guide business action. And you can do all these things with huge data sets containing millions and millions of rows!

对数据的基本了解是如此强大。 您可以将这种理解与您选择的任何技术工具结合起来。 然后,您可以对数据进行分组和过滤以进行业务报告和KPI监控,进行统计测试以回答有关数据的问题,预测未来数据,甚至生成AI模型以使用数据来帮助指导业务行动。 您可以使用包含数百万行的庞大数据集来完成所有这些工作!

OK I know I’m selling you and selling you on this idea, so let me cut to the chase. If you understand data concepts and how to apply them, you can easily implement these concepts with any technical tool or product of your choice.

好吧,我知道我在这个想法上要卖给你,也要卖给你,所以让我开始追逐。 如果您了解数据概念以及如何应用它们,则可以使用您选择的任何技术工具或产品轻松实现这些概念。

But don’t worry, I’m not just here to sell you on this and then head out. I’m going to talk about 3 basic data skills that I use daily as a data analyst, from a general perspective. NO TECHNICAL TERMS OR CODE INVOLVED. If you begin to master these (and other) data concepts, it is EASY PEASY LEMON SQUEEZY to take them and apply them with any tool. I even have a serious life hack at the end of the article that will help you further flex your new data knowledge in any tool you’ve been wanting to master. Stick with me, I got you!

但请放心,我不只是在这里卖给您,然后出发。 我将从总体的角度来谈论我日常用作数据分析师的3种基本数据技能。 不涉及技术术语或代码。 如果您开始掌握这些(和其他)数据概念,则很容易将它们应用于任何工具。 我什至在文章结尾处都有一个严肃的生活技巧,可以帮助您在想要掌握的任何工具中进一步扩展新的数据知识。 坚持我,我得到了你!

#1 筛选资料 (#1. Filtering Data)

The first data concept that is crucial in the data world is filtering data. Honestly, filtering data is a super simple concept and one that we as human beings do on a daily basis. Take this example. If you are going to get McDonald’s, you should probably ask your 3 roomies if they want some (because you don’t wanna be that roommate). But, before you go ask your roomies if they want chicken nugs, you remember that 2 out of your 3 roomies don’t even like McDonald’s, so you only end up asking one. Basically, you “filtered out” your two roommates from your “data set” based on some “attribute”, which is whether or not they like McDonald’s.

在数据世界中至关重要的第一个数据概念是过滤数据。 老实说,过滤数据是一个非常简单的概念,这是我们人类每天都在做的事情。 举这个例子。 如果要购买麦当劳,您可能应该问问3个室友是否想要一些(因为您不想成为那个室友)。 但是,在您去问您的空姐是否想要鸡块之前,您要记住,三分之二的空姐甚至都不喜欢麦当劳,所以最终只问了一个。 基本上,您是根据某些“属性”从“数据集”中“过滤”出两个室友的,这就是他们是否喜欢麦当劳的。

Filtering data as a data analyst or data scientist works the exact same way. If you are conducting an analysis on female customers, you will need to use whatever tool you have at your disposal to filter out the non-female customers. If you are trying to build a model that helps recommend skincare for adults, you would want to filter out any data for non-adult patients.

作为数据分析师或数据科学家过滤数据的方式完全相同。 如果要对女性顾客进行分析,则需要使用可用的任何工具来过滤掉非女性顾客。 如果您试图建立一个有助于推荐成人皮肤护理的模型,则可能要过滤掉非成人患者的所有数据。

Long story short, filtering data is just taking away all of the undesired data from whatever data set you have, until you are left with whatever data you need for your analysis.

长话短说,过滤数据只是从您拥有的任何数据集中删除所有不需要的数据,直到您剩下进行分析所需的任何数据为止。

#2。 数据类型转换 (#2. Data Type Conversion)

Another commonly used data skill is data type conversion. Data types are certain categories that data can fall into when it is stored in a spreadsheet, software, or database. Some common examples of data types are:

另一个常用的数据技能是数据类型转换。 数据类型是数据存储在电子表格,软件或数据库中时可以归入的某些类别。 数据类型的一些常见示例是:

  • Strings (ex: “Hello, this is a string.”)

    字符串(例如:“您好,这是一个字符串。”)
  • Integers (ex: 400)

    整数(例如:400)
  • Decimals (ex: 400.17)

    小数(例如:400.17)
  • Booleans (ex: TRUE)

    布尔值(例如:TRUE)

When we are working with a data set, we want to make sure that each data attribute is stored as the correct data type.

在处理数据集时,我们要确保每个数据属性都存储为正确的数据类型。

We would not want to store the integer 123 as a string. If we store 123 as a string, the spreadsheet, software, or database would not be able to perform necessary operations on it. The computer would get confused. If we tell the computer that we have a string (“123”), but later we want to add that “123” to something, the computer is going to say “HOLD UP A SECOND. You taught me that “123” was a STRING, which is basically a word. Ya can’t add words crazy person! You can only add numbers!!!!”

我们不想将整数123存储为字符串。 如果我们将123存储为字符串,则电子表格,软件或数据库将无法对其执行必要的操作。 电脑会感到困惑。 如果我们告诉计算机我们有一个字符串(“ 123”),但是稍后我们想将该“ 123”添加到某个内容中,则计算机将说“ HOLD UP SECOND”。 您告诉我“ 123”是一个STRING,基本上是一个字。 雅不能添加单词疯狂的人! 您只能加数字!!!”

Sorry the hypothetical computer got so aggressive there, but you get the point. In order to ensure that we can perform proper operations on our data down the road, we want to absolutely make sure that it is represented as the right type.

抱歉,假设的计算机在那里攻击性很强,但是您明白了。 为了确保我们可以对数据进行正确的操作,我们要绝对确保将其表示为正确的类型。

#3。 汇总数据 (#3. Aggregating Data)

The final concept that I want to touch on *for now* is aggregating data. Aggregating data is so so so SO powerful. Aggregating data can take you from a big giant text file of rows and columns of data, and turn it into a summary value or a summary table that is much more meaningful and pleasing to the eye.

我现在要谈的最后一个概念是聚合数据。 聚合数据是如此强大。 汇总数据可以使您从数据行和列的大型文本文件中获取,并将其转变为摘要值或摘要表,这些文件或表格更加有意义并令人赏心悦目。

Notice how I kept saying the word summary up there? It’s probably the best way to explain an aggregation, because aggregations take multiple rows of data and summarize them into a smaller number of rows.

请注意,我在那边一直说“总结”一词吗? 这可能是解释聚合的最佳方式,因为聚合会吸收多行数据并将其汇总为较少的行数。

Image for post
SQLiteTutorial.Net SQLiteTutorial.Net提供

If you have a data set that contains numbers that would make sense to be added (such as quantities or sales), one of the simplest ways to aggregate that data is to sum it up. In the example below, I took a data set that contained the amount of coffees I drank each day. I applied an aggregation to it by summing it, which created a summary view of my data on the right. This summary shows that I drank a total of 4 coffees (in this data set at least).

如果您的数据集包含要添加的数字(例如数量或销售额),那么汇总该数据的最简单方法之一就是对其进行汇总。 在下面的示例中,我获取了一个数据集,其中包含我每天喝的咖啡量。 我通过汇总对其应用了汇总,从而在右侧创建了我的数据的汇总视图。 此摘要显示我总共喝了4杯咖啡(至少在此数据集中)。

Image for post

There are many other aggregate operations that are pretty intuitive, even for those that are new to the data world. Each of these operations answers some question that informs us more about our data set. Some examples of other simple aggregate operations are:

还有许多非常直观的聚合操作,即使对于数据世界中的新操作也是如此。 这些操作中的每一个都会回答一个问题,这些问题可以使我们更多地了解我们的数据集。 其他简单聚合操作的一些示例包括:

  • Count (how many records are there?)

    计数(有多少条记录?)
  • Maximum (what’s the biggest observation?)

    最大值(最大的观察值是什么?)
  • Minimum (what’s the smallest observation?)

    最小(什么是最小观察值?)
  • Average (what do I tend to observe?)

    平均(我倾向于观察什么?)

好的,coooOooOol ..那下一步呢? (OK coooOooOol.. so what’s next?)

I know I promised you a life hack earlier, so don’t worry — I didn’t forget. Now that you have got a firmer grasp on some of the most crucial steps in a data professional’s workflow, you can take them and apply them with any technical tool of your choice, even if you are a newbie. How? With our best friend, our ultimate savior, GOOGLE!

我知道我已答应过给您一个生活小知识,所以不用担心-我没有忘记。 既然您已经掌握了数据专业人员工作流程中最关键的一些步骤,那么即使您是新手,也可以采用这些方法并将其与您选择的任何技术工具一起应用。 怎么样? 与我们最好的朋友,我们的终极救星GOOGLE!

Whenever I want to practice any of my skills with some tool, and I need a refresher on how to execute it properly, I will Google in this format:

每当我想使用某种工具来练习我的任何技能,并且需要重新学习如何正确执行它时,我都会以这种格式使用Google:

[insert data skill] in [insert technical tool]

[插入技术工具]中的[插入数据技能]

I swear to you, any time I Google in this format, I always end up finding great documentation, blog posts, or other resources (such as Stack Overflow) that direct my thoughts toward the solution.

我向你发誓,每当我使用这种格式的Google时,总会找到很多很棒的文档,博客文章或其他资源(例如Stack Overflow),这些思想将我的想法引向解决方案。

So, did you find aggregating data interesting? And are you wanting to better your SQL skills? Then I would recommend reviewing and working on:

那么,您发现汇总数据有趣吗? 您是否想提高您SQL技能? 然后,我建议您进行审查并进行以下工作:

aggregating data in SQL

在SQL中聚合数据

Are you basically a pro at filtering data in Python, but now you would like to try it out in R? Try my life hack and Google:

您基本上是精通Python过滤数据的专业人士,但是现在您想在R中尝试一下吗? 试试我的生活技巧和Google:

filtering data in R

在R中过滤数据

Take it from the girl who overwhelmed herself for months before pursuing her data career dreams. Learn the concepts first. Worry about the tech to get it done later. Technology is always evolving, but the foundations aren’t.

从追求了数据职业梦想的几个月来让自己不知所措的女孩那里拿来。 首先学习概念。 担心技术会在以后完成。 技术始终在发展,但基础却没有。

Originally published at https://datadreamer.io on August 7, 2020.

最初于 2020年8月7日 发布在 https://datadreamer.io

翻译自: https://towardsdatascience.com/learn-these-3-basic-data-concepts-before-stressing-about-coding-languages-or-tools-e599896e6d4

r语言处理数据集编码

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值