学习率和数据集规模_数据集和数据

学习率和数据集规模

Often the words data and dataset are used interchangeably due to the understanding the words have the same meaning. They are separate and related, but no the same. Data are observations or measurements (unprocessed or processed) represented as text, numbers, or multimedia. A dataset is a structured collection of data generally associated with a unique body of work

由于理解单词具有相同的含义,因此经常将单词数据和数据集互换使用。 它们是独立且相关的,但不完全相同。 数据是以文本,数字或多媒体形式表示的观察或测量值(未处理或已处理)。 数据集是通常与独特工作相关的结构化数据集合

“THERE IS AS YET INSUFFICIENT DATA FOR A MEANINGFUL ANSWER.”Isaac Asimov, “The Last Question”

“还没有足够的数据来说明有意义的答案。” Isaac Asimov,“最后一个问题”

查找正确数据的数据集 (Finding the Dataset for the Right Data)

After creating a question of hypothesis, then need to find what constants and variables attribute to the reason for the hypothesis. With this, it is finding the “right” data. How can data be “right”? Is it ever “wrong”. Well, it can be and will be a different posting. The right data is what leads to proving a hypothesis correct or not correct.

在创建假设问题之后,则需要查找哪些常量和变量归因于假设的原因。 这样,它正在寻找“正确的”数据。 数据如何“正确”? 是否曾经“错”。 好吧,将来可能会是不同的发布。 正确的数据是导致证明假设正确或不正确的原因。

Finding data that supports the hypothesis test can be simple or complex, but necessary. This is often overlooked in projects. From experience, source of data is important and great to be reviewed for relevant and plausible. There are no real wrong answers in data. What is observed and recorded is what it is and will be. However, making sure it is relevant is to ensure the result holds value to determining the validity of a hypothesis. Plausibility is to make sure there are no errors in the data. For plausibility, the data needs to fit in bounds and make sense. If there is age included as a variable in the dataset, no age entry should be negative. Age is never a negative value.

查找支持假设检验的数据可能很简单,也可能很复杂,但是很有必要。 这在项目中经常被忽略。 从经验来看,数据的来源非常重要,并且很重要,因此有必要对其进行复核。 数据中没有真正错误的答案。 被观察和记录的是它是现在和将来的样子。 但是,确保相关性是确保结果对确定假设的有效性具有价值。 合理性是确保数据中没有错误。 为了合理起见,数据需要在一定范围内且有意义。 如果在数据集中包含年龄作为变量,则任何年龄条目都不应为负。 年龄绝不是负值。

数据集的无限种类和类型 (Limitless Kinds and Types of Datasets)

Looking for datasets, using your favorite browser can show what is available. There are different types: spreadsheets, spatial maps, text only, and more. The subjects covered are amazing. On my GitHub, I have datasets, animals for adoption and B cell cancer. But there is much more, there are datasets about trout fishing in New York, number and statistics of all public schools in Oklahoma, oncology, vital statistics for regions in Africa, and topology maps of regions globally.

查找数据集,使用您喜欢的浏览器可以显示可用数据。 有不同的类型:电子表格,空间地图,仅文本等。 涵盖的主题是惊人的。 在我的GitHub上,我有数据集,用于收养的动物和B细胞癌。 但是,还有更多的数据集,包括有关纽约州鳟鱼捕捞,俄克拉荷马州所有公立学校的数量和统计数据,肿瘤学,非洲地区人口动态统计数据以及全球地区拓扑图的数据集。

There are public datasets and private datasets. There are many sites that are “open data”. They interface to repositories that are accessible to anyone 24/7 to use in projects, research, or general use. Private datasets are restricted to those the owner allows use, it varies from paying for datasets to only one user group defined many different ways. Think school or university, data company or businesses for examples.

有公共数据集和私有数据集。 有许多站点是“开放数据”。 它们连接到24/7的任何人都可以访问以用于项目,研究或一般用途的存储库。 私有数据集仅限于所有者允许使用的那些数据集,从支付数据集到仅一个定义了许多不同方式的用户组不等。 以学校或大学,数据公司或企业为例。

Unique datasets are out there. Generic datasets are out there. Datasets you see stored many, many places are out there. I find unique and interesting internet search that results in some finds that are worthwhile. Did you know that trout is stocked and tracked for fishing season? I did not. There is data available and can find out about the environment from fish. In many data science courses, there are datasets that are common, such as the Pittsburgh geospatial data used with tools like ArcGIS for understanding health data by topology and relationships with population.

唯一的数据集在那里。 通用数据集在那里。 您看到的数据集存储了很多很多地方。 我发现独特而有趣的互联网搜索可以带来一些有价值的发现。 您知道鳟鱼在垂钓季节有备货和追踪吗? 我没有。 有可用数据,可以从鱼类中找到有关环境的信息。 在许多数据科学课程中,都有一些通用的数据集,例如匹兹堡地理空间数据与诸如ArcGIS之类的工具配合使用,可以通过拓扑结构以及与人口的关系来了解健康数据。

There are large datasets and websites with APIs to select a subset that is smaller and easier to manipulate. This is handy for many tasks where you do not need everything, but it is still big data. A recent example of this was when I pulled data for a project from data.gov, which is part of the US Government Open Data Act which has collected and served as a resource for datasets since 2009, on healthcare data from medicare initiatives.

有大型的数据集和带有API的网站可以选择更小且更易于操作的子集。 对于不需要所有内容但仍然是大数据的许多任务,这很方便。 最近的一个例子是,当我从data.gov提取一个项目的数据时,该数据是美国政府开放数据法案的一部分,该法案自2009年以来一直收集并用作医疗保障计划中医疗数据的数据集资源。

知道有假设检验资源的计划 (Knowing the Plan to Have Resource for Hypothesis Testing)

Hypothesis testing becomes important. Working with a dataset does not guarantee results. The planning is in the beginning, deciding the hypothesis to run the experiment using analytics on data to generate a result. “Virtual Experiments” are virtual research to find and predict financial trends, trout, and study population health. Starting with a question, creating a statement to prove true or false, then finding what you need to know to prove it is key and underlying to use of data and datasets. Then, searching and selecting the pieces needed is easy. Need to know if more children are old enough to attend high school than elementary school in Oklahoma? Start with, “More children attend elementary school than high school” followed by picking datasets with quantity of public schools and type of school by grade level. Then, this should fulfill the needed information. After analyzing, we prove either true or false with additional insights from the process.

假设检验变得很重要。 使用数据集不能保证结果。 计划是一开始的,要确定使用数据分析生成结果的假设来运行实验。 “虚拟实验”是虚拟研究,用于发现和预测财务趋势,鳟鱼和研究人群健康状况。 从一个问题开始,创建一个陈述来证明是非题,然后找到您需要知道的内容以证明它是使用数据和数据集的关键和基础。 然后,轻松搜索和选择所需的零件。 是否想知道俄克拉荷马州的孩子上高中的年龄是否比小学还要多? 首先,“上小学的孩子多于高中”,然后按年级选择包含公立学校数量和学校类型的数据集。 然后,这应该满足所需的信息。 经过分析,我们从过程中获得了更多的洞察力,证明是对还是错。

翻译自: https://medium.com/ai-in-plain-english/datasets-and-data-6beb85098554

学习率和数据集规模

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值