android 激发亮度_5个数据集可激发您的下一个数据科学项目-CSDN博客

这篇博客翻译自一篇关于数据科学项目的文章，提到Android激发亮度这一话题，并提供了5个数据集，旨在为你的下一个数据科学项目提供灵感。

摘要由CSDN通过智能技术生成

android 激发亮度

There are two ways to start a new data science project, you either have an idea that you want to implement, so you look up datasets to use to make your vision come to life. Or, you come across an exciting dataset that inspires you to start a new project.

启动新数据科学项目的方法有两种，一种是您要实现的想法，因此您需要查找数据集以使您的愿景变为现实。或者，您遇到了一个令人兴奋的数据集，它激发了您开始一个新项目。

Often, as a beginner, you’ll probably be a little lost, looking around for a good place to start a project. For me, a good starting place was always finding an interesting dataset that triggers my curiosity.

通常，作为一个初学者，您可能会迷路了，四处寻找启动项目的好地方。对我而言，一个好的起点始终是找到一个有趣的数据集来激发我的好奇心。

Well, this article is about looking for datasets to inspire you. When you stumble across an intriguing dataset, with so much potential that sparks the creativity within you and you can’t help but using to build something great.

好吧，本文是关于寻找可以激发您灵感的数据集。当您偶然发现一个有趣的数据集时，它的巨大潜力激发了您的内在创造力，您不禁要使用它来构建很棒的东西。

Wherever you are on your data science journey, just starting out, or trying to grow your skills and maybe build new ones, there is no better way to improve a skill then practice it. The more projects you build, the more fluent in data science you will get, and the better and more appealing your profile will become.

无论您是在数据科学之旅中的任何地方，无论是刚开始还是尝试提高技能并可能建立新技能，都没有更好的方法来提高技能然后再实践。您构建的项目越多，您将获得的数据科学知识就越流利，您的档案就会变得越有吸引力。

In order to increase your data science skills and establish a good set of skills needed to build a strong profile, you need to tackle the 5 aspects of data science:

为了提高您的数据科学技能并建立一套强大的技能以建立良好的形象，您需要解决数据科学的5个方面：

Deep Learning
深度学习
Natural Language Processing
自然语言处理
Big Data
大数据
Machine Learning
机器学习
Image Processing
图像处理

In this article, I will present you with five options of datasets—one for each of the 5 aspects of data science. I will talk a little bit about the construction of the dataset, the formate of it, and some possible ideas of how you can use it to build some fantastic projects.

在本文中，我将为您提供五个数据集选项，一个用于数据科学的五个方面。我将讨论数据集的构造，它的形式以及有关如何使用它来构建一些出色的项目的一些可能的想法。

These datasets can be used for building projects on more than one aspect of data science. So, use your creativity and start building up your profile — or grow it.

这些数据集可用于在数据科学的多个方面构建项目。因此，请发挥您的创造力并开始建立您的个人资料-或扩大个人资料。

MNIST数据集 (MNIST Datasets)

There's no better place to start than with the — arguably — most famous datasets collection of them all, the MNIST datasets. Here, we will talk about two MNIST datasets:

最好的起点就是最好的MNIST数据集，这可以说是最著名的所有数据集。在这里，我们将讨论两个MNIST数据集：

MNIST of handwritten digits.
MNIST的手写数字。
Fashion-MNIST.
时尚MNIST。

MNIST手写数字 (MNIST of handwritten digits)

The MNIST dataset is a collection of handwritten digits. The dataset has a training set of 60,000 images and a test set of 10,000 images that can be used for model evaluation. The digits in the dataset have been size-normalized and centered in a fixed-size image.

MNIST数据集是手写数字的集合。数据集具有可用于模型评估的60,000张图像的训练集和10,000张图像的测试集。数据集中的数字已进行尺寸规格化，并在固定尺寸的图像中居中。

This dataset is excellent for both beginners who want to try learning techniques and pattern recognition methods, as well as intermediate data scientists wanting to test their models on real-world data while spending minimal efforts on preprocessing and formatting.

对于想要尝试学习技术和模式识别方法的初学者以及想要在真实数据上测试其模型而又不花太多精力进行预处理和格式化的中间数据科学家来说，此数据集都是极好的选择。

The original format of this dataset might be a little confusing for absolute beginners; luckily, this dataset is also available in an easy to handle CSV format.

该数据集的原始格式对于绝对的初学者而言可能会有些混乱。幸运的是，该数据集也以易于处理的CSV格式提供。

时尚MNIST (Fashion-MNIST)

Fashion-MNIST is a dataset of Zalando's article images. Just like the original dataset, this one also consists of a training set of 60,000 examples and a test set of 10,000 samples.

Fashion-MNIST是Zalando文章图片的数据集。就像原始数据集一样，该数据集也包含60,000个示例的训练集和10,000个样本的测试集。

Each data entry is a 28x28 grayscale image, associated with a label from 10 classes. The structure of training and testing pictures are the same.

每个数据条目都是一个28x28灰度图像，与来自10个类别的标签相关联。训练和测试图片的结构相同。

您如何使用MNIST数据集？ (How can you use the MNIST datasets?)

MNIST datasets are great in helping beginners understand and learn different machine learning and pattern recognition techniques.
MNIST数据集可帮助初学者了解和学习不同的机器学习和模式识别技术。
The dataset contains both training and testing data, so you won't need to split your data.
数据集包含训练和测试数据，因此您无需拆分数据。
This dataset can be used to build image recognition applications for handwriting, digit recognition, and clothes-item recognition in the fashion dataset.
该数据集可用于构建图像识别应用程序，以用于时尚数据集中的手写，数字识别和衣服项目识别。
You can use this dataset to learn and practice the different methods and techniques of convolution neural networks (CNN). You can eave use Keras and build your model.
您可以使用此数据集来学习和实践卷积神经网络(CNN)的不同方法和技术。您可以使用Keras来构建模型。

亚马逊产品数据数据集 (Amazon product data Dataset)

Amazon product data dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 — July 2014. This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

Amazon产品数据数据集包含来自Amazon的产品评论和元数据，包括从1996年5月至2014年7月的1.428亿条评论。该数据集包括评论(评分，文字，帮助性投票)，产品元数据(描述，类别信息，价格，品牌和图像)功能)和链接(也可以查看/购买的图表)。

This dataset will give you a basic understanding of real business problems and helps you comprehend and extract trends in sales over the years.

该数据集将使您对实际的业务问题有基本的了解，并帮助您理解和提取多年来的销售趋势。

您如何使用Amazon产品数据集？ (How can you use the Amazon product data datasets?)

You can use this dataset to analyze sentiment, which is one of the most popular applications of natural language processing(NLP).
您可以使用此数据集来分析情绪，这是自然语言处理(NLP)最受欢迎的应用之一。
This dataset is a text processing data, which you can use to build all various types of NLP models.
此数据集是文本处理数据，可用于构建所有各种类型的NLP模型。
You can also use this dataset to build product trending models and predict future trends based on that.
您还可以使用该数据集来构建产品趋势模型，并据此预测未来趋势。

YouTube视频统计数据集 (YouTube Videos Statistics Dataset)

YouTube maintains a continuously-updated list of the top trending videos on the platform. To decide on the year's top trending videos, YouTube uses various factors, including measuring users' interactions (number of views, shares, comments, and likes).

YouTube会在平台上不断更新热门视频的列表。为了确定本年度最热门的视频，YouTube使用了多种因素，包括衡量用户的互动情况(观看次数，分享次数，评论和喜欢的次数)。

YouTube videos statistics Dataset is a daily record of the top trending YouTube videos.

YouTube视频统计数据集是最热门YouTube视频的每日记录。

如何使用YouTube视频统计数据集？ (How to use the YouTube Videos Statistics Dataset?)

To perform sentiment analysis on different types of videos and find patterns.
对不同类型的视频进行情感分析并找到模式。
Categorizing YouTube videos based on their comments and statistics. Using the results, you can build your own database, of which vides tend to engage the audience more.
根据YouTube视频的评论和统计信息对其分类。使用结果，您可以构建自己的数据库，其中的视频倾向于吸引更多观众。
Train machine learning algorithms like RNNs to generate YouTube comments.
训练RNN等机器学习算法来生成YouTube评论。
Use the previous years' lists of popular video to build a machine learning model that predicts a future top-trending list of videos.
使用前几年的热门视频列表来构建机器学习模型，以预测未来视频中最热门的列表。

短信垃圾数据集 (SMS Spam Dataset)

Nowadays, we are surrounded by spam all around us, spam mail, spam advertising, and spam SMSs. The SMS spam dataset contains a set of SMS messages in English of 5,574 messages, tagged as spam.

如今，我们到处都是垃圾邮件，垃圾邮件，垃圾广告和垃圾短信。 SMS垃圾邮件数据集包含一组英语的SMS消息，其中包含5,574条消息，标记为垃圾邮件。

This dataset represents the different spam messages as entries of a CSV file for easy reading and extracting. The dataset CSV file contains two columns, one for the classification of the message as spam or not, and the other one is the raw text of the message.

此数据集将不同的垃圾邮件表示为CSV文件的条目，以方便阅读和提取。数据集CSV文件包含两列，一列用于将邮件分类为垃圾邮件或不归类为垃圾邮件，另一列是邮件的原始文本。

如何使用SMS垃圾邮件数据集？ (How can you use the SMS Spam Dataset?)

You can use machine learning classifications algorithms to build a spam message classifier then test it on some messages to label them as safe or spam.
您可以使用机器学习分类算法来构建垃圾邮件分类器，然后在某些邮件上对其进行测试以将其标记为安全或垃圾邮件。
You can build a model and train on this dataset to predict and detect spam messages.
您可以建立模型并对该数据集进行训练，以预测和检测垃圾邮件。

可可数据集 (COCO Dataset)

COCO is a large-scale object detection, segmentation, and captioning dataset created by Microsoft and sponsored by many other big companies. The acronym COCO stands for Common Objects in Context. This dataset includes many features, such as 1.5 million object instances for 80 object categories, 330K of images, 91 stuff categories, and 5 captions per image.

COCO是由Microsoft创建并由许多其他大公司赞助的大规模对象检测，分割和字幕数据集。首字母缩写词COCO代表上下文中的通用对象。该数据集包含许多功能，例如150万个对象实例，80个对象类别，330K图像，91个填充类别以及每个图像5个标题。

COCO has amazing documentation, and you can explore the dataset online using an explorer before you decide to download it.

COCO拥有出色的文档，您可以在决定下载数据集之前使用资源管理器在线浏览数据集。

如何使用COCO数据集？ (How can you use COCO dataset?)

COCO can be used to train and build machine learning models to detect and classify different objects. For example, you can use it to classify different viable types.
COCO可用于训练和构建机器学习模型以检测和分类不同的对象。例如，您可以使用它来分类不同的可行类型。
By nature, working with COCO enables you to build different image processing applications, such as image segmentation and compression.
从本质上讲，与COCO合作可让您构建不同的图像处理应用程序，例如图像分割和压缩。
You can also use COCO to train your model to analyze footage data from a security camera, by detecting people/animals.
您还可以使用COCO通过检测人/动物来训练模型以分析来自安全摄像机的镜头数据。
COCO can also be used for non-data science projects, such as object detection in robotics.
COCO还可以用于非数据科学项目，例如机器人技术中的对象检测。

带走 (Takeaway)

Starting a new project can be somewhat tricky, especially if you’re just starting with data science. When I first started, I couldn’t decide on a project, simply, because I didn’t have enough knowledge to choose a project and a dataset.

开始一个新项目可能有些棘手，特别是如果您刚开始使用数据科学时。刚开始时，我无法确定项目，只是因为我没有足够的知识来选择项目和数据集。

One of the things that helped me is, browsing datasets websites (e.g. Kaggle) and reading about different datasets and how can they be used. That gave me the inspiration I needed to kick start new projects. I still do that now.

帮助我的一件事是浏览数据集网站(例如Kaggle )，并阅读有关不同数据集的信息以及如何使用它们。这给了我启动新项目所需的灵感。我现在仍然这样做。

The only time I didn't need to explore and browse different datasets, is when I had a set project with a specific dataset. Which was the case if I were employed by a company or a client.

唯一不需要浏览和浏览不同数据集的时间就是当我有一个带有特定数据集的固定项目时。如果我受雇于公司或客户，情况就是这样。

As my knowledge base grew, I developed an eye for good datasets and how to see their potential. Data always tells a story, you just need to listen to it.

随着知识库的增长，我开始关注良好的数据集以及如何查看其潜力。数据总能讲述一个故事，您只需要听听它。

So, I hope this article inspired you to build a new project or browse the web for exciting datasets that inspire you to build something awesome.

因此，我希望本文能激发您构建一个新项目或浏览Web的令人兴奋的数据集，从而激发您构建出很棒的东西。