

“Data! Data! Data!” he cried impatiently. “I cannot make bricks without clay.”

“数据! 数据! 数据!” 他不耐烦地哭了。 “我不能没有粘土就制造砖头。”

Sherlock Holmes in “The Adventure of the Copper Beeches,” Sir Arthur Conan Doyle


The importance of data cannot be emphasised enough in a data science process. The outcomes of a data analysis task is a representation of the kind of data that has been fed into it. However, sometimes getting the data in itself is also a big pain point. Recently, I did a short course titled Data Journalism and Visualization with Free Tools, and there were some great resources shared through that course. I’ll be sharing some of the useful tips through a set of articles. In these articles, I’ll try to highlight some of the ways by which you can find data on the internet for free and then use it to create something meaningful out of it.

在数据科学过程中不能足够强调数据的重要性。 数据分析任务的结果表示已馈入其中的数据类型。 但是,有时获取数据本身也是一个很大的难题。 最近,我做了一个疗程短标题数据新闻和可视化通过免费工具并有通过该课程共享一些重要的资源。 我将通过一组文章分享一些有用的技巧。 在这些文章中,我将重点介绍一些方法,您可以通过这些方法免费在Internet上查找数据,然后使用它们来创建有意义的数据。

进阶Google搜索 (Advanced Google Search)

Let’s begin with the advanced Google search, which is one of the most common ways to get access to publicly available datasets. By merely typing the name of the required dataset in the search bar, we can get access to a plethora of resources. However, here is a simple trick which could ease this process to a great extent and help you find files with specific types on the internet.

让我们从高级Google搜索开始,这是访问公开可用数据集的最常用方法之一。 只需在搜索栏中键入所需数据集的名称,我们就可以访问大量资源。 但是,这是一个简单的技巧,可以在很大程度上简化此过程,并帮助您在Internet上查找具有特定类型的文件。

1.使用文件名和要下载文件的扩展名 (1. Using Filename and extension of the file to be downloaded)

Let’s say we have a task at hand to find healthcare-related data in CSV format. A CSV file indicates a comma-separated values file, which allows data to be saved in a tabular form. To get such files, go to the Google search bar and type the following:

假设我们手头有一项任务,以CSV格式查找与医疗保健相关的数据。 CSV文件表示用逗号分隔的值文件,该文件允许以表格形式保存数据。 要获取此类文件,请转到Google搜索栏并输入以下内容:

filetype < the extension of the file to be downloaded>: <category of data> data
Image for post
Image by Author

Google will list the links which closely matches the search results. Most of the times this will be a direct link to the specific files on the sites which can then be downloaded on to the local system and analysed later.

Google会列出与搜索结果非常匹配的链接。 在大多数情况下,这将是直接链接到站点上的特定文件,然后可以将其下载到本地系统上并在以后进行分析。

2.使用文件名,扩展名和站点名称 (2. Using Filename, extension and site name)

If you want to narrow down your search further, then this option will come in handy. , mentioning the file name will point to a lot of files. However, if you want to find data on a specific website, you can mention it too in the search bar, as follows:

如果您想进一步缩小搜索范围,那么此选项将派上用场。 ,提到文件名将指向很多文件。 但是,如果要在特定网站上查找数据,也可以在搜索栏中提及该数据,如下所示:

filetype < the extension of the file to be downloaded> : site <website> <category of data> filetype xlsx: health
Image for post
Image by Author

All the results will now pertain to only WHO, and this helps to narrow down the search results considerably.


与搜索命令兼容的文件 (Files compatible with the search command)

So what different kinds of files are compatible with the search command. This information can be accessed easily through the settings on the homepage as follows:

那么哪些不同类型的文件与search命令兼容。 可以通过以下页面上的设置轻松访问此信息:

  • Click Settings > Advanced Search

    单击Settings > Advanced Search

  • Scroll Down to the file type option and look for the available types. You'll see there are a lot of options including pdf and ppt filetypes also.

    向下滚动到file type选项,然后查找可用的类型。 您会看到很多选项,包括pdf和ppt文件类型。

Image for post
Image by Author


In this article, we looked at ways to find our desired datasets faster and more efficiently, via standard google search. We looked at how by merely adding a filename extension and a site name, could help filter the result more effectively. These techniques could be handy when we know what kind of data are we looking for. In the next article, I have shared some resources and useful sites which offer free and curated datasets for our data analysis tasks. Here is the link to the article:

在本文中,我们研究了通过标准Google搜索更快,更高效地找到所需数据集的方法。 我们研究了仅添加文件扩展名和站点名称如何有助于更有效地过滤结果的方法。 当我们知道要查找哪种数据时,这些技术可能会很方便。 在下一篇文章中,我共享了一些资源和有用的站点,这些站点为我们的数据分析任务提供了免费的精选数据集。 这是文章的链接:

