How to use Google Cloud Platform (GCP) as rented infrastructure?

Rent-a-VM to process earthquake data

Overview

Duration is 1 min

In this lab you spin up a virtual machine, configure its security, access it remotely, and then carry out the steps of an ingest-transform-and-publish data pipeline manually.

What you learn
In this lab, you:

  • Create a Compute Engine instance with the necessary Access and Security
  • SSH into the instance
  • Install the software package Git (for source code version control)
  • Ingest data into a Compute Engine instance
  • Store the transformed data on Cloud Storage
  • Publish Cloud Storage data to the web

Introduction

Duration is 1 min

In this lab you spin up a virtual machine, install software on it, and use it to do scientific data processing. We do not recommend that you work with Compute Engine instances at such a low-level, but you can!

In this lab, you will use Google Cloud Platform in a manner similar to the way you likely use clusters today. Spinning up a virtual machine and running your jobs on it is the closest you can get to working with the public cloud as simply rented infrastructure. It doesn’t take advantage of the other benefits that Google Cloud Platform provides – namely the ability to forget about infrastructure and work with your scientific computation problems simply as software that requires to be run.

You will ingest real-time earthquake data published by the United States Geological Survey (USGS) and create maps that look like this:

Create Compute Engine instance with the necessary API access

Duration is 4 min

To create a Compute Engine instance:

Step 1
Browse to https://cloud.google.com/

Step 2
Click on Console.

Step 3
Click on the Menu (three horizontal lines):

Step 4
Select Compute Engine.

Step 5
Click Create Instance and wait for a form to load. You will need to change some options on the form that comes up.

Step 6
Change Identify and API access for the Compute Engine default service account to Allow full access to all Cloud APIs:

Step 7
Now, click Create

SSH into the instance

Duration is 2 min

You can remotely access your Compute Engine instance using Secure Shell (SSH):

Step 1
Click on SSH:

Note

SSH keys are automatically transferred, and that you can ssh directly from the browser, with no extra software needed.

Step 2
To find some information about the Compute Engine instance, type the following into the command-line:

cat /proc/cpuinfo

Install software

Duration is 2 min

Step 1
Type the following into command-line:
sudo apt-get update
sudo apt-get -y -qq install git

Step 2
Verify that git is now installed
git --version

Ingest USGS data

Duration is 3 min

Step 1
On the command-line, type:
git clone https://github.com/GoogleCloudPlatform/training-data-analyst
This clones the code repo.

Step 2
Navigate to the folder corresponding to this lab:

cd training-data-analyst/courses/machine_learning/deepdive/01_googleml/earthquakes
Step 3
Examine the ingest code using less:

less ingest.sh
The less command allows you to view the file (Press the spacebar to scroll down; the letter b to back up a page; the letter q to quit).

The program ingest.sh downloads a dataset of earthquakes in the past 7 days from the US Geological Survey. Where is this file downloaded? To disk or to Cloud Storage? ________________

Step 4
Run the ingest code:

bash ingest.sh
Step 5
Verify that some data has been downloaded:

head earthquakes.csv
The head command shows you the first few lines of the file.

Transform the data

Duration is 3 min

You will use a Python program to transform the raw data into a map of earthquake activity:

Step 1
The transformation code is explained in detail in this notebook:

https://github.com/GoogleCloudPlatform/datalab-samples/blob/master/basemap/earthquakes.ipynb

Feel free to read the narrative to understand what the transformation code does. The notebook itself was written in Datalab, a GCP product that you will use later in this set of labs.

Step 2
First, install the necessary Python packages on the Compute Engine instance:

bash install_missing.sh
Step 3
Then, run the transformation code:

python transform.py
Step 4
You will notice a new image file if you list the contents of the directory:

ls -l

Create bucket

Duration is 2 min

Create a bucket using the GCP console:

Step 1
Browse to the GCP Console by visiting http://cloud.google.com) and clicking on Go To Console

Step 2
Click on the Menu (3 bars) at the top-left and select Storage

Step 3
Click on Create Bucket.

Step 4
Choose a globally unique bucket name (your project name is unique, so you could use that). You can leave it as Multi-Regional, or improve speed and reduce costs by making it Regional . Then, click Create.

Note: Please pick a region from the following: us-east1, us-central1, asia-east1, europe-west1. These are the regions that currently support Cloud ML Engine jobs. Please verify here since this list may have changed after this lab was last updated. For example, if you are in the US, you may choose us-east1 as your region.

Step 5
Note down the name of your bucket: _____________________

In this and future labs, you will insert this whenever the directions ask for .

Store data

Duration is 1 min

To store the original and transformed data in Cloud Storage

Step 1
In the SSH window of the Compute Engine instance, type:

gsutil cp earthquakes.* gs://<YOUR-BUCKET>/earthquakes/
to copy the files to Cloud Storage

Step 2
On the GCP console, click on your bucket name, and notice there are three new files present in the earthquakes folder.

Publish Cloud Storage files to web

Duration is 2 min

To publish Cloud Storage files to the web:

Step 1
On the GCP console, select all three earthquakes files that you uploaded to the bucket and click on Share publicly

Step 2
Click on the Public link corresponding to earthquakes.htm

Step 3
What is the URL of the published Cloud Storage file? How does it relate to your bucket name and content?


Step 4
What are some advantages of publishing to Cloud Storage? ___________________________________

Clean up

Duration is 2 min

To delete the Compute Engine instance (since we won’t need it any more):

Step 1
On the GCP console, click the Menu (three horizontal bars) and select Compute Engine

Step 2
Click on the checkbox corresponding to the instance that you created (the default name was instance-1)

Step 3
Click on the Delete button in the top-right corner

Step 4
Does deleting the instance have any impact on the files that you stored on Cloud Storage? _________

Summary

Duration is 1 min

In this lab, you used Google Cloud Platform (GCP) as rented infrastructure. You can spin up a Compute Engine VM, install custom software on it, and run your processing jobs. However, using GCP in this way doesn’t take advantage of the other benefits that Google Cloud Platform provides – namely the ability to forget about infrastructure and work with your scientific computation problems simply as software that requires to be run.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值