๐Ÿ˜ถโ€๐ŸŒซ๏ธ Google Cloud Platform Intro

Rafael
Rafael
๐Ÿ˜ถโ€๐ŸŒซ๏ธ Google Cloud Platform Intro
Table of Contents
Table of Contents

โœ๏ธ Week 1 lesson 7 of DataTalksClub 2022 data engineering zoomcamp, โ˜๏ธ GCP intro, ๐Ÿค– service account, ๐Ÿƒ roles, and ๐Ÿงฐ gcloud CLI

Today, we will follow DataTalksClub's videos that introduce the Google Cloud Platform (GCP):

  1. DE Zoomcamp 1.1.1 - Introduction to Google Cloud Platform.
  2. DE Zoomcamp 1.3.1 - Introduction to Terraform Concepts & GCP Pre-Requisites.

Which are part of the DataTalksClub 2022 Data engineering Zoomcamp week 1 repo.

In our last post, we reviewed SQL basics, following DataTalksClub's video: DE Zoomcamp 1.2.6 - SQL Refreshser.

๐Ÿ’ฝ SQL refresher
โœ๏ธ Week 1 lesson 6 of DataTalksClub 2022 data engineering zoomcamp, reviewing ๐Ÿ’ฝ SQL basics with the ๐Ÿš• NYC taxi trips data

๐Ÿ’ฌ In this lesson, we will:

  1. Introduce the Google Cloud Platform.
  2. Create a GCP account.
  3. Set up our first GCP project.
  4. Review GCP Identity Access Management concepts.
  5. Create a service account for our infrastructure application.
  6. Generate the service account key to be able to authenticate from our local computer.
  7. Install the Google Cloud CLI on our local computer.
  8. Authenticate our Google Cloud CLI installation.
  9. Add roles to the service account to create storage buckets, work with files, and interact with BigQuery.
  10. Enable the project's IAM APIs so we can access them from our local computer.

This post is part of a series. Find all the other related posts here

DataTalkClub 2022 data engineering zoomcamp
โœ๏ธ Table of contents for my learning in public posts on DataTalksClub 2022 data engineering zoomcamp

โ˜๏ธ What is the Google Cloud Platform?

The Google Cloud Platform, or GCP, is a suite of cloud computing services offered by Google that runs on the same infrastructure that Google uses internally for its end-user products (Search, Gmail, Drive, YouTube). GCP includes a range of hosted services for compute, storage, and application development.

GCP services summary.

That's a lot! But not all of it. Please visit the link below for a comprehensive and interactive list of services. However, in this zoomcamp, our focus will be on the Big Data and Storage and Databases services.

Google Cloud Developer Cheat Sheet
Every product in the Google Cloud family described in 4 or less words.

๐Ÿ‘คCreate a GCP account

To create a GCP account, navigate to: https://console.cloud.google.com/

GCP cloud console.

In this case, I was already logged in to Gmail, so my Google account was automatically recognized and presented in the top right corner. If you have different accounts, you can use this button to select the appropriate one, e.g., personal, work, or school. If you don't have a Gmail account, you can get one in Google Signup.

At the time of this writing, Google is running a promotion for new users to signup for a free 90-day or $300 trial, so let's use that to get us started. Here are some of the questions asked to create your account:

  1. Country
  2. What best describes your organization's needs?
    "Class project/assignment" fits nicely.
  3. Read the Terms of service.
    Note:
    • The free trial is available only for new GCP users.
    • The service is restricted to 8 cores of Google Compute Engine.
    • No crypto mining!
    • Service level agreement and Google's indemnity don't apply, so don't keep anything too important until you have upgraded your account.
  4. Phone number
    It will be used to send you a verification code, so make sure you have access to your phone while signing up.
  5. Account type
    Individual (only one user)
  6. Payment method: This can be a credit or debit card, PayPal, or a bank account. They need this information to help them reduce fraud and abuse, with the promise that you will not be charged unless you turn automatic billing on (manual upgrade).
  7. What brought you to Google Cloud?
    Learn more / explore
  8. What are you interested in doing in Google Cloud?
    Data analytics, Data management, AI / ML, Virtual machines.
  9. What best describes your role?
    Data Scientist / Data Engineer

After completing the registration, you will be prompted to follow the tutorials for the compute engine, cloud storage, and cloud SQL services. These are tied to your project, so here I added the links to the reference guides instead:

  1. Compute Engine
  2. Cloud Storage
  3. Cloud SQL

To which we can come back later.

๐Ÿฃ Setting up your first GCP project

After signing up to GCP, you will get to the "getting started" page

Getting started page.

From which we can select a project by clicking on the project drop down button (My First Project ) in the top bar.

Select a project in GCP.

Let's create a new project for this course by clicking on the NEW PROJECT button in the top right and name it dtc-de-zc-2022.

After MB1 on CREATE, you will see a notification that the project was created

Once the project is created, we need to switch to it by going to the top bar, MB1 on My First Project and MB1 on the project name dtc-de-zc-2022.

๐Ÿ‘ฎ IAM overview

IAM stands for Identity Account Management. IAM lets you grant granular access to specific Google Cloud resources. We looked at a few Google Cloud resources before when we talked about Big Data and Storage and Databases services. With IAM, you define who (identity) has what access (role) for which resource.

In this model, permissions to access a resource aren't granted directly to the end-user. Instead, permissions are grouped into roles, and roles are given to authenticated principals. The permission management in IAM consists of three parts:

  • Principal: Answers the who. It is a set whose members type can be a Google Account (for end-users), a service account (for applications and compute workloads), a Google group, a Google Workspace account, or a Cloud Identity domain that can access a resource.
  • Role: Answers the what access. A role is a collection of permissions. Permissions determine what operations are allowed on a resource.
  • Policy: The allow policy is a collection of role bindings that bind one or more principals to individual roles. When you want to define who (principal) has what type of access (role) on a resource, you create an allow policy and attach it to the resource.
An allow policy with two bindings. The role bindings associate specific members with specific roles.

This and more information are available in:

IAM overview | IAM Documentation | Google Cloud
How Google Cloudโ€™s Identity and Access Management (IAM) system works and how you can use it to manage access in Google Cloud.

๐Ÿค– Create a service account

A service account is an account for an application or compute workload instead of an individual end-user. When you run code that's hosted on Google Cloud, the code runs as the account you specify. You can create as many service accounts as needed to represent the different logical components of your application. For more information about using a service account in your application, see Getting started with authentication.

To create a service account, MB1 on the top bar navigation menu (left corner), and navigate to IAM & Admin > Service Accounts.

This will take you to the service account page for the current project

where we can MB1 on the + Create a Service Account button in the top area. Then, fill the creation questions as shown below.

MB1 on Create and continue, and fill the next form as shown to grant our service account viewer access to most Google Cloud resources.

The last step can be used to share this service account with other users, but we won't do that here, so we can MB1 on Done to complete setting up the service acccount.

๐Ÿ—๏ธ Generate service account key

After completing setting up the service account on the previous section, we can generate its key so we can access the service account remotely, i.e., from our local machine. For this, MB1 on the three vertical dots in the service account page, and select Manage keys

This will take us to the Keys page, where we can MB1 on the Add Key button, and select Create new key

The following message will appear, where we select the JSON key type for downloading the service account key and then MB1 on the Create button.

A file save window will appear where we can select the location to download and store the private key JSON file. After you MB1 on the Save button, the following confirmation window will appear.

๐Ÿšจ
Safety first: A bad actor can wreck havoc if they get access to your service account keys. Keep them safe. Never commit them to source code repositories or public cloud storage buckets. More on best practices for managing service account keys.

๐Ÿงฐ Install Google Cloud CLI

The gcloud CLI manages authentication, local configuration, developer workflow, and general interactions with Google Cloud resources. With the Google Cloud CLI, itโ€™s easy to perform many common cloud tasks like creating a Compute Engine VM instance, managing a Google Kubernetes Engine cluster, and deploying an App Engine application, either from the command line or in scripts and other automations. -cloud.google.com/sdk

Let's follow the installation instructions on the gcloud CLI Quickstart guide.

Quickstart: Install the Google Cloud CLI | Google Cloud CLI Documentation
Learn how to install Google Cloud CLI and run a few core gcloud CLI commands.

I'm running Ubuntu so I'll follow the steps for installing gcloud CLI using a snap package. On a terminal window we type

$ snap remove google-cloud-sdk
snap "google-cloud-sdk" is not installed

To verify that we don't already have the package. Then we run the install command

$ snap install google-cloud-cli --classic
google-cloud-cli 385.0.0 from Cloud SDK (google-cloud-sdkโœ“) installed

We can add the gcloud CLI tools to our PATH with

source /snap/google-cloud-cli/34/completion.bash.inc

Finally, we initialize the gcloud CLI by running

gcloud init

Which will run a few diagnostics and ask you to log in

$ gcloud init
Welcome! This command will take you through the configuration of gcloud.

Your current configuration has been set to: [default]

You can skip diagnostics next time by using the following flag:
  gcloud init --skip-diagnostics

Network diagnostic detects and fixes local network connection issues.
Checking network connection...done.                                            
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).

You must log in to continue. Would you like to log in (Y/n)?  y

When you hit enter after typing "y" to answer the last question, a new web browser window will pop up. There, select your gmail account and follow the prompts until you get the You are now authenticated with the gcloud CLI! message/site.

Back in the terminal, you need to select which project to use. Chose the dtc-de-zc-2022 project we just created by typing the number next to it, 1 in this example, and then hit enter

You are logged in as: [your-account@gmail.com].

Pick cloud project to use: 
 [1] dtc-de-zc-2022
 [2] stable-snowfall-348912
 [3] Enter a project ID
 [4] Create a new project
Please enter numeric choice or text value (must exactly match list item):  1

This will associate out project with the default gcloud configuration. Make sure to read through the post installation messages as they give you great ideas on how to work with the CLI, like if you want to manage many projects, or set a default compute region.

๏ธ๐Ÿ” Authenticate Google Cloud CLI

Now we need to allow gcloud CLI to use service account credentials to make requests. For this, find the path to the JSON file with the service account key we created in the ๐Ÿ—๏ธ ๏ธGenerate a service account key section. I placed mine in the Download folder, so I'll list the files that start with the project name to get the full path to the file. For security reasons, it is best practice to move this file to a different place in your system, but we leave that as homework for the reader ๐Ÿ™‚.

$ ls ~/Downloads/dtc-de-zc-2022-*
/home/pinto/Downloads/dtc-de-zc-2022-24ec0403a138.json

Next, we need to add the content of this file to an environment variable

export GOOGLE_APPLICATION_CREDENTIALS=/home/pinto/Downloads/dtc-de-zc-2022-24ec0403a138.json

Finally, we can authenticate the gcloud CLI with

gcloud auth activate-service-account --key-file $GOOGLE_APPLICATION_CREDENTIALS

More on the authentication command here:

gcloud auth activate-service-account | Google Cloud CLI Documentation

๐Ÿƒ Add roles to service account

In a future lesson, we will use Terraform to create the infrastructure we need to run our project, e.g., create storage buckets, add files to them, and interact with BigQuery. Since Terraform will use our service account to perform these operations, in this step, we will add the necessary permissions to the service account.

โš ๏ธ
To minimize the security risk associated with a compromised service account key, in a typical project set up for production, it is best practice to create dedicated service accounts for each application, e.g., one for infrastructure creation, one for data pipeline, and so on. However, for the sake of simplicity, in this course we will create one โšก all-mighty โšก service account. Similarly, we don't often use Google's default roles in production, instead, we create custom roles with specific permissions associated to specific resources.

Let's go back to IAM & Admin GCP page on a web browser to add a couple three more roles to the service account we created in a previous section.

On the IAM page, MB1 on the Edit Principal button (pencil icon) in the right most column of the row that shows the service account we created before. This will bring up the Edit permissions pane, as shown below.

MB1 on +ADD ANOTHER ROLE and select Storage Admin. This role allows the principal to create storage buckets. Now, repeat this process to add a Storage Object Admin role. This role allows the principal to create/delete/edit/grant access to files in the storage buckets. Finally, add a BigQuery Admin role to allow interactions with BigQuery. After you are done adding the roles, the Edit permissions pane should look like the figure below.

MB1 on the SAVE button to apply the changes

๐Ÿค Enable the project's APIs

The local environment we set up in our system interacts with the project's cloud environment through APIs. In this step, we will enable two APIs (IAM and IAM credentials) to allow this interaction. In the cloud cloud, verify that the class project is selected (e.g., dtc-de-zc-2022) in the top bar, and then MB1 on the navigation button (top left corner) > APIs & Services > Enable APIs & services.

In the APIs & Services page, MB1 on the + ENABLE APIS AND SERVICES button near the top center menu bar.

This will take us to the API library page, where we can type a search query in the search bar.

Let's search for IAM API

and MB1 on the Identity and Access Management (IAM) API result. In the IAM API page, MB1 on the ENABLE button.

Now let's repeat the process to find and enable the IAM credentials API.

It is possible that by activating the IAM API, the IAM Service Account Credential API gets automatically enabled.


๐Ÿ“ Summary

In this post we:

  1. Introduced the Google Cloud Platform.
  2. Created a GCP account.
  3. Set up our first GCP project.
  4. Reviewed GCP Identity Access Management concepts.
  5. Created a service account for our infrastructure application.
  6. Generated the service account key to facilitate authentication from our local computer.
  7. Installed the Google Cloud CLI on our local computer.
  8. Authenticated our Google Cloud CLI installation.
  9. Added roles to the service account to create storage buckets, work with files, and interact with BigQuery.
  10. Enabled the project's IAM APIs so we can access them from our local computer.
Week 1 lesson 7 visual summary



Great! Next, complete checkout for full access to pintonista
Welcome back! You've successfully signed in
You've successfully subscribed to pintonista
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated