Lesson 001 [ Google GCP ]
I have taught this lesson several times now.
Most of this lesson is done with a browser.
I offer tips for success.
- Use Google-Chrome browser.
- Dedicate a new browser window to this lesson.
- If you need a browser to check news or e-mail, do it in another window.
- Get a new window with: File -> New Window
- Why? Because you want to avoid working in a window with many tabs.
- On a piece of paper, write down a useful tab-organization:
- Tab 1: cs101.us
- Tab 2: codelabs
- Tab 3: github
- Tab 4: GCP 1
- Tab 5: GCP Cloud-Shell
- Tab 6: DataLab
- Tab 7: GCP 3
I started this lesson by creating a new Chrome-window.
I created 7 tabs in that window.
In tab 1 I loaded this URL: http://cs101.us
In tab 2 I loaded this URL: https://codelabs.developers.google.com/cpb100
In tab 3 I loaded this URL: https://github.com/GoogleCloudPlatform
In tab 4 I loaded this URL: https://console.cloud.google.com
In tabs 5,6,7 I had done nothing (yet).
After I created the 7 tabs, my Chrome-window looked like this:
I clicked on tab 2, the codelabs-tab.
I continued this lesson by studying the URL listed below:
https://codelabs.developers.google.com/cpb100
On that page I assume the correct ordering of some links is listed below:
The above links are discussed below:
Free-Trial
Next, in the codelabs-tab, I clicked this link:
https://codelabs.developers.google.com/codelabs/cpb100-free-trial
I saw a simple path through 4 pages.
The third page had a noteworthy URL I used to get $300 of cloud-credit from Google:
I loaded the URL into the GCP-tab:
https://console.developers.google.com/freetrial
I logged in using my gmail account.
It worked well; I gave it my credit-card info.
It gave me $300 of cloud-credit from Google.
I hope that you can also get that credit.
Next, I visited the URL listed below:
https://console.cloud.google.com/projectcreate
I used that page to create a project called cpb100.
Google needed 4 minutes to create project cpb100.
Although I wanted the name to be cp100, Google added some text to the end of the name:
cpb100-195004
Google created a URL for the project:
https://console.cloud.google.com/home/activity?project=cpb100-195004
In the upper right I saw a blue-button called "UPGRADE".
I clicked the button.
Next, I accepted the upgrade offer because I wanted access to services required by future training sessions.
Then I verified that the "hamburger-link" in the upper-left behaved well when I clicked it.
It did behave well and I was ready to move on.
In the codelabs-tab, I returned to the URL listed below:
https://codelabs.developers.google.com/cpb100
[top]
Compute-Engine
Next, in the codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-compute-engine
I quickly clicked through the first two links in that page and landed at this URL:
https://codelabs.developers.google.com/codelabs/cpb100-compute-engine/#2
It asked me to click the upper-left-hamburger-link in my project page.
I went to the GCP-tab and did that.
Then I followed the advice to click the "Compute Engine" link on the left:
https://console.cloud.google.com/compute
I saw this:
I clicked the "Create" button.
It served me a form from this URL:
https://console.cloud.google.com/compute/instancesAdd
I saw this:
I clicked radio-button: "Allow full access to all Cloud APIs"
I deviated from the tutorial and clicked the link: "SSH keys"
Note: If you are on windows, you can get ssh from the URL listed below:
https://git-scm.com/download/win
My intent was to ease my ability to ssh into the GCP-Instance from a shell in my laptop.
The text I mouse-copied into the GCP SSH-key-form came from this file:
~/.ssh/id_rsa.pub
It looks like this:
ssh-rsa
AAAAB3NzaC1yc2EAAAADAQABAAABAQC8cbLxnaqPPznHz9DgMq
xg38LPxRTRT1qjyQH0cGEHHtHVYHgPdCNvW++0ArCuJVxiQ7fx
XvK2cYThurYozSkI6uwxVSPBoJgsLmLPvhc+JshDHi7SgtWl4b
8JZlnL5dMPQNo61p/qGmqZpKxXYJanY0zN4WnB17vlnVFhXL2j
3U3YKvifIC8a6gRKitG+XFGmj5sZKbJuqbnfhD93ytcRGV+rEM
VipYAl2XBs27K0VGwK+u3NOOerWXjRrgqIo9Frk7C4rps/dMYd
56QKxnVumr24TUJ0TlymsCYkhD9qDHyJHxGTyN5BAzUpryphd7
QDLZn+Rdrm4Ssu8/jclLPH
dan@ubu42.net
Notice the last line.
The last line asks GCP to create a Linux account named "dan" in the new Linux instance.
I saw this:
Next, I clicked the blue "Create" button.
GCP started the instance in about 1 minute.
I saw the corresponding IP address: 35.229.101.148
I saw this:
I tried to login with this shell command:
It failed.
I used a shell command which asked to login as "dan".
The above command worked perfectly.
If you do not know how to start a shell on your laptop, you should click the SSH-link in your GCP-tab:
Next, I returned to the codelabs-tab; I studied the instructions in this page:
https://codelabs.developers.google.com/codelabs/cpb100-compute-engine/#4
I returned to my shell-window.
I followed instructions I had just read:
sudo apt-get update
sudo apt-get -y -qq install git
git --version
The above shell commands worked perfectly in my new GCP instance.
In the codelabs-tab, I returned to the URL listed below:
https://codelabs.developers.google.com/cpb100
That ended my interaction with the CPB100-node I call: "Compute-Engine".
[top]
Cloud-Storage
Next, in the codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage
I studied the page there and clicked through until I landed at this URL:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/#2
I verified that I could ssh into the GCP instance I had created earlier.
I saw this:
After the instance served me a shell prompt, I issued a shell command:
git clone https://github.com/GoogleCloudPlatform/training-data-analyst
That command worked well.
I ran more shell commands:
cd training-data-analyst/CPB100/lab2b
less ingest.sh
bash ingest.sh
head earthquakes.csv
In the codelabs-tab, I clicked the next node in the page and landed on this URL:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/#3
I read the content on that page.
Next, I clicked the github-tab.
I visited the URL below:
https://github.com/GoogleCloudPlatform/datalab-samples/blob/master/basemap/earthquakes.ipynb
I studied the above page.
I returned to the shell.
I ran two shell commands inside the GCP instance:
bash install_missing.sh
python transform.py
The last command resulted in some warnings from Python but I am confident it worked.
I noticed a new file which python created:
dan@instance-1:~/training-data-analyst/CPB100/lab2b$ ls -latr
total 632
-rw-r--r-- 1 dan dan 751 Feb 13 08:22 earthquakes.htm
-rw-r--r-- 1 dan dan 637 Feb 13 08:22 commands.sh
-rwxr-xr-x 1 dan dan 3074 Feb 13 08:22 transform.py
drwxr-xr-x 2 dan dan 4096 Feb 13 08:22 scheduled
-rwxr-xr-x 1 dan dan 680 Feb 13 08:22 install_missing.sh
-rwxr-xr-x 1 dan dan 759 Feb 13 08:22 ingest.sh
drwxr-xr-x 8 dan dan 4096 Feb 13 08:22 ..
-rw-r--r-- 1 dan dan 297347 Feb 13 08:43 earthquakes.csv
drwxr-xr-x 3 dan dan 4096 Feb 13 08:56 .
-rw-r--r-- 1 dan dan 313185 Feb 13 08:56 earthquakes.png
dan@instance-1:~/training-data-analyst/CPB100/lab2b$
dan@instance-1:~/training-data-analyst/CPB100/lab2b$
dan@instance-1:~/training-data-analyst/CPB100/lab2b$
I copied the PNG file to my laptop and looked at it:
In the codelabs-tab, I clicked the next node in the page and landed on this URL:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/#4
I clicked on the GCP-tab.
I visited the URL below:
https://cloud.google.com/
I hovered the button: "GO TO CONSOLE".
I made note of its href-URL:
https://console.cloud.google.com
I clicked the button.
It sent me to a page I call: "Dashboard".
I clicked the upper-left-hamburger-link and from there clicked: "Storage".
It served me a page with blue-button: "Create bucket".
I clicked "Create bucket".
It served me a form asking me to name a new bucket.
I called my new bucket: cs101feb2018
In retrospect, I can see now that was a poor naming choice.
Now I know to create buckets with shorter names.
At the bottom of the form I clicked the blue "Create" button.
I saw evidence that it worked.
It served me a page, called "Browser", prompting me to upload files into the bucket.
The page reminds me of the page served to me by AWS when I work with S3.
I saw this:
In the codelabs-tab, I returned to the cpb100 page:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/#4
I clicked the next node in the page and landed on this URL:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/#5
I returned to the shell prompt of my GCP instance and type a command:
gsutil cp earthquakes.* gs://cs101feb2018
I am confident it worked.
I verified by returning to the "Browser" page in the GCP-tab and inspecting the contents of cs101feb2018.
It listed three files:
- earthquakes.csv
- earthquakes.htm
- earthquakes.png
I saw this:
In the codelabs-tab, I returned to the cpb100 page:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/#5
I clicked the next node in the page and landed on this URL:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/#6
I returned to the "Browser" page in the GCP-tab.
I checked the boxes to "Share publicly" the three files in cs101feb2018.
I list the resulting URLs below:
Using a laptop-shell, I verified that the public could see earthquakes.png
cd /tmp
wget https://storage.cloud.google.com/cs101feb2018/earthquakes.png
I tried to use an incognito-browser to see earthquakes.png
The google server asked me to authenticate.
Google should not force me to login to gmail in order to see publicly-shared content.
Based on this "feature", AWS-S3 is clearly superior to GCP-Storage.
In the codelabs-tab, I returned to the cpb100 page:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/#6
I clicked the next node in the page and landed on this URL:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-storage/#7
I followed the instructions in that page to delete my GCP instance.
I saw this:
In the codelabs-tab, I returned to the URL listed below:
https://codelabs.developers.google.com/cpb100
That ended my interaction with the CPB100-node I call: "Cloud-Storage".
[top]
Cloud-Sql
This sub-lesson has many steps and offers many opportunities to get lost.
Good Luck!
In the codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql
I studied the page it served me:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/#0
I studied the next page in sequence:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/#1
I studied the next page in sequence:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/#2
In the GCP-tab, I visited this URL:
https://console.cloud.google.com
I found the tiny "Cloud-Shell" button near the upper-right.
I hovered it to see: "Activate Google Cloud Shell".
I saw this:
I clicked it.
A dark window appeared at the bottom.
After 15 seconds, a white shell prompt appeared in the window.
I tried some shell commands and saw this:
Welcome to Cloud Shell! Type "help" to get started.
cs101@cpb100-195004:~$ id
uid=1000(cs101) gid=1000(cs101) groups=1000(cs101),4(adm),27(sudo),999(docker)
cs101@cpb100-195004:~$ df -h
Filesystem Size Used Avail Use% Mounted on
none 25G 19G 5.0G 79% /
tmpfs 853M 0 853M 0% /dev
tmpfs 853M 0 853M 0% /sys/fs/cgroup
/dev/sda1 25G 19G 5.0G 79% /etc
/dev/sdb1 4.8G 11M 4.6G 1% /home
shm 64M 0 64M 0% /dev/shm
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$ ls -la
total 24
drwxr-xr-x 2 cs101 cs101 4096 Feb 13 12:06 .
drwxr-xr-x 4 root root 4096 Feb 13 12:04 ..
-rw------- 1 cs101 cs101 9 Feb 13 12:06 .bash_history
-rw-r--r-- 1 cs101 cs101 220 May 15 2017 .bash_logout
-rw-r--r-- 1 cs101 cs101 3564 Feb 9 16:03 .bashrc
-rw-r--r-- 1 cs101 cs101 675 May 15 2017 .profile
lrwxrwxrwx 1 cs101 cs101 38 Feb 13 12:04 README-cloudshell.txt -> /google/devshell/README-cloudshell.txt
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$ ifconfig
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.18.0.1 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:42:77:65:a7:42 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1460
inet 172.17.0.2 netmask 255.255.0.0 broadcast 0.0.0.0
ether 02:42:ac:11:00:02 txqueuelen 0 (Ethernet)
RX packets 366 bytes 105824 (103.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 326 bytes 54222 (52.9 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 0 (Local Loopback)
RX packets 49 bytes 3813 (3.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 49 bytes 3813 (3.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
I popped that shell window so it had its own tab.
I moved that tab so it was tab #5, the tab after GCP-tab.
After that, GCP issued an error.
I sense that it some kind of GCP-bug:
It is unfortunate that GCP lacks reliability.
The error suggested that I send feedback.
So I did that and halted my activity in this lab for a day.
After 24 hours, I continued.
In GCP-tab I started a new Cloud-Shell.
I move the shell to its own tab (tab #5).
Cloud-Shell stopped issuing errors.
In the codelabs-tab, I returned my attention to this page:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/#2
In the Cloud-Shell-tab, I typed a shell command which looked familiar:
git clone https://github.com/GoogleCloudPlatform/training-data-analyst
That worked well; I entered another shell command:
cd training-data-analyst/CPB100/lab3a
I entered another shell command:
cat cloudsql/table_creation.sql
I saw some SQL syntax:
CREATE DATABASE IF NOT EXISTS recommendation_spark;
USE recommendation_spark;
DROP TABLE IF EXISTS Recommendation;
DROP TABLE IF EXISTS Rating;
DROP TABLE IF EXISTS Accommodation;
CREATE TABLE IF NOT EXISTS Accommodation
(
id varchar(255),
title varchar(255),
location varchar(255),
price int,
rooms int,
rating float,
type varchar(255),
PRIMARY KEY (ID)
);
CREATE TABLE IF NOT EXISTS Rating
(
userId varchar(255),
accoId varchar(255),
rating int,
PRIMARY KEY(accoId, userId),
FOREIGN KEY (accoId)
REFERENCES Accommodation(id)
);
CREATE TABLE IF NOT EXISTS Recommendation
(
userId varchar(255),
accoId varchar(255),
prediction float,
PRIMARY KEY(userId, accoId),
FOREIGN KEY (accoId)
REFERENCES Accommodation(id)
);
I can see in the above syntax that the SQL script declares three tables and their columns:
- Accommodation
- id varchar(255),
- title varchar(255),
- location varchar(255),
- price int,
- rooms int,
- rating float,
- type varchar(255),
- Rating
- userId varchar(255),
- accoId varchar(255),
- rating int,
- Recommendation
- userId varchar(255),
- accoId varchar(255),
- prediction float,
I see two relationships in the above syntax:
- An Accommodation has 0 or more Ratings
- An Accommodation has 0 or more Recommendations
The tables might support these scenarios:
- When a user rates a house, the app adds a row to Rating.
- When admin adds a house for rent, the app adds a row to Accommodation.
- When ML-app predicts a user rating for a house, ML-app adds a row to Recommendation.
In the Cloud-Shell-tab I ran another command and collected the output:
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$ head cloudsql/*.csv
==> cloudsql/accommodation.csv <==
1,Comfy Quiet Chalet,Vancouver,50,3,3.1,cottage
2,Cozy Calm Hut,London,65,2,4.1,cottage
3,Agreable Calm Place,London,65,4,4.8,house
4,Colossal Quiet Chateau,Paris,3400,16,2.7,castle
5,Homy Quiet Shack,Paris,50,1,1.1,cottage
6,Pleasant Quiet Place,Dublin,35,5,4.3,house
7,Vast Peaceful Fortress,Seattle,3200,24,1.9,castle
8,Giant Quiet Fortress,San Francisco,3400,12,4.1,castle
9,Giant Peaceful Palace,London,1500,20,3.5,castle
10,Sizable Calm Country House,Auckland,650,9,4.9,mansion
==> cloudsql/rating.csv <==
10,1,1
18,1,2
13,1,1
7,2,2
4,2,2
13,2,3
19,2,2
12,2,1
11,2,1
1,2,2
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
I see in the above output that we have accommodations and ratings but no recommendations yet.
I returned to the codelabs-tab.
I navigated to the next page in the sequence:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/#3
I returned to the Cloud-Shell-tab.
I ran another command:
gsutil cp cloudsql/* gs://cs101feb2018/sql/
It was very slow; it needed 60 seconds to finish:
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$ gsutil cp cloudsql/* gs://cs101feb2018/sql/
Copying file://cloudsql/accommodation.csv [Content-Type=text/csv]...
Copying file://cloudsql/rating.csv [Content-Type=text/csv]...
Copying file://cloudsql/table_creation.sql [Content-Type=application/x-sql]...
- [3 files][ 14.2 KiB/ 14.2 KiB]
Operation completed over 3 objects/14.2 KiB.
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
I visited the GCP-tab.
I verified success of the above command by visiting an appropriate URL:
https://console.cloud.google.com/storage/browser/cs101feb2018/sql/
I returned to the codelabs-tab.
I navigated to the next page in the sequence:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/#4
I studied that page.
The page listed 10 steps.
To start step 1, clicked the GCP-tab and I visited this URL:
https://console.cloud.google.com
I clicked the upper-left-hamburger-link and from there clicked: "SQL".
It served me a page describing: "Cloud SQL Instances"
- Cloud SQL instances are fully managed relational:
- MySQL databases
- PostgreSQL databases
I saw this:
I clicked the blue: "Create instance" button.
In the next page I picked the MySQL radio-button.
I clicked the blue: "Next" button.
I saw this:
I clicked the blue: "Choose Second Generation" button.
The next page landed me in a field: "Instance ID".
I entered string: "rentals".
I entered root password: "root".
I saw this:
I clicked the blue "Create" button.
GCP served me a page with information about my new db instance:
- MySQL 2nd Gen 5.7
- IP: 104.197.219.252
- Instance connection name: cpb100-195004:us-central1:rentals
I saw this:
Next I clicked the Cloud-Shell-tab.
I ran shell commands:
cd ~/training-data-analyst/CPB100/lab3a
bash find_my_ip.sh
I saw this:
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$ bash find_my_ip.sh
35.227.172.20
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
I ran shell commands:
cd ~/training-data-analyst/CPB100/lab3a
bash authorize_cloudshell.sh
I saw this:
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$ cat authorize_cloudshell.sh
#!/bin/bash
gcloud sql instances patch rentals \
--authorized-networks `wget -qO - http://ipecho.net/plain`/32
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$ bash authorize_cloudshell.sh
When adding a new IP address to authorized networks, make sure to also
include any IP addresses that have already been authorized.
Otherwise, they will be overwritten and de-authorized.
Do you want to continue (Y/n)? y
The following message will be used for the patch API method.
{"project": "cpb100-195004", "name": "rentals", "settings":
{"ipConfiguration": {"authorizedNetworks": [{"value": "35.227.172.20/32"}]}}}
Patching Cloud SQL instance...done.
Updated [https://www.googleapis.com/sql/v1beta4/projects/cpb100-195004/instances/rentals].
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
The intent of the above syntax is to grant networking permission from MySQL to Cloud-Shell.
If you have problems getting permissions granted, GCP allows you to grant permissions in a web page.
A clue to do that is described below:
In the GCP-tab, visit this URL:
https://console.cloud.google.com/sql
Then look for a link for the MySQL instance.
Under that link find the authorization form.
I saw this:
Enough about that; back to the lesson.
In the codelabs-tab, I navigated to the next page in the sequence:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/#5
I studied that page.
The page listed 4 steps.
To start step 1, in the GCP-tab, I navigated to the URL listed below:
https://console.cloud.google.com/sql
I clicked the link called "rentals" which is the name of the Instance I created earlier.
I clicked "Import" at the top.
GCP served a page: "Import data from Cloud Storage"
I saw this:
I clicked the button named: "Browse".
GCP served a link named: "cs101feb2018" which is the name of the bucket I created earlier.
I clicked the "cs101feb2018" link.
GCP showed three files and a folder named "sql" in that bucket.
I clicked "table_creation.sql" in that folder.
I saw this:
GCP activated a blue button named "Import".
I saw this:
I clicked the button.
GCP got busy for 10 seconds and then indicated success with a green-check-mark.
I saw this:
In the codelabs-tab, I navigated to the next page in the sequence:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/#6
I studied that page.
The page listed 3 steps.
To start step 1, in the GCP-tab, I navigated to the URL listed below:
https://console.cloud.google.com/sql
I clicked the link called "rentals" which is the name of the Instance I created earlier.
I clicked "Import" at the top.
I browsed to accommodation.csv and selected it.
GCP served me a page: "Import data from Cloud Storage"
I selected radio-button: "CSV".
I selected database: "recommendation_spark".
In the next field, I entered string: "Accommodation".
GCP responded by activating the blue "Import" button.
I saw this:
I clicked the button.
GCP quickly served me a page named "Instance details".
Quickly after that it served a green-check-mark to indicate success.
I clicked "Import" at the top.
I browsed to rating.csv and selected it.
GCP served me a page: "Import data from Cloud Storage"
I selected radio-button: "CSV".
I selected database: "recommendation_spark".
In the "Table" field, I entered string: "Rating".
GCP responded by activating the blue "Import" button.
I clicked the button.
GCP quickly served me a page named "Instance details".
Quickly after that it served a green-check-mark to indicate success.
In the codelabs-tab, I visited the next page in the sequence:
https://codelabs.developers.google.com/codelabs/cpb100-cloud-sql/#7
I studied that page.
The page contained 5 steps.
To start step 1, in the GCP-tab, I visited this page:
https://console.cloud.google.com/sql
I used that page to get the IP address of the MySQL instance: 104.197.219.252
Next, in the Cloud-Shell-tab, I entered a shell command:
mysql --host=104.197.219.252 --user=root --password
I saw this:
cs101@cpb100-195004:~$ mysql --host=104.197.219.252 --user=root --password
Enter password:
ERROR 2003 (HY000): Can't connect to MySQL server on '104.197.219.252' (110 "Connection timed out")
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
The above error was caused because Cloud-Shell had "lost" its IP address.
So, GCP gave it a new IP address.
It was up to me tell MySQL about this problem.
I tried these shell commands:
cd ~/training-data-analyst/CPB100/lab3a
bash find_my_ip.sh
bash authorize_cloudshell.sh
mysql --host=104.197.219.252 --user=root --password
I saw this:
cs101@cpb100-195004:~$ mysql --host=104.197.219.252 --user=root --password
Enter password:
ERROR 2003 (HY000): Can't connect to MySQL server on '104.197.219.252' (110 "Connection timed out")
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$ cd ~/training-data-analyst/CPB100/lab3a
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$ bash find_my_ip.sh
35.203.155.130
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$ bash authorize_cloudshell.sh
When adding a new IP address to authorized networks, make sure to also
include any IP addresses that have already been authorized.
Otherwise, they will be overwritten and de-authorized.
Do you want to continue (Y/n)? y
The following message will be used for the patch API method.
{"project": "cpb100-195004", "name": "rentals", "settings":
{"ipConfiguration": {"authorizedNetworks": [{"value": "35.203.155.130/32"}]}}}
Patching Cloud SQL instance...done.
Updated [https://www.googleapis.com/sql/v1beta4/projects/cpb100-195004/instances/rentals].
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3a$ mysql --host=104.197.219.252 --user=root --password
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 899
Server version: 5.7.14-google-log (Google)
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]>
MySQL [(none)]>
MySQL [(none)]>
At the MySQL prompt I entered a command:
use recommendation_spark;
I wanted to see tables so I typed this:
I verifed that ratings.csv was loaded into ratings table:
MySQL [recommendation_spark]> show tables;
+--------------------------------+
| Tables_in_recommendation_spark |
+--------------------------------+
| Accommodation |
| Rating |
| Recommendation |
+--------------------------------+
3 rows in set (0.04 sec)
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]> select count(*) from Rating;
+----------+
| count(*) |
+----------+
| 1186 |
+----------+
1 row in set (0.03 sec)
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]> select * from Rating limit 4;
+--------+--------+--------+
| userId | accoId | rating |
+--------+--------+--------+
| 10 | 1 | 1 |
| 13 | 1 | 1 |
| 18 | 1 | 2 |
| 12 | 10 | 3 |
+--------+--------+--------+
4 rows in set (0.04 sec)
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
I entered a SQL command:
MySQL [recommendation_spark]>
MySQL [recommendation_spark]> select * from Accommodation where type = 'castle' and price < 1500;
+----+--------------------------+--------------+-------+-------+--------+--------+
| id | title | location | price | rooms | rating | type |
+----+--------------------------+--------------+-------+-------+--------+--------+
| 14 | Colossal Peaceful Palace | Melbourne | 1200 | 21 | 1.5 | castle |
| 15 | Vast Private Fort | London | 1300 | 18 | 2.6 | castle |
| 26 | Enormous Peaceful Palace | Paris | 1300 | 18 | 1.1 | castle |
| 31 | Colossal Private Castle | Buenos Aires | 1400 | 15 | 3.3 | castle |
| 45 | Vast Quiet Chateau | Tokyo | 1100 | 19 | 2.3 | castle |
+----+--------------------------+--------------+-------+-------+--------+--------+
5 rows in set (0.04 sec)
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
That ended my interaction with the CPB100-node I call: "Setup rentals data in Cloud SQL".
[top]
Dataproc
Next, in the codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-dataproc
I studied the page it served me:
https://codelabs.developers.google.com/codelabs/cpb100-dataproc/#0
In the Cloud-Shell tab I verifed that repo: "training-data-analyst" was still available.
I verified that my MySQL instance and tables were still available.
In the codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-dataproc/#1
I studied the page it served me.
I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-dataproc/#2
I studied the page it served me.
In the GCP-tab I noted the region of my Cloud SQL instance: "us-central1".
I clicked the upper-left-hamburger-link and from there clicked: "Dataproc".
GCP served me a page with blue-button: "Enable API".
I clicked it.
GCP served me a page with blue-button: "Create cluster".
I saw this:
I clicked it.
GCP served me a form to configure the cluster.
I ensured the cluster was in same region as my Cloud SQL instance: "us-central1".
I changed machine type of both master and worker nodes to: "2vCPUs 7.5 GB memory"
I specified the cluster have two nodes (the minimum).
I saw this:
I clicked the blue-button: "Create".
GCP served a page indicating it was busy building the cluster.
I saw this:
Eventually, it finished.
I saw this:
In Cloud-Shell-tab I issued shell commands:
cd ~/training-data-analyst/CPB100/lab3b
bash authorize_dataproc.sh cluster-d037 us-central1 2
It served me an error:
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3b$ bash authorize_dataproc.sh cluster-d037 us-central1 2
Machines to authorize: cluster-d037-m cluster-d037-w-0 cluster-d037-w-1 in us-central1
... finding their IP addresses
ERROR: (gcloud.compute.instances.describe) Could not fetch resource:
- Invalid value for field 'zone': 'us-central1'. Unknown zone.
IP address of cluster-d037-m is /32
ERROR: (gcloud.compute.instances.describe) Could not fetch resource:
- Invalid value for field 'zone': 'us-central1'. Unknown zone.
IP address of cluster-d037-w-0 is /32
ERROR: (gcloud.compute.instances.describe) Could not fetch resource:
- Invalid value for field 'zone': 'us-central1'. Unknown zone.
IP address of cluster-d037-w-1 is /32
Authorizing [/32,/32,/32] to access cloudsql=rentals
ERROR: (gcloud.sql.instances.patch) argument --authorized-networks: Bad value [/32]:
Must be specified in CIDR notation, also known as 'slash' notation (e.g. 192.168.100.0/24).
Usage: gcloud sql instances patch INSTANCE [optional flags]
optional flags may be --activation-policy | --assign-ip | --async |
--authorized-gae-apps | --authorized-networks |
--no-backup | --backup-start-time |
--clear-authorized-networks | --clear-database-flags |
--clear-gae-apps | --cpu | --database-flags | --diff |
--enable-bin-log | --enable-database-replication |
--follow-gae-app | --gce-zone | --help |
--maintenance-release-channel |
--maintenance-window-any | --maintenance-window-day |
--maintenance-window-hour | --memory | --pricing-plan |
--replication | --require-ssl |
--storage-auto-increase | --storage-size | --tier
For detailed information on this command and its flags, run:
gcloud sql instances patch --help
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3b$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3b$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3b$
In the GCP-tab, I studied the page at this URL:
https://console.cloud.google.com/dataproc/clusters
In Cloud-Shell-tab I tried this shell command:
bash authorize_dataproc.sh cluster-d037 us-central1-f 2
I saw this:
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3b$ bash authorize_dataproc.sh cluster-d037 us-central1-f 2
Machines to authorize: cluster-d037-m cluster-d037-w-0 cluster-d037-w-1 in us-central1-f
... finding their IP addresses
IP address of cluster-d037-m is 35.224.123.22/32
IP address of cluster-d037-w-0 is 35.192.238.24/32
IP address of cluster-d037-w-1 is 35.225.251.161/32
Authorizing [35.224.123.22/32,35.192.238.24/32,35.225.251.161/32] to access cloudsql=rentals
When adding a new IP address to authorized networks, make sure to also
include any IP addresses that have already been authorized.
Otherwise, they will be overwritten and de-authorized.
Do you want to continue (Y/n)? y
The following message will be used for the patch API method.
{"project": "cpb100-195004", "name": "rentals", "settings":
{"ipConfiguration":
{"authorizedNetworks":
[{"value": "35.224.123.22/32"},
{"value": "35.192.238.24/32"}, {"value": "35.225.251.161/32"}]}}}
Patching Cloud SQL instance...done.
Updated [https://www.googleapis.com/sql/v1beta4/projects/cpb100-195004/instances/rentals].
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3b$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3b$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3b$
In the codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-dataproc/#3
I studied that page.
I ran a shell command to open an editor in Cloud-Shell-tab:
nano sparkml/train_and_apply.py
I typed in the IP-address of the Cloud-SQL-Instance: 104.197.219.252
I saw this:
I entered a shell command:
gsutil cp sparkml/tr*.py gs://cs101feb2018/
In GCP-tab, I visited this URL:
https://console.cloud.google.com/dataproc/clusters/
I clicked the "Jobs" icon on the left.
It served me a page with blue-button: "Submit job".
I clicked "Submit job".
GCP served me a job-submission form for Hadoop.
I changed the job-type to: "PySpark"
I indicated the location of the Python script to be this:
gs://cs101feb2018/train_and_apply.py
I saw this:
Near the bottom, I clicked on the blue-button: "Submit"
GCP accepted the job and then gave it a status of: "Running".
I saw this:
After two minutes, GCP changed the status to "Succeeded".
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-dataproc/#4
I studied that page.
In GCP-tab, I got the IP address of MySQL from this URL:
https://console.cloud.google.com/sql/instances
In Cloud-Shell-tab, I verified that I could connect to MySQL.
mysql --host=104.197.219.252 --user=root --password
It failed because GCP lost the authorization for Cloud-Shell.
I got authorization again with these shell commands:
cd ~/training-data-analyst/CPB100/lab3a
bash authorize_cloudshell.sh
In Cloud-Shell-tab, I verified that I could connect to MySQL.
mysql --host=104.197.219.252 --user=root --password
I issued a command:
use recommendation_spark;
I issued a SQL commands:
select count(*) from Recommendation;
select r.userid,
r.accoid,
r.prediction,
a.title,
a.location,
a.price,
a.rooms,
a.rating,
a.type
from Recommendation as r,
Accommodation as a
where r.accoid = a.id and
r.userid = 10;
I saw this:
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab3b$ mysql --host=35.192.114.78 --user=root --password
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 1121
Server version: 5.7.14-google-log (Google)
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]>
MySQL [(none)]>
MySQL [(none)]>
MySQL [(none)]> use recommendation_spark;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]> select count(*) from Recommendation;
+----------+
| count(*) |
+----------+
| 125 |
+----------+
1 row in set (0.03 sec)
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]> select r.userid,
-> r.accoid,
-> r.prediction,
-> a.title,
-> a.location,
-> a.price,
-> a.rooms,
-> a.rating,
-> a.type
-> from Recommendation as r,
-> Accommodation as a
-> where r.accoid = a.id and
-> r.userid = 10;
+--------+--------+------------+-------------------------+---------------+-------+-------+--------+--------+
| userid | accoid | prediction | title | location | price | rooms | rating | type |
+--------+--------+------------+-------------------------+---------------+-------+-------+--------+--------+
| 10 | 40 | 1.5812576 | Colossal Private Castle | Seattle | 2900 | 24 | 1.5 | castle |
| 10 | 35 | 1.5657468 | Colossal Quiet Chateau | NYC | 2300 | 14 | 4.6 | castle |
| 10 | 45 | 1.5110599 | Vast Quiet Chateau | Tokyo | 1100 | 19 | 2.3 | castle |
| 10 | 74 | 1.4872638 | Giant Calm Fort | Melbourne | 2400 | 12 | 2.3 | castle |
| 10 | 46 | 1.4497858 | Colossal Private Castle | San Francisco | 1900 | 15 | 3.7 | castle |
+--------+--------+------------+-------------------------+---------------+-------+-------+--------+--------+
5 rows in set (0.04 sec)
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
MySQL [recommendation_spark]>
In the above output I see predictions.
So, I see that as solid evidence that I was able to finish the Dataproc lab.
In the codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-dataproc/#5
I studied that page.
In the GCP-tab, I visited this URL:
https://console.cloud.google.com/sql/instances
I selected my instance and asked GCP to delete it to prevent more charges against my account.
I visited this URL:
https://console.cloud.google.com/dataproc
I selected my instance and asked GCP to delete it to prevent more charges against my account.
That ended my interaction with the CPB100-node I call: "Dataproc".
[top]
Datalab
Next, in codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-datalab
I studied the page it served me:
https://codelabs.developers.google.com/codelabs/cpb100-datalab/#0
I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-datalab/#1
In the GCP-tab, I visited the URL below:
https://console.cloud.google.com
In the Cloud-Shell-tab I entered a command:
gcloud compute zones list
I decided I want to use: "us-central1-f"
I entered a shell command:
datalab create mydatalabvm --zone us-central1-f
GCP issued an error:
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$ datalab create mydatalabvm --zone us-central1-f
Creating the network datalab-network
Creating the firewall rule datalab-network-allow-ssh
Creating the disk mydatalabvm-pd
Creating the repository datalab-notebooks
ERROR: (gcloud.source.repos.create) ResponseError: status=[PERMISSION_DENIED], code=[403],
message=[User [cs101@gmail.com] does not have permission to access project [cpb100-195004]
(or it may not exist): The caller does not have permission].
details:
- Cloud Source Repositories API is not enabled. Please enable the API on the Google
Cloud console.
enable at: https://console.cloud.google.com/apis/library/sourcerepo.googleapis.com/?project=cpb100-195004
Failed to find or create the repository datalab-notebooks.
Ask a project owner to create it for you.
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$
In the GCP-tab, I visited the URL below:
https://console.cloud.google.com/apis/library/sourcerepo.googleapis.com
The above page served a blue-button: "ENABLE".
I clicked the button.
In Cloud-Shell-tab I entered a shell command:
datalab create mydatalabvm --zone us-central1-f
GCP offered a better response:
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$ datalab create mydatalabvm --zone us-central1-f
Creating the repository datalab-notebooks
Creating the instance mydatalabvm
Created [https://www.googleapis.com/compute/v1/projects/cpb100-195004/zones/us-central1-f/instances/mydatalabvm].
Connecting to mydatalabvm.
This will create an SSH tunnel and may prompt you to create an rsa key pair.
To manage these keys, see https://cloud.google.com/compute/docs/instances/adding-removing-ssh-keys
Waiting for Datalab to be reachable at http://localhost:8081/
This tool needs to create the directory [/home/cs101/.ssh] before
being able to generate SSH keys.
Do you want to continue (Y/n)? y
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/cs101/.ssh/google_compute_engine.
Your public key has been saved in /home/cs101/.ssh/google_compute_engine.pub.
The key fingerprint is:
SHA256:/Nyvn1cGO66I19S66NaH92mwa1q1bUpTzPUH0W0DE6E cs101@cs-6000-devshell-vm-041ee3e0-98cc-4471-b54e-de69c3f602a9
The key's randomart image is:
+---[RSA 2048]----+
| ==..|
| . .o+|
| E ..o|
| . .+o|
| S . +*|
| o ...=.*|
| o+.==++|
| .oo*=+*o|
| .++.*OB+.|
+----[SHA256]-----+
Updating project ssh metadata.../Updated [https://www.googleapis.com/compute/v1/projects/cpb100-195004].
Updating project ssh metadata...done.
Waiting for SSH key to propagate.
The connection to Datalab is now open and will remain until this command is killed.
Click on the *Web Preview* (square button at top-right), select *Change port > Port 8081*, and start using Datalab.
I clicked the web preview square-button at top-right.
The button looks like this:
I opted to change the port:
I changed the port to 8081 and clicked through.
GCP loaded a page from this URL:
https://8081-dot-3516970-dot-devshell.appspot.com/tree/datalab/notebooks#
I made sure that page was in tab 6, the tab after the Cloud-Shell-tab.
I call tab 6 the DataLab-tab.
I captured an image of the page:
In the GCP-tab, I visited this URL:
https://console.cloud.google.com/compute/instances
The page told me that GCP had just started an VM instance named "mydatalabvm".
I saw this:
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-datalab/#2
I studied the page.
I ran an experiment to verify the information there:
- In GCP-tab, I stopped VM instance named "mydatalabvm"
- In DataLab-tab, I verified that I could not access:
- https://8081-dot-3516970-dot-devshell.appspot.com/tree/datalab#
- In Cloud-Shell tab I ran a command:
datalab connect mydatalabvm
- GCP responded to that by restarting the VM instance named "mydatalabvm".
- I wait a minute for the VM to boot.
- In DataLab-tab, I verified that I could access:
- https://8081-dot-3516970-dot-devshell.appspot.com/tree/datalab#
- It worked!
That ended my interaction with the CPB100-node I call: "Datalab".
[top]
Bigquery-Dataset
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-bigquery-dataset
I studied the page it served me:
https://codelabs.developers.google.com/codelabs/cpb100-bigquery-dataset/#0
I clicked the next node in the sequence:
https://codelabs.developers.google.com/codelabs/cpb100-bigquery-dataset/#1
I studied that page.
I clicked the next node in the sequence:
https://codelabs.developers.google.com/codelabs/cpb100-bigquery-dataset/#2
I studied that page.
In DataLab-tab, I verified this URL was active:
https://8081-dot-3516970-dot-devshell.appspot.com/tree/datalab#
In codelabs-tab, I clicked the next node in the sequence:
https://codelabs.developers.google.com/codelabs/cpb100-bigquery-dataset/#3
I studied that page.
In DataLab-tab, I visited this URL:
https://8081-dot-3516970-dot-devshell.appspot.com/tree/datalab#
I clicked the +notebook button.
The server responded with a page which reminds me of a Jupyter Notebook UI:
In Github-tab, I visited this URL:
demandforecast.ipynb
I copied some Python syntax from demandforecast.ipynb into the field in the new notebook UI:
So, I was bouncing between the Github-tab and DataLab-tab.
In the DataLab-tab, I asked the UI to run it.
It ran fine.
I continued the process of copy-paste-run on each section of code I found in demandforecast.ipynb
Each section ran just as expected.
Next, in GCP-tab, I visited this URL:
https://console.cloud.google.com/compute/instances
I stopped mydatalabvm to avoid adding costs to my account.
That ended my interaction with the CPB100-node I call: "Bigquery-Dataset".
[top]
TensorFlow
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-tensorflow
I studied the page it served me:
https://codelabs.developers.google.com/codelabs/cpb100-tensorflow/#0
I studied the next page:
https://codelabs.developers.google.com/codelabs/cpb100-tensorflow/#1
I studied the next page:
https://codelabs.developers.google.com/codelabs/cpb100-tensorflow/#2
In GCP-tab, I visited the console-URL:
https://console.cloud.google.com
In Cloud-Shell-tab, I issued a shell command:
datalab connect mydatalabvm
GCP offered a good response:
cs101@cpb100-195004:~$
cs101@cpb100-195004:~$ datalab connect mydatalabvm
Starting instance(s) mydatalabvm...done.
Updated
[https://www.googleapis.com/compute/v1/projects/cpb100-195004/zones/us-central1-f/instances/mydatalabvm].
Connecting to mydatalabvm.
This will create an SSH tunnel and may prompt you to create an rsa key pair.
To manage these keys, see https://cloud.google.com/compute/docs/instances/adding-removing-ssh-keys
Waiting for Datalab to be reachable at http://localhost:8081/
The connection to Datalab is now open and will remain until this command is killed.
Click on the *Web Preview* (square button at top-right),
select *Change port > Port 8081*, and start using Datalab.
I clicked the Web-Preview button in upper right and changed the port to 8081.
GCP loaded this URL into my browser:
https://8081-dot-3516970-dot-devshell.appspot.com/tree/datalab/notebooks
I move that page so it occupied DataLab-tab [ tab 6 ].
I clicked the +notebook button in upper-left.
GCP served a page with name: "Untitled Notebook1".
The page offered an input field for notebook commands.
The author of this notebook server assumes that I know how to operate a Python Jupyter notebook.
A GCP-datalab notebook offers behavior similar to a Jupyter notebook.
In Github-tab, I visited the URL listed below which offers syntax for a Jupyter notebook:
demandforecast.ipynb
I scrolled to this h2-element [about 2/3 to page-end]:
"Machine Learning with Tensorflow":
I studied four paragraphs below the h2-element.
I studied the Python syntax below the paragraphs:
import tensorflow as tf
shuffled = data2.sample(frac=1, random_state=13)
# It would be a good idea, if we had more data, to treat the days as categorical variables
# with the small amount of data, we have though, the model tends to overfit
#predictors = shuffled.iloc[:,2:5]
#for day in xrange(1,8):
# matching = shuffled['dayofweek'] == day
# key = 'day_' + str(day)
# predictors[key] = pd.Series(matching, index=predictors.index, dtype=float)
predictors = shuffled.iloc[:,1:5]
predictors[:5]
I returned to the top of demandforecast.ipynb
I copied syntax from demandforecast.ipynb into "Untitled Notebook1", in DataLab-tab.
I did the copying bit-by-bit; I was patient.
I wanted to study the output at a slow pace.
Eventually I worked all the way through the syntax in: demandforecast.ipynb
Next, in GCP-tab, I visited this URL:
https://console.cloud.google.com/compute/instances
I stopped mydatalabvm to avoid adding costs to my account.
That ended my interaction with the CPB100-node I call: "TensorFlow".
[top]
Translate-API
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-translate-api
I studied the page it served me:
https://codelabs.developers.google.com/codelabs/cpb100-translate-api/#0
I studied the next page:
https://codelabs.developers.google.com/codelabs/cpb100-translate-api/#1
I studied the next page:
https://codelabs.developers.google.com/codelabs/cpb100-translate-api/#2
In GCP-tab, I visited this URL:
https://console.cloud.google.com/apis
I saw a blue-button: "ENABLE APIS AND SERVICES".
I clicked it.
GCP served me a large list of APIs.
I enabled: "Google Cloud Vision API".
GCP served me this page:
https://console.cloud.google.com/apis/api/vision.googleapis.com/overview
I clicked the credentials-key-icon on the left.
It served me a blue-button: "Create credentials".
I clicked it.
At that point the UI mis-matched the instructions I was following in cpb100.
So I was confused.
I visited this URL:
https://console.cloud.google.com/apis/credentials/wizard
From select-option control I picked value: "Google Cloud Vision API".
Next I picked radio-button: "Yes I am using one or both".
I clicked blue-button: "What credentials do I need?"
GCP served me a message: "You don't need to create new credentials"
So, I clicked blue-button: "Done".
I followed the same process to enable Translate API, Speech API, and Natural Language APIs.
In codelabs-tab, I studied the next codelabs-page:
https://codelabs.developers.google.com/codelabs/cpb100-translate-api/#3
In GCP-tab, I visited my console URL:
https://console.cloud.google.com
In Cloud-Shell-tab, I entered a shell command:
datalab connect mydatalabvm
GCP offered good news:
Welcome to Cloud Shell! Type "help" to get started.
cs101@cpb100-195004:~$ datalab connect mydatalabvm
Starting instance(s) mydatalabvm...done.
Updated
[https://www.googleapis.com/compute/v1/projects/cpb100-195004/zones/us-central1-f/instances/mydatalabvm].
Connecting to mydatalabvm.
This will create an SSH tunnel and may prompt you to create an rsa key pair.
To manage these keys, see https://cloud.google.com/compute/docs/instances/adding-removing-ssh-keys
Waiting for Datalab to be reachable at http://localhost:8081/
The connection to Datalab is now open and will remain until this command is killed.
Click on the *Web Preview* (square button at top-right),
select *Change port > Port 8081*, and start using Datalab.
I use the Web-Preview link in the upper-right of Cloud-Shell to load Datalab into a new tab in my browser.
The Datalab URL served to me looked like this:
https://8081-dot-3516970-dot-devshell.appspot.com/tree/datalab/notebooks#
I moved that page so it was tab 6, the DataLab-tab.
In codelabs-tab, I studied the next codelabs-page:
https://codelabs.developers.google.com/codelabs/cpb100-translate-api/#4
I visited the GCP-tab and loaded a URL:
https://console.cloud.google.com/apis
I clicked credentials-icon on left.
I clicked blue-button: "Create credentials".
I clicked API-key.
GCP gave me this key:
AIzaSyBhcruU8RW0PeOllj-yxkfLnef3YRHwCt
In Github-tab, I loaded a URL from github.com:
mlapis.ipynb
I studied comments in that page.
I returned to my Datalab-tab.
I clicked +Notebook
It served me a new notebook with a field for syntax.
I typed in this syntax:
APIKEY="AIzaSyBhcruU8RW0PeOllj-yxkfLnef3YRHwCt"
I entered Shift-Enter which activated the syntax and then opened a new syntax-field.
Next, one field at a time, I carefully copy-pasted syntax from Github-tab to my Datalab-tab.
I saw responses from GCP which matched information in the Github-tab.
So, I am confident I completed the lab with no error.
In my GCP-tab I loaded this URL:
https://console.cloud.google.com/compute/instances
I stopped the instance named: "mydatalabvm".
Next, I loaded this URL:
https://console.cloud.google.com/apis/credentials
I used the trash-can-icon to remove the API key I no longer needed.
That ended my interaction with the CPB100-node I call: "Translate-API".
[top]
Serverless-Ingest
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-serverless-ingest
I studied the page it served me:
https://codelabs.developers.google.com/codelabs/cpb100-serverless-ingest/#0
I clicked the next node in the sequence:
https://codelabs.developers.google.com/codelabs/cpb100-serverless-ingest/#1
In Cloud-Shell-tab, I issued a command:
git clone https://github.com/GoogleCloudPlatform/training-data-analyst
Cloud-Shell served me an error:
fatal: destination path 'training-data-analyst' already exists and is not an empty directory.
I see that as a good error.
It means that I had already git-cloned training-data-analyst.
I responded with a shell command:
cd ~/training-data-analyst/CPB100/lab2b/scheduled
I examined a Python script in that folder:
In my Github-tab I studied ingestapp.py by visiting the URL listed below:
https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/CPB100/lab2b/scheduled/ingestapp.py
The script is a simple Python Flask App.
This app has one interesting route:
Flask uses that route to act as a proxy for the URL listed below:
http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.csv
Flask transforms the CSV-data into an image of earthquake activity.
This happens during a Flask response to a request of the '/ingest' route.
Near the end of the response, Flask uploads the image to GCP-Cloud-Storage.
The author of the script separated ingest_last_week() into sections:
- verify that this is a cron job request
- create png
- upload to cloud storage
- change permissions: make_public()
After I finished my study of that script, I returned to the codelabs-tab.
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-serverless-ingest/#2
I studied the above page.
In Cloud-Shell-tab I ran a command:
I saw this:
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab2b/scheduled$ cat Dockerfile
FROM ubuntu:latest
MAINTAINER Rajdeep Dua "dua_rajdeep@yahoo.com"
RUN apt-get update -y
RUN apt-get install -y python-pip python-dev build-essential python-mpltoolkits.basemap python-numpy python-matplotlib
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
ENTRYPOINT ["python"]
CMD ["ingestapp.py"]
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab2b/scheduled$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab2b/scheduled$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab2b/scheduled$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab2b/scheduled$
I saw that this Dockerfile declares that I want an Ubuntu host.
After the host is available, root should run shell commands:
apt-get update -y
apt-get install -y python-pip python-dev build-essential python-mpltoolkits.basemap python-numpy python-matplotlib
Next, Docker should copy "." into /app of the host.
Then, Docker should make /app the Current Working Directory.
Next, Docker should run a shell command:
pip install -r requirements.txt
Then, Docker should run python script: ingestapp.py
I assume that after Docker runs ingestapp.py, then a Flask app will be running on the host.
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-serverless-ingest/#3
I studied the above page.
In Cloud-Shell-tab, I used the nano editor to enhance cron.yaml:
cron:
- description : ingest earthquake data
url : /ingest
schedule: every 2 hours
target: default
I wanted to use the bucket I had created in the Cloud-Storage lab.
That bucket is named: "cs101feb2018".
In Cloud-Shell-tab, I used the nano editor to enhance app.yaml:
runtime: custom
env: flex
entrypoint: gunicorn -b :$PORT ingestapp:app
service: default
#[START env]
env_variables:
CLOUD_STORAGE_BUCKET: cs101feb2018
#[END env]
handlers:
- url: /ingest
script: ingestapp.app
- url: /.*
script: ingestapp.app
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-serverless-ingest/#4
I studied the above page.
In Cloud-Shell-tab, I ran a shell command:
gcloud app deploy --quiet app.yaml cron.yaml
I saw this:
snip ...
Stored in directory: /root/.cache/pip/wheels/fc/a8/66/24d655233c757e178d45dea2de22a04c6d92766abfb741129a
Running setup.py bdist_wheel for MarkupSafe: started
Running setup.py bdist_wheel for MarkupSafe: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/88/a7/30/e39a54a87bcbe25308fa3ca64e8ddc75d9b3e5afa21ee32d57
Running setup.py bdist_wheel for googleapis-common-protos: started
Running setup.py bdist_wheel for googleapis-common-protos: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/93/e0/cb/b06866f012310b96fba79c37f199aaf973a2e97a42ca7ef696
Successfully built itsdangerous MarkupSafe googleapis-common-protos
Installing collected packages: MarkupSafe, Jinja2, Werkzeug, click, itsdangerous, Flask, gunicorn,
setuptools, protobuf, googleapis-common-protos, futures, rsa, cachetools, pyasn1-modules, google-auth,
certifi, chardet, idna, urllib3, requests, google-api-core, google-cloud-core, google-cloud-storage
Found existing installation: setuptools 20.7.0
Not uninstalling setuptools at /usr/lib/python2.7/dist-packages, outside environment /usr
Found existing installation: idna 2.0
Not uninstalling idna at /usr/lib/python2.7/dist-packages, outside environment /usr
Successfully installed Flask-0.11.1 Jinja2-2.10 MarkupSafe-1.0 Werkzeug-0.14.1 cachetools-2.0.1
certifi-2018.1.18 chardet-3.0.4 click-6.7 futures-3.2.0 google-api-core-0.1.4 google-auth-1.4.1
google-cloud-core-0.28.0 google-cloud-storage-0.21.0 googleapis-common-protos-1.5.3 gunicorn-19.6.0
idna-2.6 itsdangerous-0.24 protobuf-3.5.1 pyasn1-modules-0.2.1 requests-2.18.4 rsa-3.4.2
setuptools-38.5.1 urllib3-1.22
You are using pip version 8.1.1, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
---> 6fed58c358cf
Removing intermediate container fa0f0b0f8404
Step 8/9 : ENTRYPOINT python
---> Running in acfefa5495d2
---> dbd5b9d59bfe
Removing intermediate container acfefa5495d2
Step 9/9 : CMD ingestapp.py
---> Running in d2b66192cfa2
---> 1074a82f102e
Removing intermediate container d2b66192cfa2
Successfully built 1074a82f102e
Successfully tagged us.gcr.io/cpb100-195004/appengine/default.20180219t183619:latest
PUSH
Pushing us.gcr.io/cpb100-195004/appengine/default.20180219t183619:latest
The push refers to a repository [us.gcr.io/cpb100-195004/appengine/default.20180219t183619]
a062c5a0f6af: Preparing
64501b77450c: Preparing
b5a37c3f1cb8: Preparing
f469dd62d29b: Preparing
6f4ce6b88849: Preparing
92914665e7f6: Preparing
c98ef191df4b: Preparing
9c7183e0ea88: Preparing
ff986b10a018: Preparing
92914665e7f6: Waiting
c98ef191df4b: Waiting
9c7183e0ea88: Waiting
ff986b10a018: Waiting
6f4ce6b88849: Layer already exists
92914665e7f6: Layer already exists
c98ef191df4b: Layer already exists
9c7183e0ea88: Layer already exists
ff986b10a018: Layer already exists
64501b77450c: Pushed
f469dd62d29b: Pushed
a062c5a0f6af: Pushed
b5a37c3f1cb8: Pushed
latest:
digest: sha256:5f25cb6cad3e00e27fa7caff79e71b3e22c2123932becf081a92bd5896df3470 size: 2202
DONE
-------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------
-------------------------------------------------------
Updating service [default] (this may take several minutes)...failed.
ERROR: (gcloud.app.deploy) Error Response: [9]
Application startup error:
/usr/lib/python2.7/dist-packages/matplotlib/font_manager.py:273:
UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
Traceback (most recent call last):
File "ingestapp.py", line 22, in <module>
import google.cloud.storage as gcs
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/__init__.py", line 39, in <module>
from google.cloud.storage.batch import Batch
File "/usr/local/lib/python2.7/dist-packages/google/cloud/storage/batch.py", line 29, in <module>
from google.cloud.exceptions import make_exception
ImportError: cannot import name make_exception
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab2b/scheduled$
cs101@cpb100-195004:~/training-data-analyst/CPB100/lab2b/scheduled$
The above error made little sense to me.
I decided to abandon this effort.
Perhaps in the future I will encounter App-Engine training content which works better.
That ended my interaction with the CPB100-node I call: "Serverless-Ingest".
[top]
Distributed-Landsat
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-distributed-landsat
I studied the page it served me:
https://codelabs.developers.google.com/codelabs/cpb100-distributed-landsat/#0
I studied the next page it served me:
https://codelabs.developers.google.com/codelabs/cpb100-distributed-landsat/#1
In the GCP-tab I visited this URL:
https://console.cloud.google.com/apis
I clicked the blue-button: "ENABLE APIS AND SERVICES".
I searched for: "Google Dataflow API".
I found it; I enabled it.
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-distributed-landsat/#2
I studied that page.
In tab-7 I visited the URL below:
how-to-do-distributed-processing-of-landsat-data-in-python
In Github-tab, I visited the URL below:
dfndvi.py
In the above page I studied the run() method.
Below, I list sections of that method which I see:
- parse arguments
- create beam.Pipeline object
- Read the index file and find all scenes that cover this area
- for each month and spacecraft-coverage-pattern (given by the path and row), find clearest scene
- write out info about scene
- compute ndvi on scene
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-distributed-landsat/#3
I studied that page.
In Cloud-Shell-tab, I ran commands:
cd ~/training-data-analyst/blogs/landsat/
cat run_oncloud.sh
I ran another shell command which listed both my project and bucket I want to use:
./run_oncloud.sh cpb100-195004 cs101feb2018
The command failed:
cs101@cpb100-195004:~/training-data-analyst/blogs/landsat$ ./run_oncloud.sh cpb100-195004 cs101feb2018
CommandException: 1 files/objects could not be removed.
Traceback (most recent call last):
File "./dfndvi.py", line 16, in <module>
import apache_beam as beam
ImportError: No module named apache_beam
cs101@cpb100-195004:~/training-data-analyst/blogs/landsat$
cs101@cpb100-195004:~/training-data-analyst/blogs/landsat$
cs101@cpb100-195004:~/training-data-analyst/blogs/landsat$
The above error was caused by a missing Python module and/or package.
I ran two shell scripts I found in the folder:
sudo bash install_packages.sh
I ran another shell command which listed both my project and bucket I want to use:
./run_oncloud.sh cpb100-195004 cs101feb2018
I saw this:
cs101@cpb100-195004:~/training-data-analyst/blogs/landsat$ ./run_oncloud.sh cpb100-195004 cs101feb2018
CommandException: 1 files/objects could not be removed.
No handlers could be found for logger "oauth2client.contrib.multistore_file"
/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py:122:
DeprecationWarning: object() takes no parameters
super(GcsIO, cls).__new__(cls, storage_client))
/usr/local/lib/python2.7/dist-packages/apache_beam/coders/typecoders.py:134:
UserWarning: Using fallback coder for typehint: Any.
warnings.warn('Using fallback coder for typehint: %r.' % typehint)
/usr/local/lib/python2.7/dist-packages/apache_beam/coders/typecoders.py:134:
UserWarning: Using fallback coder for typehint: <class __main__.SceneInfo at 0x7f9de6c73a10>.
warnings.warn('Using fallback coder for typehint: %r.' % typehint)
/usr/local/lib/python2.7/dist-packages/apache_beam/coders/typecoders.py:134:
UserWarning: Using fallback coder for typehint: <type 'NoneType'>.
warnings.warn('Using fallback coder for typehint: %r.' % typehint)
running sdist
running egg_info
creating landsatmonthly.egg-info
writing requirements to landsatmonthly.egg-info/requires.txt
writing landsatmonthly.egg-info/PKG-INFO
writing top-level names to landsatmonthly.egg-info/top_level.txt
writing dependency_links to landsatmonthly.egg-info/dependency_links.txt
writing manifest file 'landsatmonthly.egg-info/SOURCES.txt'
reading manifest file 'landsatmonthly.egg-info/SOURCES.txt'
writing manifest file 'landsatmonthly.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
running check
warning: check: missing required meta-data: url
warning: check: missing meta-data: either (author and author_email) or
(maintainer and maintainer_email) must be supplied
creating landsatmonthly-0.0.1
creating landsatmonthly-0.0.1/landsatmonthly.egg-info
copying files to landsatmonthly-0.0.1...
copying dfndvi.py -> landsatmonthly-0.0.1
copying ndvi.py -> landsatmonthly-0.0.1
copying setup.py -> landsatmonthly-0.0.1
copying landsatmonthly.egg-info/PKG-INFO -> landsatmonthly-0.0.1/landsatmonthly.egg-info
copying landsatmonthly.egg-info/SOURCES.txt -> landsatmonthly-0.0.1/landsatmonthly.egg-info
copying landsatmonthly.egg-info/dependency_links.txt -> landsatmonthly-0.0.1/landsatmonthly.egg-info
copying landsatmonthly.egg-info/requires.txt -> landsatmonthly-0.0.1/landsatmonthly.egg-info
copying landsatmonthly.egg-info/top_level.txt -> landsatmonthly-0.0.1/landsatmonthly.egg-info
Writing landsatmonthly-0.0.1/setup.cfg
Creating tar archive
removing 'landsatmonthly-0.0.1' (and everything under it)
DEPRECATION: pip install --download has been deprecated and will be removed in the future.
Pip now has a download command that should be used instead.
Collecting google-cloud-dataflow==2.2.0
Downloading google-cloud-dataflow-2.2.0.tar.gz
Saved /tmp/tmp6uRpmJ/google-cloud-dataflow-2.2.0.tar.gz
Successfully downloaded google-cloud-dataflow
cs101@cpb100-195004:~/training-data-analyst/blogs/landsat$
cs101@cpb100-195004:~/training-data-analyst/blogs/landsat$
cs101@cpb100-195004:~/training-data-analyst/blogs/landsat$
In codelabs-tab, I clicked the next link in the series:
https://codelabs.developers.google.com/codelabs/cpb100-distributed-landsat/#4
I studied that page.
In GCP-tab, I visited the URL below:
http://console.cloud.google.com/dataflow
I saw this:
I saw that as good evidence that my DataFlow job was running.
Eventually, it finished:
In GCP tab I visited URL:
http://console.cloud.google.com/storage
I searched about and found much output from DataFlow:
That ended my interaction with the CPB100-node I call: "Distributed-Landsat"
Also that ended my interaction with the CPB100.
[top]
|