Start a workspace in Verily Workbench

Quickstart guide to workspaces in the web UI

Purpose: This document walks you through how to use basic features of Verily Workbench, including creating a workspace, loading data and analyzing it in a notebook.



Synopsis

The goal of this walkthrough is to show you what you need to know to achieve the following through the Verily Workbench web UI:

  • Creating a workspace on your own
  • Importing some data
  • Accessing the data from a notebook running in a cloud environment
  • Performing basic data manipulation and analysis operations
  • Saving and sharing your results

Prerequisites

Before you begin, you’ll need:

  • A Workbench account
  • To be a member of at least one pod. Each pod is linked to a cloud account used for billing, allowing you to perform operations that incur costs (i.e., creating workspaces, creating and running cloud environments, and storing data on the cloud).

Please contact support if you need a Workbench account or membership to a pod.

Finally, it’s helpful, though not required, if you’re familiar with basic JupyterLab notebook usage and R code.

Verily Workbench features

Let’s get started with some of Workbench’s key features.

1. Create a workspace

To create a workspace, follow the steps below:

  1. Navigate to the Workbench dashboard here.
  2. Sign in using your registered email address.
  3. Click the Workspaces icon in the left-hand menu bar, or the Workspaces link on the welcome page.
  4. Click the + New Workspace button at the top of your screen. This will open the Create a new workspace dialog.
  5. Complete the three dialog screens. The workspace name and pod are the only fields that require your input; everything else is either optional or prefilled.
  1. Click the Create workspace button on the last screen. It should take less than a minute for the system to create your workspace. Once it’s done, your browser will load your new workspace’s overview page.

2. Add cloud resources to the workspace

The bucket resources you add to a workspace will be automounted to the file system of any workspace cloud environments that you create.

To add storage resources to the workspace, follow the steps below:

2.1. Connect to a storage bucket containing data

First, you’re going to connect an external cloud storage bucket to your workspace, which will make it easy to access data in that bucket from your workspace. This bucket will be treated as a referenced resource.

  1. Click on the Resources tab on your workspace page. Click on the + New resource button to open the menu of options, and select Reference Cloud Storage bucket to open the Adding Cloud Storage bucket dialog.
  2. Complete the form. The Resource ID, Folder path, and Bucket name fields are required. A description is not required, but recommended.
  1. Click Add to resources. You should see the new resource appear immediately in the list of resources. You can test that you can access the bucket by clicking on the white Browse button in the information panel on the right of the Resources tab.
Creating a GCS bucket reference.

ℹ️ Data resource operations > Reference a storage bucket

2.2. Create a storage bucket for outputs

Now you have a workspace with data connected to it, but nowhere to store results of your analyses. Next, you’re going to create a bucket that will be attached to the workspace itself, where you can store your outputs, as well as any other files and inputs that you might wish to associate with the workspace. This bucket will be treated as a controlled resource.

Creating a controlled storage bucket.
  1. Click on the Resources tab on your workspace page. Click on the + New resource button, then select New Cloud Storage bucket to open the Creating Cloud Storage bucket dialog.
  2. Complete the form. The Resource ID, Folder path, and Bucket name fields are required. A description is not required, but recommended.
  1. Click Create bucket. You should see the new resource appear immediately in the list of resources. You can confirm that you can access the bucket by clicking on the white Browse button on the right. Since it’s brand new, the bucket should be empty.
Creating a controlled storage bucket. Here, the resource will be added under the "experimental data" folder.

Note: You can also create a referenced resource to an object or folder in a Cloud Storage bucket, instead of pointing to the whole bucket. We’ll see that in the example notebook referenced below.

ℹ️ Data resource operations > Create a storage bucket

3. Create a reference to a Git repository

Follow the steps below to add a Git repository to your workspace’s cloud environment(s):

Adding a Git repo to a workspace.
  1. Click on the Environments tab on your workspace page. In the Git repositories card on the right-hand side, click the + Add repository button to open the Adding Git repository dialog.
  2. Complete the form. The Name and Repository URL fields are required. A description is not required, but recommended. For testing purposes, give it the Name workbench-examples and add this public repository as the Repository URL: https://github.com/verily-src/workbench-examples.git.
Adding a GitHub repo to a workspace.
  1. Click Add repository. You’ll now see the repo listed in the Git repositories card. This repo will be automatically cloned to any cloud environments that you create in this workspace, which you’ll do next.
Git repositories card showing the newly added repo.

4. Create a cloud environment

Next, you need to create and launch a cloud environment that will consist of a virtual machine (VM) with some preinstalled software and a local storage drive associated with it, called a persistent disk.

Creating a new cloud environment.
  1. Click on the Environments tab on your workspace page. Click on the + New cloud environment button to open the Creating cloud environment dialog.

  2. On the first dialog screen, select the JupyterLab option labeled Vertex AI Workbench instance. Click Next.

Select your cloud environment application.
  1. On the second dialog screen, you’ll be shown the default configuration. Click Next.
Select your cloud environment configuration.
  1. On the third dialog screen, you can customize the environment image type, the number of GPUs and CPUs (if applicable), the data disk size, and the autostop idle time. For this walkthrough, we’ll select R (Latest) for the Environment image, since the analysis example below will use R code.

    If you need more compute power, you can increase the number of CPUs to allocate; the memory allocated will scale accordingly. You can also attach GPUs to your environment (if compatible with the selected environment image). Keep in mind that the running cost of your environment will scale with the computing resources allocated to it.

    You can also change the default data disk size. 100 GB is recommended for most environments, but the size can range from 10 GB to 64,000 GB.

    To help keep compute costs down, the autostop feature will automatically turn off your cloud environment after a certain time. An autostop idle time of four hours is enabled by default when you create a cloud environment. You can adjust the idle time to any value between 1 hour and 14 days, or opt out of autostop entirely. A cloud environment is considered “idle” when the app is not open in a browser, or if the CPU load on the VM is below 10% (e.g., there’s no background job running).

    After finalizing your options, click Next.
Customize your cloud environment.
  1. On the fourth dialog screen, give your environment a memorable name in the Environment ID field. Feel free to customize the Cloud environment name or accept the one generated automatically by the system. A Description is not required, but recommended.
Review details of your cloud environment.

Click the Create environment button. You should see a card for the new environment appear in the Environments tab of your workspace, with a status indicator. It may take a few minutes for the system to get your cloud environment ready, depending on what resources you requested.

5. Open the Jupyter Notebook

Once your environment is ready, the status indicator will turn green and display Running. Now, everything is set up for you to start working.

To start a JupyterLab session, click the Environments tab on your workspace page. Ensure the cloud environment is in Running status and click the Environment ID of the notebook you’d like to access. This will open a JupyterLab session in a new browser window/tab, displaying the JupyterLab Launcher. It may take a minute to load.

Open the link to a running cloud environment.

5.1. Open a new notebook

For the purposes of this walkthrough, we’re going to use example code written in R, so click on the R logo in the list of Notebook options. This will create a new R notebook file stored on your environment’s persistent disk, and open it for editing.

Select R notebook.

Give your new notebook a meaningful name. Right-click (or control-click) on the file name (which should be Untitled.ipynb) to open the contextual menu and select the Rename option.

Rename a new notebook.

You’ll find the file in the $HOME directory, which is displayed by default in the JupyterLab file explorer (left side panel). The file explorer allows you to access and organize any files stored on the persistent disk itself, and to access resources that are mounted to your environment.

Now that you know how to create a new notebook, you won’t actually use it for the example below. Instead, you’ll open a pre-existing R notebook.

5.2. Open an existing notebook from an example repo

Next, open an existing notebook from the examples repo you added to the workspace in Step 3. This repo, workbench-examples, was automatically cloned to the JupyterLab server when you created your cloud environment. You will find it under repos in the file explorer (or /home/jupyter/repos in the Terminal).

In the file navigator, double-click repos, then workbench-examples, then 1kgenomes_examples. Double-click the R_1k_genomes.ipynb file to open it.

6. Run an example notebook

To run the **R_1k_genomes.ipynb notebook that walks through the computation of principal components (PCA) of genomic variant data across one chromosome from 2,504 people from The 1000 enomes project, follow these steps below:

  1. Open the **R_1kgenomes.ipynb **notebook.
  2. Click the Run (▶) or Forward (⏩) button to run all the available commands on the notebook. You’ll find the file in the $HOME directory, which is displayed by default in the JupyterLab file explorer (left side panel). The file explorer allows you to access and organize any files stored on the persistent disk itself, and to access resources that are mounted to your environment.

Now that you know how to create a new notebook, you won’t actually use it for the example below. Instead, you’ll open a pre-existing R notebook.

6. Run an example notebook

This example R notebook, R_1k_genomes.ipynb walks through the computation of principal components (PCA) of genomic variant data across one chromosome from 2,504 people from the 1000 genomes project. (The notebook is adapted from http://bwlewis.github.io/1000_genomes_examples/PCA.html).

To run the notebook, click the Run (▶) or Forward (⏩) button to run all the available commands in the notebook.

7. Turn down the cloud environment

Once you’ve finished your work and ensured everything is saved appropriately, you can close the JupyterLab browser window.

Return to your workspace’s Environments tab and click the Stop button in the environment card to turn down your cloud environment. This will immediately send the instruction to stop the environment; there is no confirmation step. However, there may be a lag of a few seconds before the status is updated to Stopped in the graphical user interface.

Stop a running cloud environment

ℹ️ Cloud environment operations > Stop cloud environment

When you’re ready to resume work, click the Start button in the environment card. It may take a few minutes for a cloud environment to restart after being paused.

Start a stopped cloud environment.

8. Delete the cloud environment

If you no longer need your cloud environment, you can delete it from the action menu of the environment card. To delete the environment:

  1. Click the Environments tab on your workspace page.
  2. Find the cloud environment you want to delete. Click the additional actions menu, which is represented by a “three-dot” icon in the top right corner of the environment card.
  3. Click Delete. A dialog will open asking you to confirm deletion. Check the box and click the Delete environment button.
Delete a cloud environment.

Deletion of the cloud environment deletes its underlying disk as well. Therefore, make sure that you’ve preserved your work. You can do this by writing data and notebook files to your workspace bucket; this was shown in the example notebook. You can also commit work back to a GitHub repo, or download files to your local machine.

Last Modified: 23 September 2024