Computational Resources (VMs)#

Virtual Machines (VMs) in the Secure Data Enclave (SDE) provide the processing power needed to run analyses, build models, and work with data securely. VMs run the Linux Ubuntu operating system and are preconfigured with standard research software. They have attached storage, as well as access to storage buckets and BigQuery, if configured, in the SDE environment.

VMs are part of the Google Cloud Compute Engine service.

See the SDE User Guide for details on connecting to and using VMs.

VM Availability#

VMs are available in the following projects:

Project

Available to

Purpose

Compute Availability

Data Ingress

Data Engineer

For controlled data transfer into the SDE environment.

Single VM for all data ingress tasks.

Data Ops

Data Engineer

Data cleaning, curation, and management. Data analysis by Data Engineers.

1 VM by default; additional VMs can be requested.

Workspace

Researcher/Data Analyst

General research tasks and data analysis.

1 VM per workspace project. Additional VMs can be requested.

Using VMs#

  • VMs have no direct internet access. This means that R, Python, and other analysis or software packages cannot be installed directly by users. See software for more information on preinstalled software and the process for adding additional packages.

  • VMs come with persistent local storage, but it is not backed up automatically. Important files should be saved to a storage bucket.

  • A Data Engineer manages VM configuration and can request additional CPU, memory, or storage if needed.

  • Access and permissions are centrally managed to maintain compliance with NIST SP 800-171 and institutional requirements.

VM usage is billed based on the hours they are running, including idle time. SDE users need to start VMs when ready to use them and stop them when their work is finished. Attached storage persists when a VM is stopped, allowing files to be used across multiple sessions. When VMs are deleted, all files and data on the attached storage are also deleted.

Users can start and stop VMs through the Google Cloud Console or Google Cloud Command Line Interface . Users can connect to VMs using SSH-in-browswer, a terminal program on their managed laptop or workstation, or through remote desktop or screen sharing applications.

Best Practices

  • Always stop your VM when you finish your workday. This preserves resources and maintains security.

  • Store results or large datasets in buckets, not directly on the attached VM storage.

Learn more about using VMs in the SDE User Guide.

Available VM Types#

The SDE offers by default a standard VM configuration (E2-Standard-8) suitable for most research and data analysis tasks.
Additional VM types can be requested for projects requiring more computational power, memory, or GPU capabilities. VMs with greater computational resources have higher per-hour costs .

VM Type

vCPUs

Memory (GB)

Typical Use Case

Availability

E2-Standard-8

8

32

General data analysis, R/Python workloads, Jupyter notebooks

Default

E2-Standard-4

4

16

Lightweight alternative for processing, scripting, or testing

Optional

N2-Highmem-16

16

128

Memory-intensive computations, large datasets

By Request

N2-Highcpu-64

64

64

CPU-heavy workloads, simulation, and parallel processing

By Request

Checking your VM configuration#

Northwestern IT configures different types of VMs based on the requirements submitted at the time of building the environment. If you want to identify what type of VM has been configured in your SDE Project:

  1. Navigate to the Google Cloud Console VM Instances Page .

  2. Ensure you’re in the correct project (check the project selector at the top of the page) for the VM you want to connect to.

    Google Cloud Console VM Page
  3. Open the VM details page by selecting the VM name.

    Virtual Machine Select
  4. In the details page, you will have the VM configuration under the Machine configuration section:

    Virtual Machine Configuration
  5. Under the Storage section, the VM persistent disk Size will be noted.

    Virtual Machine Storage

Available Software#

Connections between the SDE environment are tightly controlled and blocked for most SDE resources. VMs include preinstalled software and libraries for data analysis, statistics, development, and document creation.

Additional software, packages, and libraries can be added to the VMs via request to Northwestern IT. When new software or packages are added, VMs need to be recreated in the SDE environment, which will delete all files stored on the attached VM storage; data and files can and should be saved in storage buckets in the SDE environment before software updates.

Web Browser#

Google Chrome

Office & Productivity#

LibreOffice Suite

  • Writer (Documents)

  • Calc (Spreadsheets)

  • Impress (Presentations)

  • Draw (Diagrams/Graphics)

  • Base (Databases)

  • Math (Formulas)

Technical & Statistical Computing#

MATLAB (licensed, available for installation when needed)

R/RStudio, with a variety of packages including:

  • Data analysis (tidyverse, data.table)

  • Statistics & modeling (survival, lme4, mgcv)

  • Machine learning (randomForest, xgboost, glmnet)

  • Bayesian analysis (rstan, brms)

  • Geospatial (sf, terra, spatstat)

  • Visualization (ggplot2, plotly, rgl)

  • Web apps & dashboards (shiny ecosystem)

  • Reporting (bookdown, rmarkdown, flextable, gtsummary)

Additional packages can be installed via request to Northwestern IT.

Python#

Python 3, with popular science and ML packages including the following libraries and their dependencies:

  • Data: pandas, polars, numpy, pyarrow

  • Statistics: scipy, statsmodels

  • Machine Learning: scikit-learn, xgboost, lightgbm

  • Deep Learning: tensorflow, theano

  • NLP: nltk, spacy

  • Visualization: matplotlib, plotly, bokeh

  • Images & OCR: Pillow, scikit-image, pytesseract

  • Geospatial: geopandas, shapely

  • Networks: networkx, additional graph analysis tools

  • HTML processing: beautifulsoup4

  • Notebooks: JupyterLab

Additional libraries can be installed via request to Northwestern IT.

Development Tools#

  • GCC and Fortran compilers

  • CMake, Make

  • git

  • pkg-config

  • autotools