Computational Resources (VMs)#
Virtual Machines (VMs) in the Secure Data Enclave (SDE) provide the processing power needed to run analyses, build models, and work with data securely. VMs run the Linux Ubuntu operating system and are preconfigured with standard research software. They have attached storage, as well as access to storage buckets and BigQuery, if configured, in the SDE environment.
VMs are part of the Google Cloud Compute Engine service.
See the SDE User Guide for details on connecting to and using VMs.
VM Availability#
VMs are available in the following projects:
Project |
Available to |
Purpose |
Compute Availability |
|---|---|---|---|
For controlled data transfer into the SDE environment. |
Single VM for all data ingress tasks. |
||
Data cleaning, curation, and management. Data analysis by Data Engineers. |
1 VM by default; additional VMs can be requested. |
||
General research tasks and data analysis. |
1 VM per workspace project. Additional VMs can be requested. |
Using VMs#
VMs have no direct internet access. This means that R, Python, and other analysis or software packages cannot be installed directly by users. See software for more information on preinstalled software and the process for adding additional packages.
VMs come with persistent local storage, but it is not backed up automatically. Important files should be saved to a storage bucket.
A Data Engineer manages VM configuration and can request additional CPU, memory, or storage if needed.
Access and permissions are centrally managed to maintain compliance with NIST SP 800-171 and institutional requirements.
VMs can have access to authorized and configured GitHub repositories to share code between SDE users and transfer code in and out of the SDE.
VM usage is billed based on the hours they are running, including idle time. SDE users need to start VMs when ready to use them and stop them when their work is finished. Attached storage persists when a VM is stopped, allowing files to be used across multiple sessions. When VMs are deleted, all files and data on the attached storage are also deleted.
Users can start and stop VMs through the Google Cloud Console or Google Cloud Command Line Interface . Users can connect to VMs using SSH-in-browswer, a terminal program on their managed laptop or workstation, or through remote desktop or screen sharing applications.
Best Practices
Always stop your VM when you finish your workday. This preserves resources and maintains security.
Store results or large datasets in buckets, not directly on the attached VM storage.
Regularly back up your scripts and notebooks to a storage bucket or approved GitHub repository.
Learn more about using VMs in the SDE User Guide.
Available VM Types#
The SDE offers by default a standard VM configuration (E2-Standard-8) suitable for most research and data analysis tasks.
Additional VM types can be requested for projects requiring more computational power, memory, or GPU capabilities. VMs with greater computational resources have higher per-hour costs .
VM Type |
vCPUs |
Memory (GB) |
Typical Use Case |
Availability |
|---|---|---|---|---|
E2-Standard-8 |
8 |
32 |
General data analysis, R/Python workloads, Jupyter notebooks |
Default |
E2-Standard-4 |
4 |
16 |
Lightweight alternative for processing, scripting, or testing |
Optional |
N2-Highmem-16 |
16 |
128 |
Memory-intensive computations, large datasets |
By Request |
N2-Highcpu-64 |
64 |
64 |
CPU-heavy workloads, simulation, and parallel processing |
By Request |
Available Software#
Connections between the SDE environment are tightly controlled and blocked for most SDE resources. VMs include preinstalled software and libraries for data analysis, statistics, development, and document creation.
Additional software, packages, and libraries can be added to the VMs via request to Northwestern IT. When new software or packages are added, VMs need to be recreated in the SDE environment, which will delete all files stored on the attached VM storage; data and files can and should be saved in storage buckets in the SDE environment before software updates.
Web Browser#
Google Chrome
Office & Productivity#
LibreOffice Suite
Writer (Documents)
Calc (Spreadsheets)
Impress (Presentations)
Draw (Diagrams/Graphics)
Base (Databases)
Math (Formulas)
Technical & Statistical Computing#
MATLAB (licensed, available for installation when needed)
R/RStudio, with a variety of packages including:
Data analysis (tidyverse, data.table)
Statistics & modeling (survival, lme4, mgcv)
Machine learning (randomForest, xgboost, glmnet)
Bayesian analysis (rstan, brms)
Geospatial (sf, terra, spatstat)
Visualization (ggplot2, plotly, rgl)
Web apps & dashboards (shiny ecosystem)
Reporting (bookdown, rmarkdown, flextable, gtsummary)
Additional packages can be installed via request to Northwestern IT.
Python#
Python 3, with popular science and ML packages including the following libraries and their dependencies:
Data: pandas, polars, numpy, pyarrow
Statistics: scipy, statsmodels
Machine Learning: scikit-learn, xgboost, lightgbm
Deep Learning: tensorflow, theano
NLP: nltk, spacy
Visualization: matplotlib, plotly, bokeh
Images & OCR: Pillow, scikit-image, pytesseract
Geospatial: geopandas, shapely
Networks: networkx, additional graph analysis tools
HTML processing: beautifulsoup4
Notebooks: JupyterLab
Additional libraries can be installed via request to Northwestern IT.
Development Tools#
GCC and Fortran compilers
CMake, Make
git
pkg-config
autotools