Computational Resources (VMs)#
Virtual Machines (VMs) in the Secure Data Enclave (SDE) provide the processing power needed to run analyses, build models, and work with data securely. VMs run the Linux Ubuntu operating system and are preconfigured with standard research software. They have attached storage, as well as access to storage buckets and BigQuery, if configured, in the SDE environment.
VMs are part of the Google Cloud Compute Engine service.
See the SDE User Guide for details on connecting to and using VMs.
VM Availability#
VMs are available in the following projects:
Project |
Available to |
Purpose |
Compute Availability |
|---|---|---|---|
For controlled data transfer into the SDE environment. |
Single VM for all data ingress tasks. |
||
Data cleaning, curation, and management. Data analysis by Data Engineers. |
1 VM by default; additional VMs can be requested. |
||
General research tasks and data analysis. |
1 VM per workspace project. Additional VMs can be requested. |
Using VMs#
VMs have no direct internet access. This means that R, Python, and other analysis or software packages cannot be installed directly by users. See software for more information on preinstalled software and the process for adding additional packages.
VMs come with persistent local storage, but it is not backed up automatically. Important files should be saved to a storage bucket.
A Data Engineer manages VM configuration and can request additional CPU, memory, or storage if needed.
Access and permissions are centrally managed to maintain compliance with NIST SP 800-171 and institutional requirements.
VM usage is billed based on the hours they are running, including idle time. SDE users need to start VMs when ready to use them and stop them when their work is finished. Attached storage persists when a VM is stopped, allowing files to be used across multiple sessions. When VMs are deleted, all files and data on the attached storage are also deleted.
Users can start and stop VMs through the Google Cloud Console or Google Cloud Command Line Interface . Users can connect to VMs using SSH-in-browswer, a terminal program on their managed laptop or workstation, or through remote desktop or screen sharing applications.
Best Practices
Always stop your VM when you finish your workday. This preserves resources and maintains security.
Store results or large datasets in buckets, not directly on the attached VM storage.
Learn more about using VMs in the SDE User Guide.
Available VM Types#
The SDE offers by default a standard VM configuration (E2-Standard-8) suitable for most research and data analysis tasks.
Additional VM types can be requested for projects requiring more computational power, memory, or GPU capabilities. VMs with greater computational resources have higher per-hour costs .
VM Type |
vCPUs |
Memory (GB) |
Typical Use Case |
Availability |
|---|---|---|---|---|
E2-Standard-8 |
8 |
32 |
General data analysis, R/Python workloads, Jupyter notebooks |
Default |
E2-Standard-4 |
4 |
16 |
Lightweight alternative for processing, scripting, or testing |
Optional |
N2-Highmem-16 |
16 |
128 |
Memory-intensive computations, large datasets |
By Request |
N2-Highcpu-64 |
64 |
64 |
CPU-heavy workloads, simulation, and parallel processing |
By Request |
Checking your VM configuration#
Northwestern IT configures different types of VMs based on the requirements submitted at the time of building the environment. If you want to identify what type of VM has been configured in your SDE Project:
Navigate to the Google Cloud Console VM Instances Page .
Ensure you’re in the correct project (check the project selector at the top of the page) for the VM you want to connect to.
Open the VM details page by selecting the VM name.
In the details page, you will have the VM configuration under the Machine configuration section:
Under the Storage section, the VM persistent disk Size will be noted.
Available Software#
Connections between the SDE environment are tightly controlled and blocked for most SDE resources. VMs include preinstalled software and libraries for data analysis, statistics, development, and document creation.
Additional software, packages, and libraries can be added to the VMs via request to Northwestern IT. When new software or packages are added, VMs need to be recreated in the SDE environment, which will delete all files stored on the attached VM storage; data and files can and should be saved in storage buckets in the SDE environment before software updates.
Web Browser#
Google Chrome
Office & Productivity#
LibreOffice Suite
Writer (Documents)
Calc (Spreadsheets)
Impress (Presentations)
Draw (Diagrams/Graphics)
Base (Databases)
Math (Formulas)
Technical & Statistical Computing#
MATLAB (licensed, available for installation when needed)
R/RStudio, with a variety of packages including:
Data analysis (tidyverse, data.table)
Statistics & modeling (survival, lme4, mgcv)
Machine learning (randomForest, xgboost, glmnet)
Bayesian analysis (rstan, brms)
Geospatial (sf, terra, spatstat)
Visualization (ggplot2, plotly, rgl)
Web apps & dashboards (shiny ecosystem)
Reporting (bookdown, rmarkdown, flextable, gtsummary)
Additional packages can be installed via request to Northwestern IT.
Python#
Python 3, with popular science and ML packages including the following libraries and their dependencies:
Data: pandas, polars, numpy, pyarrow
Statistics: scipy, statsmodels
Machine Learning: scikit-learn, xgboost, lightgbm
Deep Learning: tensorflow, theano
NLP: nltk, spacy
Visualization: matplotlib, plotly, bokeh
Images & OCR: Pillow, scikit-image, pytesseract
Geospatial: geopandas, shapely
Networks: networkx, additional graph analysis tools
HTML processing: beautifulsoup4
Notebooks: JupyterLab
Additional libraries can be installed via request to Northwestern IT.
Development Tools#
GCC and Fortran compilers
CMake, Make
git
pkg-config
autotools