Secure Data Enclave Glossary#
This glossary defines key terms and concepts used throughout the SDE documentation.
- allowlist
Network connections between the SDE and the Internet are very limited to support security. A limited set of specific websites, resources, and endpoint devices are allowed to connect to the SDE. Approved sites or devices are put on an allowlist.
- Bucket
A storage container in Google Cloud Storage (GCS) that holds datasets, results, or logs. Buckets are tightly controlled with project-specific permissions. See the Storage resources and Managing Buckets pages for more details.
- Cloud Logging
A centralized Google Cloud service that collects and stores audit and system logs for compliance and troubleshooting.
- Curated Data
Data that has been cleaned, validated, and approved for analysis within the SDE.
- Data Egress
The movement of data out of the SDE environment. All egress must be reviewed and approved. Data can only be staged for egress by a Data Engineer. See Data Egress in the SDE User Guide.
- Data Ingress
The movement of data into the SDE environment. Data can only be ingressed by a Data Engineer using approved methods from approved sources. See Data Ingress in the SDE User Guide.
- DUA
Data Use Agreement. The rules and requirements governing what is allowed and required for working with data in the SDE must be prescribed by a data use agreement between Northwestern University and the organization providing the data.
- Endpoint
A laptop or workstation. Managed endpoints, configured and managed by Northwestern IT, must be used for secure access to the SDE. Managed endpoints are monitored to ensure compliance with data protection policies.
- IAM
IAM stands for Identity and Access Management. It is the system used in GCP to manage users, roles, and access policies.
- gcloud
The main Google Cloud command line tool for managing projects, compute resources, and IAM configurations.
- gsutil
A command-line tool for managing data in Google Cloud Storage (e.g., copying files between buckets).
- Multi-Factor Authentication (MFA)
A login process requiring two or more verification methods (e.g., a password and authentication application) for secure access.
- NIST SP 800-171
A U.S. federal standard that defines how Controlled Unclassified Information (CUI) must be protected in non-federal systems.
- PII
Personally Identifiable Information. This includes names, emails, phone numbers, identification numbers, and other information. It can also include combinations of information that may identify an individual, such as a zip code along with a birthday. Consult your DUA for details of what information is considered sensitive.
- Project
A defined workspace in GCP that groups together related compute, storage, and permissions for a specific research initiative.
- Role
A role defines what a user can do within the SDE. Roles determine access to data and resources. Example: Data Engineer, Data Analyst.
- Role-Based Access
A security model that assigns users permissions based on their function (e.g., Data Engineer, Data Analyst) to enforce least-privilege access.
- Service Account
A special GCP identity used by applications, tools, or automated workflows to access resources securely without user credentials.
- SSO
Single Sign-on. SSO allows users to log into external services with their Northwestern NetID and NetID password. Northwestern MFA may also be used.
- Virtual Machine (VM)
A secure, isolated computing instance used for data processing or analysis within the SDE. In GCP, VMs are provided through Compute Engine.
- VPC (Virtual Private Cloud)
The secure, isolated network environment in which SDE compute and storage resources operate.
- VPC Service Controls
Google Cloud’s perimeter security feature that helps keep data inside the SDE by restricting communication between resources.