User Roles within the Enclave#
Users of the SDE are assigned one of two roles that determine what access, permissions, and responsibilities they have in their SDE environment.
Data Engineer: Effectively full access to all resources in the SDE Google Cloud environment. At least one user must have the Data Engineer role in each SDE environment, but it should be granted to a limited set of users. Data Engineers manage data ingress and data egress, manage permissions for other SDE environment users, move data between projects, and perform data cleaning and curation tasks.
Researcher/Data Analyst: This is the default role for most SDE users. Users with the Researcher/Data Analyst role can work with data that has already been added to the SDE environment and access VMs in the Workspace Project to analyze data.
Permission Sets
Google Cloud “roles” refer to specific permission sets. The Data Engineer and Researcher/Data Analyst “roles” discussed here are really a collection of multiple Google Cloud “roles”.
Data Engineer#
Users with the Data Engineer role have access to all raw and processed data and are responsible for moving data in and out of the SDE using compliant procedures. At least one member of the research team needs to have this role, and this role should be assigned on a limited basis.
Responsibilities#
Data Engineers manage the technical backbone of the enclave research project. They are responsible for:
Data Ingress: Ingest datasets from approved sources into the SDE environment raw data storage.
Data Access: Grant and manage data access for other SDE environment users, and promptly revoke access when no longer needed.
Data Egress: Coordinate with SDE administrators to obtain approval for any data export requests.
Research Team Support: Supporting other members of their research team with using the SDE environment.
Requesting Support: Communicate with Northwestern IT to request additional resources or technical support.
Project Access#
Data Ingress: This project is for importing data into the SDE environment. Data engineers are responsible for managing and executing data ingress workflows, ensuring only approved data enters the environment, and moving data to the data ops project for review, cleaning, and processing.
Data Ops: This project is for cleaning and curating data for use by other users. It can also be used by Data Engineer users for research and analysis work (Researcher/Data Analyst users use the workspace project for this type of work).
Data Lake: The data lake project is the primary data storage location for the SDE environment. Data Engineers add data to this project after it is cleaned and curated and manage permissions for Researcher/Data Analyst users to access data in the project.
Data Egress: This project is for exporting data from the SDE environment. Data Engineers are responsible for reviewing data egress requests from other users, getting required data egress approvals, moving data into the project, and removing data after it has been downloaded by other users.
Researcher/Data Analyst#
Users with this role are able to access and work with analysis-ready data sets that have been made available in the Data Lake Project by a Data Engineer. Most members of the research team have this role. Researcher/Data Analysts can be split into groups that have different data access permissions, if needed.
SDE environments can be configured to group users into different Researcher/Data Analyst groups if different users need different permissions to files and computational resources. Discuss resource needs with Northwestern IT.
Responsibilities#
Researcher/Data Analysts are responsible for performing scientific, statistical, or exploratory analyses within the secure enclave using curated data and working within compliance boundaries. Researcher/Data Analysts need to coordinate data ingress, data egress, and other tasks with a project Data Engineer.
Key responsibilities include:
Accessing curated datasets provided by Data Engineers.
Developing analysis code (e.g., Python, R, SQL) in the secure enclave workspace project.
Documenting results and methods to support transparency and reproducibility.
Submit data egress requests to a project Data Engineer when data needs to leave the enclave.
Coordinate with a project Data Engineer for any new data needs or data access permissions.
Security considerations:
Only use approved datasets from the Data Lake Project.
Never attempt to upload, download, or sync files outside enclave boundaries.
All activities are logged for compliance and auditing.
Results must be reviewed and approved before export.
Project Access#
Workspace: The workspace project includes VMs with research software for working with data. Researcher/Data Analyst users primarily work with resources in this project.
Data Lake: The data lake project is the primary data storage location for the SDE environment. Researcher/Data Analysts can access data they are approved for from storage buckets and BigQuery tables in the Data Lake Project.
Data Egress: Researcher/Data Analysts can download data from the Data Egress Project once files are reviewed, logged, and approved by a Data Engineer. See Data Egress in the SDE User Guide for more details.