Importing and Exporting Data#

Both importing data into (ingress) and exporting data from (egress) an SDE environment must be done carefully to maintain compliance with Northwestern and NIST SP 800-171 standards.

Only Data Engineers can ingress data; other SDE users should contact their project’s Data Engineer for data imports.

SDE users can stage data for export, but it must be reviewed for compliance and logged by a Data Engineer before it can be downloaded from an SDE environment.

Data Ingress#

Only data consistent with the DUA governing a project can be imported into an SDE environment.

There are two established data transfer options for importing data:

If data cannot be transferred via these methods, please contact us to discuss.

Both methods must be set up before they can be used. Data ingress will be discussed during the SDE onboarding process. If an additional method is needed, please submit a support request so that the ingress method can be reviewed and setup.

Globus Transfer#

Globus is a secure, high-performance data transfer service used by many research institutions. It allows you to move data directly between approved storage endpoints.

To ingress data via Globus, submit an ingress request. Once approved, Northwestern IT will configure a Globus endpoint for your SDE environment and provide you with the endpoint details.

Once set up, you will be able to sign in to Globus with your Northwestern credentials and view your SDE environment as an endpoint. You will be able to transfer data from other approved Globus endpoints to your secure SDE Globus endpoint. See the Globus guide here or the official Globus documentation for more information on using Globus.

Website Transfer#

If your data source cannot use Globus, you can ingest your data by accessing a website from a VM in the Ingress Project. This method works for downloading data from Sharepoint, Dropbox, or other cloud storage locations with a web interface.

External websites must be reviewed and the URL added to an allowlist. You will then be able to log in to a VM in the Ingress Project and access only the specific permitted URL via a web browser. You can download data to local storage on the VM and then transfer it to a storage bucket.

Data Egress#

Only files with deidentified or aggregated data that removes PII or other information restricted by the project DUA may be exported from an SDE environment.

Data Engineers are responsible for reviewing data egress requests for compliance with the project DUA and setting up a storage bucket in the Egress Project from which a Researcher/Data Analyst can download reviewed and approved files.

This is the only supported method for data egress at this time. Other tools are not allowed. Ensure you comply with all project and security policies when exporting data.

If you encounter issues with data egress, please request support.

Researcher/Data Analyst#

To request the egress of a file from the SDE environment, move the file to the “Egress Dataprep” bucket in the Data Lake project and notify a project Data Engineer of the need for egress review.

Once the Data Engineer has reviewed the file, they will create a bucket in the Egress project for the file to be downloaded.

  1. Log in to the Google Cloud Console. Navigate to the Cloud Storage Buckets page .

  2. Make sure you are in the Egress project (check the project name at the top of the console).

    Identify Project through Console
  3. Select the bucket that contains the data you want to download.

    Identify Project through Console
  4. Browse the bucket to find the specific files or folders you need.

  5. Download the files through the browser.

    • Check the box next to the file(s) or folder(s).

    • Click the Download button.

    • Save the file(s) to your approved endpoint.

    Identify Project through Console

Data Engineer#

Researcher/Data Analysts can stage files for egress for your review in the “Egress Dataprep” bucket in the Data Lake project.

Use the VM in the Data Ops project to review the files to check that they comply with the project DUA and are appropriate for egress.

You must maintain a lot of all data approved for egress that includes the date of egress, details about the data, and who made the request.

Once a file is approved for egress, create a new bucket in the Egress project and give the Researcher/Data Analyst access to the bucket to download the file. You should create a new bucket for each egress request.

Data should be downloaded from the egress bucket promptly. Delete the bucket and the files within 3 days of the data being downloaded.

If you need to egress a file from the SDE yourself, you need another Data Engineer to review and approve the egress.