Skip to main content

Canvas

Code on GitHub

This is an early release of the connector -- as with any connector we welcome and look forward to community contributions to improve the connector.

Getting Started

Step 1: Prerequisites

Before you begin, ensure you have the following:

  1. Access to your data warehouse: Credentials and network access to your data warehouse instance (e.g. Snowflake, BigQuery).
  2. Canvas EHR data: Your raw Canvas data must be loaded into specific tables within your data warehouse.
  3. dbt CLI Installed: You need dbt (version 1.9 recommended) installed on your machine or environment where you'll run the transformations. See dbt Installation Guide for help with installation.
  4. Git: You need Git installed to clone this project repository.
  5. Authentication Details: These details will be important in connecting to dbt with a profiles.yml file.

Step 2: Clone the Repository

Open your terminal or command prompt and clone this project:

git clone https://github.com/tuva-health/canvas_connector.git
cd canvas_connector

Step 3: Create and Activate Virtual Environment

It's highly recommended to use a Python virtual environment to manage project dependencies. This isolates the project's packages from your global Python installation.

  1. Create the virtual environment (run this inside the canvas_connector directory):
# Use python3 if python defaults to Python 2
python -m venv venv

This creates a venv directory within your project folder.

  1. Activate the virtual environment:
  • macOS / Linux (bash/zsh): source venv/bin/activate
  • Windows (Command Prompt): venv\Scripts\activate.bat
  • Windows (PowerShell): .\venv\Scripts\Activate.ps1
  • Windows (Git Bash): source venv/Scripts/activate

You should see (venv) prepended to your command prompt, indicating the environment is active.

Step 4: Install Python Dependencies

With the virtual environment active, install the required Python packages, including dbt and the warehouse-specific dbt adapter (e.g. dbt-snowflake, dbt-bigquery).

Step 5: Configure profiles.yml for Data Warehouse Connection

dbt needs to know how to connect to your data warehouse. In general, this is done via a profiles.yml file, which you need to create. This file should NOT be committed to Git, as it contains sensitive credentials.

  • Location: By default, dbt looks for this file in ~/.dbt/profiles.yml (your user home directory, in a hidden .dbt folder).
  • Content: See the dbt docs.

Step 6: Install dbt Package Dependencies

This project relies on external dbt packages (The Tuva Project and dbt_utils). Run the following command in your terminal from the project directory (the one containing dbt_project.yml):

dbt deps

This command reads packages.yml and downloads the necessary code into the dbt_packages/ directory within your project.

Step 7: Test the Connection

Before running transformations, verify that dbt can connect to your data warehouse using your profiles.yml settings:

dbt debug

Look for "Connection test: OK connection ok". If you see errors, double-check your profiles.yml settings (account, user, role, warehouse, authentication details, paths).

Running the Project

Once setup is complete, you can run the dbt transformations:

Full Run (Recommended First Time), this command will:

  • Run all models (.sql files in models/).
  • Run all tests (.yml, .sql files in tests/).
  • Materialize tables/views in your target data warehouse as configured.
dbt build

This might take some time depending on the data volume and warehouse size.

Run Only Models:

If you only want to execute the transformations without running tests:

dbt run

Run Only Tests:

To execute only the data quality tests:

dbt test

Running Specific Models:

You can run specific parts of the project using dbt's node selection syntax. For example:

  • Run only the staging models: dbt run -s path:models/staging
  • Run a specific model and its upstream dependencies: dbt run -s +your_model_name