Skip to main content

Getting Started with the Tuva Data Quality Dashboard

Welcome! If you're working with healthcare data and using the Tuva Project, ensuring the quality and accuracy of your data after mapping is crucial. We're excited to introduce a new open-source tool to help you visualize and monitor this: the Tuva Data Quality Dashboard.

This dashboard is designed to work hand-in-hand with the Tuva Health dbt package and provides a visual interface to understand the results of the data quality tests defined within the Tuva Framework. It helps answer the question: "Have I mapped my input data correctly into the Tuva Data Model?"

Repository: https://github.com/tuva-health/tuva_dqi

Video Overview

What Does It Do?

The Tuva Data Quality Dashboard takes the output from the data quality tests run by the Tuva dbt package and presents it in an easy-to-understand web application. Its main goals are to help you:

  1. Monitor Overall Health: Get a quick sense of your data's quality with an A-F grading system.
  2. Assess Usability: Understand if specific data marts (like Service Categories or CMS HCCs) are reliable enough to use based on test results.
  3. Investigate Issues: Drill down into specific failing tests to understand what went wrong and where.

How It Works: The Big Picture

It's important to understand that the dashboard itself doesn't run the data quality tests. Instead, it visualizes the results generated by the Tuva dbt package.

Here's the typical workflow:

  1. Run Tests: You use the Tuva Health dbt package (version 0.14.3 or later) within your dbt project to execute data quality tests against your data warehouse.
  2. Generate Outputs: The dbt package creates specific tables containing the test results (data_quality__testing_summary) and data for visualizations (data_quality__exploratory_charts).
  3. Export Data: You export these two tables from your data warehouse into CSV files (with headers).
  4. Import into Dashboard: You upload these CSV files into the running Data Quality Dashboard application.
  5. Visualize & Analyze: The dashboard processes the CSVs and displays the interactive charts, grades, and test details.

Key Features at a Glance

  • Data Quality Grade: An overall A-F grade summarizing data health.
  • Data Mart Status: Clear indicators of usability for each data mart.
  • Detailed Test Results: See which tests passed or failed, including severity.
  • Quality Dimension Analysis: Break down results by completeness, validity, etc.
  • Interactive Visualizations: Charts exploring data patterns and quality metrics over time or by category.
  • Report Generation: Create shareable PDF report cards.
  • Severity Levels: Tests are categorized by severity to help prioritize fixes:
    • Level 1: Critical (prevents dbt build)
    • Level 2: Major (affects data reliability)
    • Level 3: Moderate (use data with caution)
    • Level 4: Minor (limited impact)
    • Level 5: Informational (dbt informational tests)

Getting Started: Running the Dashboard

Here’s how you can get the dashboard running locally. For detailed instructions, always refer to the project's README.

Prerequisites:

  • Python (version 3.11 or higher recommended)
  • Git

Steps:

  1. Clone the Repository:
    git clone [https://github.com/tuva-health/tuva_dqi.git](https://github.com/tuva-health/tuva_dqi.git)
    cd tuva_dqi
  2. Set up a Virtual Environment: (Recommended to avoid dependency conflicts)
    # Navigate into the cloned directory (if you aren't already)
    cd tuva_dqi
    # Create the environment
    python -m venv .venv # Use python3 if 'python' doesn't work
    # Activate the environment
    # On macOS/Linux:
    source .venv/bin/activate
    # On Windows:
    # .venv\Scripts\activate
  3. Install Dependencies:
    python -m pip install -r requirements.txt
  4. Run the Application:
    python app.py
  5. Access: Open your web browser and navigate to http://localhost:8080.

(Note: Docker instructions are also available in the repository's README if you prefer containerization.)

Generating the Input Data (Crucial Step!)

The dashboard needs specific data generated by the Tuva dbt package.

Prerequisites:

  • A working dbt project.
  • Your healthcare data loaded into your data warehouse.

Steps:

  1. Add the Tuva Package: Ensure your project's packages.yml includes the Tuva package, version 0.14.3 or later:

    packages:
    - package: tuva-health/the_tuva_project
    version: [">=0.14.3"] # Or specify a higher version
  2. Install/Update the Package:

    dbt clean
    dbt deps
  3. Run Tuva Models and Tests: Execute the necessary dbt commands to build Tuva models and run the data quality tests. A typical sequence is:

    dbt seed --full-refresh # Load necessary seed data
    dbt run # Build Tuva models
    dbt test # Run data quality tests
  4. Export the Results: After the dbt test command finishes successfully, two key tables will exist in your data warehouse (likely in your dbt data quality schema with Tuva prefix if specified):

    • data_quality__testing_summary
    • data_quality__exploratory_charts

    You need to export the data from these two tables into CSV files, including headers. The method for exporting depends on your specific data warehouse (e.g., using SQL queries, warehouse UI export features, or other tools).

  5. Upload to Dashboard: Once you have the testing_summary.csv and exploratory_charts.csv files, use the "Import Test Results" feature within the running dashboard application to upload them.

Running tests just within dbt

DQI has introduced variables into the Tuva dbt project. You can enable or target specific dqi tests while staying within dbt. For example

# Run all the tests associated with the HCC mart. Warnings change into errors
dbt test -s tag:dqi_cms_hccs --warn-error

# Test sev 1 and sev 2 issues only
dbt test -s tag:tuva_dqi_sev_1 -s tag:tuva_dqi_sev_2

# Run all tests that are specific to tuva dqi
dbt test -s tag:dqi

List of mart DQI test tags

  • dqi_service_categories
  • dqi_ccsr
  • dqi_cms_chronic_conditions
  • dqi_tuva_chronic_conditions
  • dqi_cms_hccs
  • dqi_ed_classification
  • dqi_financial_pmpm
  • dqi_quality_measures
  • dqi_readmission

How to disable DQI

Set the variable in your CLI or dbt_project.yml file enable_input_layer_testing to false and this will disable the functionality.

Development Status

Please note that the Tuva Data Quality Dashboard is currently in alpha/early development. It's designed to work with outputs from the Tuva dbt package version 0.14.3 or later.

Get Involved!

This is an open-source project, and we welcome community feedback and contributions!

We hope this dashboard helps you gain better insights into your healthcare data quality!