Getting Started with the Tuva Data Quality Dashboard
Welcome! If you're working with healthcare data and using the Tuva Project, ensuring the quality and accuracy of your data after mapping is crucial. We're excited to introduce a new open-source tool to help you visualize and monitor this: the Tuva Data Quality Dashboard.
This dashboard is designed to work hand-in-hand with the Tuva Health dbt package and provides a visual interface to understand the results of the data quality tests defined within the Tuva Framework. It helps answer the question: "Have I mapped my input data correctly into the Tuva Data Model?"
Repository: https://github.com/tuva-health/tuva_dqi
Video Overview
What Does It Do?
The Tuva Data Quality Dashboard takes the output from the data quality tests run by the Tuva dbt package and presents it in an easy-to-understand web application. Its main goals are to help you:
- Monitor Overall Health: Get a quick sense of your data's quality with an A-F grading system.
- Assess Usability: Understand if specific data marts (like
Service Categories
orCMS HCCs
) are reliable enough to use based on test results. - Investigate Issues: Drill down into specific failing tests to understand what went wrong and where.
How It Works: The Big Picture
It's important to understand that the dashboard itself doesn't run the data quality tests. Instead, it visualizes the results generated by the Tuva dbt package.
Here's the typical workflow:
- Run Tests: You use the Tuva Health dbt package (version 0.14.3 or later) within your dbt project to execute data quality tests against your data warehouse.
- Generate Outputs: The dbt package creates specific tables containing the test results (
data_quality__testing_summary
) and data for visualizations (data_quality__exploratory_charts
). - Export Data: You export these two tables from your data warehouse into CSV files (with headers).
- Import into Dashboard: You upload these CSV files into the running Data Quality Dashboard application.
- Visualize & Analyze: The dashboard processes the CSVs and displays the interactive charts, grades, and test details.
Key Features at a Glance
- Data Quality Grade: An overall A-F grade summarizing data health.
- Data Mart Status: Clear indicators of usability for each data mart.
- Detailed Test Results: See which tests passed or failed, including severity.
- Quality Dimension Analysis: Break down results by completeness, validity, etc.
- Interactive Visualizations: Charts exploring data patterns and quality metrics over time or by category.
- Report Generation: Create shareable PDF report cards.
- Severity Levels: Tests are categorized by severity to help prioritize fixes:
- Level 1: Critical (prevents dbt build)
- Level 2: Major (affects data reliability)
- Level 3: Moderate (use data with caution)
- Level 4: Minor (limited impact)
- Level 5: Informational (dbt informational tests)
Getting Started: Running the Dashboard
Here’s how you can get the dashboard running locally. For detailed instructions, always refer to the project's README.
Prerequisites:
- Python (version 3.11 or higher recommended)
- Git
Steps:
- Clone the Repository:
git clone [https://github.com/tuva-health/tuva_dqi.git](https://github.com/tuva-health/tuva_dqi.git)
cd tuva_dqi - Set up a Virtual Environment: (Recommended to avoid dependency conflicts)
# Navigate into the cloned directory (if you aren't already)
cd tuva_dqi
# Create the environment
python -m venv .venv # Use python3 if 'python' doesn't work
# Activate the environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate - Install Dependencies:
python -m pip install -r requirements.txt
- Run the Application:
python app.py
- Access: Open your web browser and navigate to
http://localhost:8080
.
(Note: Docker instructions are also available in the repository's README if you prefer containerization.)
Generating the Input Data (Crucial Step!)
The dashboard needs specific data generated by the Tuva dbt package.
Prerequisites:
- A working dbt project.
- Your healthcare data loaded into your data warehouse.
Steps:
-
Add the Tuva Package: Ensure your project's
packages.yml
includes the Tuva package, version 0.14.3 or later:packages:
- package: tuva-health/the_tuva_project
version: [">=0.14.3"] # Or specify a higher version -
Install/Update the Package:
dbt clean
dbt deps -
Run Tuva Models and Tests: Execute the necessary dbt commands to build Tuva models and run the data quality tests. A typical sequence is:
dbt seed --full-refresh # Load necessary seed data
dbt run # Build Tuva models
dbt test # Run data quality tests -
Export the Results: After the
dbt test
command finishes successfully, two key tables will exist in your data warehouse (likely in your dbt data quality schema with Tuva prefix if specified):data_quality__testing_summary
data_quality__exploratory_charts
You need to export the data from these two tables into CSV files, including headers. The method for exporting depends on your specific data warehouse (e.g., using SQL queries, warehouse UI export features, or other tools).
-
Upload to Dashboard: Once you have the
testing_summary.csv
andexploratory_charts.csv
files, use the "Import Test Results" feature within the running dashboard application to upload them.
Running tests just within dbt
DQI has introduced variables into the Tuva dbt project. You can enable or target specific dqi tests while staying within dbt. For example
# Run all the tests associated with the HCC mart. Warnings change into errors
dbt test -s tag:dqi_cms_hccs --warn-error
# Test sev 1 and sev 2 issues only
dbt test -s tag:tuva_dqi_sev_1 -s tag:tuva_dqi_sev_2
# Run all tests that are specific to tuva dqi
dbt test -s tag:dqi
List of mart DQI test tags
dqi_service_categories
dqi_ccsr
dqi_cms_chronic_conditions
dqi_tuva_chronic_conditions
dqi_cms_hccs
dqi_ed_classification
dqi_financial_pmpm
dqi_quality_measures
dqi_readmission
How to disable DQI
Set the variable in your CLI or dbt_project.yml
file enable_input_layer_testing
to false
and this will disable the
functionality.
Development Status
Please note that the Tuva Data Quality Dashboard is currently in alpha/early development. It's designed to work with outputs from the Tuva dbt package version 0.14.3 or later.
Get Involved!
This is an open-source project, and we welcome community feedback and contributions!
- Report Bugs or Suggest Features: Please open an issue on the tuva_dqi GitHub repository.
- Learn More about Tuva: Explore the main Tuva Project dbt package.
- Post in our Slack: Join our Slack and provide feedback on DQI.
We hope this dashboard helps you gain better insights into your healthcare data quality!