Development Style Guides

dbt Style Guide

Model Organization

The Tuva models can fit into three categories:

Category	Description	Folder in dbt	Table Prefix
Staging	Contains models	/models/staging	`_stg_`
Intermediate	Contains models with logic (usually complex) that transforms data	/models/intermediate	`_int_`
Final	Contains the models that produce the final outputs of the package	/models/final	(none)

See the Tuva package repo on GitHub as an example of model naming and organization.

Model File Naming and Coding

Model names should use the naming convention <folder_name>__<table_prefix>_<model_name> (two underscores between folder name and table prefix). For example:
- cms_hcc__int_demographic_factors
  - "cms_hcc" is the name of the mart and the parent folder of the model
  - "int" is the table prefix for models in an intermediate folder
  - "demographic_factors" is the name/description of the model
Schema, table, and column names should be in lower snake case.
- Every word or abbreviation should be separated by an underscore.
Schema names should reflect the parent folder name and be consistent throughout all the models in the parent folder.
Table names should reflect the model name and the table prefix. For example:
- cms_hcc__int_demographic_factors materializes in the database with an alias as _int_demographic_factors.
Limit the use of abbreviations. A contributor will understand current_order_status better than current_os.
Booleans column names should use the suffix "_flag".
Price/revenue fields should be in decimal currency (e.g. 19.99 for $19.99; many app databases store prices as integers in cents).
Avoid using reserved words (such as these) as column names.
Consistency is key! Use the same field names across models where possible.

Model Testing

At a minimum, unique and not_null tests should be applied to the expected primary key of each model.

Model Configurations

All model configurations for a data mart or connector are contained in their own YML within the mart folder, e.g., cms_hcc_models.yml.
The final models should be materialized as table.
Intermediate models can be tables or views based on what’s needed for performance.
Within the YML file, set the schema, alias, tags, and materialization.
Within the SQL file, set the enabled logic. Setting the enabled logic here avoids compilation issues.

YAML and Markdown style guide

Indents should use two spaces.
Items listed in a single .yml or .md file should be sorted alphabetically for ease of finding in larger files.
- It’s also preferable to sort them by category, with the final models listed first. They can be separated by comments.

SQL Style Guide

Do not optimize for fewer lines of code. New lines should be used within reason to produce code that is easily read.

Use leading commas with a space in your select statement.

select
    npi
  , provider_first_name
  , provider_last_name
from provider

When dealing with long case or where clauses, predicates should be indented on a new line. For example:
```
where 
  npi is not null
  and deactivation_flag = 0
  and entity_type_code = 1
```
Use all lowercase unless a specific scenario needs you to do otherwise. This means that keywords, field names, function names, and file names should all be lowercase.
The as keyword should be used when aliasing a field or table.

When aliasing a table, a descriptive name or abbreviation should be used rather than something generic or unrelated. For example:

-- good
select *
from encounter as enc
inner join provider as prov
  on enc.provider_id = prov.provider_id

-- bad
select *
from encounter a
inner join provider b
  on a.provider_id = b.provider_id

Aggregations should be executed as early as possible before joining to another table.
Ordering and grouping by the column name is preferred over using numbers (e.g., group by 1, 2).
Must use union all not union for BigQuery support.
If joining two or more tables, always prefix your column names with the table alias.
Be explicit about your join (i.e. write inner join instead of join). left joins are the most common, right joins often indicate that you should change which table you select from and which one you join to.

Joins should list the left table first (i.e., the table you're joining data to). For example:

select
    medical_claim.*
  , ms_drg.description as ms_drg_description
  , apr_drg.description as apr_drg_description
from medical_claim as claims
left join terminology__ms_drg as ms_drg
  on claims.ms_drg = ms_drg.ms_drg_code
left join terminology__apr_drg as apr_drg
  on claims.apr_drg = apr_drg.apr_drg_code

CTEs

Where performance permits, CTEs should perform a single, logical unit of work.
CTE names should be as verbose as needed to convey what they do.
CTEs with confusing or notable logic should be described with SQL comments as you would with any complex functions and should be located above the CTE.
CTEs fall into two main categories:
- Import: Used to bring data into a model. These are kept relatively simple and refrain from complex operations such as joins and column transformations.
- Logical: Used to perform a logical step with the data that is brought into the model toward the end result.

Example SQL with CTEs

-- Import CTEs
with claims as (
  select * from {{ ref('core__medical_claim') }}
)

, place_of_service as (
  select * from {{ ref('terminology__place_of_service') }}
)

, providers as (
  select * from {{ ref('terminology__provider') }}
)
  
-- Logical CTEs
, snf_claims as (
    select
        claims.claim_id
      , claims.claim_line_number
      , claims.facility_npi
      , place_of_service.place_of_service_description
    from claims
    left join place_of_service
      on claims.place_of_service_code = place_of_service.place_of_service_code
    where place_of_service.place_of_service_code = 31
)

, final as (
    select
        claim_id
      , claim_line_number
      , place_of_service_description
      , provider_organization_name as facility_name
    from snf_claims
    left join providers
      on snf_claims.facility_npi = providers.npi
)

-- Simple select statement
select * from final

Jinja Style Guide

Jinja delimiters should have spaces inside the delimiter between the brackets and your code. For example, {{ this }} instead of {{this}}.
Use whitespace control to make compiled SQL more readable.
An effort should be made for a good balance in readability for both templated and compiled code. However, opt for code readability over compiled SQL readability when needed.
A macro file should be named after the main macro it contains.
A file with more than one macro should follow these conventions:
- There is one macro, which is the main focal point.
- The file is named for the main macro or idea.
- All other macros within the file are only used for the purposes of the main idea and not used by other macros outside of the file.

Use new lines to visually indicate logical blocks of Jinja or to enhance readability. For example:

{%- set orig_cols = adapter.get_columns_in_relation(ref('lab_orders')) %}

{%- set new_cols = dbt_utils.star(
      from=ref('lab_result'),
      except=orig_cols
) %}

{# original columns. {{ col }} is indented here, but choose what will satisfy #} 
{# your own balance for Jinja vs. SQL readability.#} 
{%- for col in orig_cols %}
      {{ col }}
{% endfor %}

-- column difference
{{ new_cols }}

Use new lines within Jinja delimiters and arrays if there are multiple arguments. For example:

{%- dbt_utils.star(
      from=ref('core__lab_result'),
      except=[
            'encounter_id'
          , 'accession_number'
          , 'source_code'
      ],
      prefix='lab_'
) %}

Variables within macros should be descriptive for readability. For example, {%- for col in orig_cols %} rather than {%- for x in y %}.
Use comment syntax to describe complicated blocks of code or dependencies. For example, {#- this is a comment #}.

dbt Style Guide​

Model Organization​

Model File Naming and Coding​

Model Testing​

Model Configurations​

YAML and Markdown style guide​

SQL Style Guide​

CTEs​

Example SQL with CTEs​

Jinja Style Guide​