dbt Snapshot - Data Blog by Paing

- dbt v1.9 မတိုင်ခင် အသုံးပြုခဲ့တဲ့ Legacy Snapshot configuration ကို ဒီ [link](https://docs.getdbt.com/reference/resource-configs/snapshots-jinja-legacy) မှာ ဖတ်ပါ။ Legacy မှာတုန်းက SQL-based configuration ကို အဓိက သုံးပြီး v1.9 နောက်ပိုင်းမှာ YAML-based configuration ကို dbt က recommend လုပ်ပါတယ်။ ### Example - SQL based config Example ```sql {% snapshot orders_snapshot %} {{ config() }} SELECT * FROM {{ source('jaffle_shop', 'orders') }} {% endsnapshot %} ``` - snapshots တွေကို များသောအားဖြင့် project root directory မှာပဲ `snapshots` folder တစ်ခု ဆောက်ပြီး ထားလေ့ ရှိပါတယ်။ - အဲဒီ location ကို `snapshot-paths: ["snapshots]` ဆိုပြီး dbt_project.yml file ထဲမှာ ကြေညာရမယ်။ - အဲဒီ location ရဲ့ ပြင်ပမှာ snapshot query သွားရေးရင် dbt က Error ပြလိမ့်မယ်။ ## Snapshot ကြေညာနည်း - snapshots ကို ဒီ နေရာ ၃ ခုထဲမှာ ကြေညာလို့ ရတယ်။ ဦးစားပေးမှု အစဥ် (နံပါတ်စဥ်) အလိုက် အရေးပါတယ်။ 1. sql file ထဲက `config` block ထဲမှာ။ 2. model sql file အတွက် အသုံးပြုတဲ့ `.yml` file ထဲက `config` resource property ထဲမှာ။ 3. [[4.dbt_YAML_configs#dbt_project.yml|dbt_project.yml]] ထဲမှာ `snapshots:` ဆိုတဲ့ key ကို အသုံးပြုပြီး ကြေညာလို့ ရတယ် ။ အဲဒီမှာ အရေးကြီးတာက resource path တွေကို ထည့်ပေးဖိုပဲ။ ဥပမာ - ```yml ## dbt_project.yml snapshots: <project_name>: <folder_name>: +target_schema: <string> +target_database: +unique_key: +strategy: timestamp | check +updated_at: updated_at +check_cols: [col1, col2] | all ``` ## Snapshot Strategies - Snapshot Strategies ဆိုတာက dbt က row တစ်ကြောင်း ပြောင်းလဲသွားတယ် ဆိုတာကို သိနိုင်ဖို့အတွက် သတ်မှတ်ရတာ ဖြစ်တယ်။ အဲဒီအတွက် နည်းလမ်း ၂ မျိုး ရှိတယ်။ 1. ==Timestamp Strategy== - timestamp column တခုကို အသုံးပြုပြီး source data ရဲ့ အပြောင်းအလဲတွေကို identify လုပ်တာ။ (dbt က preferred လုပ်တဲ့၊ efficient လဲ အဖြစ်ဆုံး နည်းလမ်း) 2. ==Check== - Columns တွေ ကြေညာထားတဲ့ list တစ်ခုကို အသုံးပြုပြီး လက်ရှိ current နဲ့ အရင် history အကြား ဘယ် row က ပြောင်းလဲသွားသလဲ ဆိုတာကို ဆုံးဖြတ်တယ်။ - `check_cols` ဆိုတဲ့ parameter ကို မဖြစ်မနေ အသုံးပြုရမယ်။ - အသုံးပြုလို့ ရနိုင်မယ့် `updated_at` column ကောင်းကောင်းတစ်ခု မရှိတဲ့ table တွေအတွက် အသုံးဝင်တယ်။ Changes ကို detect - `dbt snapshot` command ကို အသုံးပြုပြီး run တယ်။ - ပထမဆုံး run မှာ - initial snapshot table တခု ဆောက်တယ်။ - SELECT statement မှာ all columns - `dbt_valid_to = NULL` - နောက်ပိုင်း run တဲ့အခါမှာ - Changed records တွေကို စစ်တယ် - Changed records တွေရဲ့ `dbt_valid_to` column ကို update လုပ်တယ်။ - New records တွေမှာ `dbt_valid_to = NULL` သတ်မှတ်တယ်။ #### invalidate_hard_deletes - Source table မှာ record က ပျောက်ဆုံး (disappear) ဖြစ်သွားရင် ဘာလုပ်မလဲ ဆိုတာကို handle လုပ်တဲ့ option ဖြစ်တယ်။ - Snapshot က `updated_at` (or) `check_cols` ကို အသုံးပြုပြီး changes တွေကို detect လုပ်ပေမယ့် row တစ်ကြောင်းက source table ထဲကနေ လုံးဝ hard delete လုပ်ခံရရင် ဘယ်လို လုပ်ကြမလဲ။ **Case 1 - invalidate_hard_deletes=False** (Default) - အောက်က User တစ်ယောက်ကို delete လုပ်လိုက်တယ် ဆိုပါစို့ | id | first_name | last_name | email | dbt_valid_to | | --- | ---------- | --------- | -------------- | ------------ | | 1 | Anna | Smith | [email protected] | null | - Default အတိုင်းပဲ **False** ထားထားရင် dbt က အဲဒီ row delete လုပ်ခံလိုက်ရမှန်းကို မသိလိုက်ဘူး။ active row အနေနဲ့ပဲ ဆက်ထားပြီး `dbt_valid_to` ကိုလဲ NULL အဖြစ်ပဲ သတ်မှတ်ထားမယ်။ **Case 1 - invalidate_hard_deletes=True** - **True** သတ်မှတ်ရင်တော့ dbt က အဲဒီ row ကို invalidate အဖြစ် သတ်မှတ်ပြီး `dbt_valid_to` ကိုလဲ update ပြုလုပ်မယ်။ | id | first_name | last_name | email | dbt_valid_to | |----|------------|-----------|----------------|---------------------| | 1 | Anna | Smith | [email protected] | 2025-10-05 12:00:00 |