/ blog

# Data Warehousing

Read snowflake docs

Read only data separate from operational data - not updated in place.

Historically longer than operational data - 5 to 10 years or whatever

# Fact vs Dimension

Fact is something concrete like sales, dimension is something adjacent like customer or time.

Star schema: one fact table, multiple dimension tables

Snowflake schema: one fact table, multiple levels of dimension tables

Constellation schema: multiple fact tables linked together by multiple levels of dimension tables

# Data Cube Operations

Make n-d lattice of cuboids to analyze data at a certain granularity.

ex: 3d

year, item, color

a = (2010, bikes, red)
b = (210, bikes, *) // one dimension up via aggregation (roll up)

Data cube materialization: pre-computing cuboids is expensive, on-demand is slow when you need data. Partial materialization is ideal but have to chose what cuboids and cells to pre-compute (heuristics for decision).

Iceberg cube: minimum support for frequent queries