Snowflake Warehouse Sizing: XS to 6XL – A Field Guide to Scaling Up and Out

Every data engineer faces the same question sooner or later: “What size warehouse should I use in Snowflake?”

From XS to 6XL, Snowflake offers a buffet of warehouse sizes, but picking the right one isn’t just about raw horsepower – it’s about cost efficiency, concurrency, and performance tuning. Get it wrong, and you either waste money or frustrate users. Get it right, and your pipelines and dashboards hum smoothly.


1. What Snowflake Warehouses Really Are

A warehouse in Snowflake isn’t hardware – it’s compute resources running on cloud infrastructure.

  • You pay for compute per-second of usage (with a 60-second minimum).
  • Each size (XS through 6XL) doubles the resources of the previous size.
  • Scaling isn’t just vertical (bigger warehouses), but also horizontal (multi-cluster warehouses).

So the decision is both art and science – balancing your workload type, concurrency, and budget.


2. The Sizing Spectrum: XS → 6XL

Here’s the progression:

  • XS (Extra Small) → Perfect for lightweight transformations, dev/testing, and one-off queries.
  • S (Small) → Entry-level production jobs, light BI dashboards, ad hoc exploration.
  • M (Medium) → Mid-size ETL/ELT jobs, moderately concurrent dashboards, model training prep.
  • L (Large) → Heavier transformations, data science workloads, multiple BI tools hammering at once.
  • XL & 2XL → Enterprise-scale batch jobs, ML scoring at scale, and high-volume BI workloads.
  • 3XL to 6XL → Rare territory – used for massive joins, petabyte-scale crunching, or extreme concurrency.

💡 Rule of thumb: If a query runs slowly on XS, it won’t magically become efficient on XL. Fix bad SQL first before scaling compute.


3. Scaling Up vs. Scaling Out

Scaling Up → Increase the warehouse size.

  • Use when queries are hitting compute limits (e.g., large joins, complex aggregations).
  • Helps reduce query runtimes but increases cost linearly.

Scaling Out → Enable multi-cluster warehouses.

  • Use when concurrency is the bottleneck.
  • Instead of making queries faster, you allow more users to run queries simultaneously without queueing.

💡 Example: A dashboard hammered by 200 users will benefit more from a multi-cluster Medium than a single XL.


4. Field Guide to Common Scenarios

  1. ETL Pipelines (Nightly Loads)
    • Start with Medium or Large.
    • Scale up temporarily if jobs spill to disk or run past SLAs.
  2. Ad-hoc Analytics
    • Small or Medium works fine.
    • Concurrency issues? Try multi-cluster scaling.
  3. High-Concurrency BI Dashboards
    • Multi-cluster Medium or Large is usually cost-effective.
    • Scale out before scaling up.
  4. Machine Learning Feature Engineering
    • Large to 2XL, depending on data size.
    • Consider temporary scaling up, then scale back down.
  5. Executive Dashboards During Quarterly Reviews
    • Auto-suspend + auto-resume with multi-cluster.
    • Keeps costs under control while handling spikes.

5. Best Practices for Cost and Performance

  • Auto-suspend warehouses when idle – no need to burn compute dollars.
  • Auto-resume on demand – so users aren’t blocked.
  • Start small, then scale pragmatically as workloads prove they need more.
  • Use Query Profile to identify inefficiencies – don’t just throw bigger warehouses at bad SQL.
  • Test workloads across multiple sizes – sometimes a Medium warehouse running for 4 minutes is cheaper than a Large running for 2.

6. Wrapping Up

Snowflake gives you flexibility, but it’s up to your team to pick wisely between XS and 6XL. The trick isn’t just about raw size – it’s about matching workloads to the right warehouse and knowing when to scale up or scale out.

Remember: Big warehouses solve some problems, but smart warehouses solve most problems.

Advertisements

Leave a comment

Website Powered by WordPress.com.

Up ↑

Discover more from BrontoWise

Subscribe now to keep reading and get access to the full archive.

Continue reading