Design an ETL Pipeline
Design a complete ETL pipeline architecture with transformation logic, error handling, and monitoring.
The Prompt
Design an ETL pipeline for the following data flow. Provide: 1. Pipeline architecture diagram (text-based) 2. Extract: source systems, extraction method, frequency 3. Transform: cleaning, joining, business logic rules 4. Load: target destination, load strategy (full/incremental/upsert) 5. Error handling: what to do with malformed or missing data 6. Monitoring: what to alert on 7. Technology recommendations with trade-off notes 8. Code skeleton for the transform step Data pipeline details: - Source: [WHERE DATA COMES FROM] - Target: [WHERE DATA SHOULD GO] - Transformation needed: [WHAT NEEDS TO HAPPEN TO THE DATA] - Frequency: [BATCH / NEAR-REAL-TIME / STREAMING] - Volume: [ROWS PER DAY / GB] - Tech constraints: [EXISTING STACK OR PREFERENCES]
Example Output
Architecture: PostgreSQL → dbt transform → BigQuery → Looker. Extraction: daily cron job using pg_dump → GCS staging bucket. Transform: 4 dbt models with incremental materialization, handling SCD Type 2 for user attributes. Load: BigQuery MERGE statement for upsert. Alert on: row count drop >10%, transform failures, load latency >30 min.
FAQ
Which AI model is best for Design an ETL Pipeline?
Claude Sonnet 4 — excellent at data engineering architecture and code scaffolding.
How do I use the Design an ETL Pipeline prompt?
Copy the prompt, replace the [BRACKETED] placeholders with your specific information, and paste into your preferred AI assistant (ChatGPT, Claude, Gemini, etc.). Architecture: PostgreSQL → dbt transform → BigQuery → Looker. Extraction: daily cron job using pg_dump → GCS staging bucket. Transform: 4 dbt models with incremental materialization, handling SCD Type 2 for user attributes. Load: BigQuery MERGE statement for upsert. Alert on: row count drop >10%, transform failures, load latency >30 min.
Model Recommendation
Claude Sonnet 4 — excellent at data engineering architecture and code scaffolding.