ChatGPT Prompts for Data Engineers (2026)

Why AI Works Well for Data Engineering

Data engineering has an unusually high ratio of structured, pattern-based work to creative decision-making. Pipeline boilerplate, schema migrations, test coverage, documentation — these tasks have clear structure that AI handles well. The judgment that requires your expertise: choosing the right architecture, understanding data semantics, deciding when eventual consistency is acceptable.

The result: AI is most valuable for data engineers in the spaces between the hard problems. Use it to compress the time spent on everything that isn't system design and data modeling, and you'll be dramatically more productive.

Pipeline Design & Architecture Prompts

Weak prompt

"Design a data pipeline for my company."

Strong prompt

"You are a senior data engineer. Design an ingestion pipeline for the following scenario: source is a Postgres transactional database (5M rows, 50GB, ~10k writes/min), destination is Snowflake, freshness SLA is 15 minutes, team uses dbt for transformations, infra is AWS. Compare: Fivetran vs Airbyte vs custom Debezium CDC. For each option: implementation effort, operational overhead, cost at this scale, and trade-offs. Recommend one approach with justification."

Incremental load strategy: "Design an incremental load strategy for a table with these properties: [row count, update pattern, has updated_at column yes/no, partitioned yes/no, source system]. Compare full refresh vs append vs merge/upsert. Account for late-arriving data up to [X hours]. Recommend a strategy and write the SQL or config skeleton."
DAG structure review: "Review this Airflow/Prefect/dbt DAG structure [paste config or describe]. Identify: tasks that could run in parallel, missing dependency declarations, tasks that would benefit from retries vs fail-fast, and any anti-patterns. Suggest a refactored structure with expected runtime improvement."

SQL & dbt Prompts

⚡ Context is everything for SQL prompts

Always include your database engine, approximate row counts, and whether the table is partitioned. A query optimization for BigQuery looks completely different from one for Redshift or Postgres. Without this context, AI gives generic advice that may make things worse.

SQL optimization: "You are a [Snowflake/BigQuery/Postgres] query optimization expert. This query runs in [X seconds] on a table with [N rows], partitioned by [column]. [Paste EXPLAIN output if available]. Identify: the most expensive operation, whether indexes or clustering keys are being used, and 3 specific rewrites that would improve performance. Show the rewritten query for the highest-impact change."
dbt model review: "Review this dbt model for: naming conventions (following dbt best practices), correct use of refs vs sources, missing tests (not null, unique, accepted values, relationships), and whether the grain is clearly defined. Suggest tests to add and flag any model that mixes grain levels."
Window function design: "Write a SQL query using window functions to calculate [metric, e.g. 7-day rolling average, session boundaries, first/last event per user]. Table: [describe schema]. Engine: [Snowflake/BigQuery/Postgres]. Include comments explaining the frame clause."
Slow query diagnosis: "Here is a query taking [X seconds] on [engine] with [N rows]: [paste query]. Walk me through a diagnostic process to find the bottleneck. What would you check first? What does each step tell you? End with the most likely culprit and the fix."

Data Quality & Testing Prompts

Data quality check prompt

"You are a data quality engineer. Design a data quality test suite for this table: [table name, columns, grain, source system, business purpose]. Include: (1) completeness checks (nulls, row count vs expectation), (2) accuracy checks (values within expected ranges, referential integrity), (3) consistency checks (cross-table relationships, duplicate detection), (4) timeliness checks (freshness SLA monitoring). For each check: the test logic, the threshold for alerting, and the business impact of a failure."

Data contract draft: "Write a data contract for this dataset: [name, owner team, consumers, schema]. Include: schema definition with types and nullable status, SLA (freshness, availability), quality guarantees, breaking change policy, and consumer notification process. Format: YAML data contract spec."
Anomaly detection rules: "Design statistical anomaly detection rules for [metric/table]. The metric is [description], typical daily range is [X-Y], with [seasonality pattern]. Write rules for: volume anomalies (too many/few rows), distribution shifts, and null rate spikes. Include the SQL or Python logic for each rule."

Schema Design Prompts

Dimensional model: "You are a data warehouse architect. Design a dimensional model for [business domain, e.g. e-commerce orders]. Identify: fact tables (with grain), dimension tables (with type 1/2/3 SCD recommendation), and the relationships between them. Justify each SCD type choice based on the analytical use cases."
Schema migration plan: "Write a migration plan for this schema change: [describe change, e.g. splitting a column, adding a nullable field, changing a type]. The table has [N rows], [downstream consumers list]. Include: the migration SQL, rollback SQL, estimated runtime, and the zero-downtime deployment sequence if possible."
Naming convention review: "Review these table and column names for consistency and clarity: [list names]. Apply these conventions: [your conventions]. Flag: abbreviations that should be expanded, inconsistent casing, names that obscure the data's meaning, and reserved words. Suggest replacements."

Data Documentation Prompts

dbt model description: "Write a dbt model description for this model: [paste SQL or describe]. Include: what this model represents (grain, scope), key columns and their business meaning, how it's used downstream, and any known limitations or gotchas. Format: YAML schema.yml entry."
Data dictionary entry: "Write a data dictionary entry for [column/metric name]. Include: definition (what it measures, not just the name), calculation logic if derived, source system, update frequency, nullability explanation, and example values. Audience: business analysts who will query this directly."
Lineage explanation: "Explain the data lineage for [metric/report] to a non-technical stakeholder. Start from the source system, walk through each transformation step in plain language, and end with how the final number is calculated. Highlight any steps where data quality issues could affect the output."

Generate expert data engineering prompts instantly

GODLE's data engineering role includes expert templates for pipelines, SQL, quality, and documentation.

⚡ Try Data Engineering Prompts

100% free · No signup · Works with ChatGPT and Claude