Natural Language to SQL: How Text-to-SQL Works
Natural language to SQL (text-to-SQL) translates human questions into database queries. Learn how it works, its limitations, and why semantic approaches are more reliable.
Natural language to SQL (also called text-to-SQL or NL2SQL) is a technology that translates questions asked in natural language into SQL queries that can be executed against databases. When you ask "What were sales last month?", a text-to-SQL system generates something like:
SELECT SUM(amount)
FROM sales
WHERE sale_date >= '2024-01-01'
AND sale_date < '2024-02-01'
This translation is what powers many conversational analytics tools.
How Text-to-SQL Works
Step 1: Parse the Question
The system breaks down the natural language:
- Subject: What is being asked about? ("sales")
- Action: What operation? ("sum")
- Filter: What constraints? ("last month")
- Output: What format? (single value)
Step 2: Schema Mapping
Map natural language terms to database elements:
- "Sales" →
salestable - "Amount" →
amountcolumn (guessed from context) - "Last month" → date calculation
Step 3: Query Construction
Build SQL that implements the intent:
- SELECT clause with appropriate aggregation
- FROM clause with correct tables
- WHERE clause with filters
- JOINs if multiple tables needed
Step 4: Validation
Some systems validate the generated query:
- Is the SQL syntactically correct?
- Do the referenced tables/columns exist?
- Are the joins valid?
Step 5: Execution and Response
Execute the SQL and format results for the user.
Why Text-to-SQL Is Hard
Ambiguity
"Show me sales" could mean:
- Total sales amount
- Number of sales
- Sales records
- Sales by region
- Sales for a specific period
The AI must pick an interpretation.
Schema Complexity
Real databases have:
- Hundreds of tables
- Unclear naming conventions
- Multiple possible join paths
- Undocumented business logic
Business Logic
SQL alone doesn't encode business rules:
- What's "active" vs "churned"?
- How is revenue recognized?
- What exclusions apply?
Language Variation
Users ask the same question many ways:
- "Revenue last quarter"
- "Q4 sales"
- "What did we make in the last three months?"
- "Show income for Oct-Dec"
Accuracy Challenges
Text-to-SQL accuracy is measured on benchmarks like Spider and WikiSQL. State-of-the-art systems achieve:
- 60-70% on complex multi-table queries
- 75-85% on simpler single-table queries
- Lower on real-world business schemas
Even 80% accuracy means 1 in 5 queries is wrong - often without obvious indication.
The Semantic Alternative
Instead of generating SQL from scratch, semantic approaches:
- Map questions to semantic concepts: "Revenue" → defined Revenue metric
- Query the semantic layer: Use the metric's certified definition
- Let the semantic layer generate SQL: Correct SQL is determined by the metric definition
This approach achieves 95%+ accuracy for supported queries because the AI isn't guessing - it's using explicit definitions.
When Text-to-SQL Works
- Simple schemas with clear naming
- Single-table queries
- Basic aggregations and filters
- Users who can validate results
When Text-to-SQL Struggles
- Complex business schemas
- Multi-table joins
- Business logic and rules
- High-stakes decisions
- Users who can't validate
For trustworthy analytics, semantic approaches are generally preferable to raw text-to-SQL.
Questions
Academic benchmarks show 70-85% accuracy on standardized tests. Real-world performance on complex business schemas is often lower (50-70%). Accuracy varies significantly based on query complexity, schema quality, and how well the AI understands the domain.