How Semantic Lead Matching Works
HomeStar uses semantic search and vector embeddings to automatically route leads to the most appropriate agent. This document explains how the system works and why it’s more effective than traditional keyword-based routing.
The Problem with Traditional Lead Routing
Section titled “The Problem with Traditional Lead Routing”Keyword-Based Matching Limitations
Section titled “Keyword-Based Matching Limitations”Traditional real estate lead routing relies on exact keyword matches:
- Client mentions “downtown condo” → Route to agent with “condo” specialty
- Client mentions “Eagle” → Route to agent serving “Eagle”
- Client mentions nothing specific → Route randomly or to primary contact
This approach fails when:
- Synonyms differ — “Luxury home” vs “high-end property” vs “executive residence”
- Context matters — “First home” (buyer) vs “first rental property” (investor)
- Implicit meaning — “Downsizing after kids left” implies senior/empty-nester
- Multiple factors — “Family relocating to Eagle for tech job, need good schools”
How Semantic Matching Works
Section titled “How Semantic Matching Works”Overview of the Process
Section titled “Overview of the Process”- Agent Profile Creation — Agent describes ideal clients in natural language
- Embedding Generation — System converts preferences to numerical vectors
- Lead Submission — Client submits inquiry with message
- Lead Analysis — System converts inquiry to vector
- Similarity Calculation — System measures “distance” between lead and each agent
- Routing Decision — Lead goes to agent with highest match score
Vector Embeddings Explained
Section titled “Vector Embeddings Explained”A vector embedding is a numerical representation of meaning. Instead of matching exact words, the system captures semantic concepts.
Example conversion:
Text: "I'm looking for a luxury home in Eagle with mountain views"
Vector: [0.23, -0.45, 0.67, ..., 0.12] (768 numbers) ↑ Each dimension represents abstract semantic features: - Property prestige level - Geographic preferences - Natural feature interests - Lifestyle indicators - Budget signalsAgent preference:
Text: "I specialize in high-end properties in Eagle and Meridian, particularly homes with mountain views"
Vector: [0.25, -0.42, 0.69, ..., 0.15] (768 numbers) ↑ Very similar to the lead's vector (high match score)The system measures the mathematical “distance” between these vectors. Similar meaning = close distance = high match score.
Why Semantic Matching Outperforms Keywords
Section titled “Why Semantic Matching Outperforms Keywords”Synonym Recognition
Section titled “Synonym Recognition”Lead inquiry: “Looking for my first house, budget around $300k”
Keyword matching:
- ✗ No exact match for “first-time buyer”
- ✗ Might route to wrong agent
Semantic matching:
- ✓ Recognizes “first house” ≈ “first-time buyer”
- ✓ Routes to agent specializing in entry-level buyers
Contextual Understanding
Section titled “Contextual Understanding”Lead inquiry: “We’re empty-nesters looking to downsize from our 4-bedroom”
Keyword matching:
- ✗ Matches “downsize” but misses demographic context
- ✗ Might route to any downsizing specialist
Semantic matching:
- ✓ Recognizes “empty-nester” context
- ✓ Routes to agent specializing in seniors/downsizing
- ✓ Considers property type transition (large → smaller)
Multi-Factor Matching
Section titled “Multi-Factor Matching”Lead inquiry: “Military family relocating to Boise area, need to close quickly, interested in Eagle or Meridian neighborhoods with good schools”
Keyword matching:
- ✗ Matches “Eagle” and “Meridian” but misses nuance
- ✗ Can’t weigh multiple factors
Semantic matching:
- ✓ Weights: Military (high), Relocation (high), Timeline urgency (medium), School quality (high), Geographic flexibility (Eagle/Meridian both acceptable)
- ✓ Routes to agent with military + relocation experience in those areas
The Technology Stack
Section titled “The Technology Stack”Natural Language Processing (NLP)
Section titled “Natural Language Processing (NLP)”HomeStar uses transformer-based language models to generate embeddings. These models are trained on massive text datasets to understand:
- Word relationships (synonyms, antonyms, hierarchies)
- Contextual meaning (same word, different meanings based on context)
- Implicit signals (sentiment, urgency, sophistication level)
Vector Similarity Calculation
Section titled “Vector Similarity Calculation”The system uses cosine similarity to measure how closely two vectors align:
- Score 1.0 — Identical meaning (perfect match)
- Score 0.8-1.0 — Very similar (excellent match)
- Score 0.6-0.8 — Somewhat similar (acceptable match)
- Score < 0.6 — Different concepts (poor match)
Example scores:
| Lead Inquiry | Agent Specialty | Score | Interpretation |
|---|---|---|---|
| ”Luxury condo downtown" | "High-end urban properties” | 0.92 | Excellent match |
| ”Luxury condo downtown" | "Luxury homes in suburbs” | 0.74 | Partial match (luxury yes, location no) |
| “Luxury condo downtown" | "First-time buyer specialist” | 0.45 | Poor match |
Advantages Over Other Approaches
Section titled “Advantages Over Other Approaches”vs. Manual Round-Robin
Section titled “vs. Manual Round-Robin”Round-robin problems:
- Ignores agent expertise
- Wastes leads on poor fits
- Frustrates both agents and clients
Semantic matching benefits:
- Expertise-based routing
- Higher conversion rates
- Better client experience
vs. Simple Rules-Based Routing
Section titled “vs. Simple Rules-Based Routing”Rules-based limitations:
IF inquiry contains "luxury" AND price > $500k THEN route to luxury agentELSE IF inquiry contains "first time" THEN route to first-time buyer agentELSE THEN route to primary contactProblems:
- Brittle (fails on unexpected phrasing)
- Maintenance burden (rules grow complex)
- Can’t handle nuance
Semantic matching:
- Handles any phrasing
- No rule maintenance required
- Captures subtle distinctions
vs. Keyword Weighting
Section titled “vs. Keyword Weighting”Weighted keywords approach:
luxury: +10 pointsfirst-time: +8 pointsinvestor: +12 pointsProblems:
- Still keyword-dependent
- Weights are arbitrary and hard to tune
- No context understanding
Semantic matching:
- Learns context automatically
- Adapts to language evolution
- Weights factors naturally
How Agents Benefit
Section titled “How Agents Benefit”Better Lead Quality
Section titled “Better Lead Quality”Agents receive leads that:
- Match their actual expertise
- Fit their preferred price points
- Are in their service areas
- Involve client types they excel with
Result: Higher conversion rates, less wasted time
Implicit Preference Recognition
Section titled “Implicit Preference Recognition”Agents don’t need to predict every possible phrasing. The system understands:
- “Starter home” = “First-time buyer”
- “Move-up buyer” = “Upsizing”
- “Executive property” = “Luxury”
- “Investment property” = “Investor”
Continuous Learning
Section titled “Continuous Learning”As agents update their “ideal leads” descriptions, the system immediately:
- Generates new embeddings
- Adjusts matching behavior
- Routes future leads accordingly
No rule rewriting or admin intervention required.
How Clients Benefit
Section titled “How Clients Benefit”Better First Contact
Section titled “Better First Contact”Clients connect with agents who:
- Understand their specific needs
- Have relevant experience
- Know their preferred areas
- Work with their budget
Faster Response
Section titled “Faster Response”Leads go to agents who are genuinely interested in that type of client, leading to:
- Faster response times
- More enthusiastic engagement
- Better initial communication
Reduced Friction
Section titled “Reduced Friction”Clients don’t get passed around between agents looking for the “right fit.” First contact is usually the right contact.
Technical Implementation
Section titled “Technical Implementation”Embedding Model
Section titled “Embedding Model”HomeStar uses the Ollama embedding service with open-source transformer models. Key characteristics:
- 768-dimensional vectors — Balance between expressiveness and performance
- Self-hosted — No third-party API dependencies
- Fast generation — Sub-second embedding creation
Database Integration
Section titled “Database Integration”Embeddings are stored in SQLite with the sqlite-vec extension:
-- Agent embeddings stored directly in agent tableCREATE TABLE agents ( id INTEGER PRIMARY KEY, name TEXT, preferences TEXT, -- Natural language description embedding BLOB -- 768-float vector);
-- Similarity searchSELECT agent_id, vec_distance_cosine(embedding, :lead_embedding) AS scoreFROM agentsWHERE lead_recipient = 1 AND active = 1ORDER BY score DESCLIMIT 1;Performance
Section titled “Performance”- Embedding generation: ~50-200ms per text block
- Similarity search: <10ms across hundreds of agents
- Scalability: Linear with agent count (easily handles 1000+ agents)
Future Enhancements
Section titled “Future Enhancements”Potential Improvements
Section titled “Potential Improvements”- Learning from outcomes — Track which matches led to successful transactions
- Seasonal adjustments — Weight vacation home specialists higher in summer
- Multi-language support — Generate embeddings for non-English inquiries
- Confidence scoring — Surface low-confidence matches for manual review
Research Directions
Section titled “Research Directions”- Hybrid approaches — Combine semantic matching with hard constraints (geography, price)
- Dynamic re-weighting — Adjust importance of factors based on market conditions
- Explainability — Show agents why they received a particular lead
Conclusion
Section titled “Conclusion”Semantic lead matching transforms lead routing from a mechanical process (keywords, rules) into an intelligent one (meaning, context, nuance). By understanding what clients actually want—not just what words they use—the system connects them with the agents best equipped to help.
Core principle: Match based on meaning, not just words.