How Semantic Lead Matching Works

HomeStar uses semantic search and vector embeddings to automatically route leads to the most appropriate agent. This document explains how the system works and why it’s more effective than traditional keyword-based routing.

The Problem with Traditional Lead Routing

Keyword-Based Matching Limitations

Traditional real estate lead routing relies on exact keyword matches:

Client mentions “downtown condo” → Route to agent with “condo” specialty
Client mentions “Eagle” → Route to agent serving “Eagle”
Client mentions nothing specific → Route randomly or to primary contact

This approach fails when:

Synonyms differ — “Luxury home” vs “high-end property” vs “executive residence”
Context matters — “First home” (buyer) vs “first rental property” (investor)
Implicit meaning — “Downsizing after kids left” implies senior/empty-nester
Multiple factors — “Family relocating to Eagle for tech job, need good schools”

How Semantic Matching Works

Overview of the Process

Agent Profile Creation — Agent describes ideal clients in natural language
Embedding Generation — System converts preferences to numerical vectors
Lead Submission — Client submits inquiry with message
Lead Analysis — System converts inquiry to vector
Similarity Calculation — System measures “distance” between lead and each agent
Routing Decision — Lead goes to agent with highest match score

Vector Embeddings Explained

A vector embedding is a numerical representation of meaning. Instead of matching exact words, the system captures semantic concepts.

Example conversion:

Text: "I'm looking for a luxury home in Eagle with mountain views"

Vector: [0.23, -0.45, 0.67, ..., 0.12]  (768 numbers)
       ↑
       Each dimension represents abstract semantic features:
       - Property prestige level
       - Geographic preferences
       - Natural feature interests
       - Lifestyle indicators
       - Budget signals

Agent preference:

Text: "I specialize in high-end properties in Eagle and Meridian,
       particularly homes with mountain views"

Vector: [0.25, -0.42, 0.69, ..., 0.15]  (768 numbers)
       ↑
       Very similar to the lead's vector (high match score)

The system measures the mathematical “distance” between these vectors. Similar meaning = close distance = high match score.

Why Semantic Matching Outperforms Keywords

Synonym Recognition

Lead inquiry: “Looking for my first house, budget around $300k”

Keyword matching:

✗ No exact match for “first-time buyer”
✗ Might route to wrong agent

Semantic matching:

✓ Recognizes “first house” ≈ “first-time buyer”
✓ Routes to agent specializing in entry-level buyers

Contextual Understanding

Lead inquiry: “We’re empty-nesters looking to downsize from our 4-bedroom”

Keyword matching:

✗ Matches “downsize” but misses demographic context
✗ Might route to any downsizing specialist

Semantic matching:

✓ Recognizes “empty-nester” context
✓ Routes to agent specializing in seniors/downsizing
✓ Considers property type transition (large → smaller)

Multi-Factor Matching

Lead inquiry: “Military family relocating to Boise area, need to close quickly, interested in Eagle or Meridian neighborhoods with good schools”

Keyword matching:

✗ Matches “Eagle” and “Meridian” but misses nuance
✗ Can’t weigh multiple factors

Semantic matching:

✓ Weights: Military (high), Relocation (high), Timeline urgency (medium), School quality (high), Geographic flexibility (Eagle/Meridian both acceptable)
✓ Routes to agent with military + relocation experience in those areas

The Technology Stack

Natural Language Processing (NLP)

HomeStar uses transformer-based language models to generate embeddings. These models are trained on massive text datasets to understand:

Word relationships (synonyms, antonyms, hierarchies)
Contextual meaning (same word, different meanings based on context)
Implicit signals (sentiment, urgency, sophistication level)

Vector Similarity Calculation

The system uses cosine similarity to measure how closely two vectors align:

Score 1.0 — Identical meaning (perfect match)
Score 0.8-1.0 — Very similar (excellent match)
Score 0.6-0.8 — Somewhat similar (acceptable match)
Score < 0.6 — Different concepts (poor match)

Example scores:

Lead Inquiry	Agent Specialty	Score	Interpretation
”Luxury condo downtown"	"High-end urban properties”	0.92	Excellent match
”Luxury condo downtown"	"Luxury homes in suburbs”	0.74	Partial match (luxury yes, location no)
“Luxury condo downtown"	"First-time buyer specialist”	0.45	Poor match

Advantages Over Other Approaches

vs. Manual Round-Robin

Round-robin problems:

Ignores agent expertise
Wastes leads on poor fits
Frustrates both agents and clients

Semantic matching benefits:

Expertise-based routing
Higher conversion rates
Better client experience

vs. Simple Rules-Based Routing

Rules-based limitations:

IF inquiry contains "luxury" AND price > $500k
  THEN route to luxury agent
ELSE IF inquiry contains "first time"
  THEN route to first-time buyer agent
ELSE
  THEN route to primary contact

Problems:

Brittle (fails on unexpected phrasing)
Maintenance burden (rules grow complex)
Can’t handle nuance

Semantic matching:

Handles any phrasing
No rule maintenance required
Captures subtle distinctions

vs. Keyword Weighting

Weighted keywords approach:

luxury: +10 points
first-time: +8 points
investor: +12 points

Problems:

Still keyword-dependent
Weights are arbitrary and hard to tune
No context understanding

Semantic matching:

Learns context automatically
Adapts to language evolution
Weights factors naturally

How Agents Benefit

Better Lead Quality

Agents receive leads that:

Match their actual expertise
Fit their preferred price points
Are in their service areas
Involve client types they excel with

Result: Higher conversion rates, less wasted time

Implicit Preference Recognition

Agents don’t need to predict every possible phrasing. The system understands:

“Starter home” = “First-time buyer”
“Move-up buyer” = “Upsizing”
“Executive property” = “Luxury”
“Investment property” = “Investor”

Continuous Learning

As agents update their “ideal leads” descriptions, the system immediately:

Generates new embeddings
Adjusts matching behavior
Routes future leads accordingly

No rule rewriting or admin intervention required.

How Clients Benefit

Better First Contact

Clients connect with agents who:

Understand their specific needs
Have relevant experience
Know their preferred areas
Work with their budget

Faster Response

Leads go to agents who are genuinely interested in that type of client, leading to:

Faster response times
More enthusiastic engagement
Better initial communication

Reduced Friction

Clients don’t get passed around between agents looking for the “right fit.” First contact is usually the right contact.

Technical Implementation

Embedding Model

HomeStar uses the Ollama embedding service with open-source transformer models. Key characteristics:

768-dimensional vectors — Balance between expressiveness and performance
Self-hosted — No third-party API dependencies
Fast generation — Sub-second embedding creation

Database Integration

Embeddings are stored in SQLite with the sqlite-vec extension:

-- Agent embeddings stored directly in agent table
CREATE TABLE agents (
    id INTEGER PRIMARY KEY,
    name TEXT,
    preferences TEXT,  -- Natural language description
    embedding BLOB     -- 768-float vector
);

-- Similarity search
SELECT
    agent_id,
    vec_distance_cosine(embedding, :lead_embedding) AS score
FROM agents
WHERE lead_recipient = 1 AND active = 1
ORDER BY score DESC
LIMIT 1;

Performance

Embedding generation: ~50-200ms per text block
Similarity search: <10ms across hundreds of agents
Scalability: Linear with agent count (easily handles 1000+ agents)

Future Enhancements

Potential Improvements

Learning from outcomes — Track which matches led to successful transactions
Seasonal adjustments — Weight vacation home specialists higher in summer
Multi-language support — Generate embeddings for non-English inquiries
Confidence scoring — Surface low-confidence matches for manual review

Research Directions

Hybrid approaches — Combine semantic matching with hard constraints (geography, price)
Dynamic re-weighting — Adjust importance of factors based on market conditions
Explainability — Show agents why they received a particular lead

Conclusion

Semantic lead matching transforms lead routing from a mechanical process (keywords, rules) into an intelligent one (meaning, context, nuance). By understanding what clients actually want—not just what words they use—the system connects them with the agents best equipped to help.

Core principle: Match based on meaning, not just words.