Vendors

Commercial property maintenance platform

AI-driven products including smart-bookmarking, messaging platform and the largest and most data-rich database of commercial property maintenance vendors in the US.

Website:discovervendors.com

Role:Founder, Product, Design, Engineering

Duration:2025 - Present

Skills:

agentic-development ai-assisted-coding ui-ux-design frontend

Tech:

Agents SDKClaude CodeNext.jsTypeScriptTailwind CSSSupabase

View Live

Introduction

discovervendors.com helps commercial property managers find and manage dedicated property maintenance vendors. I designed and built the product end to end: interface design, architecture, backend, data extraction pipeline, and search infrastructure.

Every line of code was written with Claude Code, not through loose "vibecoding" but with tightly controlled, AI-assisted development.

The two biggest challenges on this project were:

Efficiently extracting complete and precise vendor data from more than 3M web pages on a tight budget.
Designing an intuitive way to integrate an agentic assistant into the UI.

Note: This project was built in 2025. Some of the technologies and methodologies used are already obsolete.

Frontend

Next.js 16, React 19, TypeScript, Tailwind, ShadCN, Supabase with pgvector, OpenAI Agents SDK, OpenAI ChatKit

UI/UX Design

UI

The interface is built on a custom-themed shadcn/ui library with modern, minimal colors and typography. It is a deliberate departure from the cluttered look of most property management software, which helps the product stand out. The minimalist design also supports the UX: no visual noise or gimmicks, just focused UI that guides property managers through their tasks.

UX

The product and UX were developed from over 20 hours of semi-structured and exploratory interviews with property managers and vendors. Those interviews uncovered a consistent set of pain points on both sides of the market.

For property managers:

Communication is fragmented. Coordination with vendors is spread across email threads, text messages, and phone calls. Details get lost, and digging up old information is nearly impossible.
Volume is massive. An average commercial building generates roughly 450 maintenance line items per year and requires around 300 different vendors to handle them. With clients expecting 3–5 bids per item, the number of vendor relationships a manager needs far exceeds what most teams can maintain.
Sharing preferred vendors is a mess. Teams rely on Word docs and Excel sheets shared over email. There is no single source of truth, and lists go stale quickly.
Evaluating bids is tedious. Comparing RFx responses is time-consuming because bids are rarely apples-to-apples—pricing structures differ, scopes drift, and exceptions and risk terms vary across proposals.

For vendors:

Getting invited to bid is hard. The current system is a closed network. Property managers tend to reuse the same vendors, and breaking in as a new provider is difficult regardless of qualifications.
Marketing spend is wasted. Vendors invest significant budgets in marketing and outreach, but most of it never reaches the right decision-makers. They are actively looking for better ways to generate new business.

UX/UI Gallery

AI-Assisted Development

I don’t write code line by line anymore. All of my recent projects are built through AI-assisted, spec-driven and test-driven development, supported by custom tooling for context management, compliance checks, and persistent memory.

Vibe coding here means letting the model “just build it” from loose prompts—minimal specs, no clear constraints, and very little structure. It’s fine for quick prototypes, but it tends to produce brittle code and inconsistent behaviour.

By contrast, AI-assisted development in this context means treating the model as a disciplined collaborator: it writes most of the code, tests, and glue, but it does so from explicit specs, skills, and tooling, with me designing the architecture and validating the output.

For 100% AI-assisted work, the setup matters more than any single prompt. My process is built around three practices:

Spec(ification)-driven development
Dynamic context management
Test-driven development

Spec(ification)-driven development

Spec-driven development has become one of the most effective parts of my AI-assisted workflow. I invest a lot of time upfront in detailed spec docs before any implementation starts.

I use a feature-interview agent that takes a feature description and interrogates it: edge cases, business rules, constraints, failure states, and UX expectations. The output is saved as spec-[feature].md and becomes the source of truth for the agents implementing the work.

Backfilling specs from experiments

Not every project starts with a perfect spec. For more exploratory or experimental work, I often begin with smaller, incremental AI-assisted changes to find the right interaction, design, feature or behaviour.

Once the feature set feels right and the functionality is stable, I create a spec document from the working solution. That spec becomes the new source of truth, and I then have Claude Code completely rewrite the implementation based on the spec—treating the existing code as a loose reference, not something to preserve.

This “spec after exploration” loop has been very effective. It turns incremental experiments into clean, consistent, spec-compliant code without losing what worked.

Dynamic context management

Being deliberate about how the models get and keep context has had the biggest impact on the quality of the output.

Skills

Unlike many setups, my CLAUDE.md file is intentionally light. It only contains project-wide information like stack details, folder structure, and core conventions.

Everything else lives in skills. A sample of the skills for this stack:

Frontend
Backend
Supabase
AI-agents
Context

At the time of writing, Claude Code is still inconsistent at loading all relevant skills for a given task, even with optimized one-line skill descriptions. To work around this, I’ve created a skills script that runs on the UserPromptSubmit hook. The script uses an LLM to determine which skills are needed for the current task, and if any required skills are not already present in the conversation, it loads them automatically.

Skills compliance

Once skills are loaded, the next challenge is keeping the model aligned with them as the conversation grows and the context gets noisy.

To keep answers aligned with the active skills, I use a second script wired into the PostToolUse hook. After the model has produced a response (and used any tools it needs), the script asks the LLM to evaluate its own answer against the loaded skills: is it compliant; if not, go back and fix it. In practice, this catches around 90% of skill-level oversights.

The remaining gaps are handled by a custom version of the Claude Code code-review agent. It runs targeted checks for:

Skills and CLAUDE.md compliance
Logic errors and race conditions
Security issues (injection, XSS, the usual OWASP list)
TypeScript strictness

Each issue is scored by confidence, and anything below 80 is filtered out. The agent also runs explicit compliance checks against the active skills and the CLAUDE.md file, which helps catch file-wide violations.

Persistent memory

When working on a feature or bug within a complex feature set, Opus will load the files it thinks are relevant to get the context it needs. The problem is that it often misses important pieces, and business logic decisions are not always obvious just from reading the code.

To fix that, I use a persistent-memory skill that runs on stop. It inspects the changes and decides whether to create or update a .context.md file in the folder it’s working in. The content is deliberately short: a few lines that capture key decisions, constraints, and non-obvious behaviour for that part of the system. On every prompt, the agent reads all .context.md files in the current folder and then walks up the tree until it hits /app or /components (similar to how .CLAUDE files work on a folder level). In practice, this has reduced mistakes, regressions, and incorrect assumptions.

Future improvements

One of the biggest weaknesses in my current setup is that each task still runs inside the same long-lived agent. Ideally, every task would:

Spawn a dedicated sub-agent
Load only the skills it needs
Solve the task in isolation
Return its result back to the main agent

The Claude Code Superpowers plugin already implements much of this pattern (among other things), but it inherits the same underlying limitation around reliably loading all relevant skills. To get the best of both worlds, I would need to combine Superpowers’ sub-agent model with my skills-matcher script so each sub-agent also gets the right skills attached.

Agentic AI: Vendor Data Extraction

“We’re not doing it because it’s easy, we’re doing it because we thought it would be easy.”

Extracting the vendor data has been by far the most challenging part of this project—ironically the part I assumed would be straightforward.

The challenges fall into two areas:

Big data
Agentic AI

Big data challenges

There are an estimated 1M commercial property maintenance vendors with around 3M branches in the US. In this first iteration, I’m extracting data from 120,000 vendors.

Solution 1 — Direct URL extraction

What I tried: Call the OpenAI API with "Visit commercial vendor [URL] and extract the following fields: {name, address, …}", letting the model handle everything.

Result: The requests all succeeded, but the outputs were full of hallucinated data. Models called through the API can’t actually browse to a URL (this was early 2025), so they guessed based on the domain and prior training instead of reading the page.
Solved: Nothing.
Remaining problems: No reliable way to get real vendor data—when asked to “visit” a URL, the model happily invents details.

Solution 2 — Full-site crawl + single-pass extraction

What I tried: Since the model couldn’t access the internet directly, I did the crawling myself: crawl the entire site (all pages), feed all pages to the model, and ask it to extract the vendor fields.

Result: Around 50% of the test sites were blocked by bot protection. For the remaining (often small, local) vendors, the total token count per site ran into the millions once you included HTML, JavaScript, CSS, SVGs, and irrelevant pages like blogs. Hallucinations were still common because the model was overwhelmed with noisy input.
Solved: For sites that weren’t blocked, agents now had full access to vendor content.
Remaining problems: Bot blocking on many sites, massive token usage, and continued hallucinations from overloaded context.

Solution 3 — Adaptive scraper + context management

What I tried: Reduce token usage and avoid blocking by building a custom scraper that escalates scraping strategies based on server response (fetch → headless browser → external scraping service like Oxylabs). Strip out <script>, <head>, <svg>, and other non-content elements, then convert the remaining HTML to Markdown.

Result: Bot blocking was no longer an issue, and total tokens per site dropped significantly. But the content was still too large and too noisy to feed directly into a model for precise extraction. To keep costs down I used 4o mini, and the combination of a cheaper model and large, messy inputs still produced hallucinations.
Solved: Bot blocking and some of the worst token bloat.
Remaining problems: Total cost remained high, and the content size/noise was still causing hallucinations.

Solution 4 — ContentLocationPredictionAgent for targeted crawling

What I tried: Instead of crawling the entire site, crawl just the landing page, extract only links, and hand those to an LLM tool (ContentLocationPredictionAgent). For each extraction area (e.g. “branding”, “operational”, “legal”, “reviews”), I pass the bucket of links plus a description of what I’m looking for, and the agent predicts which links are most likely to contain that content.

ContentLocationPredictionAgent flow diagram

Result: With ContentLocationPredictionAgent, I only extract content from relevant pages—on average 3–5 pages per data area like branding, legal, operational, reviews, etc. Token usage dropped further and general hallucinations decreased.
Solved: Token usage and general hallucinations were now at a manageable level.
Remaining problems: There were still smaller issues with the extracted data like incorrect or non‑existent phone numbers and links, information that occasionally went missing, and the whole pipeline was still slow.

The solution to this leads up to the next challenging area, Agentic AI.

Agentic AI – Data extractor

All agents, sub-agents, and tools for this pipeline are built with the OpenAI Agents SDK.

Solution 5 — Orchestrated sub-agents and parameter tuning

What I tried: With most content issues solved, the remaining problems were wrong or missing fields (phone numbers, links) that showed up in a patterned way. Instead of blaming 4o mini alone, I tightened the sampling settings and changed the architecture: I moved from an OpenAI Agent with automatic tool selection to a normal function that calls each sub-agent explicitly.

In this setup, those sampling settings mostly control how aggressive the model is about “filling in gaps” versus copying what it sees. top_p controls sampling diversity (lower values keep it closer to the most likely tokens), while presence_penalty and frequency_penalty nudge the model away from reusing common tokens and patterns unless the context strongly supports them. In the earlier configuration, a relatively loose setup made 4o mini too willing to guess plausible-looking phone numbers and URLs when the input was ambiguous. Lowering top_p and increasing the penalties made it behave more extractively: it tends to copy numbers and links from the scraped content, instead of inventing new ones.

Result: Previously, the orchestration pattern ran everything through one OpenAI LLM Agent. 4o mini has a large initial context window, but only around a 20K token carry-over between turns. With seven sub-agents sometimes producing large outputs, their combined results could exceed that limit and trigger silent compression—explaining why details sometimes “disappeared”. After the change, the main extractor is no longer an OpenAI Agent; each sub-agent is still technically an agent, but they are invoked in isolation. This pushes the 20K carry-over limit down to each sub-agent call instead of one shared conversation. Solved: The main sources of hallucinations and missing data in this pipeline—aggressive sampling plus a shared, overfilled context window.
Remaining problems: No structural issues left in this extractor beyond normal LLM noise.

Agentic AI: RAG & Semantic Search

Most of the vendor data is normalized and cleaned, but “services offered” is intentionally not. In this industry there are no well-defined categories—some vendors do all of HVAC, others only a tiny subset, and they often describe the same service with very different language.

This is where semantic search comes in. It lets me store “services offered” exactly as vendors describe them, while still allowing users to search in their own words and get good matches based on meaning rather than exact wording.

RAG optimizations

Making semantic search location-aware

Plain semantic search returns the highest-scoring matches, which can be misleading. For example, a query like “Show me the top plumbers in Orange County” might return a plumbing vendor from Ohio and an HVAC vendor in Orange County: one matches “plumbing” perfectly, the other matches the location and is semantically close to “plumbing”.

To fix this, I added two constraints:

Every vendor search requires both a location (like city, street, or neighbourhood) and a service. If the user asks “Find me top plumbers”, the system responds with “In which location?”. Once we have the location, an LLM extracts the county, and the search is limited to vendors whose service areas include that county (service areas are stored as counties).
I raised the minimum similarity score for matches, so HVAC vendors aren’t returned for “plumber” queries just because the embedding space thinks they’re “close enough”.

Embedding normalization

To make semantic search more consistent, I use two small agents:

Vendor normalizer – Converts each vendor record (location, services, size, etc.) into a standardized “embedding text” format and stores that in the database.
Query normalizer – Takes any incoming user query and rewrites it into the same structured format used for vendor embeddings.

So a query like:

“I’m looking for a great vendor in downtown San Francisco who can help with a broken fixture”

gets rewritten to something like:

“Vendor in San Francisco County offering plumbing services.”

Because both the stored vendor text and the rewritten query follow the same structure, the embeddings line up better and the search results become far more accurate.

Live at discovervendors.com.