AI-Powered Vendor Research

When you need to find vendors for a project — caterers, photographers, printing companies, whatever — the process is always the same. You search. You click through a dozen websites. You hunt for contact info buried in footers and "About" pages. You evaluate quality from reviews, portfolios, and gut instinct. Then you sit down and write personalized emails to each one, trying to sound like you didn't just copy-paste the same template eight times.

It takes hours. For a short list. I built a pipeline to automate the entire thing.

What the pipeline does

The research assistant is an interactive CLI tool that runs a four-stage process: search, scrape, qualify, and outreach. You tell it what you're looking for, where, and what matters most. It finds candidates, extracts their contact information, scores them against your criteria, and — when you're ready — sends personalized emails through Gmail.

The whole pipeline is designed to be resumable. Every session is persisted to disk, so you can search today, qualify tomorrow, and send outreach next week. No duplicate searches. Full audit trail.

  SEARCH           SCRAPE           QUALIFY          OUTREACH
  ──────           ──────           ───────          ────────
  User input:      Visit each       Score each       User selects
  business type    vendor site      vendor           targets +
  + location       + extract        (0-100):         email template
  + criteria       contact info                       │
    │              via regex        40% rating        │
    ▼              patterns         20% reviews       ▼
  Google Search      │              20% distance    Template engine
  API (free)         ▼              20% contact     renders {{vars}}
    │              emails,          availability      │
    ▼              phones,            │               ▼
  candidate        addresses          ▼             Gmail API
  list               │              ranked          sends via
                     ▼              shortlist       OAuth
                   enriched
                   vendor data

Stage 1: Search

The user provides three inputs: business type, location, and any specific criteria. "Event photographers in Grand Rapids with outdoor experience." "Commercial printers within 30 miles that do large format." The search query gets constructed and sent to the Google Search API.

A key design decision here: I use the free googlesearch-python library instead of the Google Places API. Places charges between $0.20 and $2.00 per query depending on the data fields you request. For a tool that might run dozens of searches in a session, that adds up fast.

The tradeoff is speed. The free library is slower — rate-limited to avoid blocks. But it returns actual website URLs instead of just business listings, which means the next stage can scrape contact information directly from vendor sites. Places gives you a phone number and an address. Web scraping gives you email addresses, contact forms, social links, and sometimes pricing pages.

For vendor research, the richer data is worth the wait.

Stage 2: Scrape

For each candidate from the search results, BeautifulSoup visits their website and extracts contact information. The scraper uses regex patterns to find email addresses, phone numbers, and physical addresses embedded in page content, footers, and contact pages.

This is inherently messy work. Websites aren't structured data. Some vendors put their email in a mailto link. Others render it as an image to prevent scraping. Some have a contact form but no direct email anywhere on the site. The scraper does its best and records what it finds — along with what it couldn't find, which becomes part of the qualification score.

Every scraped result gets stored as a Python dataclass. Clean serialization, type hints, and a consistent shape that the downstream stages can rely on. If the data model needs a new field — say, a LinkedIn URL — it gets added once and flows through the entire pipeline.

Stage 3: Qualify

Raw search results aren't useful. You need to know which vendors are worth reaching out to. The qualification stage runs a scoring model that evaluates each candidate on a 0-100 scale with four weighted factors:

Rating (40%): Average review score from available sources. Higher is better, obviously, but a 4.8 with 200 reviews beats a 5.0 with three.
Review count (20%): Volume of reviews as a signal of establishment and reliability. Normalized against the highest count in the result set.
Distance (20%): Proximity to the target location. Closer vendors score higher. Configurable radius.
Contact availability (20%): Did the scraper find a direct email? A phone number? Both? Vendors with no discoverable contact info score zero on this factor — you can't do outreach to someone you can't reach.

The weights are configurable. If you care more about proximity than ratings, bump distance to 40% and drop something else. The defaults reflect my own priorities: quality first, reachability second.

After scoring, the pipeline produces a ranked shortlist. The top candidates float up. The ones with no contact info or poor ratings sink. You review the list and decide who to contact.

Stage 4: Outreach

The outreach stage is where the human stays firmly in the loop. You select which vendors to contact and choose an email template. The template engine renders personalized messages using {{variables}} — vendor name, business type, project details, specific notes you want to include.

The rendered emails go through the Gmail API via OAuth. No SMTP credentials floating around. No app passwords. Authentication is handled through the OS Keychain, same as every other credential in the system.

Each sent email gets logged to the session file. When you come back next week, you can see exactly who you contacted, when, and with what message. No accidental double-sends. No lost threads.

The architecture

The stack is deliberately simple. Python 3.9+, googlesearch-python for search, BeautifulSoup4 for scraping, and the Gmail API for outreach. No database. No web framework. No deployment infrastructure. Everything runs locally through an interactive CLI.

Session data persists as JSON files. Each research session gets its own file with a timestamp. The data model uses Python dataclasses throughout — vendors, search results, qualification scores, outreach records. Type safety without the overhead of an ORM.

The best architecture for a tool you'll use weekly is one you can understand in five minutes and debug in ten. Complexity is not a feature.

The pipeline is also designed to be extensible without being over-engineered. The search stage is an interface. Right now it wraps Google Search. If you wanted to add LinkedIn company search or Yelp results, you'd write a new search provider that returns the same data shape. The qualification stage doesn't care where the data came from — it just scores whatever it receives.

Same principle for outreach. Right now it's email via Gmail. If you wanted to add LinkedIn InMail or SMS, you'd write a new outreach provider. The template engine and variable system stay the same.

Key decisions and tradeoffs

Free search over paid API. Slower, but returns richer data (actual URLs vs. business listings) and costs nothing. For a tool that runs dozens of queries per session, the cost savings are meaningful.

Local persistence over cloud storage. Research sessions are personal work product. They don't need to live in a database. JSON files are portable, inspectable, and trivial to back up. If the tool grows to serve a team, persistence could be swapped to a database without touching the pipeline logic.

Human-in-the-loop for outreach. The pipeline could send emails automatically to every vendor above a score threshold. It doesn't. Outreach is a relationship-building action. You should read the message before it goes out. You should decide who gets it. Automation handles the research. Judgment handles the communication.

What this replaced

Before this tool, vendor research was a half-day project. Open a browser. Search. Click through sites. Copy email addresses into a spreadsheet. Write individual messages. Send them one at a time. Lose track of who you contacted and when.

Now it's a focused session. Define the search. Let the pipeline run. Review the scored results. Select targets. Send. The whole process — from "I need a photographer" to "five personalized emails sent" — takes about twenty minutes. The boring parts are automated. The judgment calls are still yours.

That's the right balance. The machine does the work it's good at — searching, scraping, scoring, templating. The human does the work that requires taste — choosing who to contact, reviewing the message, deciding whether the tone is right for this particular vendor.

The pipeline doesn't make you faster at vendor research. It makes vendor research a thing you can actually do in the time you have.