Making Blog Search Smarter with LLMs and Open WebUI
We recently released LLM Expanded Search for our blog’s vector search. It builds on what we covered in our earlier posts about AI-powered search and vector search basics. Here’s how we built it with our internal AI setup (Open WebUI running an OpenAI-compatible API), why it makes search better, and what’s coming next.
What “LLM Expanded Search” actually does
Here’s the basic idea: when you search for something, we first ask an LLM to come up with related terms and phrases. Then we search for all of those terms, not just your original query.
- Your search gets expanded by an open-source LLM through our AI portal (Open WebUI with an OpenAI-compatible API)
- Those extra terms give our vector index more ways to find posts that match what you’re looking for
- We combine the results, remove duplicates, and sort by relevance before showing the best matches with snippets and links
This really helps with short or vague searches where regular vector search might miss the relevant context — for example, “S3” refers to Amazon S3, which is a cloud object storage system, so whereas “S3” doesn’t provide enough context for a useful vector search. An LLM can expand this short search and include context about cloud object storage in general, as well as give enough context to return results about S3.
How it works
The frontend is pretty straightforward: our search bar has two options, “Search” (just hit Enter) and “LLM Expanded Search” (Shift/Ctrl/Command+Enter).
When you use expanded search, here’s what happens:
- We call our Open WebUI endpoint with a prompt that asks for 8–15 related terms
- We turn both your original query and the expanded terms into embeddings
- We search our vector store with all these terms and combine the results
- Caching and rate limiting keep things fast and cheap
Here’s a simple example of how we expand queries:
from openai import OpenAI
import os
client = OpenAI(
base_url=os.getenv("OPENAI_BASE_URL"), # e.g., http://openwebui.local/api/v1
api_key=os.getenv("OPENAI_API_KEY") # token managed in your environment
)
def expand_query(raw_query: str) -> list[str]:
messages = [
{
"role": "system",
"content": (
"You expand a short search query into a concise, comma-separated list of "
"synonyms and closely related phrases (8–15 items). No explanations."
)
},
{"role": "user", "content": raw_query}
]
res = client.chat.completions.create(
model=os.getenv("OPENAI_MODEL", "local-llm"),
messages=messages,
temperature=0.2,
max_tokens=200,
)
text = res.choices[0].message.content
return [t.strip() for t in text.split(",") if t.strip()]
After that, we embed the original query and the expanded terms, search the vector index, then sort by score and drop duplicates so each post appears once. Finally, we render concise snippets.
For example, after a similarity search you can rank and de-duplicate like this:
# given: results = [(doc, score), ...]
valid = [(d, float(s)) for d, s in results if float(s) > 0.05]
valid.sort(key=lambda x: x[1], reverse=True) # highest score first
seen = set()
unique = []
for doc, score in valid:
src = doc.metadata.get("source", "")
if src not in seen:
unique.append((doc, score))
seen.add(src)
# unique now holds top ranked, de‑duplicated posts
Why we chose Open WebUI
A few reasons made Open WebUI the right choice:
- It’s open source and works great self-hosted
- The OpenAI-compatible API means we can drop it into existing code
- We can use whatever models and inference backends we want
- It’s easy to experiment with different prompts and workflows
What’s next: Moving more into Open WebUI
We’re looking into moving more of the search pipeline directly into Open WebUI workflows:
- Query expansion (LLM)
- Vector retrieval (custom tool that hits our index)
This would give us tighter integration, fewer network calls, and simpler deployment, and make it easier to try new approaches.
What you’ll notice when using it
- Short searches work way better, you get more relevant results and fewer dead ends
- It’s still experimental, so sometimes results might drift into related topics. Stick with regular “Search” if you want more exact matches
- We cache common terms to keep things smooth
Give it a try at our blog. Just use the search bar in our header: press Enter for regular search, or Shift/Ctrl/Command+Enter for LLM Expanded Search.
Want to know more about why we built this? Check out the announcement and vector search posts linked above.
If you’re interested in setting up LLM-expanded vector search or running something similar self-hosted with Open WebUI, we’d love to help out.
search api open-source artificial-intelligence tools python
Comments