<?xml version="1.0" encoding="utf-8" standalone="yes"?><feed xmlns="http://www.w3.org/2005/Atom">
  <title></title>
  <subtitle></subtitle>
  <id>https://www.endpointdev.com/blog/tags/search/</id>
  <link href="https://www.endpointdev.com/blog/tags/search/"/>
  <link href="https://www.endpointdev.com/blog/tags/search/" rel="self"/>
  <updated>2025-09-29T00:00:00+00:00</updated>
  <author>
    <name>End Point Dev</name>
  </author>
  
    <entry>
      <title>Making Blog Search Smarter with LLMs and Open WebUI</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/09/llm-expanded-vector-search/"/>
      <id>https://www.endpointdev.com/blog/2025/09/llm-expanded-vector-search/</id>
      <published>2025-09-29T00:00:00+00:00</published>
      <author>
        <name>Edgar Mlowe</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/09/llm-expanded-vector-search/stained-glass-flowering.webp&#34; alt=&#34;An ornate pattern flowers out from a circular window in the center of the image, framing plant-shaped stained glass depicting European church images&#34;&gt;&lt;/p&gt;
&lt;!-- Photo by Seth Jensen, 2024. --&gt;
&lt;p&gt;We recently released LLM Expanded Search for our blog&amp;rsquo;s vector search. It builds on what we covered in our earlier posts about &lt;a href=&#34;/blog/2025/08/vector-search-for-the-end-point-blog/&#34;&gt;AI-powered search&lt;/a&gt; and &lt;a href=&#34;/blog/2025/07/vector-search/&#34;&gt;vector search basics&lt;/a&gt;. Here&amp;rsquo;s how we built it with our internal AI setup (Open WebUI running an OpenAI-compatible API), why it makes search better, and what&amp;rsquo;s coming next.&lt;/p&gt;
&lt;h3 id=&#34;what-llm-expanded-search-actually-does&#34;&gt;What &amp;ldquo;LLM Expanded Search&amp;rdquo; actually does&lt;/h3&gt;
&lt;p&gt;Here&amp;rsquo;s the basic idea: when you search for something, we first ask an LLM to come up with related terms and phrases. Then we search for all of those terms, not just your original query.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Your search gets expanded by an open-source LLM through our AI portal (Open WebUI with an OpenAI-compatible API)&lt;/li&gt;
&lt;li&gt;Those extra terms give our vector index more ways to find posts that match what you&amp;rsquo;re looking for&lt;/li&gt;
&lt;li&gt;We combine the results, remove duplicates, and sort by relevance before showing the best matches with snippets and links&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This really helps with short or vague searches where regular vector search might miss the relevant context — for example, &amp;ldquo;S3&amp;rdquo; refers to Amazon S3, which is a cloud object storage system, so whereas &amp;ldquo;S3&amp;rdquo; doesn&amp;rsquo;t provide enough context for a useful vector search. An LLM can expand this short search and include context about cloud object storage in general, as well as give enough context to return results about S3.&lt;/p&gt;
&lt;h3 id=&#34;how-it-works&#34;&gt;How it works&lt;/h3&gt;
&lt;p&gt;The frontend is pretty straightforward: our search bar has two options, &amp;ldquo;Search&amp;rdquo; (just hit Enter) and &amp;ldquo;LLM Expanded Search&amp;rdquo; (Shift/​Ctrl/​Command+Enter).&lt;/p&gt;
&lt;p&gt;When you use expanded search, here&amp;rsquo;s what happens:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We call our Open WebUI endpoint with a prompt that asks for 8–15 related terms&lt;/li&gt;
&lt;li&gt;We turn both your original query and the expanded terms into embeddings&lt;/li&gt;
&lt;li&gt;We search our vector store with all these terms and combine the results&lt;/li&gt;
&lt;li&gt;Caching and rate limiting keep things fast and cheap&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here&amp;rsquo;s a simple example of how we expand queries:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;openai&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; OpenAI
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;client = OpenAI(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    base_url=os.getenv(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;OPENAI_BASE_URL&amp;#34;&lt;/span&gt;),   &lt;span style=&#34;color:#888&#34;&gt;# e.g., http://openwebui.local/api/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    api_key=os.getenv(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;OPENAI_API_KEY&amp;#34;&lt;/span&gt;)      &lt;span style=&#34;color:#888&#34;&gt;# token managed in your environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;expand_query&lt;/span&gt;(raw_query: &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;) -&amp;gt; &lt;span style=&#34;color:#038&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    messages = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;system&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;: (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;You expand a short search query into a concise, comma-separated list of &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;synonyms and closely related phrases (8–15 items). No explanations.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        {&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;: raw_query}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    res = client.chat.completions.create(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        model=os.getenv(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;OPENAI_MODEL&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;local-llm&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        messages=messages,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        temperature=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.2&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        max_tokens=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;200&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    text = res.choices[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;].message.content
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; [t.strip() &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; t &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; text.split(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;,&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; t.strip()]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After that, we embed the original query and the expanded terms, search the vector index, then sort by score and drop duplicates so each post appears once. Finally, we render concise snippets.&lt;/p&gt;
&lt;p&gt;For example, after a similarity search you can rank and de-duplicate like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# given: results = [(doc, score), ...]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;valid = [(d, &lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;(s)) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; d, s &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; results &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;(s) &amp;gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.05&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;valid.sort(key=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;lambda&lt;/span&gt; x: x[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;], reverse=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;)  &lt;span style=&#34;color:#888&#34;&gt;# highest score first&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;seen = &lt;span style=&#34;color:#038&#34;&gt;set&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;unique = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; doc, score &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; valid:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    src = doc.metadata.get(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;source&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; src &lt;span style=&#34;color:#080&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; seen:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        unique.append((doc, score))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        seen.add(src)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# unique now holds top ranked, de‑duplicated posts&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;why-we-chose-open-webui&#34;&gt;Why we chose Open WebUI&lt;/h3&gt;
&lt;p&gt;A few reasons made Open WebUI the right choice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s open source and works great self-hosted&lt;/li&gt;
&lt;li&gt;The OpenAI-compatible API means we can drop it into existing code&lt;/li&gt;
&lt;li&gt;We can use whatever models and inference backends we want&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s easy to experiment with different prompts and workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;whats-next-moving-more-into-open-webui&#34;&gt;What&amp;rsquo;s next: Moving more into Open WebUI&lt;/h3&gt;
&lt;p&gt;We&amp;rsquo;re looking into moving more of the search pipeline directly into Open WebUI workflows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Query expansion (LLM)&lt;/li&gt;
&lt;li&gt;Vector retrieval (custom tool that hits our index)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This would give us tighter integration, fewer network calls, and simpler deployment, and make it easier to try new approaches.&lt;/p&gt;
&lt;h3 id=&#34;what-youll-notice-when-using-it&#34;&gt;What you&amp;rsquo;ll notice when using it&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Short searches work way better, you get more relevant results and fewer dead ends&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s still experimental, so sometimes results might drift into related topics. Stick with regular &amp;ldquo;Search&amp;rdquo; if you want more exact matches&lt;/li&gt;
&lt;li&gt;We cache common terms to keep things smooth&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Give it a try at &lt;a href=&#34;/blog/&#34;&gt;our blog&lt;/a&gt;. Just use the search bar in our header: press Enter for regular search, or Shift/​Ctrl/​Command+Enter for LLM Expanded Search.&lt;/p&gt;
&lt;p&gt;Want to know more about why we built this? Check out the announcement and vector search posts linked above.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re interested in setting up LLM-expanded vector search or running something similar self-hosted with Open WebUI, we&amp;rsquo;d love to &lt;a href=&#34;/contact/&#34;&gt;help out&lt;/a&gt;.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Vector Search: The Future of Finding What Matters</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/07/vector-search/"/>
      <id>https://www.endpointdev.com/blog/2025/07/vector-search/</id>
      <published>2025-07-01T00:00:00+00:00</published>
      <author>
        <name>Kürşat Kutlu Aydemir</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/07/vector-search/jigsaw-piece.webp&#34; alt=&#34;A blank jigsaw puzzle with one piece missing, revealing pink beneath.&#34;&gt;&lt;br&gt;
Photo by &lt;a href=&#34;https://www.pexels.com/photo/pink-jigsaw-puzzle-piece-3482441/&#34;&gt;Ann H&lt;/a&gt; on Pexels&lt;/p&gt;
&lt;p&gt;In a world flooding with data in several different formats like images, documents, text, and videos, traditional search methods are starting to not be modern anymore. Today, the vector search technique is revolutionizing how we retrieve and understand information. If you wonder how Spotify can recommend the perfect song or how Google can find almost perfectly accurate image matches, vector search is kind of the wizard behind the curtain. Let&amp;rsquo;s see how it has become a game changer.&lt;/p&gt;
&lt;h3 id=&#34;what-is-vector-search&#34;&gt;What Is Vector Search?&lt;/h3&gt;
&lt;p&gt;At its core, vector search is a method of finding similar items in a dataset by representing them as vectors — essentially, lists of numbers in a multi-dimensional space. Unlike keyword-based search, which relies on exact matches or predefined rules, vector search focuses on semantic similarity. This means it can understand the meaning or context behind data, not just the words or pixels on the surface.&lt;/p&gt;
&lt;p&gt;Imagine you’re searching for &lt;code&gt;a cozy cabin in the woods&lt;/code&gt;. A traditional search might get stuck on the exact words in this query, missing a listing for something similar like &lt;code&gt;a snug retreat nestled in a forest&lt;/code&gt;. Vector search, however, can connect the dots because it understands that &lt;code&gt;cozy&lt;/code&gt; and &lt;code&gt;snug&lt;/code&gt; or &lt;code&gt;woods&lt;/code&gt; and &lt;code&gt;forest&lt;/code&gt; are conceptually close.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You can try vector search on this blog! There&amp;rsquo;s a search bar in the header at the top of the page, and you can see our announcement post &lt;a href=&#34;/blog/2025/08/vector-search-for-the-end-point-blog/&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3 id=&#34;how-does-it-work&#34;&gt;How Does It Work?&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/07/vector-search/vector-search-arch.webp&#34; alt=&#34;A block flow diagram. At the start is a block labeled &amp;ldquo;Raw Data&amp;rdquo;. A red arrow labeled &amp;ldquo;Embedding Model&amp;rdquo; points to stacked blocks labeled &amp;ldquo;Embedding&amp;rdquo;. Then a red arrow points to a cylinder reading &amp;ldquo;Vector Store&amp;rdquo;. Then a green arrow points to stacked blocks reading &amp;ldquo;Found Vectors&amp;rdquo;. Also pointing to the &amp;ldquo;Vector Store&amp;rdquo; cylinder, with a blue arrow labeled &amp;ldquo;Similarity Search&amp;rdquo;, is a block labeled &amp;ldquo;Query Vector&amp;rdquo;, which has another blue arrow pointing to it from a block labeled &amp;ldquo;Query&amp;rdquo;.&#34;&gt;&lt;/p&gt;
&lt;p&gt;The magic of vector search happens in three key steps:&lt;/p&gt;
&lt;h4 id=&#34;embedding-generation&#34;&gt;Embedding Generation&lt;/h4&gt;
&lt;p&gt;Once you have access to the data sources that you want to apply vector search, you will need to prepare and ingest the data into your system. This phase will handle data clean-up and extraction from different data sources and formats. For multi-modality support you may need to include multiple data formats like text, image, audio, and video.&lt;/p&gt;
&lt;p&gt;In the embedding generation phase, as depicted above, raw data (like text, images, or audio) is transformed into vectors using suitable embedding machine learning (ML) models. These models (e.g. &lt;a href=&#34;https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2&#34;&gt;all-MiniLM-L6-v2&lt;/a&gt;, &lt;a href=&#34;https://huggingface.co/sentence-transformers/all-mpnet-base-v2&#34;&gt;all-mpnet-base-v2&lt;/a&gt; for text, or &lt;a href=&#34;https://huggingface.co/docs/transformers/en/model_doc/resnet&#34;&gt;ResNet&lt;/a&gt; for images) analyze the data and spit out a numerical representation, a vector, that captures its essence. For example, the sentence &lt;code&gt;I love sunny hikes&lt;/code&gt; might become something like &lt;code&gt;[-0.07571851, -0.02147608, 0.07130147, 0.1087752, 0.02052169, ...]&lt;/code&gt;, where each number reflects a feature of its meaning.&lt;/p&gt;
&lt;h4 id=&#34;storing-vectors&#34;&gt;Storing Vectors&lt;/h4&gt;
&lt;p&gt;The generated vectors are then stored in a vector database or an index optimized for fast retrieval. Think of it as a massive, multi-dimensional map where every point represents an item or a chunk of an item. There are several options for your choice like &lt;a href=&#34;https://faiss.ai/&#34;&gt;FAISS&lt;/a&gt;, &lt;a href=&#34;https://www.trychroma.com/&#34;&gt;Chroma&lt;/a&gt;, &lt;a href=&#34;https://milvus.io/&#34;&gt;Milvus&lt;/a&gt;, and &lt;a href=&#34;https://github.com/pgvector/pgvector&#34;&gt;pg-vector&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&#34;similarity-search&#34;&gt;Similarity Search&lt;/h4&gt;
&lt;p&gt;When you use this system with a query like &lt;code&gt;find me green hoodies&lt;/code&gt;, it converts your input into a vector too. Then, it searches the vector store for the closest points using mathematical distance metrics like &lt;a href=&#34;https://en.wikipedia.org/wiki/Cosine_similarity&#34;&gt;Cosine Similarity&lt;/a&gt; or &lt;a href=&#34;https://en.wikipedia.org/wiki/Euclidean_distance&#34;&gt;Euclidean Distance&lt;/a&gt;. The closer the vectors, the more similar the items.&lt;/p&gt;
&lt;p&gt;The result? You get matches that feel right, even if they don’t share exact keywords or pixel patterns.&lt;/p&gt;
&lt;h4 id=&#34;a-simple-vector-search-implementation&#34;&gt;A Simple Vector Search Implementation&lt;/h4&gt;
&lt;p&gt;Here, to demonstrate how vector search works, I implemented a simple vector search based on this architecture without using third-party libraries. The best practice is surely using well-known implementations and tools to achieve production ready applications; this is just for demonstration.&lt;/p&gt;
&lt;p&gt;Below is a very simple vector store implementation:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;numpy&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; List, Tuple
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;SimpleVectorStore&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Initialize&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.ids = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;add_vectors&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, vectors: List[List[&lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;]], ids: List[&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;]):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Add vectors to vector store&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(vectors) != &lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(ids):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;raise&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ValueError&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Number of vectors must match number of IDs&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        vectors = np.array(vectors)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors.extend(vectors)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.ids.extend(ids)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors = np.array(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;cosine_similarity&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, v1: np.ndarray, v2: np.ndarray) -&amp;gt; &lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Calculate cosine similarity between two vectors&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# zero vectors&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; np.all(v1 == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;) &lt;span style=&#34;color:#080&#34;&gt;or&lt;/span&gt; np.all(v2 == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        dot_product = np.dot(v1, v2)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        norm_v1 = np.linalg.norm(v1)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        norm_v2 = np.linalg.norm(v2)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; dot_product / (norm_v1 * norm_v2)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;search&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, query_vector: List[&lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;], k: &lt;span style=&#34;color:#038&#34;&gt;int&lt;/span&gt; = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;5&lt;/span&gt;) -&amp;gt; List[Tuple[&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;]]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Search for the k most similar vectors&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors) == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        query_vector = np.array(query_vector)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        similarities = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            (&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors[i], &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.ids[i], &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.cosine_similarity(query_vector, vec))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; i, vec &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;enumerate&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# sort and return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        similarities.sort(key=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;lambda&lt;/span&gt; x: x[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;], reverse=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; similarities[:k]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, let&amp;rsquo;s store some sample vectors (so-called embeddings) to this simple vector store and perform an example search. In real-world applications this represents the raw data in vector form, but I used dummy embedding values in this example. I also excluded the embedding model phase when creating these embedding values to simplify the code.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;__name__&lt;/span&gt; == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;__main__&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# vector store instance&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    vector_store = SimpleVectorStore()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# example embeddings&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    sample_embeddings = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;], &lt;span style=&#34;color:#888&#34;&gt;# Vector 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;], &lt;span style=&#34;color:#888&#34;&gt;# Vector 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;], &lt;span style=&#34;color:#888&#34;&gt;# Vector 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;], &lt;span style=&#34;color:#888&#34;&gt;# Vector 4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    sample_ids = [&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;doc1&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;doc2&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;doc3&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;doc4&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# add sample vectors to vector store&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    vector_store.add_vectors(sample_embeddings, sample_ids)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# query vector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    query = [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.5&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# search for top k = 2 results&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    top_k = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    results = vector_store.search(query, k=top_k)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Query vector:&amp;#34;&lt;/span&gt;, query)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Top &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;top_k&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt; similar vectors:&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; vector, doc_id, similarity &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; results:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;DocId: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;doc_id&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;, Vector: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;vector&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;, Similarity: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;similarity&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.4f&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The output of this search is like below:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Query vector: [1.0, 0.5, 0.0]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Top 2 similar vectors:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;DocId: doc3, Vector: [1. 1. 0.], Similarity: 0.9487
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;DocId: doc1, Vector: [1. 0. 0.], Similarity: 0.8944&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now compare the vectors in the results with the query vector. As you can see the bigger similarity scores are the closer ones and are more similar to the query vector.&lt;/p&gt;
&lt;h3 id=&#34;why-vector-search-is-powerful&#34;&gt;Why Vector Search is Powerful&lt;/h3&gt;
&lt;p&gt;Vector search offers several advantages over traditional search methods. First of all, it has a degree of semantic understanding by focusing on contextual meaning rather than keywords: vector search handles synonyms, misspellings, and contextual nuances. For instance, searching &lt;code&gt;car&lt;/code&gt; could return results about &lt;code&gt;automobiles&lt;/code&gt; or &lt;code&gt;vehicles&lt;/code&gt; (assuming you have a well-trained model).&lt;/p&gt;
&lt;p&gt;Multimodal capability is another feature which is commonly used today. Vector search isn’t limited to text. It can process images, audio, video, or even combinations of these. For example, you could search for &lt;code&gt;sunset photos&lt;/code&gt; using a text query and retrieve visually similar images.&lt;/p&gt;
&lt;p&gt;Vector search can incorporate user preferences or behavior like past searches into the vector space and deliver personalized results.&lt;/p&gt;
&lt;h3 id=&#34;real-world-examples&#34;&gt;Real-World Examples&lt;/h3&gt;
&lt;p&gt;Vector search is already everywhere:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Retrieval-Augmented Generation (RAG)&lt;/p&gt;
&lt;p&gt;Regardless of the industry that a RAG application is designed for, vector search is essential in the retrieval (&amp;ldquo;R&amp;rdquo;) phase: it is used to find relevant data for a query to build a context for the LLM.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ecommerce:&lt;/p&gt;
&lt;p&gt;Vector search is used for recommending products based on user queries or visual similarity (e.g. &amp;ldquo;find shoes like these&amp;rdquo; using an image). Example: Searching for &amp;ldquo;red sneakers&amp;rdquo; might return visually similar items even if the product description doesn’t mention &amp;ldquo;red sneakers&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Content Discovery:&lt;/p&gt;
&lt;p&gt;Used for powering recommendation engines for streaming platforms (e.g. Netflix, Spotify) by finding movies, songs, or articles similar to a user’s interests. Example: Suggesting a sci-fi movie based on a user’s love for &amp;ldquo;Star Wars&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Customer Support:&lt;/p&gt;
&lt;p&gt;Used for enabling chatbots to retrieve relevant knowledge base articles or FAQs by understanding the intent behind a user’s question. Example: A query like &amp;ldquo;how to reset my device&amp;rdquo; could pull up guides even if phrased differently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Image and Video Search:&lt;/p&gt;
&lt;p&gt;Allows users to search for visually similar images or videos, such as finding artwork or stock footage that matches a specific style. Example: Uploading a photo of a beach to find similar vacation destinations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Enterprise Search:&lt;/p&gt;
&lt;p&gt;Helps employees find documents, emails, or internal resources by understanding the context of their queries. Example: Searching &amp;ldquo;project timeline&amp;rdquo; could retrieve relevant spreadsheets or emails.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Healthcare:&lt;/p&gt;
&lt;p&gt;Used for matching patient records, medical images, or research papers based on semantic similarity to aid diagnosis or treatment planning. Example: Finding studies related to a specific disease even if terminology varies.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;challenges-ahead&#34;&gt;Challenges Ahead&lt;/h3&gt;
&lt;p&gt;Although computational cost is one of the challenges of performing similarity search on large datasets I won&amp;rsquo;t mention it as a major challenge.&lt;/p&gt;
&lt;p&gt;The main challenges in vector search seem to revolve around interpretability and data quality, as is common in ML. Unlike keyword search, where matches are explicit, vector search operates in a black box, making it harder to explain why certain results were returned.&lt;/p&gt;
&lt;p&gt;As for data quality, the adage applies: garbage in, garbage out. If the input data is noisy or incomplete, the embeddings may not accurately represent the content. Data extraction and creating the embeddings of certain data formats (e.g. tables) is another challenge. Searching for accurate table data values is an especially challenging task. At End Point we are using advanced methods and LLM services to overcome these challenges when preparing quality embeddings.&lt;/p&gt;
&lt;h3 id=&#34;wrapping-up&#34;&gt;Wrapping Up&lt;/h3&gt;
&lt;p&gt;Vector search isn’t just a tech buzzword — it’s a fundamental shift in how we interact with data. By moving beyond rigid keywords to a world of meaning and similarity, it’s unlocking possibilities we’re only beginning to explore. Whether you’re a developer building the next big app or just a curious soul, vector search is worth keeping an eye on. It’s not about finding exactly what you typed — it’s about finding exactly what you meant.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Building Ecommerce Search Using Algolia</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2023/10/building-ecommerce-search-using-algolia/"/>
      <id>https://www.endpointdev.com/blog/2023/10/building-ecommerce-search-using-algolia/</id>
      <published>2023-10-12T00:00:00+00:00</published>
      <author>
        <name>Dylan Wooters</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2023/10/building-ecommerce-search-using-algolia/east-bay-hills.webp&#34; alt=&#34;Looking east from the top of the Berkeley hills over the Briones Reservoir. Rolling hills are seen in the distance with the sun setting to the west.&#34;&gt;&lt;/p&gt;
&lt;!-- Photo by Dylan Wooters, 2020 --&gt;
&lt;p&gt;A common request that developers receive when embarking on a new website project is for the website to have &amp;ldquo;Google-like search.&amp;rdquo; For many years, this meant writing custom code to replicate the intelligent and user-friendly aspects of Google search, which was no easy feat. However, now we have many search-as-a-service offerings that do the hard work for us and make this process much easier.&lt;/p&gt;
&lt;p&gt;In this blog post, we’ll dive into one of these search-as-a-service platforms, &lt;a href=&#34;https://www.algolia.com/&#34;&gt;Algolia&lt;/a&gt;. We recently worked on an ecommerce website and used Algolia in an interesting way, both as a search engine and as a lightweight backend database to hold product data managed in Salesforce. Algolia worked beautifully, offering users fast and accurate search results, and also allowing us to launch the site within a relatively short time frame.&lt;/p&gt;
&lt;p&gt;We will look at how to load Algolia with data, configure search options, and connect the search to the frontend using Algolia’s Vue library.&lt;/p&gt;
&lt;h3 id=&#34;loading-the-index-with-data&#34;&gt;Loading the index with data&lt;/h3&gt;
&lt;p&gt;To start using Algolia’s search, you need to load up an index with data. You have the option of manually uploading a JSON file, or using Algolia’s API to programmatically load records. For our backend, we chose to use Algolia&amp;rsquo;s &lt;a href=&#34;https://www.npmjs.com/package/algoliasearch&#34;&gt;JavaScript API client&lt;/a&gt; in some lightweight TypeScript scripts that are triggered by cron. These scripts allowed us to sync inventory data between Salesforce and the index in Algolia.&lt;/p&gt;
&lt;p&gt;Using the Algolia JavaScript client is quite simple. Regardless of where your data comes from—be it in a database, a platform like Salesforce, or elsewhere—once it is in JSON format, you can load it into Algolia with a few lines of code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-javascript&#34; data-lang=&#34;javascript&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; * as algolia from &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;algoliasearch&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;const&lt;/span&gt; products = [{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    name: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Fender F-5 Acoustic&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    make: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Fender&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    model: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;F-5&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    category: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Guitars&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    status: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Used&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    objectID: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Fender-001&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  }, {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    name: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Fender Player Jaguar&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    make: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Fender&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    model: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Jaguar (Player)&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    category: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Guitars&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    status: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;New&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    objectID: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Fender-002&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  }];
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;const&lt;/span&gt; index = algolia.&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;default&lt;/span&gt;(process.env.ALGOLIA_APP_ID, process.env.ALGOLIA_API_KEY).initIndex(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;store_products&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; index.saveObjects(products);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Note the &lt;code&gt;objectID&lt;/code&gt; property, which is used by Algolia as a primary key. If the &lt;code&gt;objectID&lt;/code&gt; does not exist, a new record will be created. If it does exist, the record will be updated. This makes it easy to run a data sync process using a single &lt;code&gt;saveObjects&lt;/code&gt; command, without having to worry about differentiating between create and update operations.&lt;/p&gt;
&lt;h3 id=&#34;configuring-the-index&#34;&gt;Configuring the index&lt;/h3&gt;
&lt;p&gt;Once you have your index loaded, you’ll want to configure it. Algolia does a good job of walking you through this process using a built-in tutorial when you first load your index. Basically, you will be selecting the &lt;a href=&#34;https://www.algolia.com/doc/guides/sending-and-managing-data/prepare-your-data/how-to/setting-searchable-attributes/&#34;&gt;searchable properties/​attributes&lt;/a&gt; from your JSON data, setting how results are ranked and sorted, and adjusting more advanced aspects of search like typo tolerance, stop words, etc.&lt;/p&gt;
&lt;p&gt;An important feature we utilized on our recent project is &lt;a href=&#34;https://www.algolia.com/doc/guides/managing-results/refine-results/faceting/&#34;&gt;faceting&lt;/a&gt;. Faceting allows users to easily drill down and refine search by categories, and is also easy to develop using the handy frontend libraries that Algolia provides (more on that in the next section). This feature is powerful and can be used to both refine search and drive homepage category/​subcategory links. When you configure your index, you can select which attributes of your data should be used for faceting.&lt;/p&gt;
&lt;h3 id=&#34;setting-up-search-on-the-frontend&#34;&gt;Setting up search on the frontend&lt;/h3&gt;
&lt;p&gt;We used &lt;a href=&#34;https://nuxt.com/&#34;&gt;Nuxt&lt;/a&gt; to build the frontend of the website, and we leveraged Algolia’s &lt;a href=&#34;https://www.algolia.com/doc/guides/building-search-ui/what-is-instantsearch/vue/&#34;&gt;Vue InstantSearch&lt;/a&gt; library for the UI. This library really speeds along development, as it wraps all of the search-related functionality in simple widgets, providing the search bar, results, refinements, filtering, and more.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://www.algolia.com/doc/api-reference/widgets/instantsearch/vue/&#34;&gt;&lt;code&gt;ais-instant-search&lt;/code&gt;&lt;/a&gt; widget is the parent widget. It serves the search state to its children, which allows you to show the search bar, search hits, hierarchical menus, etc. Here is a simple example of the &lt;code&gt;ais-instant-search&lt;/code&gt; widget with a search bar and hits (pulled directly from &lt;a href=&#34;https://www.algolia.com/doc/guides/building-search-ui/getting-started/vue/&#34;&gt;Algolia’s Vue docs&lt;/a&gt;):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-html&#34; data-lang=&#34;html&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;template&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ais-instant-search&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;:search-client&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;searchClient&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;index-name&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;demo_ecommerce&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ais-search-box&lt;/span&gt; /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ais-hits&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;template&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;v-slot:item&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;{ item }&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;h2&lt;/span&gt;&amp;gt;{{ item.name }}&amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;h2&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;template&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ais-hits&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ais-instant-search&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;template&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; algoliasearch from &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;algoliasearch/lite&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;instantsearch.css/themes/algolia-min.css&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;export&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;default&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  data() {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      searchClient: algoliasearch(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;[Your app ID]&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;[Your API key]&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      ),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    };
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;};
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;script&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;style&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;body&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;font-family&lt;/span&gt;: &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;sans-serif&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;padding&lt;/span&gt;: &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#888;font-weight:bold&#34;&gt;em&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;style&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;using-faceting-for-search-refinement-and-filtering&#34;&gt;Using faceting for search refinement and filtering&lt;/h3&gt;
&lt;p&gt;I mentioned faceting above when discussing how to configure your index. Once you have selected the attributes in your JSON data that can be used for faceting (e.g., category, subcategory), you can feed those attributes to the &lt;a href=&#34;https://www.algolia.com/doc/api-reference/widgets/hierarchical-menu/vue/&#34;&gt;&lt;code&gt;ais-hierarchical-menu&lt;/code&gt;&lt;/a&gt; widget for display on the frontend.&lt;/p&gt;
&lt;p&gt;Here is a bit of sample code from the website we built, which offers expandable category refinement via &lt;code&gt;ais-hierarchical-menu&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-html&#34; data-lang=&#34;html&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;div&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;class&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;sidebar-segment&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;p&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;class&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;sidebar-segment-title&amp;#34;&lt;/span&gt;&amp;gt;Category&amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;p&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ais-hierarchical-menu&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#369&#34;&gt;:limit&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;100&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#369&#34;&gt;:attributes&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;categoryAttrs&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#369&#34;&gt;:sort-by&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;hierarchicalMenuSort&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;div&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;slot-scope&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;{ items, refine, createURL }&amp;#34;&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &amp;lt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;hierarchical-menu-list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#369&#34;&gt;:items&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;items&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#369&#34;&gt;:refine&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;refine&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#369&#34;&gt;:create-url&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;createURL&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      /&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;div&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ais-hierarchical-menu&lt;/span&gt;&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;/&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;div&lt;/span&gt;&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Above, we are using the &lt;code&gt;limit&lt;/code&gt; property to set the maximum number of items to 100. The &lt;code&gt;attributes&lt;/code&gt; property, which targets the JSON attributes in your index data that represent your categories, is set to the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-javascript&#34; data-lang=&#34;javascript&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;categoryAttrs: [&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;categories.lvl0&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;categories.lvl1&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;categories.lvl2&amp;#39;&lt;/span&gt;],&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;These represent the three levels of categories in our data (zero-based), and are used to build the hierarchical menu.&lt;/p&gt;
&lt;p&gt;Finally, the &lt;code&gt;sort-by&lt;/code&gt; attribute points to a simple function that uses the JavaScript &lt;code&gt;localeCompare&lt;/code&gt; method to provide alphanumeric sorting:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-javascript&#34; data-lang=&#34;javascript&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;function&lt;/span&gt; hierarchicalMenuSort(a, b) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; a.name.localeCompare(b.name, &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;undefined&lt;/span&gt;, {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    numeric: &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;true&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    sensitivity: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;base&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  })
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;wrapping-up-seeing-algolia-in-action&#34;&gt;Wrapping up: seeing Algolia in action&lt;/h3&gt;
&lt;p&gt;If you’d like to see Algolia in action on the site that we built, head over to &lt;a href=&#34;https://www.eiffeltrading.com/&#34;&gt;eiffeltrading.com&lt;/a&gt;. If you search the site, you’ll see autocomplete, fast results (thanks to both Algolia and Nuxt), and other aspects of good search that we have all come to expect from modern ecommerce sites.&lt;/p&gt;
&lt;p&gt;Next time you are faced with a build involving full-text search, consider search-as-a-service offerings like Algolia. They could save you time and headaches over rolling your own search functionality. Sometimes it’s good to let others do the hard work!&lt;/p&gt;
&lt;p&gt;Have questions or feedback on the topic? Let us know in the comments section below.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Full-text search on a budget: Xapian</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2021/08/full-text-search-xapian/"/>
      <id>https://www.endpointdev.com/blog/2021/08/full-text-search-xapian/</id>
      <published>2021-08-19T00:00:00+00:00</published>
      <author>
        <name>Marco Pessotto</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2021/08/full-text-search-xapian/looking-glass.jpg&#34; alt=&#34;Mounted telescope monocular pointing through a wire fence&#34;&gt;&lt;/p&gt;
&lt;!-- Photo from Pixabay: https://pixabay.com/photos/looking-glass-magnification-glass-919017/ --&gt;
&lt;p&gt;Over the years I’ve seen and implemented different full-text search
applications using various technologies: plain SQL,
&lt;a href=&#34;https://www.postgresql.org/docs/13/textsearch.html&#34;&gt;PostgreSQL&lt;/a&gt;,
&lt;a href=&#34;https://www.elastic.co/elasticsearch/&#34;&gt;Elasticsearch&lt;/a&gt;,
&lt;a href=&#34;https://solr.apache.org/&#34;&gt;Solr&lt;/a&gt;, and most recently
&lt;a href=&#34;https://xapian.org/&#34;&gt;Xapian&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;While Solr and Elasticsearch are very well known, Xapian, despite the fact
that it’s available and packaged in all the major GNU/Linux
distributions, doesn’t seem to be so popular, at least not among project
managers.&lt;/p&gt;
&lt;p&gt;But Xapian is fast, advanced, can be configured to do faceted searches
(so the user can filter the search results), and my favorite, is fast
to build and has virtually no maintenance overhead.&lt;/p&gt;
&lt;p&gt;Its main feature is that it’s not a stand-alone application, like Solr
or Elasticsearch, but instead it’s a library written in C++ which has
bindings for all the major languages (as advertised on its
&lt;a href=&#34;https://xapian.org/&#34;&gt;homepage&lt;/a&gt;). It has also great
&lt;a href=&#34;https://github.com/xapian/xapian-docsprint&#34;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now, being in the e-commerce business, my typical use-case is that the
client’s shop needs something faster and better than a search using a
SQL query against the products table. And beware, even implementing a
non-trivial SQL-based search can burn more hours than setting up
Xapian.&lt;/p&gt;
&lt;p&gt;With Xapian you can prototype very quickly, without losing hours
wading through obscure options, setting up services and configuring
firewalls. And yet, the prototype will allow you to build more
advanced features once you need them.&lt;/p&gt;
&lt;p&gt;I’m a Perl guy, so I will show you some Perl code, but the procedure
is the same for the other languages. Even the
documentation can be built specifically for your language!&lt;/p&gt;
&lt;p&gt;Typically, to add a search engine to your site you need two pieces: an
indexer to which you feed the data (from static files or databases or
even fetching remote pages or whatever you need) and the search itself
in the site.&lt;/p&gt;
&lt;p&gt;Both the indexer and the search code need to load the Xapian library
and point to the same Xapian database, which is usually a directory
(or a file pointing to a directory).&lt;/p&gt;
&lt;h3 id=&#34;indexing&#34;&gt;Indexing&lt;/h3&gt;
&lt;p&gt;Now, stripped down to the minimum, this is what a typical indexer&amp;rsquo;s code
looks like:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-perl&#34; data-lang=&#34;perl&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;#!/usr/bin/env perl&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;utf8&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;strict&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;warnings&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian&lt;/span&gt; (&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;:all&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;JSON&lt;/span&gt; &lt;span style=&#34;color:#2b2;background-color:#f0fff0&#34;&gt;qw/encode_json/&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$dblocation&lt;/span&gt; = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;xapiandb&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$xapian&lt;/span&gt; = &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian::WritableDatabase&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;new&lt;/span&gt;(&lt;span style=&#34;color:#369&#34;&gt;$dblocation&lt;/span&gt;, DB_CREATE_OR_OPEN);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$indexer&lt;/span&gt; = &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian::TermGenerator&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;new&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#369&#34;&gt;$indexer&lt;/span&gt;-&amp;gt;set_database(&lt;span style=&#34;color:#369&#34;&gt;$xapian&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#369&#34;&gt;$indexer&lt;/span&gt;-&amp;gt;set_stemmer(&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian::Stem&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;new&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;english&amp;#34;&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;@entries&lt;/span&gt; = ({ uri =&amp;gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;/blog/1&amp;#39;&lt;/span&gt;, title =&amp;gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;T1&amp;#39;&lt;/span&gt;, text =&amp;gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Marco loves pizza&amp;#39;&lt;/span&gt; },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;               { uri =&amp;gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;/blog/2&amp;#39;&lt;/span&gt;, title =&amp;gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;T2&amp;#39;&lt;/span&gt;, text =&amp;gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;They love chapati&amp;#39;&lt;/span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;foreach&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$data&lt;/span&gt; (&lt;span style=&#34;color:#369&#34;&gt;@entries&lt;/span&gt;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$doc&lt;/span&gt; = &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian::Document&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;new&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$qterm&lt;/span&gt; = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Q&amp;#39;&lt;/span&gt; . &lt;span style=&#34;color:#369&#34;&gt;$data&lt;/span&gt;-&amp;gt;{uri};
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#369&#34;&gt;$doc&lt;/span&gt;-&amp;gt;add_term(&lt;span style=&#34;color:#369&#34;&gt;$qterm&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#369&#34;&gt;$doc&lt;/span&gt;-&amp;gt;set_data(encode_json({ uri =&amp;gt; &lt;span style=&#34;color:#369&#34;&gt;$data&lt;/span&gt;-&amp;gt;{uri}, title =&amp;gt; &lt;span style=&#34;color:#369&#34;&gt;$data&lt;/span&gt;-&amp;gt;{title} }));
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#369&#34;&gt;$indexer&lt;/span&gt;-&amp;gt;set_document(&lt;span style=&#34;color:#369&#34;&gt;$doc&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#369&#34;&gt;$indexer&lt;/span&gt;-&amp;gt;index_text(&lt;span style=&#34;color:#369&#34;&gt;$data&lt;/span&gt;-&amp;gt;{text});
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#369&#34;&gt;$xapian&lt;/span&gt;-&amp;gt;replace_document_by_term(&lt;span style=&#34;color:#369&#34;&gt;$qterm&lt;/span&gt;, &lt;span style=&#34;color:#369&#34;&gt;$doc&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This code will create a &lt;code&gt;xapiandb&lt;/code&gt; directory with the Xapian database,
indexing the blog posts in the &lt;code&gt;@entries&lt;/code&gt; array. In a real script,
they would come from the database.&lt;/p&gt;
&lt;p&gt;Still, there are a couple of things worth noting in this minimal code.&lt;/p&gt;
&lt;p&gt;We set the &lt;a href=&#34;https://en.wikipedia.org/wiki/Stemming&#34;&gt;stemmer&lt;/a&gt; for the
given language, so the text passed to the indexer via the &lt;code&gt;index_text&lt;/code&gt;
call is parsed.&lt;/p&gt;
&lt;p&gt;Then we store the data structure we want to retrieve later with
&lt;code&gt;set_data&lt;/code&gt;. The best thing to do is probably to serialize it with
JSON, in this case I’m excluding the full text, which we don’t need in
the output (but it would be wise to add a teaser).&lt;/p&gt;
&lt;p&gt;Also, we use a &lt;code&gt;Q&lt;/code&gt;
&lt;a href=&#34;https://xapian.org/docs/omega/termprefixes.html&#34;&gt;prefix&lt;/a&gt; to produce
an unique term to update the entry when it already exists.&lt;/p&gt;
&lt;p&gt;Of course the indexer will need to grow if you need more power and
more structured data (like filtering or searching a specific field),
but at this point we want just to show something to our hypothetical
client.&lt;/p&gt;
&lt;p&gt;The database can be inspected very easily. Xapian comes with a tool
called &lt;code&gt;delve&lt;/code&gt; (or &lt;code&gt;xapian-delve&lt;/code&gt;):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ xapian-delve xapiandb -a -v -1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;All terms in database (termfreq):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Q/blog/1 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Q/blog/2 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Zchapati 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Zlove 2
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Zmarco 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Zpizza 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Zthey 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;chapati 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;love 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;loves 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;marco 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pizza 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;they 1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And you can also try a search from the command line with &lt;code&gt;quest&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ quest -d xapiandb &amp;#34;loves NOT chapati&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Parsed Query: Query((Zlove@1 AND_NOT Zchapati@2))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Exactly 1 matches
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;MSet:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;1: [0.0953102]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&amp;#34;title&amp;#34;:&amp;#34;T1&amp;#34;,&amp;#34;uri&amp;#34;:&amp;#34;/blog/1&amp;#34;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ quest -d xapiandb &amp;#34;pizza OR chapati&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Parsed Query: Query((Zpizza@1 OR Zchapati@2))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Exactly 2 matches
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;MSet:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;1: [0.405465]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&amp;#34;title&amp;#34;:&amp;#34;T1&amp;#34;,&amp;#34;uri&amp;#34;:&amp;#34;/blog/1&amp;#34;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;2: [0.405465]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&amp;#34;uri&amp;#34;:&amp;#34;/blog/2&amp;#34;,&amp;#34;title&amp;#34;:&amp;#34;T2&amp;#34;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As the example above shows, it should be clear that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;the search works as you would expect (with logical operators) out of
the box&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;the stemming works, searching for &amp;ldquo;loves&amp;rdquo; and &amp;ldquo;love&amp;rdquo; is the same.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;the results give us back the JSON we stored in the index.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;searching&#34;&gt;Searching&lt;/h3&gt;
&lt;p&gt;So let’s call it done and move to the next part, the searcher.&lt;/p&gt;
&lt;p&gt;Now, while the indexer is a single script, the search needs to be
plugged into the live code of your site. For the purposes of this
article, I will provide a script instead, which does basically the
same thing as &lt;code&gt;quest&lt;/code&gt;. Plugging it into the web application is left as
an exercise for the reader. I would also suggest to put both the
indexing and searching code in a single shared module, keeping the
logic in a single location.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-perl&#34; data-lang=&#34;perl&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;#!/usr/bin/env perl&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;utf8&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;strict&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;warnings&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian&lt;/span&gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;:all&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;use&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;JSON&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; (&lt;span style=&#34;color:#369&#34;&gt;$cgi&lt;/span&gt;) = &lt;span style=&#34;color:#038&#34;&gt;join&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#369&#34;&gt;@ARGV&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$dblocation&lt;/span&gt; = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;xapiandb&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$database&lt;/span&gt; = &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian::Database&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;new&lt;/span&gt;(&lt;span style=&#34;color:#369&#34;&gt;$dblocation&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$enquire&lt;/span&gt; = &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian::Enquire&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;new&lt;/span&gt;(&lt;span style=&#34;color:#369&#34;&gt;$database&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$qp&lt;/span&gt; = &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian::QueryParser&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;new&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#369&#34;&gt;$qp&lt;/span&gt;-&amp;gt;set_database(&lt;span style=&#34;color:#369&#34;&gt;$database&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#369&#34;&gt;$qp&lt;/span&gt;-&amp;gt;set_stemmer(&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Search::Xapian::Stem&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;new&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;english&amp;#34;&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#369&#34;&gt;$qp&lt;/span&gt;-&amp;gt;set_stemming_strategy(STEM_SOME);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#369&#34;&gt;$qp&lt;/span&gt;-&amp;gt;set_default_op(OP_AND);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$query&lt;/span&gt; = &lt;span style=&#34;color:#369&#34;&gt;$qp&lt;/span&gt;-&amp;gt;parse_query(&lt;span style=&#34;color:#369&#34;&gt;$cgi&lt;/span&gt;, FLAG_PHRASE|FLAG_BOOLEAN|FLAG_WILDCARD);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#369&#34;&gt;$enquire&lt;/span&gt;-&amp;gt;set_query(&lt;span style=&#34;color:#369&#34;&gt;$query&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# fetch the first 50 results&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$mset&lt;/span&gt; = &lt;span style=&#34;color:#369&#34;&gt;$enquire&lt;/span&gt;-&amp;gt;get_mset(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;50&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Total results: &amp;#34;&lt;/span&gt; . &lt;span style=&#34;color:#369&#34;&gt;$mset&lt;/span&gt;-&amp;gt;get_matches_estimated . &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;\n&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$json_pretty&lt;/span&gt; = &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;JSON&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;new&lt;/span&gt;-&amp;gt;pretty(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)-&amp;gt;utf8(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)-&amp;gt;canonical(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;foreach&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$m&lt;/span&gt; (&lt;span style=&#34;color:#369&#34;&gt;$mset&lt;/span&gt;-&amp;gt;items) {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$data&lt;/span&gt; = decode_json(&lt;span style=&#34;color:#369&#34;&gt;$m&lt;/span&gt;-&amp;gt;&lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;get_document&lt;/span&gt;-&amp;gt;get_data);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# decode and reencode the json in a human-readable fashion&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;print&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$json_pretty&lt;/span&gt;-&amp;gt;encode(&lt;span style=&#34;color:#369&#34;&gt;$data&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you’re wondering what those constants are and where to look for
more, they are in the module’s
&lt;a href=&#34;https://metacpan.org/pod/Search::Xapian#EXPORT&#34;&gt;documentation&lt;/a&gt;, in
plain sight (we asked for them when loading the module with the &lt;code&gt;:all&lt;/code&gt;
argument).&lt;/p&gt;
&lt;p&gt;Most of the code shown here is boilerplate, but that could change once
you build up. Notably we set the stemmer for the current language and
the query parser options, so we can use wildcard (e.g. &lt;code&gt;piz*&lt;/code&gt;), the
&lt;code&gt;AND&lt;/code&gt;/&lt;code&gt;OR&lt;/code&gt; operators, and quoting.&lt;/p&gt;
&lt;p&gt;Let’s see the script in action.&lt;/p&gt;
&lt;p&gt;Wildcard:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ ./search.pl &amp;#39;piz&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Total results: 0
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ ./search.pl &amp;#39;piz*&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Total results: 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &amp;#34;title&amp;#34; : &amp;#34;T1&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &amp;#34;uri&amp;#34; : &amp;#34;/blog/1&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Operators:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ ./search.pl &amp;#39;pizza OR chapati&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Total results: 2
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &amp;#34;title&amp;#34; : &amp;#34;T1&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &amp;#34;uri&amp;#34; : &amp;#34;/blog/1&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &amp;#34;title&amp;#34; : &amp;#34;T2&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &amp;#34;uri&amp;#34; : &amp;#34;/blog/2&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ ./search.pl &amp;#39;pizza AND chapati&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Total results: 0&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Quoting (beware here the double quotes to escape the shell):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ ./search.pl &amp;#39;&amp;#34;loves chapati&amp;#34;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Total results: 0
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ ./search.pl &amp;#39;&amp;#34;love chapati&amp;#34;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Total results: 1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &amp;#34;title&amp;#34; : &amp;#34;T2&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &amp;#34;uri&amp;#34; : &amp;#34;/blog/2&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The whole thing already looks pretty good. Way better (and way faster
to code and to execute) than a home-baked SQL search.&lt;/p&gt;
&lt;p&gt;As already noted, this is just scratching the surface. Xapian can
do &lt;a href=&#34;https://getting-started-with-xapian.readthedocs.io/en/latest/howtos/index.html&#34;&gt;much more&lt;/a&gt;:
filtering, range queries, facets, sorting, even spelling corrections!&lt;/p&gt;
&lt;p&gt;I don’t doubt that Solr &amp;amp; co. have their use-cases, but for the common
scenario of a small/​mid-sized e-shop or site, I think that this
solution is more affordable and maintainable than having a whole
separate application (like a Solr server) to maintain, upgrade and
secure. Don’t forget that here we haven’t done a single HTTP request.
We didn’t have to manage daemons, opening/​closing ports, and the like. We
didn’t have to configure a schema and a tokenizer in a separate
application (and keep that aligned with the handling code). It’s all
there in our (Perl) code in two files (as already noted, the logic
should live in a single module).&lt;/p&gt;
&lt;p&gt;We just installed a library (there is a very good chance that it’s already
installed) and a Perl module.&lt;/p&gt;
&lt;p&gt;The Xapian database lives on the disk and your code has full control
over it. Also it’s normally your GNU/Linux distribution taking care of
the security upgrades.&lt;/p&gt;
&lt;p&gt;If your client is on a budget, building a full-text search Xapian can
be the right choice, and you can scale it up on the go, as more
features are required.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Mobile-friendly sites or bust!</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2015/03/mobile-friendly-sites-or-bust/"/>
      <id>https://www.endpointdev.com/blog/2015/03/mobile-friendly-sites-or-bust/</id>
      <published>2015-03-23T00:00:00+00:00</published>
      <author>
        <name>Jon Jensen</name>
      </author>
      <content type="html">
        &lt;p&gt;A few weeks ago, &lt;a href=&#34;http://googlewebmastercentral.blogspot.com/2015/02/finding-more-mobile-friendly-search.html&#34;&gt;Google announced that&lt;/a&gt; starting on April 21 it will expand its “use of mobile-friendliness as a ranking signal” which “will have a significant impact in our search results”.&lt;/p&gt;
&lt;p&gt;The world of search engine optimization and online marketing is aflutter about this announcement, given that even subtle changes in Google’s ranking algorithm can have major effects to improve or worsen any particular site’s ranking. And the announcement was made less than two months in advance of the announced date of the change, so there is not much time to dawdle.&lt;/p&gt;
&lt;p&gt;Google has lately been increasing its pressure on webmasters (is that still a real term‽) such as with its announcement last fall of an accelerated timetable for &lt;a href=&#34;https://security.googleblog.com/2014/09/gradually-sunsetting-sha-1.html&#34;&gt;sunsetting SSL certificates with SHA-1 signatures&lt;/a&gt;. So far these accelerated changes have been a good thing for most people on the Internet.&lt;/p&gt;
&lt;p&gt;In this case, Google provides an easy &lt;a href=&#34;https://www.google.com/webmasters/tools/mobile-friendly/&#34;&gt;Mobile-Friendly Site Test&lt;/a&gt; that you can run on your sites to see if you need to make changes or not:&lt;/p&gt;
&lt;div class=&#34;separator&#34; style=&#34;clear: both; text-align: center;&#34;&gt;&lt;a href=&#34;https://www.google.com/webmasters/tools/mobile-friendly/&#34;&gt;&lt;img border=&#34;0&#34; src=&#34;/blog/2015/03/mobile-friendly-sites-or-bust/image-0.png&#34;/&gt;&lt;/a&gt;&lt;br/&gt; &lt;/div&gt;
&lt;p&gt;So get on it and check those sites! I know we have a few that we can do some work on.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Elasticsearch: Give me object!</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2013/04/elasticsearch-object-mapping-eof-400/"/>
      <id>https://www.endpointdev.com/blog/2013/04/elasticsearch-object-mapping-eof-400/</id>
      <published>2013-04-30T00:00:00+00:00</published>
      <author>
        <name>Miguel Alatorre</name>
      </author>
      <content type="html">
        &lt;p&gt;I’m currently working on a project where Elasticsearch is used to index copious amounts of data with sometimes deeply nested JSON. A recurring error I’ve experienced is caused by a field not conforming to the type listed in the mapping. Let’s reproduce it on a small scale.&lt;/p&gt;
&lt;p&gt;Assuming you have Elasticsearch installed, let’s create an index and mapping:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ curl -XPUT &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;http://localhost:9200/test&amp;#39;&lt;/span&gt; -d &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    &amp;#34;mappings&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;item&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;            &amp;#34;properties&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;                &amp;#34;state&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;                    &amp;#34;properties&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;                        &amp;#34;name&amp;#34;: {&amp;#34;type&amp;#34;: &amp;#34;string&amp;#34;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;                    }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;                }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;            }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;ok&amp;#34;&lt;/span&gt;:true,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;acknowledged&amp;#34;&lt;/span&gt;:true}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Since we’ve defined properties for the “state” field, Elasticsearch will automatically treat it as an object.* Let’s now add some documents:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ curl -XPUT &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;http://localhost:9200/test/item/1&amp;#39;&lt;/span&gt; -d &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    &amp;#34;state&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;name&amp;#34;: &amp;#34;North Carolina&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;ok&amp;#34;&lt;/span&gt;:true,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;_index&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;test&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;_type&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;item&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;_id&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;1&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;_version&amp;#34;&lt;/span&gt;:1}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Success! Let’s now get into trouble:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ curl -XPUT &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;http://localhost:9200/test/item/2&amp;#39;&lt;/span&gt; -d &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    &amp;#34;state&amp;#34;: &amp;#34;California&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;error&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;MapperParsingException[object mapping for [state] tried to parse as object, but got EOF, has a concrete value been provided to it?]&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;status&amp;#34;&lt;/span&gt;:400}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The solution: check any non-objects in your data against your mapping schema and you’ll be sure to find a mismatch.&lt;/p&gt;
&lt;p&gt;One thing to note is that the explicit creation of the mapping is unnecessary since Elasticsearch creates it using the first added document. Try this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ curl -XPUT &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;http://localhost:9200/test2/item/1&amp;#39;&lt;/span&gt; -d &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    &amp;#34;state&amp;#34;: {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;name&amp;#34;: &amp;#34;North Carolina&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;ok&amp;#34;&lt;/span&gt;:true,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;_index&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;test2&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;_type&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;item&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;_id&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;1&amp;#34;&lt;/span&gt;,&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;_version&amp;#34;&lt;/span&gt;:1}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ curl -XGET &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;http://localhost:9200/test2/_mapping&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;test2&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;item&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;properties&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;state&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;dynamic&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;true&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;properties&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;name&amp;#34;&lt;/span&gt;: {&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;type&amp;#34;&lt;/span&gt;:&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;string&amp;#34;&lt;/span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;So, this stays true to the statement: &lt;a href=&#34;https://elasticsearch.com/products/elasticsearch/&#34;&gt;“Elasticsearch is schema-less, just toss it a typed JSON document and it will automatically index it.”&lt;/a&gt; You can throw your car keys at Elasticsearch and it will index, however, as noted above, just be sure to keep throwing nothing but car keys.&lt;/p&gt;
&lt;p&gt;*Anything with one or more nested key-value pairs is considered an object in Elasticsearch. For more on the object type, see &lt;a href=&#34;https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Make your code search-friendly</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2012/04/make-your-code-search-friendly/"/>
      <id>https://www.endpointdev.com/blog/2012/04/make-your-code-search-friendly/</id>
      <published>2012-04-12T00:00:00+00:00</published>
      <author>
        <name>Jeff Boes</name>
      </author>
      <content type="html">
        &lt;p&gt;Here’s something about coding style that you may not have considered: is your code “search-friendly”? That is, does the format of your code help or hinder someone who might be searching it for context while debugging, extending, or just learning how it works?&lt;/p&gt;
&lt;p&gt;Seriously Contrived Example (from Perl):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-perl&#34; data-lang=&#34;perl&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$string&lt;/span&gt; = &lt;span style=&#34;color:#2b2;background-color:#f0fff0&#34;&gt;q{Your transaction could not be}&lt;/span&gt; .
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#2b2;background-color:#f0fff0&#34;&gt;q{ processed due to a charge}&lt;/span&gt; .
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   &lt;span style=&#34;color:#2b2;background-color:#f0fff0&#34;&gt;q{ card error.}&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$string&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now someone’s going to experience this error and wonder where it occurs. So armed with grep, or ack, or git-grep, they set off into the wilderness:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ git grep &amp;#39;could not be processed&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ git grep &amp;#39;charge card error&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ git grep -e &amp;#39;transaction.*charge.*error&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;$ alsdkjgalkghkf&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(The last simulates pounding the keyboard with both fists.) I would suggest humbly that “strings you emit as a line should appear as a line in your code”, if for no other reason than that it makes it so much easier for you or others to find them. Thus:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-perl&#34; data-lang=&#34;perl&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;my&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$string&lt;/span&gt; = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;lt;&amp;lt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;MSG&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;Your transaction could not be processed due to a charge card error.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;MSG&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;$string&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


      </content>
    </entry>
  
    <entry>
      <title>Interchange Search Caching with “Permanent More”</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2012/01/interchange-search-caching-with/"/>
      <id>https://www.endpointdev.com/blog/2012/01/interchange-search-caching-with/</id>
      <published>2012-01-02T00:00:00+00:00</published>
      <author>
        <name>Mark Johnson</name>
      </author>
      <content type="html">
        &lt;p&gt;Most sites that use Interchange take advantage of Interchange’s “more lists”. These are built-in tools that support an Interchange “search” (either the search/scan action, or result of direct SQL via [query]) to make it very easy to paginate results. Under the hood, the more list is a drill-in to a cached “search object”, so each page brings back a slice from the cache of the original search. There are extensive ways to modify the look and behavior of more lists and, with a bit of effort, they can be configured to meet design requirements.&lt;/p&gt;
&lt;p&gt;Where more lists tend to fall short, however, is with respect to SEO. There are two primary SEO deficiencies that get business stakeholders’ attention:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There is little control over the construction of the URLs for more lists. They leverage the scan actionmap and contain a hash key for the search object and numeric data to identify the slice and page location. They possess no intrinsic value in identifying the content they reference.&lt;/li&gt;
&lt;li&gt;The search cache by default is ephemeral and session-specific. This means all those results beyond page 1 the search engine has cataloged will result in dead links for search users who try to land directly on the more-listed pages.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is the latter issue that I wish to address because there is—​and has been for some time now—​a simple mechanism called “permanent more” to remedy the default behavior.&lt;/p&gt;
&lt;p&gt;You can leverage “permanent more” by adding the boolean &lt;strong&gt;mv_more_permanent&lt;/strong&gt;, or the shorthand &lt;strong&gt;pm&lt;/strong&gt;, to your search conditions. E.g.:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Link:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &amp;lt;a href=&amp;#34;[area search=&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        co=1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        sf=category
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        se=Foo
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        op=rm
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        more=1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ml=5
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &amp;lt;b&amp;gt;pm=1&amp;lt;/b&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &amp;#34;]&amp;#34;&amp;gt;All Foos&amp;lt;/a&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Loop:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    [loop search=&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        co=1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        sf=category
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        se=Foo
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        op=rm
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        more=1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ml=5
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &amp;lt;b&amp;gt;pm=1&amp;lt;/b&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &amp;#34;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ...loop body with [more-list]...
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    [/loop]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Query:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    [query
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        list=1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        more=1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ml=10
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &amp;lt;b&amp;gt;pm=1&amp;lt;/b&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        sql=&amp;#34;SELECT * FROM products WHERE category LIKE &amp;#39;%Foo%&amp;#39;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ...same as loop but with 10 matches/page...
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    [/query]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If the initial search is defined with the “permanent more” setting, it will produce the following adjustments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The hash key used to store and identify the search cache is deterministic based on the search conditions. Many searches for Interchange are category driven. Thus, all end users who wish to browse a category end up clicking identical links, which create duplicate search caches, belonging uniquely to them. With permanent more, they all share the same cache, with the same identifier. As long as the search conditions don’t change, neither does the cache identifier. Even as the cache is refreshed with new executions of the search, the object remains in the same location. Thus, the results a search engine produced this morning reference links still valid now, tomorrow, or next week, provided they reference the same search conditions.&lt;/li&gt;
&lt;li&gt;The cached search object has no session affinity. Any link referencing the cache with the correct hash key has access to the content.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Taken together, “permanent more” removes (for the most part, addressed later) dead links from more lists cataloged by search engines. There are, however, other benefits to “permanent more” beyond those intended as described above:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;As stated in passing, standard Interchange search caching produces duplicate search objects for common search conditions. For a busy site, these caches can have an impact on storage. Typically, maintenance is implemented to clean up cache files for all such files whose age exceeds by some amount the session duration (standard is 48 hours). With permanent more, duplicate caches are eliminated. A cache location is reused by all users with the same search requirements, keeping data-storage requirements for caches to the minimum necessary. As searches change, ophaned caches can still easily be cleaned up as they will immediately start to age with no more access to them necessary for storage.&lt;/li&gt;
&lt;li&gt;For the same reason that “permanent more” resolves search-engine links, it also resolves content management for individual sites using a reverse proxy for caching. Because most (and certainly the easiest) caching keys are based off of URL, the deterministic nature of the hash keys for “permanent more” allows assurance that the cached content in the proxy accurately reflects the search content over time, and that all users will hit the cached resource and not generate new, unique links with varying hash keys.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One shortcoming of “permanent more” to be aware of is the impact of changing data underneath the search. Even if search conditions do not change, the count and order of matching record sets may. So, e.g., enough products may be removed from a given category to cause the last page of a more list to become empty, which would cause any specific link into that page to become dead. More minor, but still a possibility, is the introduction or removal of products so that a particularly searched-for term has been “bumped” to another page within the search cache since the last time the search engine crawled the more lists. For searches backed by particularly volatile data, “permanent more” may not be sufficient to address search-engine or caching demands.&lt;/p&gt;
&lt;p&gt;Finally, “permanent more” should be avoided for any search features that may cache data sensitive to an individual user. This is unlikely to happen as, under most circumstances, the configuration of the search itself will change based on the unique characteristics of the user executing the search (e.g., a username included in a query to review order history). However, it is still possible that context-sensitive information could be stored in the search object and, if so, all other users with access to the more lists would have access to that information.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Sunspot, Solr, Rails: Working with Results</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2011/12/sunspot-solr-rails-working-with-results/"/>
      <id>https://www.endpointdev.com/blog/2011/12/sunspot-solr-rails-working-with-results/</id>
      <published>2011-12-12T00:00:00+00:00</published>
      <author>
        <name>Steph Skardal</name>
      </author>
      <content type="html">
        &lt;p&gt;Having worked with &lt;a href=&#34;http://sunspot.github.io/&#34;&gt;Sunspot&lt;/a&gt; and &lt;a href=&#34;https://lucene.apache.org/solr/&#34;&gt;Solr&lt;/a&gt; in several large Rails projects now, I’ve gained some knowledge about working with result sets optimally. Here’s a brief explanation on working with results or hits from a search object.&lt;/p&gt;
&lt;h3 id=&#34;mvc-setup&#34;&gt;MVC Setup&lt;/h3&gt;
&lt;p&gt;When working with Sunspot, searchable fields are defined in the model:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-ruby&#34; data-lang=&#34;ruby&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Thing&lt;/span&gt; &amp;lt; &lt;span style=&#34;color:#036;font-weight:bold&#34;&gt;ActiveRecord&lt;/span&gt;::&lt;span style=&#34;color:#036;font-weight:bold&#34;&gt;Base&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  searchable &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    text &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field1&lt;/span&gt;, &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:stored&lt;/span&gt; =&amp;gt; &lt;span style=&#34;color:#080&#34;&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    text &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    string &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;, &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:stored&lt;/span&gt; =&amp;gt; &lt;span style=&#34;color:#080&#34;&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    integer &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field4&lt;/span&gt;, &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:multiple&lt;/span&gt; =&amp;gt; &lt;span style=&#34;color:#080&#34;&gt;true&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The code block above will include field1, field2, field3, and field4 in the search index of &lt;strong&gt;things&lt;/strong&gt; . A keyword or text search on things will search field1 and field2 for matches. field3 and field4 may be used for scoping, or limiting the search result set based to specific values of field3 or field4.&lt;/p&gt;
&lt;p&gt;In your controller, a new search object is created with the appropriate scoping and keyword values, shown below. Pagination is also added inside the search block.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-ruby&#34; data-lang=&#34;ruby&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ThingsController&lt;/span&gt; &amp;lt; &lt;span style=&#34;color:#036;font-weight:bold&#34;&gt;ApplicationController&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;index&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#33b&#34;&gt;@search&lt;/span&gt; = &lt;span style=&#34;color:#036;font-weight:bold&#34;&gt;Sunspot&lt;/span&gt;.search(&lt;span style=&#34;color:#036;font-weight:bold&#34;&gt;Thing&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#888&#34;&gt;#fulltext search&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      fulltext params[&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:keyword&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#888&#34;&gt;#scoping&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; params.has_key?(&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        with &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;, params[&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; params.has_key?(&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        with &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;, params[&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field4&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      paginate &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:page&lt;/span&gt; =&amp;gt; params[&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:page&lt;/span&gt;], &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:per_page&lt;/span&gt; =&amp;gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;25&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#33b&#34;&gt;@search&lt;/span&gt;.execute!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In the view, one can iterate through the result set, where results is an array of Thing instances.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;% @search.results.each do |result| -%&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;h2&amp;gt;&amp;lt;%= result.field3 %&amp;gt;&amp;lt;/h2&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;%= result.field1 %&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;% end -%&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;working-with-hits&#34;&gt;Working with Hits&lt;/h3&gt;
&lt;p&gt;The above code works. It works nicely until you display many results on one page where instantiation of things is not expensive. But the above code will call the query below for every search, and subsequently instantiate Ruby objects for each of the things found. This can become sluggish when the result set is large or the items themselves are expensive to instantiate.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;# development.log
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Thing Load (0.9ms)  SELECT &amp;#34;things&amp;#34;.* FROM &amp;#34;things&amp;#34; WHERE &amp;#34;things&amp;#34;.&amp;#34;id&amp;#34; IN (6, 12, 7, 13, 8, ...)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;An optimized way to work with search results sets is working directly with hits. @search.hits is an array of Sunspot::Search::Hits, which represent the raw information returned by Solr for a single returned item. Hit objects provide access to stored field values, identified by the :stored option in the model’s searchable definition. The model definition looks the same. The controller may now look like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-ruby&#34; data-lang=&#34;ruby&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ThingsController&lt;/span&gt; &amp;lt; &lt;span style=&#34;color:#036;font-weight:bold&#34;&gt;ApplicationController&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;index&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    search = &lt;span style=&#34;color:#036;font-weight:bold&#34;&gt;Sunspot&lt;/span&gt;.search(&lt;span style=&#34;color:#036;font-weight:bold&#34;&gt;Thing&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#888&#34;&gt;#fulltext search&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      fulltext params[&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:keyword&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#888&#34;&gt;#scoping&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; params.has_key?(&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        with &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;, params[&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; params.has_key?(&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        with &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;, params[&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field4&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    search.execute!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#33b&#34;&gt;@hits&lt;/span&gt; = search.hits.paginate &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:page&lt;/span&gt; =&amp;gt; params[&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:page&lt;/span&gt;], &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:per_page&lt;/span&gt; =&amp;gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;25&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And working with the data in the view may look like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;% @hits.each do |result| -%&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;h2&amp;gt;&amp;lt;%= hit.stored(:field3) %&amp;gt;&amp;lt;/h2&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;%= hit.stored(:field1) %&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&amp;lt;% end -%&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In some cases, you may want to introduce an additional piece of logic prior pagination, which is the case with the most recent Rails application I’ve been working on:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-ruby&#34; data-lang=&#34;ruby&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ...
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    search.execute!
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    filtered_results = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    search.hits.each &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;do&lt;/span&gt; |hit|
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; hit.stored(&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field3&lt;/span&gt;) == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;some arbitrary value&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        filtered_results &amp;lt;&amp;lt; hit
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;elsif&lt;/span&gt; hit.stored(&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:field1&lt;/span&gt;) == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;some other arbitrary value&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        filtered_results &amp;lt;&amp;lt; hit
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;      &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;end&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;   
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#33b&#34;&gt;@hits&lt;/span&gt; = filtered_results.paginate &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:page&lt;/span&gt; =&amp;gt; params[&lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:page&lt;/span&gt;], &lt;span style=&#34;color:#a60;background-color:#fff0f0&#34;&gt;:per_page&lt;/span&gt; =&amp;gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;25&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Sunspot and Solr are rich with functionality and features that can add value to a Rails application, but it’s important to identify areas of the application where database calls can be minimized and lazy loading can be optimized for better performance. The standard log file and database log file are good places to start looking.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Creativity with fuzzy string search</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2010/08/creativity-with-fuzzy-string-search/"/>
      <id>https://www.endpointdev.com/blog/2010/08/creativity-with-fuzzy-string-search/</id>
      <published>2010-08-10T00:00:00+00:00</published>
      <author>
        <name>Josh Tolley</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img alt=&#34;magnifying glass&#34; border=&#34;0&#34; src=&#34;/blog/2010/08/creativity-with-fuzzy-string-search/image-0.png&#34; style=&#34;float:left; margin:0 10px 10px 0; width:234px; height:320px&#34;/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL provides a useful set of contrib modules for “fuzzy” string searching; that is, searching for something that sounds like or looks like the original search key, but that might not exactly match. One place this type of searching shows up frequently is when looking for peoples’ names. For instance, a receptionist at the dentist’s office doesn’t want to have to ask for the exact spelling of your name every time you call asking for an appointment, so the scheduling application allows “fuzzy” searches, and the receptionist doesn’t have to get it exactly right to find out who you really are. The PostgreSQL documentation provides an excellent introduction to the topic in terms of the available modules; &lt;a href=&#34;https://www.postgresonline.com/journal/index.php?/archives/158-Where-is-soundex-and-other-warm-and-fuzzy-string-things.html&#34;&gt;This blog post&lt;/a&gt; also demonstrates some of the things they can do.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;https://web.archive.org/web/20100612002539/http://www.trisano.org/&#34;&gt;TriSano&lt;/a&gt; application was originally written to use soundex search alone to find patient names, but that proved insufficient, particularly because common-sounding last names with unusual spellings would be ranked very poorly in the search results. Our solution, which has worked quite well in practice, involved creative use of PostgreSQL’s full-text search combined with the &lt;a href=&#34;https://www.postgresql.org/docs/current/static/pgtrgm.html&#34;&gt;pg_trgm contrib module&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A trigram is a set of three characters. In the case of pg_trgm, it’s three adjacent characters taken from a given input text. The pg_trgm module provides easy ways to extract all possible trigrams from an input, and compare them with similar sets taken from other inputs. Two strings that generate similar trigram lists are, in theory, similar strings. There’s no particular reason you couldn’t use two, four, or some other number of characters instead of &lt;strong&gt;tri&lt;/strong&gt;grams, but you’d trade sensitivity and variability. And as the name implies, pg_trgm only supports trigrams.&lt;/p&gt;
&lt;p&gt;Straight trigram search didn’t buy us much on top of soundex, so we got a bit more creative. A trigram is just a set of three characters, which looks pretty much just like a word, so we thought we’d try using PostgreSQL’s full text search on trigram data. Typically full text search has a list of “stop words”: un-indexed words judged too common and too short to contribute meaningfully to an index. Our words would all be three characters long, so we had to create a new text search configuration using a dictionary with an empty stop word list. With that text search configuration, we could index trigrams effectively.&lt;/p&gt;
&lt;p&gt;This search helped, but wasn’t quite good enough. We finally borrowed a simplified version of a data mining technique called &lt;a href=&#34;https://en.wikipedia.org/wiki/Boosting_(machine_learning)&#34;&gt;“boosting”&lt;/a&gt;, which involves using multiple “weak” classifiers or searchers to create one relatively good result set. We combined straightforward trigram, soundex, and metaphone searches with a normal full text search of the unmodified name data and a full text search over the trigrams generated from the names. The data sizes in question aren’t particularly large, so this amount of searching hasn’t proven unsustainably taxing on processor power, and it provides excellent results. The code is &lt;a href=&#34;https://github.com/csinitiative/trisano/blob/master/webapp/db/name_search.sql&#34;&gt;on github&lt;/a&gt;; feel free to try it out.&lt;/p&gt;
&lt;p&gt;Update: One of the comments suggested a demonstration of the results, which of course makes perfect sense. So I resurrected some of the scripts I used when developing the technique. In addition to the scripts used to install the fuzzystrmatch and pg_trgm modules and the name_search.sql script linked above, I had a script that populated the &lt;em&gt;people&lt;/em&gt; table with a bunch of fake names. Then, it’s easy to test the search mechanism like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;select * from search_for_name(&amp;#39;John Doe&amp;#39;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;as a(id integer, last_name text, first_name text, sources text[], rank double precision);
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; id  |  last_name  | first_name |                     sources                     |        rank        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;-----+-------------+------------+-------------------------------------------------+--------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 167 | Krohn       | Javier     | {trigram_fts,name_trgm,trigram_fts,trigram_fts} |  0.281305521726608
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 228 | Jordahl     | Javier     | {trigram_fts,name_trgm,trigram_fts}             |  0.237995445728302
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  59 | Pesce       | Dona       | {trigram_fts}                                   |  0.174265757203102
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 185 | Finchum     | Dona       | {trigram_fts}                                   |  0.174265757203102
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 104 | Rumore      | Dona       | {trigram_fts}                                   |  0.174265757203102
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 250 | Dumond      | Julio      | {name_trgm,trigram_fts,trigram_fts}             |   0.16849160194397
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 200 | Dedmon      | Javier     | {name_trgm,trigram_fts,trigram_fts}             |  0.163729697465897
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 230 | Dossey      | Malinda    | {name_trgm,trigram_fts}                         |  0.158055320382118
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  50 | Dress       | Darren     | {name_trgm,trigram_fts}                         |  0.153293430805206
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 136 | Doshier     | Neil       | {name_trgm,trigram_fts}                         |  0.148531511425972
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 165 | Donatelli   | Lance      | {name_trgm,trigram_fts}                         |  0.132845237851143
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 280 | Dollinger   | Clinton    | {name_trgm,trigram_fts}                         |  0.132845237851143
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 273 | Dimeo       | Milagros   | {name_trgm,trigram_fts}                         | 0.0866267532110214
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  49 | Dawdy       | Christian  | {name_trgm,trigram_fts}                         | 0.0866267532110214
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 298 | Elswick     | Jami       | {trigram_fts}                                   | 0.0845221653580666&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This isn’t all the results it returned, but it gives an idea what the results look like. The rank value ranks results based on the rankings given by each of the underlying search methods, and the sources column shows which of the search methods found this particular entry. Some search methods may show up twice, because that search method found multiple matches between the input text and the result record. These results don’t look particularly good, because there isn’t really a good match for “John Doe” in the data set. But if I horribly misspell “Jamie Elswick”, the search does a good job:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;select * from search_for_name(&amp;#39;Jomy Elswik&amp;#39;) as a(id integer, last_name text,                                                 
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;first_name text, sources text[], rank double precision)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; id  |  last_name  | first_name |                     sources                     |        rank        
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;-----+-------------+------------+-------------------------------------------------+--------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 298 | Elswick     | Jami       | {trigram_fts,name_trgm,trigram_fts,trigram_fts} |  0.480943143367767
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 312 | Elswick     | Kurt       | {name_trgm,trigram_fts}                         |  0.381967514753342
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 228 | Jordahl     | Javier     | {trigram_fts,name_trgm,trigram_fts}             |  0.197063013911247
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 403 | Walberg     | Erik       | {trigram_fts}                                   |  0.145491883158684
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; 309 | Hammaker    | Erik       | {trigram_fts}                                   |  0.145491883158684&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;


      </content>
    </entry>
  
    <entry>
      <title>Acts As Xapian — It Just Works</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2008/08/acts-as-xapian-it-just-works/"/>
      <id>https://www.endpointdev.com/blog/2008/08/acts-as-xapian-it-just-works/</id>
      <published>2008-08-26T00:00:00+00:00</published>
      <author>
        <name>Sean Schofield</name>
      </author>
      <content type="html">
        &lt;p&gt;I just recently started listening to the podcast done by the guys at &lt;a href=&#34;https://web.archive.org/web/20080906094739/http://railsenvy.com/&#34;&gt;RailsEnvy&lt;/a&gt;. It’s an excellent resource for keeping up on what’s new in the Rails world and it’s how I found out about the new &lt;a href=&#34;https://github.com/frabcus/acts_as_xapian/wikis&#34;&gt;acts_as_xapian&lt;/a&gt; search plugin for Rails. The podcast mentioned this &lt;a href=&#34;https://web.archive.org/web/20080902205324/http://locomotivation.com/2008/07/15/mulling-over-our-ruby-on-rails-full-text-search-options&#34;&gt;blog post&lt;/a&gt; which contains a very thorough rundown of all the different full-text search options currently available for rails. The timing of this article couldn’t have been better since I was in the market for a new solution.&lt;/p&gt;
&lt;p&gt;I was approaching a deadline on a client project here at End Point and I was having lots of trouble with my existing search solution which was &lt;a href=&#34;https://github.com/jkraemer/acts_as_ferret/&#34;&gt;acts_as_ferret&lt;/a&gt;. Setting up ferret was relatively easy and I was very impressed with the &lt;a href=&#34;https://web.archive.org/web/20080913093502/http://lucene.apache.org/java/docs/queryparsersyntax.html&#34;&gt;Lucene syntax&lt;/a&gt; that it supported. It seemed like a perfect a solution at first but then came “the troubles.”&lt;/p&gt;
&lt;p&gt;Ferret is extremely fragile. The slightest problem and your server will just crash. What was causing the crash? Unfortunately the server logs won’t give you much help there. You will receive some cryptic message coming from the C++ library if you’re lucky. Note that I skipped the suggested Drb server setup since this was a development box.&lt;/p&gt;
&lt;p&gt;After a while I would notice something wrong in my model code that might have caused an error while updating the search index. Unfortunately this was impossible to verify since I could not predictably reproduce the error. So in the end, I think there may have been issues with my model fields but ferret was of no help in tracking these problems down. The final straw came when the client started testing and almost immediately crashed the server after doing a search.&lt;/p&gt;
&lt;p&gt;Enter acts_as_xapian. Jim Mulholland’s excellent &lt;a href=&#34;https://web.archive.org/web/20080826095829/http://locomotivation.com/2008/07/23/simple-ruby-on-rails-full-text-search-using-xapian&#34;&gt;tutorial&lt;/a&gt; was pretty much all I needed to get it up and running on my Mac. Documentation for acts_as_xapian is a bit thin. It consists primarily of the afore mentioned tutorial and a very detailed &lt;a href=&#34;https://github.com/frabcus/acts_as_xapian/tree/master/README.txt&#34;&gt;README&lt;/a&gt;. The mailing list is starting to become more active, however, and you are likely to get a response there to any thoughtful questions you might have.&lt;/p&gt;
&lt;p&gt;One major difference with xapian (vs. ferret) is that it does not rebuild your index automatically with each model update. When you modify an ActiveRecord instance it will update the acts_as_xapian_jobs table with the id and model type of your record so that the index can be updated later. The index is then updated via a rake command that you can easily schedule via cron. You can also rebuild the entire index using a different rake command but that shouldn’t really be necessary.&lt;/p&gt;
&lt;p&gt;I was a bit concerned about the lack of a continuously updated index but I came to realize that it has some significant advantages. The biggest advantage is that it’s much faster to update your model records since you are not waiting for the re-indexing to complete on the same thread. It also means you can skip the step of setting up a separate Drb server for ferret in your production environment.&lt;/p&gt;
&lt;p&gt;With xapian you can index “related fields” in other models by constructing a pseudo-attribute in your model that returns the value of the associated model as a text string. Ferret allows you to do this as well, but unlike ferret, xapian gives excellent feedback about any mistakes you might have made while constructing them. If you have a nil exception somewhere in one of these related fields, xapian will complain and tell you exactly what line it’s bombing out.&lt;/p&gt;
&lt;p&gt;I was also able to setup paging for my search results with &lt;a href=&#34;https://web.archive.org/web/20080915144208/http://cardboardrocket.com/pages/paginating_find&#34;&gt;paginating_find&lt;/a&gt; which I prefer to &lt;a href=&#34;https://github.com/mislav/will_paginate/wikis&#34;&gt;will_paginate&lt;/a&gt; (just a personal preference—​nothing wrong with will_paginate). There is also a cool feature that will suggest other possible terms (“Did you mean?”) if your search returns no results. So far the only disappointment has been the lack of an obvious way to do searches on specific fields.&lt;/p&gt;
&lt;p&gt;If you are in the market for a new full-text search solution for Rails, you should really give xapian a try.&lt;/p&gt;

      </content>
    </entry>
  
</feed>
