<?xml version="1.0" encoding="utf-8" standalone="yes"?><feed xmlns="http://www.w3.org/2005/Atom">
  <title></title>
  <subtitle></subtitle>
  <id>https://www.endpointdev.com/blog/tags/artificial-intelligence/</id>
  <link href="https://www.endpointdev.com/blog/tags/artificial-intelligence/"/>
  <link href="https://www.endpointdev.com/blog/tags/artificial-intelligence/" rel="self"/>
  <updated>2025-12-01T00:00:00+00:00</updated>
  <author>
    <name>End Point Dev</name>
  </author>
  
    <entry>
      <title>Build a Smarter Telegram Bot: Integrating a RAG Pipeline for FAQ Answering</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/12/telegram-bot-rag-pipeline/"/>
      <id>https://www.endpointdev.com/blog/2025/12/telegram-bot-rag-pipeline/</id>
      <published>2025-12-01T00:00:00+00:00</published>
      <author>
        <name>Bimal Gharti Magar</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/12/telegram-bot-rag-pipeline/mountains-and-clouds.webp&#34; alt=&#34;A black-and-white image of clouds hanging over a sharp mountain, jutting from the bottom right of the image, and a longer mountain ridge further back in the bottom left of the image.&#34;&gt;&lt;/p&gt;
&lt;!-- Photo by Seth Jensen, 2025. --&gt;
&lt;p&gt;In this post, we will show you how to build an intelligent FAQ bot using Python and the Telegram Bot API. We&amp;rsquo;ll go beyond simple commands by integrating a Retrieval-Augmented Generation (RAG) pipeline with LangChain.&lt;/p&gt;
&lt;p&gt;This RAG pipeline lets our bot pull information from a custom knowledge base (in our case, a simple &lt;code&gt;faqs.json&lt;/code&gt; file) and use a local Large Language Model (LLM) through Ollama to generate accurate answers. The best part? This approach (which works great with interfaces like Open WebUI) gives you full control over your models and data with zero API costs.&lt;/p&gt;
&lt;h3 id=&#34;what-is-telegram&#34;&gt;What is Telegram?&lt;/h3&gt;
&lt;p&gt;You&amp;rsquo;ve probably heard of &lt;a href=&#34;https://telegram.org/&#34;&gt;Telegram&lt;/a&gt;—it&amp;rsquo;s a popular, cloud-based instant messaging app. It’s fast, works everywhere (mobile, web, and desktop), and has powerful features like huge group chats and easy file sharing.&lt;/p&gt;
&lt;p&gt;One of its most powerful features for developers is the Telegram Bot API, an open platform that allows anyone to build and integrate automated applications (like ours!) directly into the chat interface.&lt;/p&gt;
&lt;h3 id=&#34;a-warning-on-privacy-and-encryption&#34;&gt;A Warning on Privacy and Encryption&lt;/h3&gt;
&lt;p&gt;Before we build our bot, it is critical to understand how Telegram handles encryption, as it directly impacts user privacy.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cloud Chats (The Default)&lt;/strong&gt;: All standard chats, group chats, and all bot interactions are &amp;ldquo;Cloud Chats.&amp;rdquo; These use server-client encryption. This means your messages are encrypted between your device and Telegram&amp;rsquo;s servers, and then stored (encrypted) on their servers. This is what allows you to access your chat history from any device. However, Telegram itself holds the encryption keys and can access this data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Secret Chats (Manual)&lt;/strong&gt;: Telegram also offers &amp;ldquo;Secret Chats,&amp;rdquo; which are end-to-end encrypted (E2EE). In this mode, only you and the recipient can read the messages. Telegram has no access. However, bots cannot operate in Secret Chats.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means that any message a user sends to our bot is a &amp;ldquo;Cloud Chat&amp;rdquo; and is &lt;em&gt;not end-to-end encrypted&lt;/em&gt;. The data is accessible to Telegram and will be processed in plain text by our bot.py script on our server.&lt;/p&gt;
&lt;p&gt;For this reason, you should never build a bot that asks for or encourages users to send sensitive private data such as passwords, financial information, or social security numbers. Always treat bot conversations as non-private.&lt;/p&gt;
&lt;h3 id=&#34;what-is-retrieval-augmented-generation-rag&#34;&gt;What is Retrieval-Augmented Generation (RAG)?&lt;/h3&gt;
&lt;p&gt;Retrieval-Augmented Generation (RAG) is a technique that makes Large Language Models (LLMs) smarter by connecting them to external, private knowledge.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The Problem: An LLM like llama3 only knows the information it was trained on. It has no access to your company&amp;rsquo;s internal FAQs, new documents, or any private data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The Solution (RAG): RAG solves this in two steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Retrieve: When you ask a question, the system first retrieves relevant information from your own knowledge base (for us, our faqs.json file).&lt;/li&gt;
&lt;li&gt;Augment: It then augments the LLM&amp;rsquo;s prompt by pasting that retrieved information in as context, along with your original question.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short, instead of just asking the bot &amp;ldquo;What&amp;rsquo;s the shipping policy?&amp;rdquo;, we&amp;rsquo;re effectively asking, &amp;ldquo;Based on this specific text: &amp;lsquo;&amp;hellip;We offer standard shipping&amp;hellip;&amp;rsquo; — what is the shipping policy?&amp;rdquo; This forces the LLM to base its answer on our facts, not its own general knowledge, making the response accurate and reliable.&lt;/p&gt;
&lt;h3 id=&#34;what-youll-build&#34;&gt;What you’ll build&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Telegram bot&lt;/li&gt;
&lt;li&gt;faqs.json knowledge base&lt;/li&gt;
&lt;li&gt;RAG pipeline with local embeddings (FAISS) + LLM (Open WebUI)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;prerequisites&#34;&gt;Prerequisites&lt;/h3&gt;
&lt;p&gt;You’ll need:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Python 3.12+&lt;/li&gt;
&lt;li&gt;a Telegram bot token (from BotFather)&lt;/li&gt;
&lt;li&gt;access to an LLM via a locally hosted Open WebUI instance (OpenAI-compatible API)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;setting-up-the-project-for-telegram-bot&#34;&gt;Setting up the Project for Telegram Bot&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;uv&lt;/code&gt; is a high-performance Python package manager, so we&amp;rsquo;ll use it to set up our project. If you don&amp;rsquo;t have it installed, you can get it with or visit the &lt;a href=&#34;https://docs.astral.sh/uv/getting-started/installation/&#34;&gt;site&lt;/a&gt; for installation steps:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install uv&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Create a new project directory and navigate into it:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;mkdir telegram-rag-bot
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cd telegram-rag-bot&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Initialize a new Python project.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;uv init --bare&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This command creates a minimal pyproject.toml file. This file will track our project&amp;rsquo;s metadata and, most importantly, its dependencies.&lt;/p&gt;
&lt;p&gt;Create a virtual environment using uv:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;uv venv&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will create a .venv directory. Activate it with the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;source .venv/bin/activate
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;# On Windows, use
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;#.venv\Scripts\activate&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Install the necessary Python packages using uv:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;uv add python-telegram-bot python-dotenv langchain langchain-openai langchain-community faiss-cpu jq sentence-transformers&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The key libraries are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;python-telegram-bot&lt;/code&gt;: For handling all Telegram communication&lt;/li&gt;
&lt;li&gt;&lt;code&gt;langchain&lt;/code&gt;: The primary framework for building the RAG pipeline&lt;/li&gt;
&lt;li&gt;&lt;code&gt;langchain-openai&lt;/code&gt;: Connector to Open WebUI’s OpenAI-compatible API&lt;/li&gt;
&lt;li&gt;&lt;code&gt;faiss-cpu&lt;/code&gt;: An efficient library for similarity search, used as a local vector store to quickly find relevant chunks of your FAQ data&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;environment-and-configuration&#34;&gt;Environment and configuration&lt;/h4&gt;
&lt;p&gt;The bot reads the Telegram token from the environment variable &lt;code&gt;BOT_TOKEN&lt;/code&gt;. We can store it in a .env file as &lt;code&gt;BOT_TOKEN=your-token-here&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# .env (Open WebUI)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# Open WebUI_URL must end with /v1 (e.g., http://localhost:3000/v1).&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#369&#34;&gt;BOT_TOKEN&lt;/span&gt;=123456:abcdefg
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Open &lt;span style=&#34;color:#369&#34;&gt;WebUI_URL&lt;/span&gt;=http://localhost:3000/v1
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Open &lt;span style=&#34;color:#369&#34;&gt;WebUI_API_KEY&lt;/span&gt;=your_key_here&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href=&#34;https://core.telegram.org/bots/features#inline-requests&#34;&gt;Inline mode&lt;/a&gt; requires enabling inline for the bot via BotFather.&lt;/p&gt;
&lt;p&gt;Create a new file named &lt;code&gt;bot.py&lt;/code&gt; and add the following code to set up and add message handlers for the Telegram bot.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;logging&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;uuid&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; uuid4
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;telegram&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; Update, InlineQueryResultArticle, InputTextMessageContent
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;telegram.ext&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; filters, MessageHandler, ApplicationBuilder, CommandHandler, ContextTypes, InlineQueryHandler
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;dotenv&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; load_dotenv
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# load .env variables&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;load_dotenv()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;bot_token = os.getenv(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;BOT_TOKEN&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# Setup logging&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;logging.basicConfig(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;format&lt;/span&gt;=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;%(asctime)s&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt; - &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;%(name)s&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt; - &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;%(levelname)s&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt; - &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;%(message)s&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    level=logging.INFO
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;logger = logging.getLogger(&lt;span style=&#34;color:#369&#34;&gt;__name__&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;async&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;start&lt;/span&gt;(update: Update, context: ContextTypes.DEFAULT_TYPE):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; context.bot.send_message(chat_id=update.effective_chat.id, text=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;I&amp;#39;m a bot, please talk to me!&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;async&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;echo&lt;/span&gt;(update: Update, context: ContextTypes.DEFAULT_TYPE):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; context.bot.send_message(chat_id=update.effective_chat.id, text=update.message.text)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;async&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;caps&lt;/span&gt;(update: Update, context: ContextTypes.DEFAULT_TYPE):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    text_caps = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt;.join(context.args).upper()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; context.bot.send_message(chat_id=update.effective_chat.id, text=text_caps)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;async&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;inline_caps&lt;/span&gt;(update: Update, context: ContextTypes.DEFAULT_TYPE):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    query = update.inline_query.query
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;not&lt;/span&gt; query:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    results = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    results.append(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        InlineQueryResultArticle(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#038&#34;&gt;id&lt;/span&gt;=&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(uuid4()),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            title=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Caps&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            input_message_content=InputTextMessageContent(query.upper())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; context.bot.answer_inline_query(update.inline_query.id, results)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;async&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;unknown&lt;/span&gt;(update: Update, context: ContextTypes.DEFAULT_TYPE):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; context.bot.send_message(chat_id=update.effective_chat.id, text=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Sorry, I didn&amp;#39;t understand that command.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;async&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;document&lt;/span&gt;(update: Update, context: ContextTypes.DEFAULT_TYPE):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; update.message.document:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        file = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; update.message.document.get_file()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        file_name = update.message.document.file_name
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; file.download_to_drive(file_name)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;elif&lt;/span&gt; update.message.photo:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# Get the largest photo size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        file = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; update.message.photo[-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;].get_file()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        file_name = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;photo_&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;file.file_unique_id&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.jpg&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#888&#34;&gt;# Create a unique name for photos&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; file.download_to_drive(file_name)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;elif&lt;/span&gt; update.message.video:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        file = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; update.message.video.get_file()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        file_name = update.message.video.file_name
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; file.download_to_drive(file_name)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;await&lt;/span&gt; update.message.reply_text(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Please send a document, photo, or video.&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;main&lt;/span&gt;() -&amp;gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;None&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    start_handler = CommandHandler(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;start&amp;#39;&lt;/span&gt;, start)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    echo_handler = MessageHandler(filters.TEXT &amp;amp; (~filters.COMMAND), echo)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    caps_handler = CommandHandler(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;caps&amp;#39;&lt;/span&gt;, caps)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    inline_caps_handler = InlineQueryHandler(inline_caps)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    document_handler = MessageHandler(filters.PHOTO | filters.Document.PDF | filters.VIDEO, document)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    unknown_handler = MessageHandler(filters.COMMAND, unknown)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    application = ApplicationBuilder().token(bot_token).build()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    application.add_handler(start_handler)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    application.add_handler(echo_handler)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    application.add_handler(caps_handler)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    application.add_handler(inline_caps_handler)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    application.add_handler(document_handler)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    application.add_handler(unknown_handler)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# Run the bot&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    logger.info(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Starting bot polling...&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    application.run_polling()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;__name__&lt;/span&gt; == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;__main__&amp;#39;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    main()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id=&#34;try-it-out&#34;&gt;Try it out&lt;/h4&gt;
&lt;p&gt;To run the application, simply run:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;uv run bot.py&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Search for your bot name in Telegram and send the bot a message or command like &lt;code&gt;/start&lt;/code&gt; or &lt;code&gt;/caps&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id=&#34;what-this-bot-does&#34;&gt;What this bot does&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Responds to /start with a greeting&lt;/li&gt;
&lt;li&gt;Echoes back any plain text message (that isn’t a command)&lt;/li&gt;
&lt;li&gt;Converts text to uppercase via /caps or inline mode&lt;/li&gt;
&lt;li&gt;Downloads files users send (photos, PDFs, and videos) to local storage&lt;/li&gt;
&lt;li&gt;Politely handles unknown commands&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;core-structure&#34;&gt;Core structure&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;start()&lt;/code&gt;: Sends a simple welcome message when the user runs /start&lt;/li&gt;
&lt;li&gt;&lt;code&gt;echo()&lt;/code&gt;: Replies with the exact same text the user sent (only for non-commands)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;caps()&lt;/code&gt;: Turns the arguments after /caps into uppercase and sends them back&lt;/li&gt;
&lt;li&gt;&lt;code&gt;inline_caps()&lt;/code&gt;: Provides an inline result that uppercases whatever users type after @YourBotName in any chat&lt;/li&gt;
&lt;li&gt;&lt;code&gt;document()&lt;/code&gt;: Saves received media to disk:
&lt;ul&gt;
&lt;li&gt;Photos: Downloads the largest size, naming it photo_&amp;lt;unique_id&amp;gt;.jpg&lt;/li&gt;
&lt;li&gt;PDFs: Downloads using the document’s file name. Note: The filter only accepts PDFs as documents&lt;/li&gt;
&lt;li&gt;Videos: Downloads using the video’s file name&lt;/li&gt;
&lt;li&gt;If none of these are present, it prompts the user to send a supported file&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;unknown()&lt;/code&gt;: Catches any unrecognized commands and replies with a friendly error&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;handlers-and-filters&#34;&gt;Handlers and filters&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CommandHandler(&#39;start&#39;, start)&lt;/code&gt; and &lt;code&gt;CommandHandler(&#39;caps&#39;, caps)&lt;/code&gt; handle &lt;a href=&#34;https://core.telegram.org/bots/features#commands&#34;&gt;commands&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MessageHandler(filters.TEXT &amp;amp; (~filters.COMMAND), echo)&lt;/code&gt; ensures normal text (not commands) is echoed&lt;/li&gt;
&lt;li&gt;&lt;code&gt;InlineQueryHandler(inline_caps)&lt;/code&gt; answers inline queries&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MessageHandler(filters.PHOTO | filters.Document.PDF | filters.VIDEO, document)&lt;/code&gt; restricts downloads to photos, PDFs, and videos&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MessageHandler(filters.COMMAND, unknown)&lt;/code&gt; is added last to catch all other commands&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;running-the-bot&#34;&gt;Running the bot&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;main()&lt;/code&gt; wires up the handlers, builds the Application with the token, logs a startup message, and starts long polling via &lt;code&gt;application.run_polling()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This script is a clean, async-first Telegram bot scaffold that demonstrates commands, inline mode, message filtering, and media downloads—ready to extend for more sophisticated behaviors.&lt;/p&gt;
&lt;p&gt;Now that we have our bot ready, we will extend the code to add a RAG pipeline to the bot.&lt;/p&gt;
&lt;h3 id=&#34;setting-up-the-knowledge-base&#34;&gt;Setting up the knowledge base&lt;/h3&gt;
&lt;p&gt;Let’s set up a knowledge base by creating a file named &lt;code&gt;faqs.json&lt;/code&gt; to hold our data. The RAG pipeline will load and search this content. An example structure is shown below.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;category&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;General&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;question&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;What are your operating hours?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;answer&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Monday to Friday, 9:00 AM–5:00 PM (local time).&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;category&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Accounts&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;question&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;How do I reset my password?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;answer&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Go to our website, click Login, then Forgot Password. Check your email for the reset link.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;setting-up-the-rag-pipeline&#34;&gt;Setting up the RAG Pipeline&lt;/h3&gt;
&lt;p&gt;The RAG pipeline is the engine that converts our static JSON file into a searchable brain for our bot. This part initializes once and creates a vector database. In simple steps,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Load faqs.json&lt;/li&gt;
&lt;li&gt;Create embeddings&lt;/li&gt;
&lt;li&gt;Store them in FAISS&lt;/li&gt;
&lt;li&gt;When a user asks a question, find similar answers and ask the LLM to write a reply based only on those&lt;/li&gt;
&lt;/ul&gt;
&lt;h5 id=&#34;data-ingestion-indexing-the-faqs-handled-by-setup_rag_chain-method&#34;&gt;Data Ingestion (Indexing the FAQs) handled by &lt;code&gt;setup_rag_chain()&lt;/code&gt; method&lt;/h5&gt;
&lt;p&gt;This part happens once when the bot starts. We load the &lt;code&gt;faqs.json&lt;/code&gt; file, create vector embeddings, and store them in a searchable database (FAISS).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Load Data: Read the faqs.json file&lt;/li&gt;
&lt;li&gt;Embeddings: Use an embedding model (like &lt;a href=&#34;https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2&#34;&gt;&lt;code&gt;HuggingFace (Sentence-Transformers) all‑MiniLM‑L6‑v2.&lt;/code&gt;&lt;/a&gt;) to convert the text into numerical vectors&lt;/li&gt;
&lt;li&gt;Vector Store: Store these vectors in a FAISS index for fast retrieval&lt;/li&gt;
&lt;/ul&gt;
&lt;h5 id=&#34;the-rag-retrieval-logic-handled-by-handle_message-method&#34;&gt;The RAG Retrieval Logic handled by &lt;code&gt;handle_message()&lt;/code&gt; method&lt;/h5&gt;
&lt;p&gt;When a user asks a question:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Embed Query: The user&amp;rsquo;s question is converted into an embedding vector&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Retrieve Context: The query vector is used to perform a similarity search against the FAISS index. This returns the top K most relevant FAQs (question and answer pairs)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Construct Prompt: A final prompt is built, containing the user&amp;rsquo;s question and the retrieved relevant context&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Example Prompt Template:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;You are an expert FAQ assistant. Use the following context to answer
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;the user&amp;#39;s question. If the context does not contain the answer,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;state that you cannot help with this specific question. Context:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;[Retrieved FAQs] Question: [User&amp;#39;s message]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Generate Response: The complete prompt is sent to our Open WebUI model via the OpenAI-compatible API, e.g. &lt;code&gt;gpt-5&lt;/code&gt; or another model exposed by Open WebUI, which generates a coherent, context-grounded final answer&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Send to Telegram: The bot sends the LLM&amp;rsquo;s final response back to the user&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h5 id=&#34;new-capabilities-added-to-botpy&#34;&gt;New capabilities added to &lt;code&gt;bot.py&lt;/code&gt;&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;RAG pipeline: Loads FAQs from a local JSON file, embeds them with HuggingFace, retrieves the most relevant entries via FAISS, and drafts answers with an LLM served by Open WebUI.&lt;/li&gt;
&lt;li&gt;Inline UX polish: Sends a “typing…” chat action while the model thinks.&lt;/li&gt;
&lt;li&gt;Persisted chain: The RAG chain is built once at startup and stored in bot_data for reuse across messages.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The core logic of our bot will revolve around an update to the standard message handler. When a user sends a question, the bot no longer looks for a simple command; instead, it passes the question to the RAG pipeline.&lt;/p&gt;
&lt;h4 id=&#34;try-it-out-1&#34;&gt;Try it out&lt;/h4&gt;
&lt;p&gt;To run the application, simply run&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;uv run bot.py&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Search for your bot name in telegram and send the bot a message like &lt;code&gt;What are your operating hours?&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Changes to the RAG pipeline setup are available &lt;a href=&#34;https://github.com/bimalghartimagar/telegram-rag-bot/commit/4afac21e085c2782f98fffc66bb2cca27e6c7f50&#34;&gt;here&lt;/a&gt;. The source code is available &lt;a href=&#34;https://github.com/bimalghartimagar/telegram-rag-bot&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;By integrating a RAG pipeline, we&amp;rsquo;ve leveled up our Telegram bot from a simple command processor to a knowledge-aware assistant. This approach ensures our bot&amp;rsquo;s answers are accurate, grounded in our provided faqs.json data, and remain consistent, dramatically reducing the chance of &amp;ldquo;hallucinations&amp;rdquo; from the underlying LLM.&lt;/p&gt;
&lt;p&gt;This architecture is powerful and scalable. To expand its capabilities, we only need to update the faqs.json file and re-run the indexing step—no need to retrain or modify the core LLM!&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Adventures in Vibe-Coding with Replit</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/10/adventures-in-vibe-coding-with-replit/"/>
      <id>https://www.endpointdev.com/blog/2025/10/adventures-in-vibe-coding-with-replit/</id>
      <published>2025-10-29T00:00:00+00:00</published>
      <author>
        <name>Seth Jensen</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/10/adventures-in-vibe-coding-with-replit/nyc-streets-from-above.webp&#34; alt=&#34;A New York City intersection viewed from above, at a 40 degree angle&#34;&gt;&lt;/p&gt;
&lt;!-- Photo by Seth Jensen, 2025, shot on Kodak 200 with a Nikon FE --&gt;
&lt;p&gt;A few weeks back, I tried out Replit Agent to see how viable it is as a development tool. I built two apps from scratch, which took 10–15 hours total, with a decent amount of human input along the way to clarify the agent&amp;rsquo;s questions.&lt;/p&gt;
&lt;h3 id=&#34;project-1-multi-llm-code-review-api&#34;&gt;Project 1: Multi-LLM code review API&lt;/h3&gt;
&lt;p&gt;The idea for the first project was to have a multi-LLM code review app, where you would pass a repository on GitHub or locally to an API, and it would run several full-repo code reviews, ranking and deduplicating them before returning a response. This setup should allow me to have a web frontend or CLI without much added complexity.&lt;/p&gt;
&lt;h4 id=&#34;the-prompt&#34;&gt;The prompt&lt;/h4&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Please build an app based on this app specification: app_spec.yaml (460 lines)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id=&#34;the-results&#34;&gt;The results&lt;/h4&gt;
&lt;p&gt;After much back and forth (and about 40 agent-generated Git commits), I got this message:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;🎉 SUCCESS! The Multi-LLM Code Review Assistant is now fully working!&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Well, I actually got this message several times after sending just about any command to the agent. It&amp;rsquo;s very excited about its own work, even when it doesn&amp;rsquo;t work! In this case, the API connections were working, and testing with curl gave output, but I couldn&amp;rsquo;t get it to give more than one code review tip for a large codebase — not a very effective code-review app.&lt;/p&gt;
&lt;p&gt;I tried asking the agent to diagnose and fix this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;So, this seems to not be returning very many code review suggestions. If I go
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;look at the repos I&amp;#39;m putting in, there are plenty of code issues that could be
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;returned, especially with documentation. How can we make this system find more
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;issues?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In response, Replit&amp;rsquo;s agent created a full React frontend for the API! Neat, but far from what I asked — the app still doesn&amp;rsquo;t return substantial code reviews.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/10/adventures-in-vibe-coding-with-replit/code-review-unwanted-frontend.webp&#34; alt=&#34;Code Review AI Multi-LLM Assistant interface showing dark-themed navigation sidebar with options for Submit Code, Dashboard, Analytics, and Settings. Main content area displays &amp;lsquo;Multi-LLM Code Review Assistant&amp;rsquo; title with description and two buttons: GitHub Repository and Upload ZIP File.&#34;&gt;&lt;/p&gt;
&lt;p&gt;So after ~$100 of agent credits, I found that Replit was not the tool for this app — I&amp;rsquo;m now trying to build a &lt;a href=&#34;/blog/2025/10/creating-agentic-ai-apps/&#34;&gt;CrewAI&lt;/a&gt; app to accomplish the same thing (stay tuned for a post about that).&lt;/p&gt;
&lt;h3 id=&#34;project-2-end-point-ecommerce--hugo-site&#34;&gt;Project 2: End Point Ecommerce + Hugo site&lt;/h3&gt;
&lt;p&gt;A common theme in Replit apps we&amp;rsquo;ve experimented with is that it prefers to make everything a web app with a React frontend. I was curious how it&amp;rsquo;d do with a different web stack: The &lt;a href=&#34;https://gohugo.io/&#34;&gt;Hugo&lt;/a&gt; static site generator for the frontend and our recently launched &lt;a href=&#34;/expertise/end-point-ecommerce/&#34;&gt;End Point Ecommerce&lt;/a&gt; for the backend.&lt;/p&gt;
&lt;p&gt;This is an interesting problem for an LLM, since Hugo is widely used (including on &lt;a href=&#34;/blog/2021/08/converting-to-hugo/&#34;&gt;this site&lt;/a&gt;), but still small compared to React, Vue, and similar frameworks, and since End Point Ecommerce is well-documented but too new for any online discussion. However, it&amp;rsquo;s a modern .NET ecommerce framework and the code is readable and straightforward.&lt;/p&gt;
&lt;h4 id=&#34;the-prompt-1&#34;&gt;The prompt&lt;/h4&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Build me a hugo website frontend to interact with end point ecommerce:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;https://ecommerce.endpointdev.com/
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;For the UI, make a halloween-themed storefront with orange text, black and grey
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;backgrounds, and halloween-themed test products. Use the Ballast font for
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;headers, and charter for body text for now.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After providing this initial prompt, I followed up with these:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Here is the github link for end point ecommerce, which this site will connect to:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;https://github.com/EndPointCorp/end-point-ecommerce
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Please make the app as simple as possible, while still working
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Oh, and please use this demo API to connect to for now, so you don&amp;#39;t have to
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;build any backend at all, just interface with this:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;https://demo.ecommerce.endpointdev.com/swagger/index.html&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id=&#34;the-results-version-1&#34;&gt;The results (version 1)&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/10/adventures-in-vibe-coding-with-replit/site-v1.png&#34; alt=&#34;Spooky Store Halloween ecommerce website showing three product cards featuring jack-o&amp;rsquo;-lantern pumpkin carving designs. Products listed are: Pumpkin Spice Treats with pumpkin emoji ($3.99), Spider Web Candy with spider emoji ($2.49), and Ghost Marshmallows with skull emoji ($4.99). Dark background with orange Halloween-themed branding.&#34;&gt;&lt;/p&gt;
&lt;p&gt;It generated a nice, simple ecommerce storefront, using Hugo and End Point Ecommerce. Impressive! However, there&amp;rsquo;s one red flag: if you look at the products &lt;a href=&#34;https://demo.ecommerce.endpointdev.com/&#34;&gt;in the demo API&lt;/a&gt; we&amp;rsquo;re using, they are simple groceries — Apple, Banana, etc. So where did &amp;ldquo;Pumpkin spice treats,&amp;rdquo; &amp;ldquo;Spider Web Candy,&amp;rdquo; and &amp;ldquo;Ghost Marshmallows&amp;rdquo; come from?&lt;/p&gt;
&lt;p&gt;It turns out, these are &amp;ldquo;fallback products&amp;rdquo; Replit had made when it failed to connect to the End Point Ecommerce demo API. Not great for an ecommerce site! Your users would see bogus but real-looking products when they&amp;rsquo;re unable to see or buy real products (I didn&amp;rsquo;t test this far, but the site may have even accepted payment info for fake products).&lt;/p&gt;
&lt;p&gt;So I followed up:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Please remove the fallback products&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id=&#34;the-results-version-2&#34;&gt;The results (version 2)&lt;/h4&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/10/adventures-in-vibe-coding-with-replit/site-v2.png&#34; alt=&#34;Spooky Store Halloween ecommerce website with jack-o&amp;rsquo;-lantern decorations. Features welcome banner with spider icons and orange border. Error message displayed stating &amp;lsquo;Something spooky happened! Could not load products&amp;rsquo; with skull icons, indicating products failed to load in the Haunting Collection section.&#34;&gt;&lt;/p&gt;
&lt;p&gt;This looks more promising! We get an error message written by the frontend. In production we might want to display more specific messages based on the response, but for this demo, I&amp;rsquo;m happy with this way of displaying errors.&lt;/p&gt;
&lt;p&gt;In true &amp;ldquo;vibe-coding&amp;rdquo; fashion, I asked the Replit agent to diagnose the network error:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;There is a network error while loading products, what is the issue?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It responded with a couple options, the main one being assume it&amp;rsquo;s a CORS issue.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Possible Solutions
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Option 1: Server-Side Proxy (Recommended)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Create a simple server endpoint that fetches the products
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- Your frontend calls your server, your server calls the external API
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;- No CORS issues since server-to-server requests aren&amp;#39;t restricted&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I was suspicious of it being a CORS issue since everything was hosted in the Replit container, so after poking around myself, I wrote this prompt:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Please use the web api to sign up for an account by POSTing to the /api/User
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;end point and getting an API key&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;However, after reviewing the project for this post, I found that I misunderstood the issue in this prompt — no authentication is needed to GET &lt;code&gt;/api/products&lt;/code&gt; on the demo API of End Point Ecommerce, the site was just disallowing this cross-origin request. Replit&amp;rsquo;s agent went ahead and implemented the CORS change anyway, though it didn&amp;rsquo;t correct my prompt.&lt;/p&gt;
&lt;p&gt;After a few more followup prompts&amp;hellip;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Please use the &amp;#34;basePrice&amp;#34;, &amp;#34;discountAmount&amp;#34;, and &amp;#34;discountedPrice&amp;#34; keys that
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;are returned from the API to show the real prices of the items
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Now, please fetch the images from the API as well&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&amp;hellip;we have a decent working Hugo site connected with the End Point Ecommerce demo API!&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/10/adventures-in-vibe-coding-with-replit/site-v3.png&#34; alt=&#34;Spooky Store Halloween ecommerce website displaying three product cards in a row: red apple for $1.99, banana bunch on yellow background for $0.25, and raw beef for $7.99. Shopping cart button visible in top right corner. Dark background with orange Halloween-themed header.&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;feature-addition-cart&#34;&gt;Feature addition: cart&lt;/h4&gt;
&lt;p&gt;I wrote this prompt next, without having done my research on the backend (which made for an interesting unintentional experiment):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Please add a lightweight cart management system which uses localStorage, with
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;options to add to, remove from, or update quantity of items in the cart. Add a
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;simple checkout form with contact information, shipping address, and billing
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;address. For now, instead of payment information, just have a button which adds
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;a commonly used dummy credit card number.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;There&amp;rsquo;s an issue here: I specified using &lt;code&gt;localStorage&lt;/code&gt;, which is a fine way to set up a frontend cart system, but it ignores the existing system End Point Ecommerce already supplies. I hadn&amp;rsquo;t checked the documentation before making this prompt, and neither did Replit; it happily obliged and created a cart management system on the frontend. It&amp;rsquo;s nice that the agent did what I asked, but if I were trying to work with the agent, without cross-referencing documentation myself, I would be disappointed.&lt;/p&gt;
&lt;p&gt;That&amp;rsquo;s the interesting part about using a non-deterministic system like AI agents: responding to one prompt, it might ignore my specific (and misguided) requests, but in another it might blindly execute the tasks it&amp;rsquo;s given. By the nature of LLMs, you can&amp;rsquo;t predict what you&amp;rsquo;ll get.&lt;/p&gt;
&lt;h4 id=&#34;bonus-a-few-extra-notes&#34;&gt;Bonus: a few extra notes&lt;/h4&gt;
&lt;p&gt;A couple final notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No .gitignore was generated, so the Hugo-generated &lt;code&gt;public/&lt;/code&gt; folder is tracked in Git&lt;/li&gt;
&lt;li&gt;The agent used plain CSS, without SCSS or a framework — I&amp;rsquo;d recommend specifying the stack further in the prompt if you want an app that&amp;rsquo;ll be nice to maintain. You don&amp;rsquo;t want to assume the agent will make smart coding decisions on its own.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;when-id-use-replit-and-similar-do-it-all-ai-agents&#34;&gt;When I&amp;rsquo;d use Replit (and similar do-it-all AI agents)&lt;/h4&gt;
&lt;p&gt;I would use Replit for very quick-and-dirty web development, when sites which shouldn&amp;rsquo;t last a long time — although I&amp;rsquo;ve seen too many projects originally intended to be short-lived become important for many years&amp;hellip;&lt;/p&gt;
&lt;p&gt;Replit handles its containerized deployments well; if I needed to share a working demo site within a few hours, Replit would be a good pick. Especially if you like React frontends, even when you might not have asked for one! But if I was dealing with customer data, orders, or my own data I cared about, I would want to be heavily involved throughout the development process.&lt;/p&gt;
&lt;p&gt;For projects where performance and accuracy matter (read: most projects), I will be taking a more directed approach. I&amp;rsquo;ve had some good results using Continue or GitHub CoPilot as a coding assistant for small tasks in existing repos, rather than letting AI taking the wheel completely.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Creating Agentic AI Applications</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/10/creating-agentic-ai-apps/"/>
      <id>https://www.endpointdev.com/blog/2025/10/creating-agentic-ai-apps/</id>
      <published>2025-10-14T00:00:00+00:00</published>
      <author>
        <name>Kürşat Kutlu Aydemir</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/10/creating-agentic-ai-apps/futuristic-maze.webp&#34; alt=&#34;Futuristic Abstract Maze with Colorful Lights&#34;&gt;&lt;br&gt;
Photo by &lt;a href=&#34;https://www.pexels.com/photo/futuristic-abstract-maze-with-colorful-lights-28494630/&#34;&gt;Steve Johnson&lt;/a&gt; on Pexels&lt;/p&gt;
&lt;p&gt;In the rapidly evolving world of AI, agentic AI is emerging as a game-changer. These systems go beyond simple chatbots or predictive models: they&amp;rsquo;re designed to act autonomously, make decisions, and interact with the real world to accomplish goals. In this blog post, we&amp;rsquo;ll dive into what agents are, explore agentic applications and orchestration, and walk through how to build your own agentic applications with open-source examples.&lt;/p&gt;
&lt;h3 id=&#34;what-are-agents&#34;&gt;What Are Agents?&lt;/h3&gt;
&lt;p&gt;AI agents are autonomous entities that can perceive their environment, reason about it, and take actions to achieve specific objectives. Think of an AI agent as more than just a chatbot, it’s like a digital teammate that can look around, make decisions, and take action on its own. Unlike traditional AI models that respond passively to inputs, agents are proactive: they can break down goals into sub-tasks, use tools (like APIs or databases), maintain memory across interactions, and adapt based on feedback. This makes AI agents ideal for applications requiring independence, such as automation, research, or problem-solving.&lt;/p&gt;
&lt;p&gt;At their core, agents typically integrate large language models (LLMs) like GPT-4 or Llama for reasoning, combined with mechanisms for tool invocation and state management. They can operate in loops, iteratively refining their approach until the goal is met.&lt;/p&gt;
&lt;h4 id=&#34;open-source-examples-of-ai-agents&#34;&gt;Open-Source Examples of AI Agents&lt;/h4&gt;
&lt;p&gt;If you want to get hands-on with agents, there are already plenty of open-source projects you can experiment with and build on. Here are a couple of the most notable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/Significant-Gravitas/AutoGPT&#34;&gt;AutoGPT&lt;/a&gt;&lt;/strong&gt;: One of the earliest and best-known autonomous agents. AutoGPT takes a high-level goal, breaks it down into steps, and tries to complete them—whether that means researching a topic, generating code, or managing a workflow. It comes with a simple frontend for building agents, plus a library of ready-made ones (including some quirky ones, like creating viral videos from Reddit trends).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/reworkd/AgentGPT&#34;&gt;AgentGPT&lt;/a&gt;&lt;/strong&gt;: A browser-based platform that lets you spin up and deploy agents without extra setup. Just type in a goal and watch the agent go—browsing the web, running code, or connecting with external services to move toward the objective.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a longer list of projects, check out this &lt;a href=&#34;https://huggingface.co/blog/tegridydev/open-source-ai-agents-directory&#34;&gt;open-source AI agents directory on Hugging Face&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;what-are-agentic-applications-and-agent-orchestration&#34;&gt;What Are Agentic Applications and Agent Orchestration?&lt;/h3&gt;
&lt;p&gt;An agentic application is simply an app that uses one or more AI agents to take on complex tasks by itself. Instead of coding every step by hand, you can plug agents together like modules to tackle real-world problems, whether that’s analyzing data, generating content, or building software. The word &amp;ldquo;agentic&amp;rdquo; really just highlights one thing: these systems can act on their own with little human input, following the prompts they are given.&lt;/p&gt;
&lt;p&gt;When you bring more than one agent into the mix, you need a way to coordinate them. This is where orchestration comes in—getting multiple agents to work together smoothly. That might mean giving them specific roles (say, a researcher and a writer), setting up how they communicate, keeping track of what they’ve learned, and making sure they stay on task.&lt;/p&gt;
&lt;p&gt;To make this manageable, orchestration frameworks provide ready-made structures, like graphs, crews, or conversation flows, that organize the collaboration. With these in place, it becomes much easier to grow from a single helpful agent to an entire team working together.&lt;/p&gt;
&lt;h4 id=&#34;open-source-examples-of-agentic-applications-and-orchestration&#34;&gt;Open-Source Examples of Agentic Applications and Orchestration&lt;/h4&gt;
&lt;p&gt;If you want to start building agentic applications yourself, there are already a number of open-source frameworks that can help. Here are some of the most popular:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/crewaiinc/crewai&#34;&gt;CrewAI&lt;/a&gt;&lt;/strong&gt;: A lightweight Python framework for orchestrating agents that work together in teams. You can assign roles, define tasks, and let the agents collaborate—without the bulk of larger frameworks like LangChain. It’s simple, fast, and flexible enough for everything from small experiments to enterprise automations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/microsoft/autogen&#34;&gt;AutoGen&lt;/a&gt;&lt;/strong&gt;: Created by Microsoft, AutoGen is all about multi-agent conversations. Agents can chat with each other, work independently, or take direction from a human. It also supports tool use like code execution and web browsing, making it handy for more specialized domains like math or chemistry.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&#34;https://github.com/langchain-ai/langgraph&#34;&gt;LangGraph&lt;/a&gt;&lt;/strong&gt;: Part of the LangChain ecosystem, LangGraph lets you design agents as graphs, which makes them resilient and stateful. It’s well-suited for long-running workflows, offering built-in memory, human-in-the-loop controls, and debugging support through LangSmith—perfect for production-grade projects.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Other frameworks worth exploring include OpenAI’s Agents SDK (built and evolved upon Swarm) for multi-agent workflows, and MetaGPT, which simulates role-based teams of agents.&lt;/p&gt;
&lt;h3 id=&#34;writing-agentic-applications&#34;&gt;Writing Agentic Applications&lt;/h3&gt;
&lt;p&gt;To build an agentic application, the first step is choosing the right framework for your needs, for instance, CrewAI if you want simplicity and lightweight orchestration, or AutoGen if you need extensibility and layered APIs.&lt;/p&gt;
&lt;p&gt;Once you’ve selected a framework, you’ll need to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Define agent roles, e.g., researcher, writer, reviewer. Each agent is an actor to complete one or more tasks.&lt;/li&gt;
&lt;li&gt;Equip agents with tools, like search APIs, code interpreters, or custom functions. As you can see, theoretically there is no limit on what you can do with agents.&lt;/li&gt;
&lt;li&gt;Assign tasks and orchestration rules, deciding how agents will interact, share state, and collaborate.&lt;/li&gt;
&lt;li&gt;Add safeguards, error handling, memory management, and human oversight to ensure reliability and trustworthiness.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With those elements in place, you can start assembling workflows. Below, we’ll look at how to get started with one of the most widely used frameworks (CrewAI).&lt;/p&gt;
&lt;h5 id=&#34;building-with-crewai&#34;&gt;Building with CrewAI&lt;/h5&gt;
&lt;p&gt;First, install CrewAI and ddgs (duckduckgo-search) which is a free search tool that I used for this example.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;pip install crewai &amp;#39;crewai[tools]&amp;#39; ddgs&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;CrewAI uses decorators for agents and tasks. Below is a simple example of a crew for researching and reporting on a given topic. It defines two agents (a &lt;code&gt;researcher&lt;/code&gt; and a &lt;code&gt;reporting analyst&lt;/code&gt;) and one task for each agent (&lt;code&gt;research task&lt;/code&gt; and &lt;code&gt;reporting task&lt;/code&gt;). This code creates a sequential workflow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The researcher gathers data&lt;/li&gt;
&lt;li&gt;Then, the analyst compiles a report&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can extend it with more agents with more tasks or parallel processes.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;To run this script, you need to set your LLM API key. See how to do that &lt;a href=&#34;https://docs.crewai.com/en/concepts/llms#setting-up-your-llm&#34;&gt;in the crewAI docs&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; Type, Optional
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;json&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;crewai&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; Agent, Crew, Process, Task
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;crewai.project&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; CrewBase, agent, crew, task
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;crewai.tools&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; BaseTool
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;pydantic&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; BaseModel, Field
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ddgs&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; DDGS
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;DDGSearchInput&lt;/span&gt;(BaseModel):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    query: &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt; = Field(..., description=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Search query&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;DDGSearchTool&lt;/span&gt;(BaseTool):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    name: &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;DuckDuckGo Search&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    description: &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Web search via DuckDuckGo. Returns a JSON list of results.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    args_schema: Type[BaseModel] = DDGSearchInput
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# declare this as pydantic field&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    max_results: &lt;span style=&#34;color:#038&#34;&gt;int&lt;/span&gt; = Field(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;8&lt;/span&gt;, description=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Maximum number of results to return&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;_run&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, query: Optional[&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;] = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;None&lt;/span&gt;, **kwargs) -&amp;gt; &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; query &lt;span style=&#34;color:#080&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;None&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            query = kwargs.get(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;query&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;not&lt;/span&gt; query:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; json.dumps([], ensure_ascii=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        results = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;with&lt;/span&gt; DDGS() &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;as&lt;/span&gt; ddgs:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; r &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; ddgs.text(query, max_results=&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.max_results):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                results.append({
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;: r.get(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;url&amp;#34;&lt;/span&gt;: r.get(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;href&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;snippet&amp;#34;&lt;/span&gt;: r.get(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;body&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                })
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; json.dumps(results, ensure_ascii=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#555&#34;&gt;@CrewBase&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ResearchCrew&lt;/span&gt;():
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    Crew of agents for researching and reporting on given topic.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, topic):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.topic = topic
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#555&#34;&gt;@agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;researcher&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;) -&amp;gt; Agent:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; Agent(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            role=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Senior Data Researcher&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            goal=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Uncover cutting-edge developments in &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.topic&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            backstory=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Seasoned researcher skilled in finding relevant information about &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.topic&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            verbose=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            tools=[DDGSearchTool(max_results=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;10&lt;/span&gt;)]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#555&#34;&gt;@agent&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;reporting_analyst&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;) -&amp;gt; Agent:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; Agent(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            role=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Reporting Analyst&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            goal=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Create detailed reports from &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.topic&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt; research findings&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            backstory=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Meticulous analyst who turns &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.topic&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt; data into clear reports.&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            verbose=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#555&#34;&gt;@task&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;research_task&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;) -&amp;gt; Task:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; Task(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            description=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Conduct thorough research on &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.topic&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            expected_output=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;A list of 10 bullet points with key findings.&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            agent=&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.researcher()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#555&#34;&gt;@task&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;reporting_task&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;) -&amp;gt; Task:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; Task(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            description=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Expand the research on &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.topic&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt; into a full report.&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            expected_output=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;A markdown-formatted report with detailed sections.&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            agent=&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.reporting_analyst(),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            output_file=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;ai_report-&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.topic&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.md&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#555&#34;&gt;@crew&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;crew&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;) -&amp;gt; Crew:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Assembles the crew.&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; Crew(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            agents=[&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.researcher(), &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.reporting_analyst()],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            tasks=[&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.research_task(), &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.reporting_task()],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            process=Process.sequential,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            verbose=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;## run the crew of agents&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;__name__&lt;/span&gt; == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;__main__&amp;#39;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    topic = &lt;span style=&#34;color:#038&#34;&gt;input&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Topic: &amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    crew = ResearchCrew(topic).crew()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    result = crew.kickoff(inputs={&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;topic&amp;#39;&lt;/span&gt;: topic})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(result)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id=&#34;building-a-dynamic-agent&#34;&gt;Building A Dynamic Agent&lt;/h4&gt;
&lt;p&gt;We can follow a dynamic approach to create an agentic AI application as well. In this approach, without other 3rd party agent orchestration frameworks we can dynamically create task and agentic processes. Here I present a simple dynamic agent which can work on any given goal until the goal is met. I used OpenAI as the LLM in this example. Using the dynamic approach, the agent crew extracts sub-goals from the given main goal. Then it works on tasks and decides if the goals are met.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;json&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;openai&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; OpenAI
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;DynamicAgent&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, goal: &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;, model: &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt; = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;gpt-4o&amp;#34;&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.goal = goal
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.model = model
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.llm = OpenAI(api_key=os.getenv(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;OPENAI_API_KEY&amp;#34;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.memory = []  &lt;span style=&#34;color:#888&#34;&gt;# store context and past decisions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.max_iterations = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;5&lt;/span&gt;  &lt;span style=&#34;color:#888&#34;&gt;# prevent infinite loops&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;get_llm_response&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, prompt: &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;) -&amp;gt; &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        resp = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.llm.chat.completions.create(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            model=&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.model,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            messages=[
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                {&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;system&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;You are an autonomous agent that creates plans and conditions to achieve goals.&amp;#34;&lt;/span&gt;},
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                {&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;: prompt}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            temperature=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.2&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        result = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;try&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            result = resp.choices[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;].message.content
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Exception&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# fallback&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            result = &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(resp)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;llm result: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;result&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; result
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;generate_plan&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;) -&amp;gt; &lt;span style=&#34;color:#038&#34;&gt;list&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        prompt = (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Given the goal &amp;#39;&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.goal&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;, generate a pure JSON list of actionable steps with conditions for success.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Don&amp;#39;t use any formatting like markdown outside of the JSON content.&lt;/span&gt;&lt;span style=&#34;color:#04d;background-color:#fff0f0&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Each step should include: {&amp;#39;step&amp;#39;: &amp;#39;description&amp;#39;, &amp;#39;condition&amp;#39;: &amp;#39;success criteria&amp;#39;}.&lt;/span&gt;&lt;span style=&#34;color:#04d;background-color:#fff0f0&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Do not rely on predefined rules; infer the steps and conditions from the goal.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        plan = json.loads(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.get_llm_response(prompt))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.memory.append({&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;action&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;planning&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;output&amp;#34;&lt;/span&gt;: plan})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; plan
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;execute_step&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, step: &lt;span style=&#34;color:#038&#34;&gt;dict&lt;/span&gt;) -&amp;gt; &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        prompt = (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Execute this step: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;step[&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;step&amp;#39;&lt;/span&gt;]&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.&lt;/span&gt;&lt;span style=&#34;color:#04d;background-color:#fff0f0&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Success condition: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;step[&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;condition&amp;#39;&lt;/span&gt;]&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.&lt;/span&gt;&lt;span style=&#34;color:#04d;background-color:#fff0f0&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Provide the result and whether the condition was met.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        result = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.get_llm_response(prompt)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.memory.append({&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;step&amp;#34;&lt;/span&gt;: step[&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;step&amp;#39;&lt;/span&gt;], &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;result&amp;#34;&lt;/span&gt;: result})
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; result
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;run&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Starting agent with goal: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.goal&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        plan = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.generate_plan()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        results = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; i, step &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;enumerate&lt;/span&gt;(plan[:&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.max_iterations]):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Executing step &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;i+&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;step[&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;step&amp;#39;&lt;/span&gt;]&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            result = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.execute_step(step)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            results.append(result)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Result: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;result&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# check if goal met&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Goal achieved&amp;#34;&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; result:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;break&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.save_results_to_markdown(results)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;save_results_to_markdown&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, results: &lt;span style=&#34;color:#038&#34;&gt;list&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        filename = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;dynamic_agent_report-&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.goal&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.md&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;open&lt;/span&gt;(filename, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;w&amp;#34;&lt;/span&gt;, encoding=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;utf-8&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;as&lt;/span&gt; f:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            f.write(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span style=&#34;color:#04d;background-color:#fff0f0&#34;&gt;\n&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;.join(results))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Results saved to &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;filename&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;__name__&lt;/span&gt; == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;__main__&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    goal = &lt;span style=&#34;color:#038&#34;&gt;input&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Goal: &amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    agent = DynamicAgent(goal=goal)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    agent.run()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;use-cases-of-agentic-ai-applications&#34;&gt;Use Cases of Agentic AI Applications&lt;/h3&gt;
&lt;p&gt;Agentic AI applications are already finding their way into practical scenarios. By combining autonomy with orchestration, these systems go beyond demos and research projects to deliver measurable impact.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Customer Support Automation&lt;/strong&gt;: Instead of static chatbots, agentic systems can act as full-service assistants—resolving customer queries, escalating complex cases, and even initiating refunds or ticket creation. They maintain context across long interactions, providing a more human-like experience.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Research and Analysis&lt;/strong&gt;: Enterprises and individuals use research agents to autonomously scan news sources, academic databases, or financial reports. For example, a financial analyst could deploy agents to track company earnings, summarize investor calls, and produce actionable insights daily.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Content Creation Pipelines&lt;/strong&gt;: Marketing teams benefit from multi-agent setups where one agent researches a topic, another drafts a blog post, and another optimizes it for SEO. The result is faster, higher-quality content production with less manual effort.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Software Development Assistants&lt;/strong&gt;: Multi-agent workflows can handle bug triaging, code generation, testing, and documentation. For instance, one agent detects issues, another suggests fixes, while another runs automated tests and updates docs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Healthcare and Biomedicine&lt;/strong&gt;: Specialized agents can search the latest medical literature, cross-reference patient data, and generate preliminary diagnostic reports, supporting clinicians in decision-making while reducing information overload.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At End Point, we use agents to speed up development, using human review to keep our company&amp;rsquo;s long-standing commitment to high-quality products. We also build agentic AI solutions for clients, speeding up and modernizing legacy workflows.&lt;/p&gt;
&lt;h3 id=&#34;challenges-ahead&#34;&gt;Challenges Ahead&lt;/h3&gt;
&lt;p&gt;While agentic AI holds promise, several challenges remain before widespread adoption becomes seamless:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reliability and Hallucination&lt;/strong&gt;: LLM-based agents can still generate incorrect or fabricated outputs. Ensuring trustworthiness through validation, feedback loops, and human-in-the-loop oversight is crucial. But, personally I am not against hallucination — We humans hallucinate too.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security and Safety&lt;/strong&gt;: Autonomous agents that can browse the web, execute code, or control external systems pose risks if misconfigured or exploited. Sandboxing, permissions, and guardrails are essential to prevent harmful actions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Evaluation and Benchmarking&lt;/strong&gt;: Unlike traditional ML models, measuring the success of agentic systems is difficult. Metrics must capture not only accuracy but also task completion, collaboration quality, and user satisfaction.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ethics and Alignment&lt;/strong&gt;: As agents gain autonomy, aligning them with human values, legal frameworks, and ethical principles becomes critical. Misaligned objectives could cause unintended consequences.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integration with Legacy Systems&lt;/strong&gt;: Many organizations still rely on older infrastructure. Seamlessly embedding agentic AI into these environments can be a non-trivial engineering challenge.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;wrapping-up&#34;&gt;Wrapping up&lt;/h3&gt;
&lt;p&gt;Agentic AI applications represent the future of intelligent software, blending autonomy with collaboration. By leveraging open-source tools like CrewAI and LangGraph or specialized custom agents, you can create powerful solutions tailored to your needs. As AI evolves, we might expect even more sophisticated orchestration capabilities.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Making Blog Search Smarter with LLMs and Open WebUI</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/09/llm-expanded-vector-search/"/>
      <id>https://www.endpointdev.com/blog/2025/09/llm-expanded-vector-search/</id>
      <published>2025-09-29T00:00:00+00:00</published>
      <author>
        <name>Edgar Mlowe</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/09/llm-expanded-vector-search/stained-glass-flowering.webp&#34; alt=&#34;An ornate pattern flowers out from a circular window in the center of the image, framing plant-shaped stained glass depicting European church images&#34;&gt;&lt;/p&gt;
&lt;!-- Photo by Seth Jensen, 2024. --&gt;
&lt;p&gt;We recently released LLM Expanded Search for our blog&amp;rsquo;s vector search. It builds on what we covered in our earlier posts about &lt;a href=&#34;/blog/2025/08/vector-search-for-the-end-point-blog/&#34;&gt;AI-powered search&lt;/a&gt; and &lt;a href=&#34;/blog/2025/07/vector-search/&#34;&gt;vector search basics&lt;/a&gt;. Here&amp;rsquo;s how we built it with our internal AI setup (Open WebUI running an OpenAI-compatible API), why it makes search better, and what&amp;rsquo;s coming next.&lt;/p&gt;
&lt;h3 id=&#34;what-llm-expanded-search-actually-does&#34;&gt;What &amp;ldquo;LLM Expanded Search&amp;rdquo; actually does&lt;/h3&gt;
&lt;p&gt;Here&amp;rsquo;s the basic idea: when you search for something, we first ask an LLM to come up with related terms and phrases. Then we search for all of those terms, not just your original query.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Your search gets expanded by an open-source LLM through our AI portal (Open WebUI with an OpenAI-compatible API)&lt;/li&gt;
&lt;li&gt;Those extra terms give our vector index more ways to find posts that match what you&amp;rsquo;re looking for&lt;/li&gt;
&lt;li&gt;We combine the results, remove duplicates, and sort by relevance before showing the best matches with snippets and links&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This really helps with short or vague searches where regular vector search might miss the relevant context — for example, &amp;ldquo;S3&amp;rdquo; refers to Amazon S3, which is a cloud object storage system, so whereas &amp;ldquo;S3&amp;rdquo; doesn&amp;rsquo;t provide enough context for a useful vector search. An LLM can expand this short search and include context about cloud object storage in general, as well as give enough context to return results about S3.&lt;/p&gt;
&lt;h3 id=&#34;how-it-works&#34;&gt;How it works&lt;/h3&gt;
&lt;p&gt;The frontend is pretty straightforward: our search bar has two options, &amp;ldquo;Search&amp;rdquo; (just hit Enter) and &amp;ldquo;LLM Expanded Search&amp;rdquo; (Shift/​Ctrl/​Command+Enter).&lt;/p&gt;
&lt;p&gt;When you use expanded search, here&amp;rsquo;s what happens:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;We call our Open WebUI endpoint with a prompt that asks for 8–15 related terms&lt;/li&gt;
&lt;li&gt;We turn both your original query and the expanded terms into embeddings&lt;/li&gt;
&lt;li&gt;We search our vector store with all these terms and combine the results&lt;/li&gt;
&lt;li&gt;Caching and rate limiting keep things fast and cheap&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here&amp;rsquo;s a simple example of how we expand queries:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;openai&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; OpenAI
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;client = OpenAI(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    base_url=os.getenv(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;OPENAI_BASE_URL&amp;#34;&lt;/span&gt;),   &lt;span style=&#34;color:#888&#34;&gt;# e.g., http://openwebui.local/api/v1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    api_key=os.getenv(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;OPENAI_API_KEY&amp;#34;&lt;/span&gt;)      &lt;span style=&#34;color:#888&#34;&gt;# token managed in your environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;expand_query&lt;/span&gt;(raw_query: &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;) -&amp;gt; &lt;span style=&#34;color:#038&#34;&gt;list&lt;/span&gt;[&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    messages = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;system&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;: (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;You expand a short search query into a concise, comma-separated list of &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;synonyms and closely related phrases (8–15 items). No explanations.&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        {&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;: raw_query}
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    res = client.chat.completions.create(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        model=os.getenv(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;OPENAI_MODEL&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;local-llm&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        messages=messages,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        temperature=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.2&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        max_tokens=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;200&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    text = res.choices[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;].message.content
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; [t.strip() &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; t &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; text.split(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;,&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; t.strip()]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After that, we embed the original query and the expanded terms, search the vector index, then sort by score and drop duplicates so each post appears once. Finally, we render concise snippets.&lt;/p&gt;
&lt;p&gt;For example, after a similarity search you can rank and de-duplicate like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# given: results = [(doc, score), ...]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;valid = [(d, &lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;(s)) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; d, s &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; results &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;(s) &amp;gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.05&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;valid.sort(key=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;lambda&lt;/span&gt; x: x[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;], reverse=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;)  &lt;span style=&#34;color:#888&#34;&gt;# highest score first&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;seen = &lt;span style=&#34;color:#038&#34;&gt;set&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;unique = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; doc, score &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; valid:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    src = doc.metadata.get(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;source&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; src &lt;span style=&#34;color:#080&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; seen:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        unique.append((doc, score))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        seen.add(src)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# unique now holds top ranked, de‑duplicated posts&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&#34;why-we-chose-open-webui&#34;&gt;Why we chose Open WebUI&lt;/h3&gt;
&lt;p&gt;A few reasons made Open WebUI the right choice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&amp;rsquo;s open source and works great self-hosted&lt;/li&gt;
&lt;li&gt;The OpenAI-compatible API means we can drop it into existing code&lt;/li&gt;
&lt;li&gt;We can use whatever models and inference backends we want&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s easy to experiment with different prompts and workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;whats-next-moving-more-into-open-webui&#34;&gt;What&amp;rsquo;s next: Moving more into Open WebUI&lt;/h3&gt;
&lt;p&gt;We&amp;rsquo;re looking into moving more of the search pipeline directly into Open WebUI workflows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Query expansion (LLM)&lt;/li&gt;
&lt;li&gt;Vector retrieval (custom tool that hits our index)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This would give us tighter integration, fewer network calls, and simpler deployment, and make it easier to try new approaches.&lt;/p&gt;
&lt;h3 id=&#34;what-youll-notice-when-using-it&#34;&gt;What you&amp;rsquo;ll notice when using it&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Short searches work way better, you get more relevant results and fewer dead ends&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s still experimental, so sometimes results might drift into related topics. Stick with regular &amp;ldquo;Search&amp;rdquo; if you want more exact matches&lt;/li&gt;
&lt;li&gt;We cache common terms to keep things smooth&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Give it a try at &lt;a href=&#34;/blog/&#34;&gt;our blog&lt;/a&gt;. Just use the search bar in our header: press Enter for regular search, or Shift/​Ctrl/​Command+Enter for LLM Expanded Search.&lt;/p&gt;
&lt;p&gt;Want to know more about why we built this? Check out the announcement and vector search posts linked above.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re interested in setting up LLM-expanded vector search or running something similar self-hosted with Open WebUI, we&amp;rsquo;d love to &lt;a href=&#34;/contact/&#34;&gt;help out&lt;/a&gt;.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Vector Search for the End Point Blog</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/08/vector-search-for-the-end-point-blog/"/>
      <id>https://www.endpointdev.com/blog/2025/08/vector-search-for-the-end-point-blog/</id>
      <published>2025-08-13T00:00:00+00:00</published>
      <author>
        <name>Seth Jensen</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/08/vector-search-for-the-end-point-blog/trees-by-sidewalk.webp&#34; alt=&#34;A sidewalk and road lead to the left of the image, across from an industrial area with train tracks. Along the sidewalk are trees with white petals, many of which have fallen to the sidewalk.&#34;&gt;&lt;/p&gt;
&lt;!-- Photo by Seth Jensen, 2025 --&gt;
&lt;p&gt;We&amp;rsquo;re excited to announce a new feature on the End Point Blog: AI-powered vector search.&lt;/p&gt;
&lt;p&gt;Below the &amp;ldquo;Our Blog&amp;rdquo; header at the top of this page, there is a new search bar with two adjacent buttons: &amp;ldquo;Search&amp;rdquo; and &amp;ldquo;LLM Expanded Search.&amp;rdquo; If you click &amp;ldquo;Search&amp;rdquo; (or press Enter), your search will be fed directly to our vector search/​similarity search engine. If you click &amp;ldquo;LLM Expanded Search&amp;rdquo; (or press Shift+Enter, Control+Enter, or Command+Enter on macOS) your query will first be expanded by an open-source LLM, then sent to the similarity search engine.&lt;/p&gt;
&lt;p&gt;The LLM is trained to expand the query to include similar terms, keywords, etc., before sending it to the similarity search engine. For example, if I search &lt;code&gt;S3&lt;/code&gt;, similarity search alone returns no results — there isn&amp;rsquo;t enough semantic information for vector search to make useful connections. However, an LLM can expand this to &lt;code&gt;s3, simple storage service, amazon s3, object storage, cloud storage...&lt;/code&gt;, providing more anchor points for vector search to connect to results.&lt;/p&gt;
&lt;p&gt;The model improves results fairly well, but it is still an experimental technology, so results will vary. In the &lt;code&gt;S3&lt;/code&gt; example, you could get posts which don&amp;rsquo;t directly relate to Amazon S3, but relate to cloud object storage, or you could not get some posts which have an exact match of &lt;code&gt;S3&lt;/code&gt; but not more semantic similarity.&lt;/p&gt;
&lt;p&gt;You can read more about the technical side of vector search in our &lt;a href=&#34;/blog/2025/07/vector-search/&#34;&gt;recent blog post&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Happy searching!&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Vector Search: The Future of Finding What Matters</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/07/vector-search/"/>
      <id>https://www.endpointdev.com/blog/2025/07/vector-search/</id>
      <published>2025-07-01T00:00:00+00:00</published>
      <author>
        <name>Kürşat Kutlu Aydemir</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/07/vector-search/jigsaw-piece.webp&#34; alt=&#34;A blank jigsaw puzzle with one piece missing, revealing pink beneath.&#34;&gt;&lt;br&gt;
Photo by &lt;a href=&#34;https://www.pexels.com/photo/pink-jigsaw-puzzle-piece-3482441/&#34;&gt;Ann H&lt;/a&gt; on Pexels&lt;/p&gt;
&lt;p&gt;In a world flooding with data in several different formats like images, documents, text, and videos, traditional search methods are starting to not be modern anymore. Today, the vector search technique is revolutionizing how we retrieve and understand information. If you wonder how Spotify can recommend the perfect song or how Google can find almost perfectly accurate image matches, vector search is kind of the wizard behind the curtain. Let&amp;rsquo;s see how it has become a game changer.&lt;/p&gt;
&lt;h3 id=&#34;what-is-vector-search&#34;&gt;What Is Vector Search?&lt;/h3&gt;
&lt;p&gt;At its core, vector search is a method of finding similar items in a dataset by representing them as vectors — essentially, lists of numbers in a multi-dimensional space. Unlike keyword-based search, which relies on exact matches or predefined rules, vector search focuses on semantic similarity. This means it can understand the meaning or context behind data, not just the words or pixels on the surface.&lt;/p&gt;
&lt;p&gt;Imagine you’re searching for &lt;code&gt;a cozy cabin in the woods&lt;/code&gt;. A traditional search might get stuck on the exact words in this query, missing a listing for something similar like &lt;code&gt;a snug retreat nestled in a forest&lt;/code&gt;. Vector search, however, can connect the dots because it understands that &lt;code&gt;cozy&lt;/code&gt; and &lt;code&gt;snug&lt;/code&gt; or &lt;code&gt;woods&lt;/code&gt; and &lt;code&gt;forest&lt;/code&gt; are conceptually close.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;You can try vector search on this blog! There&amp;rsquo;s a search bar in the header at the top of the page, and you can see our announcement post &lt;a href=&#34;/blog/2025/08/vector-search-for-the-end-point-blog/&#34;&gt;here&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3 id=&#34;how-does-it-work&#34;&gt;How Does It Work?&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/07/vector-search/vector-search-arch.webp&#34; alt=&#34;A block flow diagram. At the start is a block labeled &amp;ldquo;Raw Data&amp;rdquo;. A red arrow labeled &amp;ldquo;Embedding Model&amp;rdquo; points to stacked blocks labeled &amp;ldquo;Embedding&amp;rdquo;. Then a red arrow points to a cylinder reading &amp;ldquo;Vector Store&amp;rdquo;. Then a green arrow points to stacked blocks reading &amp;ldquo;Found Vectors&amp;rdquo;. Also pointing to the &amp;ldquo;Vector Store&amp;rdquo; cylinder, with a blue arrow labeled &amp;ldquo;Similarity Search&amp;rdquo;, is a block labeled &amp;ldquo;Query Vector&amp;rdquo;, which has another blue arrow pointing to it from a block labeled &amp;ldquo;Query&amp;rdquo;.&#34;&gt;&lt;/p&gt;
&lt;p&gt;The magic of vector search happens in three key steps:&lt;/p&gt;
&lt;h4 id=&#34;embedding-generation&#34;&gt;Embedding Generation&lt;/h4&gt;
&lt;p&gt;Once you have access to the data sources that you want to apply vector search, you will need to prepare and ingest the data into your system. This phase will handle data clean-up and extraction from different data sources and formats. For multi-modality support you may need to include multiple data formats like text, image, audio, and video.&lt;/p&gt;
&lt;p&gt;In the embedding generation phase, as depicted above, raw data (like text, images, or audio) is transformed into vectors using suitable embedding machine learning (ML) models. These models (e.g. &lt;a href=&#34;https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2&#34;&gt;all-MiniLM-L6-v2&lt;/a&gt;, &lt;a href=&#34;https://huggingface.co/sentence-transformers/all-mpnet-base-v2&#34;&gt;all-mpnet-base-v2&lt;/a&gt; for text, or &lt;a href=&#34;https://huggingface.co/docs/transformers/en/model_doc/resnet&#34;&gt;ResNet&lt;/a&gt; for images) analyze the data and spit out a numerical representation, a vector, that captures its essence. For example, the sentence &lt;code&gt;I love sunny hikes&lt;/code&gt; might become something like &lt;code&gt;[-0.07571851, -0.02147608, 0.07130147, 0.1087752, 0.02052169, ...]&lt;/code&gt;, where each number reflects a feature of its meaning.&lt;/p&gt;
&lt;h4 id=&#34;storing-vectors&#34;&gt;Storing Vectors&lt;/h4&gt;
&lt;p&gt;The generated vectors are then stored in a vector database or an index optimized for fast retrieval. Think of it as a massive, multi-dimensional map where every point represents an item or a chunk of an item. There are several options for your choice like &lt;a href=&#34;https://faiss.ai/&#34;&gt;FAISS&lt;/a&gt;, &lt;a href=&#34;https://www.trychroma.com/&#34;&gt;Chroma&lt;/a&gt;, &lt;a href=&#34;https://milvus.io/&#34;&gt;Milvus&lt;/a&gt;, and &lt;a href=&#34;https://github.com/pgvector/pgvector&#34;&gt;pg-vector&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&#34;similarity-search&#34;&gt;Similarity Search&lt;/h4&gt;
&lt;p&gt;When you use this system with a query like &lt;code&gt;find me green hoodies&lt;/code&gt;, it converts your input into a vector too. Then, it searches the vector store for the closest points using mathematical distance metrics like &lt;a href=&#34;https://en.wikipedia.org/wiki/Cosine_similarity&#34;&gt;Cosine Similarity&lt;/a&gt; or &lt;a href=&#34;https://en.wikipedia.org/wiki/Euclidean_distance&#34;&gt;Euclidean Distance&lt;/a&gt;. The closer the vectors, the more similar the items.&lt;/p&gt;
&lt;p&gt;The result? You get matches that feel right, even if they don’t share exact keywords or pixel patterns.&lt;/p&gt;
&lt;h4 id=&#34;a-simple-vector-search-implementation&#34;&gt;A Simple Vector Search Implementation&lt;/h4&gt;
&lt;p&gt;Here, to demonstrate how vector search works, I implemented a simple vector search based on this architecture without using third-party libraries. The best practice is surely using well-known implementations and tools to achieve production ready applications; this is just for demonstration.&lt;/p&gt;
&lt;p&gt;Below is a very simple vector store implementation:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;numpy&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;as&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;typing&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; List, Tuple
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;SimpleVectorStore&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Initialize&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.ids = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;add_vectors&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, vectors: List[List[&lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;]], ids: List[&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;]):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Add vectors to vector store&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(vectors) != &lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(ids):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;raise&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;ValueError&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Number of vectors must match number of IDs&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        vectors = np.array(vectors)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors.extend(vectors)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.ids.extend(ids)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors = np.array(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;cosine_similarity&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, v1: np.ndarray, v2: np.ndarray) -&amp;gt; &lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Calculate cosine similarity between two vectors&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# zero vectors&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; np.all(v1 == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;) &lt;span style=&#34;color:#080&#34;&gt;or&lt;/span&gt; np.all(v2 == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        dot_product = np.dot(v1, v2)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        norm_v1 = np.linalg.norm(v1)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        norm_v2 = np.linalg.norm(v2)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; dot_product / (norm_v1 * norm_v2)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;search&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, query_vector: List[&lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;], k: &lt;span style=&#34;color:#038&#34;&gt;int&lt;/span&gt; = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;5&lt;/span&gt;) -&amp;gt; List[Tuple[&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;, &lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;]]:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;Search for the k most similar vectors&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors) == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        query_vector = np.array(query_vector)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        similarities = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            (&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors[i], &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.ids[i], &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.cosine_similarity(query_vector, vec))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; i, vec &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;enumerate&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vectors)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# sort and return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        similarities.sort(key=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;lambda&lt;/span&gt; x: x[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;], reverse=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; similarities[:k]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, let&amp;rsquo;s store some sample vectors (so-called embeddings) to this simple vector store and perform an example search. In real-world applications this represents the raw data in vector form, but I used dummy embedding values in this example. I also excluded the embedding model phase when creating these embedding values to simplify the code.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;__name__&lt;/span&gt; == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;__main__&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# vector store instance&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    vector_store = SimpleVectorStore()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# example embeddings&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    sample_embeddings = [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;], &lt;span style=&#34;color:#888&#34;&gt;# Vector 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;], &lt;span style=&#34;color:#888&#34;&gt;# Vector 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;], &lt;span style=&#34;color:#888&#34;&gt;# Vector 3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;], &lt;span style=&#34;color:#888&#34;&gt;# Vector 4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    sample_ids = [&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;doc1&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;doc2&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;doc3&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;doc4&amp;#34;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# add sample vectors to vector store&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    vector_store.add_vectors(sample_embeddings, sample_ids)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# query vector&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    query = [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.5&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# search for top k = 2 results&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    top_k = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    results = vector_store.search(query, k=top_k)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Query vector:&amp;#34;&lt;/span&gt;, query)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Top &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;top_k&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt; similar vectors:&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; vector, doc_id, similarity &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; results:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;DocId: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;doc_id&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;, Vector: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;vector&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;, Similarity: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;similarity&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;:&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.4f&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The output of this search is like below:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Query vector: [1.0, 0.5, 0.0]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Top 2 similar vectors:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;DocId: doc3, Vector: [1. 1. 0.], Similarity: 0.9487
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;DocId: doc1, Vector: [1. 0. 0.], Similarity: 0.8944&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now compare the vectors in the results with the query vector. As you can see the bigger similarity scores are the closer ones and are more similar to the query vector.&lt;/p&gt;
&lt;h3 id=&#34;why-vector-search-is-powerful&#34;&gt;Why Vector Search is Powerful&lt;/h3&gt;
&lt;p&gt;Vector search offers several advantages over traditional search methods. First of all, it has a degree of semantic understanding by focusing on contextual meaning rather than keywords: vector search handles synonyms, misspellings, and contextual nuances. For instance, searching &lt;code&gt;car&lt;/code&gt; could return results about &lt;code&gt;automobiles&lt;/code&gt; or &lt;code&gt;vehicles&lt;/code&gt; (assuming you have a well-trained model).&lt;/p&gt;
&lt;p&gt;Multimodal capability is another feature which is commonly used today. Vector search isn’t limited to text. It can process images, audio, video, or even combinations of these. For example, you could search for &lt;code&gt;sunset photos&lt;/code&gt; using a text query and retrieve visually similar images.&lt;/p&gt;
&lt;p&gt;Vector search can incorporate user preferences or behavior like past searches into the vector space and deliver personalized results.&lt;/p&gt;
&lt;h3 id=&#34;real-world-examples&#34;&gt;Real-World Examples&lt;/h3&gt;
&lt;p&gt;Vector search is already everywhere:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Retrieval-Augmented Generation (RAG)&lt;/p&gt;
&lt;p&gt;Regardless of the industry that a RAG application is designed for, vector search is essential in the retrieval (&amp;ldquo;R&amp;rdquo;) phase: it is used to find relevant data for a query to build a context for the LLM.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ecommerce:&lt;/p&gt;
&lt;p&gt;Vector search is used for recommending products based on user queries or visual similarity (e.g. &amp;ldquo;find shoes like these&amp;rdquo; using an image). Example: Searching for &amp;ldquo;red sneakers&amp;rdquo; might return visually similar items even if the product description doesn’t mention &amp;ldquo;red sneakers&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Content Discovery:&lt;/p&gt;
&lt;p&gt;Used for powering recommendation engines for streaming platforms (e.g. Netflix, Spotify) by finding movies, songs, or articles similar to a user’s interests. Example: Suggesting a sci-fi movie based on a user’s love for &amp;ldquo;Star Wars&amp;rdquo;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Customer Support:&lt;/p&gt;
&lt;p&gt;Used for enabling chatbots to retrieve relevant knowledge base articles or FAQs by understanding the intent behind a user’s question. Example: A query like &amp;ldquo;how to reset my device&amp;rdquo; could pull up guides even if phrased differently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Image and Video Search:&lt;/p&gt;
&lt;p&gt;Allows users to search for visually similar images or videos, such as finding artwork or stock footage that matches a specific style. Example: Uploading a photo of a beach to find similar vacation destinations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Enterprise Search:&lt;/p&gt;
&lt;p&gt;Helps employees find documents, emails, or internal resources by understanding the context of their queries. Example: Searching &amp;ldquo;project timeline&amp;rdquo; could retrieve relevant spreadsheets or emails.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Healthcare:&lt;/p&gt;
&lt;p&gt;Used for matching patient records, medical images, or research papers based on semantic similarity to aid diagnosis or treatment planning. Example: Finding studies related to a specific disease even if terminology varies.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;challenges-ahead&#34;&gt;Challenges Ahead&lt;/h3&gt;
&lt;p&gt;Although computational cost is one of the challenges of performing similarity search on large datasets I won&amp;rsquo;t mention it as a major challenge.&lt;/p&gt;
&lt;p&gt;The main challenges in vector search seem to revolve around interpretability and data quality, as is common in ML. Unlike keyword search, where matches are explicit, vector search operates in a black box, making it harder to explain why certain results were returned.&lt;/p&gt;
&lt;p&gt;As for data quality, the adage applies: garbage in, garbage out. If the input data is noisy or incomplete, the embeddings may not accurately represent the content. Data extraction and creating the embeddings of certain data formats (e.g. tables) is another challenge. Searching for accurate table data values is an especially challenging task. At End Point we are using advanced methods and LLM services to overcome these challenges when preparing quality embeddings.&lt;/p&gt;
&lt;h3 id=&#34;wrapping-up&#34;&gt;Wrapping Up&lt;/h3&gt;
&lt;p&gt;Vector search isn’t just a tech buzzword — it’s a fundamental shift in how we interact with data. By moving beyond rigid keywords to a world of meaning and similarity, it’s unlocking possibilities we’re only beginning to explore. Whether you’re a developer building the next big app or just a curious soul, vector search is worth keeping an eye on. It’s not about finding exactly what you typed — it’s about finding exactly what you meant.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Deploying LLMs Efficiently with Mixture of Experts</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/06/deploying-llms-efficiently-with-mixture-of-experts/"/>
      <id>https://www.endpointdev.com/blog/2025/06/deploying-llms-efficiently-with-mixture-of-experts/</id>
      <published>2025-06-02T00:00:00+00:00</published>
      <author>
        <name>Edgar Mlowe</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/06/deploying-llms-efficiently-with-mixture-of-experts/moe-illusraction.webp&#34; alt=&#34;Illustration of a neural network with three highlighted experts (blue, yellow, and green) and three arrows matching the three different-colored nodes pointing to a router, which has one blue dot, two green dots, and one grey dot. To the right are three GPU icons, one of which is highlighted in green.&#34;&gt;&lt;/p&gt;
&lt;!-- Illustration by Edgar Mlowe, 2025. --&gt;
&lt;h3 id=&#34;1-why-moe&#34;&gt;1. Why MoE?&lt;/h3&gt;
&lt;p&gt;Modern language models can have &lt;em&gt;hundreds of billions&lt;/em&gt; of parameters. That power comes with a cost: high latency, high memory, and high energy use. &lt;em&gt;Mixture‑of‑Experts (MoE)&lt;/em&gt; tackles the problem by letting only a few specialised sub‑networks run for each token, cutting compute while keeping quality.&lt;/p&gt;
&lt;p&gt;In this post you’ll get:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A short intro to MoE&lt;/li&gt;
&lt;li&gt;A simple diagram that shows how it works&lt;/li&gt;
&lt;li&gt;A look at Open‑Source MoE Models&lt;/li&gt;
&lt;li&gt;A quick guide to running one on your own machine with Docker + Ollama&lt;/li&gt;
&lt;li&gt;Deployment tips and extra resources&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;2-key-ideas&#34;&gt;2. Key Ideas&lt;/h3&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Term&lt;/th&gt;
          &lt;th&gt;Quick meaning&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Dense model&lt;/td&gt;
          &lt;td&gt;Every weight is used for every token.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Expert&lt;/td&gt;
          &lt;td&gt;A feed‑forward network inside the layer.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Router&lt;/td&gt;
          &lt;td&gt;Tiny layer that scores experts for each token.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;MoE layer&lt;/td&gt;
          &lt;td&gt;Router + experts; only the top‑k experts run.&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;Sparse activation&lt;/td&gt;
          &lt;td&gt;Most weights sleep for most tokens.&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;blockquote&gt;
&lt;p&gt;Analogy: Think of triage in a hospital. The nurse (router) sends you to the right specialist (expert) instead of paging every doctor.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3 id=&#34;3-how-a-token-moves-through-an-moe-layer&#34;&gt;3. How a Token Moves Through an MoE Layer&lt;/h3&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;         Input Token
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              │
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              ▼
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;         ┌────────┐
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;         │ Router │  (scores all experts)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;         └────────┘
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              │  selects top‑k
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              ▼
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; ┌────────┐   ┌────────┐
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; │Expert 1│… │Expert k│   (inactive experts ≈ greyed‑out)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt; └────────┘   └────────┘
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              │
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;              ▼
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        Combined Output&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;The router scores all experts.&lt;/li&gt;
&lt;li&gt;It picks the best one or two.&lt;/li&gt;
&lt;li&gt;Only those experts process the token.&lt;/li&gt;
&lt;li&gt;Their outputs are combined and passed to the next layer.&lt;/li&gt;
&lt;li&gt;During training, a small penalty is added so the router spreads tokens evenly among the experts.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;4-why-it-saves-compute&#34;&gt;4. Why It Saves Compute&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fewer active weights:&lt;/strong&gt; DeepSeek‑R1 activates only &lt;strong&gt;6 %&lt;/strong&gt; of its weights per token (so 94 % stay idle), while Grok‑1 activates about &lt;strong&gt;25 %&lt;/strong&gt;. Because fewer weights run, the model performs fewer multiply‑add operations, directly cutting computation time and energy.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scale without extra cost:&lt;/strong&gt; You can add more experts to grow the model’s capacity, and the router still activates only a few per token—so compute cost and latency remain almost unchanged.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Focused fine‑tuning:&lt;/strong&gt; You can fine‑tune a single expert to adapt the model to a new topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: Mixtral‑8×7B runs only 13 B parameters per token yet matches Llama‑2‑70B, while generating ~6× faster on the same GPU.&lt;/p&gt;
&lt;h3 id=&#34;5-quick-start-with-ollama&#34;&gt;5. Quick Start with Ollama&lt;/h3&gt;
&lt;p&gt;Run a Mixture‑of‑Experts model in one line:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;ollama run mixtral:8x7b &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Why is MoE efficient?&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If the model is not yet on your machine, Ollama will download a quantized copy automatically.&lt;/p&gt;
&lt;p&gt;Need Ollama? Install it either way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Native binary (macOS/​Linux/​Windows): &lt;a href=&#34;https://ollama.com/download&#34;&gt;&lt;code&gt;curl -fsSL https://ollama.com/install.sh | sh&lt;/code&gt;&lt;/a&gt;  → then use the &lt;code&gt;ollama run&lt;/code&gt; command above.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Docker container:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;docker &lt;span style=&#34;color:#038&#34;&gt;exec&lt;/span&gt; -it ollama ollama run mixtral:8x7b &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Why is MoE efficient?&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both methods expose a local REST endpoint on port 11434, so you can integrate the model into scripts or back‑end services.&lt;/p&gt;
&lt;h4 id=&#34;opensource-moe-models--at-a-glance&#34;&gt;Open‑Source MoE Models — At a Glance&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Entry‑level (single‑GPU):&lt;/strong&gt; Mixtral‑8×7B, Qwen3‑30B‑A3B — fit in 12–16 GB of VRAM and are ideal for prototyping.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mid‑range (workstation‑class):&lt;/strong&gt; Mixtral‑8×22B, DeepSeek‑R1‑32B — need ~32 GB of VRAM and provide near‑frontier accuracy with long context windows.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Research‑scale (multi‑GPU):&lt;/strong&gt; Grok‑1, DeepSeek‑R1‑671B — require 64 GB+ of VRAM or multi‑GPU clusters but offer state‑of‑the‑art performance.&lt;/li&gt;
&lt;li&gt;Models ship in a space‑saving 4‑bit form, so they use about half the memory of the standard 16‑bit (FP16) version—helpful if your GPU VRAM is tight.&lt;/li&gt;
&lt;li&gt;Start with an entry‑level model, validate your pipeline, and scale up only when the use‑case justifies the added cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;6-deployment-tips&#34;&gt;6. Deployment Tips&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Check GPU memory first. All model weights must fit into GPU VRAM during inference. If they don’t:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the 4‑bit download — it needs roughly half the memory with only a small quality trade‑off.&lt;/li&gt;
&lt;li&gt;Off‑load to CPU RAM — frameworks such as DeepSpeed‑MoE or vLLM can park less‑used weights on the CPU; throughput drops, but the model still runs.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Spread the work. While fine‑tuning, watch router stats to confirm every expert is being used; add a load‑balancing loss if a few dominate.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Batch your prompts if experts sit on different GPUs or machines.  When the model has to jump between devices, every prompt makes a short “network trip.” Sending many prompts together means fewer trips, so the overall run is faster.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;useful-links&#34;&gt;Useful links&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1&#34;&gt;Mixtral‑8×7B Instruct model card&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/deepseek-ai/DeepSeek-LLM&#34;&gt;DeepSeek‑R1 technical report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://huggingface.co/blog/moe&#34;&gt;Hugging Face MoE tutorial&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/vllm-project/vllm&#34;&gt;vLLM inference engine&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://deepspeed.readthedocs.io/en/latest/moe.html&#34;&gt;DeepSpeed‑MoE documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;7-takeaway&#34;&gt;7. Takeaway&lt;/h3&gt;
&lt;p&gt;Mixture‑of‑Experts lets you keep big‑model quality without the big‑model bill. Thanks to open models and tools like Ollama, you can spin up an MoE LLM on a single machine, test your ideas, and scale when you’re ready.&lt;/p&gt;
&lt;p&gt;Got questions or feedback? Drop a comment below.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Converting MIDI to KML using AI: Bach’s Notes in the Hills of Greenland</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/"/>
      <id>https://www.endpointdev.com/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/</id>
      <published>2025-05-02T00:00:00+00:00</published>
      <author>
        <name>Darius Clynes</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/bach-in-kml.webp&#34; alt=&#34;A 3D globe visualization at an oblique angle, with hills, lakes, and mountains, with several pins in the foreground reading &amp;ldquo;acoustic grand piano&amp;rdquo;. There are extruded triangles which are green and red extending away from the viewpoint, regularly spaced in multiple straight lines, and varying in size.&#34;&gt;&lt;/p&gt;
&lt;p&gt;I have always been interested in ways of representing music visually. Aside from conventional music notation, I imagined other cross-modal generation methods that could take a sound and generate an image. In the same vein, I have frequently envisioned a 3D landscape in which you could discover musical “objects”.&lt;/p&gt;
&lt;p&gt;Well, now I&amp;rsquo;ve realized a version of this dream — with caveats which will be mentioned later. In this blog I would like to demonstrate how I used AI (in my case ChatGPT using GPT-4 Turbo) to create an interesting JavaScript application from just a few phrases. In this case, we will be making an application that can take as input an existing piece of music represented by a MIDI file and as output, create a KML file that you can view as 3D objects somewhere on the globe.&lt;/p&gt;
&lt;p&gt;Here is how I enlisted ChatGPT to help me:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;please make a javascript application that can take a MIDI file and covert it to extruded polygons in a kml file&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here is a part of its response:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/make-a-javascript-application.webp&#34; alt=&#34;The response to the above ChatGPT query. Chat breaks the process down into 3 steps, &amp;ldquo;Parse the MIDI File&amp;rdquo;, &amp;ldquo;Generate Data for Polygons&amp;rdquo;, and &amp;ldquo;Create a KML File&amp;rdquo;.&#34;&gt;&lt;/p&gt;
&lt;p&gt;I was amazed. It included code to select the MIDI file, convert it to KML, and generate an output file. Plus, ChatGPT correctly interpreted my request despite my “covert” typo. :-)&lt;/p&gt;
&lt;p&gt;Before testing it I was interested in having the color of the extruded polygon be dependent on the pitch of the note. So next I entered:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;please change the color of the extruded polygon dependent on the pitch&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here is what it said:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/change-the-color.webp&#34; alt=&#34;Chat&amp;rsquo;s response to the above query: &amp;ldquo;To change the color of the extruded polygons based on the pitch, we can modify the KML content to include a Style element for each Placemark. We&amp;rsquo;ll map MIDI note pitches to different colors.&#34;&gt;&lt;/p&gt;
&lt;p&gt;It was implemented using npm and Node.js, which seemed excessive for this small application. To make an easier-to-run version in a single html file, I entered:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-plain&#34; data-lang=&#34;plain&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;please rewrite it so it is in an html page without node.js&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/rewrite-it.webp&#34; alt=&#34;Chat&amp;rsquo;s response to the above query: &amp;ldquo;To create a web-based application that converts a MIDI file to extruded polygons in a KML file, we can use JavaScript and the Web MIDI API along with a client-side library for parsing MIDI files, such as @tonejs/midi.&#34;&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/plain-html-app.webp&#34; alt=&#34;An unstyled HTML app reading &amp;ldquo;MIDI to KML Converter&amp;rdquo;. There is a &amp;ldquo;choose file&amp;rdquo; button, with adjacent text reading &amp;ldquo;bach_846.mid&amp;rdquo;. To the right is a &amp;ldquo;Convert to KML&amp;rdquo; button. Below is a &amp;ldquo;Download KML&amp;rdquo; link. Below that, a &amp;ldquo;play&amp;rdquo; button, then three text inputs reading &amp;ldquo;Longitude&amp;rdquo;, &amp;ldquo;Latitude&amp;rdquo;, and &amp;ldquo;Direction&amp;rdquo;, respectively.&#34;&gt;&lt;/p&gt;
&lt;p&gt;I thought it would be nice to be able to place the polygons, easily, wherever you wanted to, in the world.&lt;/p&gt;
&lt;p&gt;I asked it to take a Google Street View position from Google Maps and decode the latitude, longitude, and heading.&lt;/p&gt;
&lt;p&gt;Next, I asked it to make a nicer user interface. It “understood” the purpose of the application very well and came up with a very nice interface explaining what it does.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/styled-app.webp&#34; alt=&#34;The same app as above, but now styled using simple web design, including making the &amp;ldquo;Convert to KML&amp;rdquo; button Green, and including an explanation paragraph: &amp;ldquo;Upload your MIDI file below, and the application will convert it into a KML file with 3D polygons. Each note in the MIDI file will be represented by an extruded polygon, and different musical instruments will be visually distinct with images on each polygon. Download the KML file and view it in Google Earth.&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;how-to-try-out-the-application&#34;&gt;How to try out the application&lt;/h3&gt;
&lt;p&gt;You can try out a version of the application at &lt;a href=&#34;https://darius.endpointdev.com/midi2kml/midi2kml_improved.html&#34;&gt;https://darius.endpointdev.com/midi2kml/midi2kml_improved.html&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/test-coordinates.webp&#34; alt=&#34;The same app as above, now with a &amp;ldquo;Play MIDI&amp;rdquo; button visible, the &amp;ldquo;choose file&amp;rdquo; box filled, and values entered for coordinates.&#34;&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Find a nice place to have the MIDI data displayed. Copy the URL and paste it into the Street View URL field.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For example, here is &lt;a href=&#34;https://www.google.com/maps/@60.8287681,-45.7810281,2a,66.4y,280.22h,98.22t/data=!3m7!1e1!3m5!1sPgzLk0iAbXx_eGh1Z7pS0g!2e0!6shttps:%2F%2Fstreetviewpixels-pa.googleapis.com%2Fv1%2Fthumbnail%3Fcb_client%3Dmaps_sv.tactile%26w%3D900%26h%3D600%26pitch%3D-8.219999999999999%26panoid%3DPgzLk0iAbXx_eGh1Z7pS0g%26yaw%3D280.22!7i13312!8i6656?entry=ttu&amp;amp;g_ep=EgoyMDI0MTIxMS4wIKXMDSoASAFQAw%3D%3D&#34;&gt;a place&lt;/a&gt; in Greenland:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/greenland-sphere.webp&#34; alt=&#34;A Google Street View spherical image of hills in Greenland, with a Google Maps label reading &amp;ldquo;Hvalsey Church&amp;rdquo;, with a brick building and the ocean visible&#34;&gt;&lt;/p&gt;
&lt;p&gt;And here&amp;rsquo;s one &lt;a href=&#34;https://www.google.com/maps/@53.1000141,4.7522293,3a,75y,270.07h,90t/data=!3m8!1e1!3m6!1sAF1QipO8lOwNwAosMdcm3YTQT2CQleKuRXNRc59MsmA-!2e10!3e11!6shttps:%2F%2Flh3.googleusercontent.com%2Fp%2FAF1QipO8lOwNwAosMdcm3YTQT2CQleKuRXNRc59MsmA-%3Dw900-h600-k-no-pi0-ya271.06580195437505-ro0-fo100!7i8704!8i4352?entry=ttu&amp;amp;g_ep=EgoyMDI1MDMxMi4wIKXMDSoASAFQAw%3D%3D&#34;&gt;on the beach&lt;/a&gt; in Texel in the Netherlands:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/beach-netherlands.webp&#34; alt=&#34;A Google Street View spherical image of a beach, with a sunset, with a Google Maps label reading &amp;ldquo;Beachclub Texel&amp;rdquo;&#34;&gt;&lt;/p&gt;
&lt;ol start=&#34;2&#34;&gt;
&lt;li&gt;
&lt;p&gt;Find a nice MIDI file like &lt;a href=&#34;https://darius.endpointdev.com/midi2kml/bach_846.mid&#34;&gt;Bach’s prelude and fugue in C major&lt;/a&gt;. Download it and put it in the &amp;ldquo;choose MIDI file&amp;rdquo; field&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Press &amp;ldquo;Convert to KML&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Click on the &amp;ldquo;download KML&amp;rdquo; link&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Open the KML file in a KML viewer or by manually by loading it into a program like Google Earth or Cesium.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2025/05/midi-to-kml-bachs-notes-in-the-hills-of-greenland/midi-visualization-triangles.webp&#34; alt=&#34;A closer view of the triangles extruded from Google Earth imagery, both labeled &amp;ldquo;acoustic grand piano&amp;rdquo;, and with some triangles smaller on the right, while the ones on the left have some duplicate triangles superimposed on others.&#34;&gt;&lt;/p&gt;
&lt;h3 id=&#34;issues&#34;&gt;Issues&lt;/h3&gt;
&lt;p&gt;This application works at least partially — it puts triangular prisms for each MIDI track, over time, and shows different colors for different notes (though the color–note correlation is not clear). There are plenty of issues and questions, however:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There are lots of overlapping polygons. It&amp;rsquo;s unclear whether these are just errors, or represent different dynamics or articulations&lt;/li&gt;
&lt;li&gt;The &amp;ldquo;Play MIDI&amp;rdquo; button doesn&amp;rsquo;t work — it plays tones of some kind, but in my testing it was either a single bell sound, or computer noise reminiscent of an AOL dial-up modem.&lt;/li&gt;
&lt;li&gt;When I tested it on an orchestral score (Mozart&amp;rsquo;s Requiem, Kyrie), the instrument label pins were off to the side, not indicating which line of notes they corresponded to, and were mostly labeled &amp;ldquo;acoustic grand piano,&amp;rdquo; which is not an instrument included in the MIDI file.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;using-ai-for-blog-post-image-processing&#34;&gt;Using AI for blog post image processing&lt;/h3&gt;
&lt;p&gt;As an aside, I created this blog using Google Docs with PNG pictures embedded in it. Our blog structure requires a Markdown document with separate WebP images. Google Docs has a nice Markdown export function, but it converts the PNG images into embedded Base64-encoded PNG. So I asked ChatGPT to extract the Base64-encoded PNG and create WebP files. It created a Python script that did this perfectly:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;os&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;base64&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;re&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;PIL&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; Image
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;from&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;io&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;import&lt;/span&gt; BytesIO
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# Path to your Markdown file&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;md_file_path = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;example.md&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# Directory to store extracted and converted WEBP images&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;output_dir = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;converted_images&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;os.makedirs(output_dir, exist_ok=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# Load the Markdown content&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;open&lt;/span&gt;(md_file_path, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;r&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;as&lt;/span&gt; md_file:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    content = md_file.read()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# Regex to find base64-encoded PNGs in the Markdown&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;base64_pattern = re.compile(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;r&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;\[.*?\]:\s*&amp;lt;data:image/png;base64,([A-Za-z0-9+/=]+)&amp;gt;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# Initialize a counter for naming the files&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;counter = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;match&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; base64_pattern.finditer(content):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Match found: &amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    base64_data = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;match&lt;/span&gt;.group(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(base64_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# Decode the base64 string to a PNG image&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    png_data = base64.b64decode(base64_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    png_image = Image.open(BytesIO(png_data))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# Save the PNG as a WEBP file&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    webp_name = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;image_&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;counter&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;.webp&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    webp_path = os.path.join(output_dir, webp_name)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    png_image.save(webp_path, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;WEBP&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#888&#34;&gt;# Replace the base64 PNG in the Markdown with a reference to the WEBP file&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    content = content.replace(&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;match&lt;/span&gt;.group(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;), &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;![Image &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;counter&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;](&lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;webp_path&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;)&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    counter += &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# Save the updated Markdown file&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;updated_md_path = os.path.splitext(md_file_path)[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;] + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;_updated.md&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;with&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;open&lt;/span&gt;(updated_md_path, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;w&amp;#34;&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;as&lt;/span&gt; updated_md_file:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    updated_md_file.write(content)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;Updated Markdown file saved: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;updated_md_path&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;f&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;WEBP images saved in: &lt;/span&gt;&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;{&lt;/span&gt;output_dir&lt;span style=&#34;color:#33b;background-color:#fff0f0&#34;&gt;}&lt;/span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Unfortunately, when Google Docs embeds the images, it seems to downsize the resolution, so for this post, old-school image processing it is!&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Introducing the End Point AI Portal for Our Customers</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2025/03/ai-portal-for-end-point-customers/"/>
      <id>https://www.endpointdev.com/blog/2025/03/ai-portal-for-end-point-customers/</id>
      <published>2025-03-12T00:00:00+00:00</published>
      <author>
        <name>Benjamin Goldstein</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/glass-dome-top.webp&#34; alt=&#34;A low angle view of a gridded ceiling, leading to a large stained glass cylindrical top, which is lined with multicolored stained glass in a highly varied arrangement.&#34;&gt;&lt;/p&gt;
&lt;!-- Photo by Seth Jensen, 2025. --&gt;
&lt;p&gt;Today, End Point is opening up the End Point AI Portal for our customers to use. As a technology consulting company that gets in the trenches with our customers to integrate, manage, and support the information technology they need to operate and transform their businesses, we believe this is a great service for us to provide.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We are offering our AI Portal to our customers at no charge.&lt;/strong&gt; The price is certainly right, so we think our customers are very much going to want to use it. 🙂&lt;/p&gt;
&lt;h3 id=&#34;whats-in-the-portal&#34;&gt;What’s in the Portal&lt;/h3&gt;
&lt;p&gt;End Point’s AI Portal provides an interface to interact with many different LLM AI chatbot services all in one place. It even gives users the ability to simultaneously submit queries and instructions to multiple LLMs and see their responses side by side.&lt;/p&gt;
&lt;p&gt;Our Portal is a customized implementation of the &lt;a href=&#34;https://openwebui.com/&#34;&gt;Open WebUI&lt;/a&gt; AI interface and connects via APIs to various AI services. Because of this architecture, our analysis has shown that unless there is ongoing high-intensity usage, the cost per user will be very low — much lower than a typical end-user subscription to a single LLM provider. In the future we may implement an accounting system for the portal to pass usage costs onto our users, but our first priority is to get this capability into our clients’ hands as soon as possible to help them be early adopters of this transformative tech.&lt;/p&gt;
&lt;p&gt;Our AI Portal connects to a range of proprietary AI-powered chatbots served up by Anthropic, OpenAI, Google, and Grok, as well as to several open-source AI-powered chatbots hosted on servers in the United States. This includes the Deepseek R1 model developed in China that rocked the tech world with its release on January 20. It’s a remarkable model, but for businesses in the US having their chat histories stored on servers in China is generally not acceptable. Having the service hosted in the US helps address privacy concerns. We run the open-source models on our own servers and using third-party hosting providers such as &lt;a href=&#34;https://www.together.ai&#34;&gt;together.ai&lt;/a&gt; and &lt;a href=&#34;https://groq.com&#34;&gt;groq.com&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To provide a more coherent user experience, the Open WebUI interface maintains each user’s chat history and provides the ability to manage, share, and download them.&lt;/p&gt;
&lt;p&gt;The portal has already helped End Point with internal projects such as experimenting with how best to process various data sets to make them useful for AI-powered applications. The portal hosts several RAG (Retrieval Augmented Generation) applications that make use of various LLMs to search and answer questions about End Point’s blog posts—we are working to identify the best solution, so results may vary. Please don’t give up on reading our blog just yet.&lt;/p&gt;
&lt;p&gt;Below are screenshots of our AI Portal’s interface showing some of its features (click on the images to view at full size).&lt;/p&gt;

&lt;p&gt;
  &lt;div class=&#34;carousel-container border&#34;&gt;
    &lt;div class=&#34;carousel&#34; id=&#34;image-carousel&#34;&gt;
      &lt;div class=&#34;carousel-nav&#34; data-for=&#34;image-carousel&#34;&gt;&lt;button class=&#34;left-arrow&#34;&gt;
            &lt;img width=&#34;40&#34; height=&#34;48&#34; src=&#34;/images/arrow.svg&#34; alt=&#34;Left arrow&#34; /&gt;
          &lt;/button&gt;&lt;button aria-label=&#34;Carousel select 1&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 2&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 3&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 4&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 5&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 6&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 7&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 8&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 9&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 10&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 11&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button aria-label=&#34;Carousel select 12&#34; class=&#34;indicator&#34;&gt;&lt;/button&gt;&lt;button class=&#34;right-arrow&#34;&gt;
            &lt;img width=&#34;40&#34; height=&#34;48&#34; src=&#34;/images/arrow.svg&#34; alt=&#34;Right arrow&#34; /&gt;
          &lt;/button&gt;&lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/1.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/1.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/2.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/2.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/3.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/3.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/4.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/4.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/5.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/5.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/6.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/6.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/7.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/7.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/8.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/8.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/9.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/9.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/10.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/10.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/11.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/11.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;div class=&#34;carousel-item&#34;&gt;
          &lt;a target=&#34;_blank&#34; href=&#34;/blog/2025/03/ai-portal-for-end-point-customers/12.webp&#34;&gt;
            &lt;img src=&#34;/blog/2025/03/ai-portal-for-end-point-customers/12.webp&#34;&gt;
          &lt;/a&gt;
        &lt;/div&gt;&lt;/div&gt;
  &lt;/div&gt;
&lt;/p&gt;

&lt;p&gt;We require users to accept our AI portal&amp;rsquo;s standard Terms of Service to use the portal. Additionally, each AI-powered chatbot accessible through our portal has its own Terms of Service, to which we provide links for your reference. Our primary concern is whether the information users input into these chatbots might be used by providers for training or other purposes. We strive to avoid chatbots that engage in such practices, and as of the launch of our portal, we&amp;rsquo;re not aware of any chatbot providers that claim rights to user input data. However, it’s common for Terms of Service documents to include clauses allowing providers to modify their terms without prior notice, so we can’t guarantee ongoing compliance with our standards. Regardless, users should always exercise caution and good judgment when entering confidential information into any online service.&lt;/p&gt;
&lt;h3 id=&#34;why-we-are-doing-this&#34;&gt;Why We Are Doing This&lt;/h3&gt;
&lt;p&gt;At End Point, we’ve built our reputation for almost 30 years by providing expert consulting, development, and systems support. As AI technology rapidly evolves, we see it as both an opportunity and a necessity to stay ahead—helping our customers integrate AI into their own systems while deepening our expertise in the field.&lt;/p&gt;
&lt;p&gt;By opening up our Open WebUI Portal to customers at no charge, we’re reinforcing our commitment to innovation, transparency, value, and great customer service. We intend this not only to benefit our clients, but to highlight our ability to integrate AI-powered solutions into your business. We stand ready to help you enhance your existing workflows, products, and services—as well as to create entirely new, AI-powered applications.&lt;/p&gt;
&lt;p&gt;AI use cases include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;User-facing enhancements&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Intelligent similarity search&lt;/li&gt;
&lt;li&gt;Recommendation systems&lt;/li&gt;
&lt;li&gt;AI-powered insights from your data&lt;/li&gt;
&lt;li&gt;Dynamic FAQ generation based on past inquiries&lt;/li&gt;
&lt;li&gt;Customer support chatbots&lt;/li&gt;
&lt;li&gt;Faster and smarter documentation creation&lt;/li&gt;
&lt;li&gt;You name it. The possibilities are endless!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;AI-driven development &amp;amp; optimization&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;System testing and automated quality assurance&lt;/li&gt;
&lt;li&gt;AI-assisted code reviews and refactoring&lt;/li&gt;
&lt;li&gt;Automated code generation for improved efficiency&lt;/li&gt;
&lt;li&gt;…and more&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We’re also using AI to refine our own processes, improve our code, and enhance the value we deliver to our customers. This initiative is part of our commitment to ensure that our clients continue to benefit from the best technology solutions available.&lt;/p&gt;
&lt;h3 id=&#34;how-to-get-an-account&#34;&gt;How to Get an Account&lt;/h3&gt;
&lt;p&gt;Reach out to your End Point client rep or use &lt;a href=&#34;/contact/&#34;&gt;our contact form&lt;/a&gt; to get in touch.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Implementing SummAE neural text summarization with a denoising auto-encoder</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2020/05/summae-neural-text-summarization-denoising-autoencoder/"/>
      <id>https://www.endpointdev.com/blog/2020/05/summae-neural-text-summarization-denoising-autoencoder/</id>
      <published>2020-05-28T00:00:00+00:00</published>
      <author>
        <name>Kamil Ciemniewski</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2020/05/summae-neural-text-summarization-denoising-autoencoder/book.jpg&#34; alt=&#34;Book open on lawn with dandelions&#34;&gt;&lt;/p&gt;
&lt;p&gt;If there’s any problem space in machine learning, with no shortage of (unlabelled) data to train on, it’s easily natural language processing (NLP).&lt;/p&gt;
&lt;p&gt;In this article, I’d like to take on the challenge of taking a paper that came from Google Research in late 2019 and implementing it. It’s going to be a fun trip into the world of neural text summarization. We’re going to go through the basics, the coding, and then we’ll look at what the results actually are in the end.&lt;/p&gt;
&lt;p&gt;The paper we’re going to implement here is: &lt;a href=&#34;https://arxiv.org/abs/1910.00998&#34;&gt;Peter J. Liu, Yu-An Chung, Jie Ren (2019) SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here’s the paper’s abstract:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We propose an end-to-end neural model for zero-shot abstractive text summarization of paragraphs, and introduce a benchmark task, ROCSumm, based on ROCStories, a subset for which we collected human summaries. In this task, five-sentence stories (paragraphs) are summarized with one sentence, using human summaries only for evaluation. We show results for extractive and human baselines to demonstrate a large abstractive gap in performance. Our model, SummAE, consists of a denoising auto-encoder that embeds sentences and paragraphs in a common space, from which either can be decoded. Summaries for paragraphs are generated by decoding a sentence from the paragraph representations. We find that traditional sequence-to-sequence auto-encoders fail to produce good summaries and describe how specific architectural choices and pre-training techniques can significantly improve performance, outperforming extractive baselines. The data, training, evaluation code, and best model weights are open-sourced.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3 id=&#34;preliminaries&#34;&gt;Preliminaries&lt;/h3&gt;
&lt;p&gt;Before we go any further, let’s talk a little bit about neural summarization in general. There’re two main approaches to it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Automatic_summarization#Extraction-based_summarization&#34;&gt;Extractive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Automatic_summarization#Abstraction-based_summarization&#34;&gt;Abstractive&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first approach makes the model “focus” on the most important parts of the longer text - extracting them to form a summary.&lt;/p&gt;
&lt;p&gt;Let’s take a recent article, &lt;a href=&#34;/blog/2020/05/shopify-product-creation/&#34;&gt;“Shopify Admin API: Importing Products in Bulk”&lt;/a&gt;, by one of my great co-workers, &lt;a href=&#34;/team/patrick-lewis/&#34;&gt;Patrick Lewis&lt;/a&gt;, as an example and see what the extractive summarization would look like. Let’s take the first two paragraphs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I recently worked on an interesting project for a store owner who was facing a daunting task: he had an inventory of hundreds of thousands of Magic: The Gathering (MTG) cards that he wanted to sell online through his Shopify store. The logistics of tracking down artwork and current market pricing for each card made it impossible to do manually.&lt;/p&gt;
&lt;p&gt;My solution was to create a custom Rails application that retrieves inventory data from a combination of APIs and then automatically creates products for each card in Shopify. The resulting project turned what would have been a months- or years-long task into a bulk upload that only took a few hours to complete and allowed the store owner to immediately start selling his inventory online. The online store launch turned out to be even more important than initially expected due to current closures of physical stores.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;An extractive model could summarize it as follows:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I recently worked on an interesting project for a store owner who had an inventory of hundreds of thousands of cards that he wanted to sell through his store. The logistics and current pricing for each card made it impossible to do manually. My solution was to create a custom Rails application that retrieves inventory data from a combination of APIs and then automatically creates products for each card. The store launch turned out to be even more important than expected due to current closures of physical stores.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;See how it does the copying and pasting? The big advantage of these types of models is that they are generally easier to create and the resulting summaries tend to faithfully reflect the facts included in the source.&lt;/p&gt;
&lt;p&gt;The downside though is that it’s not how a human would do it. We do a lot of paraphrasing, for instance. We use different words and tend to form sentences less rigidly following the original ones. The need for the summaries to feel more natural made the second type — abstractive — into this subfield’s holy grail.&lt;/p&gt;
&lt;h3 id=&#34;datasets&#34;&gt;Datasets&lt;/h3&gt;
&lt;p&gt;The paper’s authors used the so-called &lt;a href=&#34;https://cs.rochester.edu/nlp/rocstories/&#34;&gt;“ROCStories” dataset&lt;/a&gt; (&lt;a href=&#34;https://www.aclweb.org/anthology/P18-2119/&#34;&gt;“Tackling The Story Ending Biases in The Story Cloze Test”. Rishi Sharma, James Allen, Omid Bakhshandeh, Nasrin Mostafazadeh. In Proceedings of the 2018 Conference of the Association for Computational Linguistics (ACL), 2018&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;In my experiments, I’ve also tried the model against one that’s quite a bit more difficult: &lt;a href=&#34;https://github.com/mahnazkoupaee/WikiHow-Dataset&#34;&gt;WikiHow&lt;/a&gt; (&lt;a href=&#34;https://arxiv.org/abs/1810.09305&#34;&gt;Mahnaz Koupaee, William Yang Wang (2018) WikiHow: A Large Scale Text Summarization Dataset&lt;/a&gt;).&lt;/p&gt;
&lt;h4 id=&#34;rocstories&#34;&gt;ROCStories&lt;/h4&gt;
&lt;p&gt;The dataset consists of 98162 stories, each one consisting of 5 sentences. It’s incredibly clean. The only step I needed to take was to split the stories between the train, eval, and test sets.&lt;/p&gt;
&lt;p&gt;Examples of sentences:&lt;/p&gt;
&lt;p&gt;Example 1:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;My retired coworker turned 69 in July. I went net surfing to get her a gift. She loves Diana Ross. I got two newly released cds and mailed them to her. She sent me an email thanking me.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Example 2:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Tom alerted the government he expected a guest. When she didn’t come he got in a lot of trouble. They talked about revoking his doctor&amp;rsquo;s license. And charging him a huge fee! Tom&amp;rsquo;s life was destroyed because of his act of kindness.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Example 3:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I went to see the doctor when I knew it was bad. I hadn&amp;rsquo;t eaten in nearly a week. I told him I felt afraid of food in my body. He told me I was developing an eating disorder. He instructed me to get some help.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h4 id=&#34;wikihow&#34;&gt;Wikihow&lt;/h4&gt;
&lt;p&gt;This is one of the most challenging openly available datasets for neural summarization. It consists of more than 200,000 long-sequence pairs of text + headline scraped from &lt;a href=&#34;https://www.wikihow.com/Main-Page&#34;&gt;WikiHow’s website&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some examples:&lt;/p&gt;
&lt;p&gt;Text:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;One easy way to conserve water is to cut down on your shower time. Practice cutting your showers down to 10 minutes, then 7, then 5. Challenge yourself to take a shorter shower every day. Washing machines take up a lot of water and electricity, so running a cycle for a couple of articles of clothing is inefficient. Hold off on laundry until you can fill the machine. Avoid letting the water run while you&amp;rsquo;re brushing your teeth or shaving. Keep your hoses and faucets turned off as much as possible. When you need them, use them sparingly.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Headline:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Take quicker showers to conserve water. Wait for a full load of clothing before running a washing machine. Turn off the water when you&amp;rsquo;re not using it.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The main challenge for the summarization model here is that the headline &lt;strong&gt;was actually created by humans&lt;/strong&gt; and is not just “extracting” anything. Any model performing well on this dataset actually needs to model the language pretty well. Otherwise, the headline could be used for computing the evaluation metrics, but it’s pretty clear that traditional metrics like &lt;a href=&#34;https://en.wikipedia.org/wiki/ROUGE_(metric)&#34;&gt;ROUGE&lt;/a&gt; are just bound here to miss the point.&lt;/p&gt;
&lt;h3 id=&#34;basics-of-the-sequence-to-sequence-modeling&#34;&gt;Basics of the sequence-to-sequence modeling&lt;/h3&gt;
&lt;p&gt;Most sequence-to-sequence models are based on the “next token prediction” workflow.&lt;/p&gt;
&lt;p&gt;The general idea can be expressed with P(token | context) — where the task is to model this conditional probability distribution. The “context” here depends on the approach.&lt;/p&gt;
&lt;p&gt;Those models are also called “auto-regressive” because they need to consume their own predictions from previous steps during the inference:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;predict([&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;lt;start&amp;gt;&amp;#34;&lt;/span&gt;], context)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# &amp;#34;I&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;predict([&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;lt;start&amp;gt;&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;], context)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# &amp;#34;love&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;predict([&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;lt;start&amp;gt;&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;love&amp;#34;&lt;/span&gt;], context)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# &amp;#34;biking&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;predict([&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;lt;start&amp;gt;&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;love&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;biking&amp;#34;&lt;/span&gt;], context)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# &amp;#34;&amp;lt;end&amp;gt;&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id=&#34;naively-simple-modeling-markov-model&#34;&gt;Naively simple modeling: Markov Model&lt;/h4&gt;
&lt;p&gt;In this model, the approach is to take on a bold assumption: that the probability of the next token is conditioned &lt;strong&gt;only&lt;/strong&gt; on the previous token.&lt;/p&gt;
&lt;p&gt;The Markov Model is elegantly introduced in the blog post &lt;a href=&#34;https://medium.com/ymedialabs-innovation/next-word-prediction-using-markov-model-570fc0475f96&#34;&gt;Next Word Prediction using Markov Model&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Why is it naive? Because we know that the probability of the word “love” depends on the word “I” &lt;strong&gt;given a broader context&lt;/strong&gt;. A model that’s always going to output “roses” would miss the best word more often than not.&lt;/p&gt;
&lt;h4 id=&#34;modeling-with-neural-networks&#34;&gt;Modeling with neural networks&lt;/h4&gt;
&lt;p&gt;Usually, sequence-to-sequence neural network models consist of two parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;encoder&lt;/li&gt;
&lt;li&gt;decoder&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The encoder is there to build a “gist” representation of the input sequence. The gist and the previous token become our “context” to do the inference. This fits in well within the P(token | context) modeling I described above. That distribution can be expressed more clearly as P(token | previous; gist).&lt;/p&gt;
&lt;p&gt;There are other approaches too with one of them being the &lt;a href=&#34;https://arxiv.org/pdf/2001.04063v2.pdf&#34;&gt;ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training - 2020 - Yan, Yu and Qi, Weizhen and Gong, Yeyun and Liu, Dayiheng and Duan, Nan and Chen, Jiusheng and Zhang, Ruofei and Zhou, Ming&lt;/a&gt;. The difference in the approach here was the prediction of n-tokens ahead at once.&lt;/p&gt;
&lt;h3 id=&#34;teacher-forcing&#34;&gt;Teacher-forcing&lt;/h3&gt;
&lt;p&gt;Let’s see how could we go about teaching the model about the next token’s conditional distribution.&lt;/p&gt;
&lt;p&gt;Imagine that the model’s parameters aren’t performing well yet. We have an input sequence of: &lt;code&gt;[&amp;quot;&amp;lt;start&amp;gt;&amp;quot;, &amp;quot;I&amp;quot;, &amp;quot;love&amp;quot;, &amp;quot;biking&amp;quot;, &amp;quot;during&amp;quot;, &amp;quot;the&amp;quot;, &amp;quot;summer&amp;quot;, &amp;quot;&amp;lt;end&amp;gt;&amp;quot;]&lt;/code&gt;. We’re training the model giving it the first token:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;model([&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;lt;start&amp;gt;&amp;#34;&lt;/span&gt;, context])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# &amp;#34;I&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Great, now let’s ask it for another one:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;model([&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;lt;start&amp;gt;&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;], context])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# &amp;#34;wonder&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Hmmm that’s not what we wanted, but let’s naively continue:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;model([&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;lt;start&amp;gt;&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;wonder&amp;#34;&lt;/span&gt;], context)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# &amp;#34;why&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We could continue gathering predictions and compute the loss at the end. The loss would really only be able to tell it about the first mistake (“love” vs. “wonder”); the rest of the errors would just accumulate from here. This would hinder the learning considerably, adding in the noise from the accumulated errors.&lt;/p&gt;
&lt;p&gt;There’s a better approach called &lt;a href=&#34;https://machinelearningmastery.com/teacher-forcing-for-recurrent-neural-networks/&#34;&gt;Teacher Forcing&lt;/a&gt;. In this approach, you’re telling the model the true answer after each of its guesses. The last example would look like the following:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;model([&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;lt;start&amp;gt;&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;I&amp;#34;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;love&amp;#34;&lt;/span&gt;], context)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#888&#34;&gt;# &amp;#34;watching&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You’d continue the process, feeding it the full input sequence and the loss term would be computed based on all its guesses.&lt;/p&gt;
&lt;h3 id=&#34;compute-friendly-representation-for-tokens-and-gists&#34;&gt;Compute-friendly representation for tokens and gists&lt;/h3&gt;
&lt;p&gt;Some of the readers might want to skip this section. I’d like to describe quickly here the concept of the &lt;a href=&#34;https://towardsdatascience.com/understanding-latent-space-in-machine-learning-de5a7c687d8d&#34;&gt;latent space&lt;/a&gt; and &lt;a href=&#34;https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa&#34;&gt;vector embeddings&lt;/a&gt;. This is to keep the matters relatively palatable for the broader audience.&lt;/p&gt;
&lt;h4 id=&#34;representing-words-naively&#34;&gt;Representing words naively&lt;/h4&gt;
&lt;p&gt;How do we turn the words (strings) into numbers that we input into our machine learning models? A software developer might think about assigning each word a unique integer. This works well for databases but in machine learning models, the fact that integers follow one another means that they encode a relation (which one follows which and in what distance). This doesn’t work well for almost any problem in data science.&lt;/p&gt;
&lt;p&gt;Traditionally, the problem is solved by “&lt;a href=&#34;https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/&#34;&gt;one-hot encoding&lt;/a&gt;”. This means that we’re turning our integers into vectors, where each value is zero except the one for the index that equals the value to encode (or minus one if your programming language uses zero-based indexing). Example: &lt;code&gt;3 =&amp;gt; [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]&lt;/code&gt; when the total number of “integers” (classes) to encode is 10.&lt;/p&gt;
&lt;p&gt;This is better as it breaks the ordering and distancing assumptions. It doesn’t encode anything about the words, though, except the arbitrary number we’ve decided to assign to them. We now don’t have the ordering but we also don’t have any distance. Empirically though we just know that the word “love” is much closer to “enjoy” than it is to “helicopter”.&lt;/p&gt;
&lt;h4 id=&#34;a-better-approach-word-embeddings&#34;&gt;A better approach: word embeddings&lt;/h4&gt;
&lt;p&gt;How could we keep our vector representation (as in one-hot encoding) but also introduce the distance? I’ve already glanced over this concept in my &lt;a href=&#34;/blog/2018/07/recommender-mxnet/&#34;&gt;post about the simple recommender system&lt;/a&gt;. The idea is to have a vector of floating-point values so that the closer the words are in their meaning, the smaller the angle is between them. We can easily compute a metric following this logic by measuring the &lt;a href=&#34;http://blog.christianperone.com/2013/09/machine-learning-cosine-similarity-for-vector-space-models-part-iii/&#34;&gt;cosine distance&lt;/a&gt;. This way, the word representations are easy to feed into the encoder, and they already contain a lot of the information in themselves.&lt;/p&gt;
&lt;h4 id=&#34;not-only-words&#34;&gt;Not only words&lt;/h4&gt;
&lt;p&gt;Can we only have vectors for words? Couldn’t we have vectors for paragraphs, so that the closer they are in their meaning, the smaller some vector space metric between them? Of course we can. This is, in fact, what will allow us in this article’s model to encode the “gist” that we talked about. The “encoder” part of the model is going to learn the most convenient way of turning the input sequence into the floating-point numbers vector.&lt;/p&gt;
&lt;h3 id=&#34;auto-encoders&#34;&gt;Auto-encoders&lt;/h3&gt;
&lt;p&gt;We’re slowly approaching the model from the paper. We still have one concept that’s vital to understand in order to get why the model is going to work.&lt;/p&gt;
&lt;p&gt;Up until now, we talked about the following structure of the typical sequence-to-sequence neural network model:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2020/05/summae-neural-text-summarization-denoising-autoencoder/seq-to-seq.png&#34; alt=&#34;Sequence To Sequence Neural Nets&#34;&gt;&lt;/p&gt;
&lt;p&gt;This is true e.g. for translation models where the input sequence is in English and the output is in Greek. It’s also true for this article’s model &lt;strong&gt;during the inference&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;What if we’d make the input and output to be the same sequence? We’d turn it into a so-called &lt;a href=&#34;https://en.wikipedia.org/wiki/Autoencoder&#34;&gt;auto-encoder&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The output of course isn’t all that useful — we already know what the input sequence is. The true value is in the model’s ability to encode the input into a &lt;strong&gt;gist&lt;/strong&gt;.&lt;/p&gt;
&lt;h4 id=&#34;adding-the-noise&#34;&gt;Adding the noise&lt;/h4&gt;
&lt;p&gt;A very interesting type of an auto-encoder is the &lt;a href=&#34;https://towardsdatascience.com/denoising-autoencoders-explained-dbb82467fc2&#34;&gt;denoising auto-encoder&lt;/a&gt;. The idea is that the input sequence gets randomly corrupted and the network learns to still produce a good gist and reconstruct the sequence before it got corrupted. This makes the training “teach” the network about the deeper connections in the data, instead of just “memorizing” as much as it can.&lt;/p&gt;
&lt;h3 id=&#34;the-summae-model&#34;&gt;The SummAE model&lt;/h3&gt;
&lt;p&gt;We’re now ready to talk about the architecture from the paper. Given what we’ve already learned, this is going to be very simple. The SummAE model is just a denoising auto-encoder that is being trained a special way.&lt;/p&gt;
&lt;h4 id=&#34;auto-encoding-paragraphs-and-sentences&#34;&gt;Auto-encoding paragraphs and sentences&lt;/h4&gt;
&lt;p&gt;The authors were training the model on both single sentences and full paragraphs. In all cases the task was to reproduce the uncorrupted input.&lt;/p&gt;
&lt;p&gt;The first part of the approach is about having two special “start tokens” to signal the mode: paragraph vs. sentence. In my code, I’ve used “&amp;lt;start-full&amp;gt;” and “&amp;lt;start-short&amp;gt;”.&lt;/p&gt;
&lt;p&gt;During the training, the model learns the conditional distributions given those two tokens and the ones that follow, for any given token in the sequence.&lt;/p&gt;
&lt;h4 id=&#34;adding-the-noise-1&#34;&gt;Adding the noise&lt;/h4&gt;
&lt;p&gt;The sentences are simply concatenated to form a paragraph. The input then gets corrupted at random by means of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;masking the input tokens&lt;/li&gt;
&lt;li&gt;shuffling the order of the sentences within the paragraph&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The authors are claiming that the latter helped them in solving the issue of the network just memorizing the first sentence. What I have found though is that this model is generally prone towards memorizing concrete sentences from the paragraph. Sometimes it’s the first, and sometimes it’s some of the others. I’ve found this true even when adding a lot of noise to the input.&lt;/p&gt;
&lt;h4 id=&#34;the-code&#34;&gt;The code&lt;/h4&gt;
&lt;p&gt;The full PyTorch implementation described in this blog post is available at &lt;a href=&#34;https://github.com/kamilc/neural-text-summarization&#34;&gt;https://github.com/kamilc/neural-text-summarization&lt;/a&gt;. You may find some of its parts less clean than others — it’s a work in progress. Specifically, the data download is almost left out.&lt;/p&gt;
&lt;p&gt;You can find the WikiData preprocessing in a notebook in the repository. For the ROCStories, I just downloaded the CSV files and concatenated with Unix &lt;code&gt;cat&lt;/code&gt;. There’s an additional &lt;code&gt;process.py&lt;/code&gt; file generated from a very simple &lt;code&gt;IPython&lt;/code&gt; session.&lt;/p&gt;
&lt;p&gt;Let’s have a very brief look at some of the most interesting parts of the code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;SummarizeNet&lt;/span&gt;(NNModel):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;encode&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, embeddings, lengths):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;decode&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, embeddings, encoded, lengths, modes):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;forward&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, embeddings, clean_embeddings, lengths, modes):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# ...&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;predict&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, vocabulary, embeddings, lengths):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# ...&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can notice separate methods for &lt;code&gt;forward&lt;/code&gt; and &lt;code&gt;predict&lt;/code&gt;. I chose the &lt;a href=&#34;https://jalammar.github.io/illustrated-transformer/&#34;&gt;Transformer&lt;/a&gt; over the recurrent neural networks for both the encoder part and the decoder. The &lt;a href=&#34;https://pytorch.org/docs/master/generated/torch.nn.TransformerDecoder.html&#34;&gt;PyTorch implementation of the transformer decoder part&lt;/a&gt; already includes the teacher forcing in the &lt;code&gt;forward&lt;/code&gt; method. This makes it convenient at the training time — to just feed it the full, uncorrupted sequence of embeddings as the “target”. During the inference we need to do the “auto-regressive” part by hand though. This means feeding the previous predictions in a loop — hence the need for two distinct methods here.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;forward&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, embeddings, clean_embeddings, lengths, modes):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    noisy_embeddings = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.mask_dropout(embeddings, lengths)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    encoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.encode(noisy_embeddings[:, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;:, :], lengths-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    decoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.decode(clean_embeddings, encoded, lengths, modes)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        decoded,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        encoded
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can notice that I’m doing the token masking at the model level during the training. The code also shows cleanly the structure of this seq2seq model — with the encoder and the decoder.&lt;/p&gt;
&lt;p&gt;The encoder part looks simple as long as you’re familiar with the transformers:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;encode&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, embeddings, lengths):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    batch_size, seq_len, _ = embeddings.shape
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    embeddings = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.encode_positions(embeddings)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    paddings_mask = torch.arange(end=seq_len).unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;).expand((batch_size, seq_len)).to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    paddings_mask = (paddings_mask + &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;) &amp;gt; lengths.unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;).expand((batch_size, seq_len))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    encoded = embeddings.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; ix, encoder &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;enumerate&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.encoders):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        encoded = encoder(encoded, src_key_padding_mask=paddings_mask)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        encoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.encode_batch_norms[ix](encoded.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)).transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    last_encoded = encoded
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    encoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.pool_encoded(encoded, lengths)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    encoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.to_hidden(encoded)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; encoded&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We’re first encoding the positions as in the “Attention Is All You Need” paper and then feeding the embeddings into a stack of the encoder layers. At the end, we’re morphing the tensor to have the final dimension equal the number given as the model’s parameter.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;decode&lt;/code&gt; sits on PyTorch’s shoulders too:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;decode&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, embeddings, encoded, lengths, modes):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    batch_size, seq_len, _ = embeddings.shape
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    embeddings = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.encode_positions(embeddings)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    mask = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.mask_for(embeddings)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    encoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.from_hidden(encoded)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    encoded = encoded.unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;).expand(seq_len, batch_size, -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    decoded = embeddings.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    decoded = torch.cat(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            encoded,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            decoded
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        axis=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    decoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.combine_decoded(decoded)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    decoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.combine_batch_norm(decoded.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)).transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    paddings_mask = torch.arange(end=seq_len).unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;).expand((batch_size, seq_len)).to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    paddings_mask = paddings_mask &amp;gt; lengths.unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;).expand((batch_size, seq_len))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; ix, decoder &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;enumerate&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.decoders):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        decoded = decoder(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            decoded,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            torch.ones_like(decoded),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            tgt_mask=mask,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            tgt_key_padding_mask=paddings_mask
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        decoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.decode_batch_norms[ix](decoded.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)).transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    decoded = decoded.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.linear_logits(decoded)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can notice that I’m combining the gist received from the encoder with each word embeddings — as this is how it was described in the paper.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;predict&lt;/code&gt; is very similar to &lt;code&gt;forward&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;predict&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, vocabulary, embeddings, lengths):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    Caller should include the start and end tokens here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    but we’re going to ensure the start one is replaces by &amp;lt;start-short&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;    &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    previous_mode = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.training
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.eval()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    batch_size, _, _ = embeddings.shape
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    results = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; row &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;range&lt;/span&gt;(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, batch_size):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        row_embeddings = embeddings[row, :, :].unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        row_embeddings[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;] = vocabulary.token_vector(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;lt;start-short&amp;gt;&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        encoded = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.encode(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            row_embeddings[:, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;:, :],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            lengths[row].unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        results.append(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.decode_prediction(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                vocabulary,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                encoded,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                lengths[row].unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.training = previous_mode
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; results&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The workhorse behind the decoding at the inference time looks as follows:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;decode_prediction&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, vocabulary, encoded1xH, lengths1x):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    tokens = [&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&amp;lt;start-short&amp;gt;&amp;#39;&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    last_token = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;None&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    seq_len = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    encoded1xH = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.from_hidden(encoded1xH)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;while&lt;/span&gt; last_token != &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&amp;lt;end&amp;gt;&amp;#39;&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;and&lt;/span&gt; seq_len &amp;lt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;50&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        embeddings1xSxD = vocabulary.embed(tokens).unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;).to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        embeddings1xSxD = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.encode_positions(embeddings1xSxD)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        maskSxS = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.mask_for(embeddings1xSxD)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        encodedSx1xH = encoded1xH.unsqueeze(dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;).expand(seq_len, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        decodedSx1xD = embeddings1xSxD.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        decodedSx1xD = torch.cat(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            [
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                encodedSx1xH,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                decodedSx1xD
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            ],
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            axis=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        decodedSx1xD = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.combine_decoded(decodedSx1xD)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        decodedSx1xD = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.combine_batch_norm(decodedSx1xD.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)).transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; ix, decoder &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;enumerate&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.decoders):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            decodedSx1xD = decoder(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                decodedSx1xD,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                torch.ones_like(decodedSx1xD),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                tgt_mask=maskSxS,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            decodedSx1xD = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.decode_batch_norms[ix](decodedSx1xD.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            decodedSx1xD = decodedSx1xD.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        decoded1x1xD = decodedSx1xD.transpose(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;,&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)[:, (seq_len-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;):seq_len, :]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        decoded1x1xV = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.linear_logits(decoded1x1xD)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        word_id = F.softmax(decoded1x1xV[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, :]).argmax().cpu().item()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        last_token = vocabulary.words[word_id]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        tokens.append(last_token)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        seq_len += &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; &amp;#39;&lt;/span&gt;.join(tokens[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;:])&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can notice starting with the “start short” token and going in a loop, getting predictions, and feeding back until the “end” token.&lt;/p&gt;
&lt;p&gt;Again, the model is very, very simple. What makes the difference is how it’s being trained — it’s all in the training data corruption and the model pre-training.&lt;/p&gt;
&lt;p&gt;It’s already a long article so I encourage the curious readers to look at the code at &lt;a href=&#34;https://github.com/kamilc/neural-text-summarization&#34;&gt;my GitHub repo&lt;/a&gt; for more details.&lt;/p&gt;
&lt;h4 id=&#34;my-experiment-with-the-wikihow-dataset&#34;&gt;My experiment with the WikiHow dataset&lt;/h4&gt;
&lt;p&gt;In my WikiHow experiment I wanted to see how the results look if I fed the full articles and their headlines for the two modes of the network. The same data-corruption regime was used in this case.&lt;/p&gt;
&lt;p&gt;Some of the results were looking &lt;strong&gt;almost&lt;/strong&gt; good:&lt;/p&gt;
&lt;p&gt;Text:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;for a savory flavor, mix in 1/2 teaspoon ground cumin, ground turmeric, or masala powder.this works best when added to the traditional salty lassi. for a flavorful addition to the traditional sweet lassi, add 1/2 teaspoon of ground cardamom powder or ginger, for some kick.  , start with a traditional sweet lassi and blend in some of your favorite fruits. consider mixing in strawberries, papaya, bananas, or coconut.try chopping and freezing the fruit before blending it into the lassi. this will make your drink colder and frothier.  , while most lassi drinks are yogurt based, you can swap out the yogurt and water or milk for coconut milk. this will give a slightly tropical flavor to the drink. or you could flavor the lassi with rose water syrup, vanilla extract, or honey.don’t choose too many flavors or they could make the drink too sweet. if you stick to one or two flavors, they’ll be more pronounced.  , top your lassi with any of the following for extra flavor and a more polished look:   chopped pistachios sprigs of mint sprinkle of turmeric or cumin chopped almonds fruit sliver&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Headline:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;add a spice., blend in a fruit., flavor with a syrup or milk., garnish.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Predicted summary:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;blend vanilla in a sweeter flavor . , add a sugary fruit . , do a spicy twist . eat with dessert . , revise . &lt;end&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;It’s not 100% faithful to the original text even though it seems to “read” well.&lt;/p&gt;
&lt;p&gt;My suspicion is that pre-training against a much larger corpus of text might possibly help. There’s an obvious issue with the lack of very specific knowledge here to have the network summarize better. Here’s another of those examples:&lt;/p&gt;
&lt;p&gt;Text:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;the settings app looks like a gray gear icon on your iphone&amp;rsquo;s home screen.; , this option is listed next to a blue &amp;ldquo;a&amp;rdquo; icon below general.  , this option will be at the bottom of the display &amp;amp; brightness menu.  , the right-hand side of the slider will give you bigger font size in all menus and apps that support dynamic type, including the mail app. you can preview the corresponding text size by looking at the menu texts located above and below the text size slider.  , the left-hand side of the slider will make all dynamic type text smaller, including all menus and mailboxes in the mail app.  , tap the back button twice in the upper-left corner of your screen. it will save your text size settings and take you back to your settings menu.  , this option is listed next to a gray gear icon above display &amp;amp; brightness.  , it&amp;rsquo;s halfway through the general menu.  ,, the switch will turn green. the text size slider below the switch will allow for even bigger fonts.  , the text size in all menus and apps that support dynamic type will increase as you go towards the right-hand side of the slider. this is the largest text size you can get on an iphone.  , it will save your settings.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Headline:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;open your iphone&amp;rsquo;s settings., scroll down and tap display &amp;amp; brightness., tap text size., tap and drag the slider to the right for bigger text., tap and drag the slider to the left for smaller text., go back to the settings menu., tap general., tap accessibility., tap larger text.  , slide the larger accessibility sizes switch to on position., tap and drag the slider to the right., tap the back button in the upper-left corner.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Predicted summary:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;open your iphone &amp;rsquo;s settings . , tap general . , scroll down and tap accessibility . , tap larger accessibility . , tap and larger text for the iphone to highlight the text you want to close . , tap the larger text - colored contacts app .&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;It might be interesting to train against this dataset again while:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;utilizing some pre-trained, large scale model as part of the encoder&lt;/li&gt;
&lt;li&gt;using a large corpus of text to still pre-train the auto-encoder&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This could possibly take a lot of time to train on my GPU (even with the pre-trained part of the encoder). I didn’t follow the idea further at this time.&lt;/p&gt;
&lt;h4 id=&#34;the-problem-with-getting-paragraphs-when-we-want-the-sentences&#34;&gt;The problem with getting paragraphs when we want the sentences&lt;/h4&gt;
&lt;p&gt;One of the biggest problems the authors ran into was with the decoder outputting the long version of the text, even though it was asked for the sentence-long summary.&lt;/p&gt;
&lt;p&gt;Authors called this phenomenon the “segregation issue”. What they have found was that the encoder was mapping paragraphs and sentences into completely separate regions. The solution to this problem was to trick the encoder into making both representations indistinguishable. The following figure comes from the paper and shows the issue visualized:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2020/05/summae-neural-text-summarization-denoising-autoencoder/segregation.jpg&#34; alt=&#34;Segregation problem&#34;&gt;&lt;/p&gt;
&lt;h4 id=&#34;better-gists-by-using-the-critic&#34;&gt;Better gists by using the “critic”&lt;/h4&gt;
&lt;p&gt;The idea of a “critic” has been popularized along with the fantastic results produced by some of the &lt;a href=&#34;https://en.wikipedia.org/wiki/Generative_adversarial_network&#34;&gt;Generative Adversarial Networks&lt;/a&gt;. The general workflow is to have the main network generate output while the other tries to guess some of its properties.&lt;/p&gt;
&lt;p&gt;For GANs that are generating realistic photos, the critic is there to guess if the photo was generated or if it’s real. A loss term is added based on how well it’s doing, penalizing the main network for generating photos that the critic is able to call out as fake.&lt;/p&gt;
&lt;p&gt;A similar idea was used in the A3C algorithm I blogged about (&lt;a href=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/&#34;&gt;Self-driving toy car using the Asynchronous Advantage Actor-Critic algorithm&lt;/a&gt;). The “critic” part penalized the AI agent for taking steps that were on average less advantageous.&lt;/p&gt;
&lt;p&gt;Here, in the SummAE model, the critic adds a penalty to the loss to the degree to which it’s able to guess whether the gist comes from a paragraph or a sentence.&lt;/p&gt;
&lt;p&gt;Training with the critic might get tricky. What I’ve found to be the cleanest way is to use two different optimizers — one updating the main network’s parameters while the other updates the critic itself:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; batch &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; batches:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; mode == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;train&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.model.train()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.discriminator.train()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.model.eval()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.discriminator.eval()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.optimizer.zero_grad()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.discriminator_optimizer.zero_grad()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    logits, state = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.model(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        batch.word_embeddings.to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        batch.clean_word_embeddings.to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        batch.lengths.to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        batch.mode.to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    mode_probs_disc = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.discriminator(state.detach())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    mode_probs = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.discriminator(state)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    discriminator_loss = F.binary_cross_entropy(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        mode_probs_disc,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        batch.mode
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    discriminator_loss.backward(retain_graph=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; mode == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;train&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.discriminator_optimizer.step()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    text = batch.text.copy()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.no_period_trick:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        text = [txt.replace(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;.&amp;#39;&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&amp;#39;&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; txt &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; text]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    classes = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.vocabulary.encode(text, modes=batch.mode)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    classes = classes.roll(-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, dims=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    classes[:,classes.shape[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;]-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;] = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    model_loss = torch.tensor(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;).cuda()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; logits.shape[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;:&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;] == classes.shape:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        model_loss = F.cross_entropy(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            logits.reshape(-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, logits.shape[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;]).to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            classes.long().reshape(-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;).to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device),
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            ignore_index=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;WARNING: Skipping model loss for inconsistency between logits and classes shapes&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    fooling_loss = F.binary_cross_entropy(
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        mode_probs,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        torch.ones_like(batch.mode).to(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.device)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    loss = model_loss + (&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.1&lt;/span&gt; * fooling_loss)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    loss.backward()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; mode == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;train&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.optimizer.step()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.optimizer.zero_grad()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.discriminator_optimizer.zero_grad()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The main idea is to treat the main network’s encoded gist as constant with respect to the updates to the critic’s parameters, and vice versa.&lt;/p&gt;
&lt;h3 id=&#34;results&#34;&gt;Results&lt;/h3&gt;
&lt;p&gt;I’ve found some of the results look really exceptional:&lt;/p&gt;
&lt;p&gt;Text:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;lynn is unhappy in her marriage. her husband is never good to her and shows her no attention. one evening lynn tells her husband she is going out with her friends. she really goes out with a man from work and has a great time. lynn continues dating him and starts having an affair.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Predicted summary:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;lynn starts dating him and has an affair . &lt;end&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Text:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;cedric was hoping to get a big bonus at work. he had worked hard at the office all year. cedric&amp;rsquo;s boss called him into his office. cedric was disappointed when told there would be no bonus. cedric&amp;rsquo;s boss surprised cedric with a big raise instead of a bonus.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Predicted summary:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;cedric had a big deal at his boss &amp;rsquo;s office . &lt;end&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Some others showed how the model attends to single sentences though:&lt;/p&gt;
&lt;p&gt;Text:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;i lost my job. i was having trouble affording my necessities. i didn&amp;rsquo;t have enough money to pay rent. i searched online for money making opportunities. i discovered amazon mechanical turk.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Predicted summary:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;i did n&amp;rsquo;t have enough money to pay rent . &lt;end&gt;&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;While the sentence like this one would maybe make a good headline — it’s definitely not the best summary as it naturally loses the vital parts found in other sentences.&lt;/p&gt;
&lt;h3 id=&#34;final-words&#34;&gt;Final words&lt;/h3&gt;
&lt;p&gt;First of all, let me thank the paper’s authors for their exceptional work. It was a great read and great fun implementing!&lt;/p&gt;
&lt;p&gt;Abstractive text summarization remains very difficult. The model trained for this blog post has very limited use in practice. There’s a lot of room for improvement though, which makes the future of abstractive summaries very promising.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>An Introduction to Neural Networks</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2019/07/an-introduction-to-neural-networks/"/>
      <id>https://www.endpointdev.com/blog/2019/07/an-introduction-to-neural-networks/</id>
      <published>2019-07-01T00:00:00+00:00</published>
      <author>
        <name>Ben Ironside Goldstein</name>
      </author>
      <content type="html">
        &lt;p&gt;&lt;img src=&#34;/blog/2019/07/an-introduction-to-neural-networks/image-0.jpg&#34; alt=&#34;Weird Tree Art (Neural Network)&#34; /&gt; &lt;a href=&#34;https://flic.kr/p/5eL8Ag&#34;&gt;Photo&lt;/a&gt; by &lt;a href=&#34;https://www.flickr.com/photos/sudhamshu/&#34;&gt;Sudhamshu Hebbar&lt;/a&gt;, used under &lt;a href=&#34;https://creativecommons.org/licenses/by/2.0/&#34;&gt;CC BY 2.0&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Earlier this year I wrote a &lt;a href=&#34;/blog/2019/05/facial-recognition-amazon-deeplens/&#34;&gt;post&lt;/a&gt; about my work with a machine-learning camera, the &lt;a href=&#34;https://aws.amazon.com/deeplens/&#34;&gt;AWS DeepLens&lt;/a&gt;, which has onboard processing power to enable AI capabilities without sending data to the cloud. Neural networks are a type of ML model which achieves very impressive results on certain problems (including computer vision), so in this post I give a more thorough introduction to neural networks, and share some useful resources for those who want to dig deeper.&lt;/p&gt;
&lt;h3 id=&#34;neurons-and-nodes&#34;&gt;Neurons and Nodes&lt;/h3&gt;
&lt;p&gt;Neural networks are models inspired by the function of biological neural networks. They consist of nodes (arranged in layers), and the connections between those nodes. Each connection between two nodes enables one-way information transfer: a node either receives input from, or sends output to each node to which it is connected. Nodes typically have an “activation function”, parameterized by the node’s inputs, and its output is the result of this function.&lt;/p&gt;
&lt;p&gt;As with the function of biological neural networks, the emergence of information processing from these mathematical operations is opaque. Nevertheless, complex artificial neural networks are capable of feats such as vision, language translation, and winning competitive games. As the technology improves, even more impressive tasks will become possible. As with organic brains, neural networks can achieve complex tasks only as a result of appropriate architecture, constraints, and training—for machine learning, humans must (for now) design it all.&lt;/p&gt;
&lt;h3 id=&#34;neural-network-architecture&#34;&gt;Neural Network Architecture&lt;/h3&gt;
&lt;img src=&#34;/blog/2019/07/an-introduction-to-neural-networks/image-1.png&#34; style=&#34;float: right; max-width: 200px&#34; /&gt; 
&lt;p&gt;Nodes are grouped in layers: the input layer, the output layer, and all the layers between them, known as hidden layers. Nodes can be networked in a variety of ways within and between layers, and sophisticated neural network models can include dozens of layers configured in various ways. These include layers which summarize, combine, eliminate, direct, or transform information. Each receives its input from the previous layer, and passes its output to the next layer. The last layer is designed such that its output answers the relevant question (for example, it would offer 9 options if the goal were to identify the hand-written numbers 1–9).&lt;/p&gt;
&lt;p&gt;For all this information processing to achieve a given task, the parameters of each node need appropriate values. The process of choosing those values is called training. In order to train a neural network, one needs to provide examples of what the network should do. (For example, to train it to write requires examples of writing. To train it to identify objects in images requires images and their appropriately labeled counterparts.) The more data a model can learn from, the better it can work. Gathering enough data is typically a major undertaking.&lt;/p&gt;
&lt;h3 id=&#34;training-a-neural-network&#34;&gt;Training a Neural Network&lt;/h3&gt;
&lt;p&gt;Before training, models have random parameters for all nodes. Each time data is passed through the model, the effectiveness of the model is measured using a “loss function”. Loss functions measure how wrong a model’s output is. Different loss functions (also known as cost functions or error functions) measure this in different ways, but in general, the more wrong a model is, the higher its loss/error/cost. Loss functions thus summarize the quality of a model’s output with a single number. Models are optimized to minimize the loss. (For more on the role of loss functions in neural networks, I suggest &lt;a href=&#34;https://machinelearningmastery.com/loss-and-loss-functions-for-training-deep-learning-neural-networks/&#34;&gt;this excellent article&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;One of the most interesting details of the entire process has to do with how the parameters are tuned. Model optimization relies on variations of a process called gradient descent, in which parameter values are adjusted by small intervals in an attempt to minimize the loss. Over many thousands of repetitions, the training program uses calculus to pick values that help to minimize the loss. As you can imagine, this process becomes extremely computationally intensive when the neural network is large and complex. However, in order to solve hard problems, networks must be large and complex. This is why training neural networks requires substantial computing power, and often takes place in the cloud. (For more on stochastic gradient descent, I suggest &lt;a href=&#34;https://www.youtube.com/watch?v=vMh0zPT0tLI&#34;&gt;this video&lt;/a&gt; as a great starting point, or &lt;a href=&#34;http://ruder.io/optimizing-gradient-descent/&#34;&gt;this review&lt;/a&gt; for a more advanced overview.)&lt;/p&gt;
&lt;h3 id=&#34;further-reading&#34;&gt;Further reading&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;It turns out that a relatively simple neural network can approximate any function. This remarkable &lt;a href=&#34;https://towardsdatascience.com/can-neural-networks-really-learn-any-function-65e106617fc6&#34;&gt;demonstration&lt;/a&gt; is quite accessible.&lt;/li&gt;
&lt;li&gt;There are countless useful implementations of neural network models. End Pointer &lt;a href=&#34;/blog/authors/kamil-ciemniewski/&#34;&gt;Kamil Ciemniewski&lt;/a&gt; wrote two in-depth and fascinating blogs about neural network projects which he completed in the past year: &lt;a href=&#34;/blog/2019/01/speech-recognition-with-tensorflow/&#34;&gt;Speech Recognition From Scratch&lt;/a&gt;, and &lt;a href=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/&#34;&gt;Self-Driving Toy Car&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;If you’re interested in getting a sense for the general state of the art, &lt;a href=&#34;https://www.topbots.com/most-important-ai-research-papers-2018/&#34;&gt;here&lt;/a&gt; are summaries of some of the most influential papers in machine learning since 2018.&lt;/li&gt;
&lt;li&gt;For those curious about the inner workings of the training process, here’s one about &lt;a href=&#34;http://neuralnetworksanddeeplearning.com/chap2.html&#34;&gt;back-propagation&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;This blog post describes “densely connected” network layers; here’s an article about &lt;a href=&#34;https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53&#34;&gt;convolutional layers&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;And finally, this article describes &lt;a href=&#34;https://medium.com/explore-artificial-intelligence/an-introduction-to-recurrent-neural-networks-72c97bf0912&#34;&gt;recurrent neural networks&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

      </content>
    </entry>
  
    <entry>
      <title>Facial Recognition Using Amazon DeepLens: Counting Liquid Galaxy Interactions</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2019/05/facial-recognition-amazon-deeplens/"/>
      <id>https://www.endpointdev.com/blog/2019/05/facial-recognition-amazon-deeplens/</id>
      <published>2019-05-01T00:00:00+00:00</published>
      <author>
        <name>Ben Ironside Goldstein</name>
      </author>
      <content type="html">
        &lt;p&gt;I have been exploring the possible uses of a machine-learning-enabled camera for the Liquid Galaxy. The Amazon Web Services (AWS) &lt;a href=&#34;https://aws.amazon.com/deeplens/&#34;&gt;DeepLens&lt;/a&gt; is a camera that can receive and transmit data over wifi, and that has computing hardware built in. Since its hardware enables it to use machine learning models, it can perform computer vision tasks in the field.&lt;/p&gt;
&lt;h3 id=&#34;the-amazon-deeplens-camera&#34;&gt;The Amazon DeepLens camera&lt;/h3&gt;
&lt;img style=&#34;float: left; width: 400px; padding-right: 2em;&#34; src=&#34;/blog/2019/05/facial-recognition-amazon-deeplens/deeplens-front-angle.jpg&#34; alt=&#34;DeepLens&#34; /&gt;
&lt;p&gt;This camera is the first of its kind—likely the first of many, given the ongoing rapid adoption of Internet of Things (IoT) devices and computer vision. It came to End Point’s attention as hardware that could potentially interface with and extend End Point’s immersive visualization platform, the &lt;a href=&#34;https://www.visionport.com/&#34;&gt;Liquid Galaxy&lt;/a&gt;. We’ve thought of several ways computer vision could potentially work to enhance the platform, for example:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Monitoring users’ reactions&lt;/li&gt;
&lt;li&gt;Counting unique visitors to the LG&lt;/li&gt;
&lt;li&gt;Counting the number of people using an LG at a given time&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first idea would depend on parsing facial expressions. Perhaps a certain moment in a user experience causes people to look confused, or particularly delighted—valuable insights. The second idea would generate data that could help us assess the platform’s impact, using a metric crucial to any potential clients whose goals involve engaging audiences. The third idea would create a simpler metric: the average number of people engaging with the system over a period of time. Nevertheless, this idea has a key advantage over the second: it doesn’t require distinguishing between people, which makes it a much more tractable project. This post focuses on the third idea.&lt;/p&gt;
&lt;p&gt;To set up the camera, the user has to plug it into a power outlet and connect it to wifi. The camera will still work even with a slow network connection, though when the connection is slower the delay between the camera seeing something and reporting it is longer. However, this delay was hardly noticable on my home network which has slow-to-moderate speeds of about 17 Mbps down and 33 Mbps up.&lt;/p&gt;
&lt;h3 id=&#34;computer-vision-and-the-amazon-deeplens&#34;&gt;Computer Vision and the Amazon DeepLens&lt;/h3&gt;
&lt;p&gt;A &lt;a href=&#34;https://en.wikipedia.org/wiki/Deep_learning&#34;&gt;deep learning model&lt;/a&gt; is a neural network with multiple layers of processing units. It is called “deep” because it has multiple layers. The inputs and outputs of each processing unit are numbers. These units are roughly analogous to neurons: they receive input from units in the previous layer, and output it to units in the next layer after transforming it based on a function. These “activation functions” can change in a variety of ways. The last layer’s outputs translate into the results. These models work because these functions get tuned based on how well the model works. For example, to make a model that labels each human face in a picture and draws a box around it, we would start with a corpus of pictures with boxes drawn around faces, as well as the versions of the pictures without the boxes drawn. We would test the model on the non-labeled images by checking—for each picture—whether the output generated by the model is correct. If not, the computer chooses different unit functions, tries again, and compares the results. Repeating this process thousands of times yields models which work remarkably well for a wide range of tasks, including computer vision.&lt;/p&gt;
&lt;p&gt;In deep learning for computer vision, training on large sets of labeled images enables models to generalize about visual characteristics. The training process takes a lot of computing resources, but once models are trained, they can produce results quickly and with relative ease. This is why the DeepLens is able to perform computer vision with its limited computing resources.&lt;/p&gt;
&lt;p&gt;Since the DeepLens is an Amazon product, it comes as no surprise that the user interface and backend for DeepLens consist of AWS services. One of the most important is &lt;a href=&#34;https://aws.amazon.com/sagemaker/&#34;&gt;SageMaker&lt;/a&gt;, which is used to train, manage, optimize, and deploy machine learning models such as neural networks. It includes hosted Jupyter notebooks (&lt;a href=&#34;https://jupyter.org/&#34;&gt;Jupyter&lt;/a&gt; is a development environment for data science), as well as the computing resources required for model training and storage. With SageMaker, users can train computer vision models for deployment to DeepLens, or import and adjust pretrained models from various sources.&lt;/p&gt;
&lt;p&gt;Remote management of the DeepLens depends on &lt;a href=&#34;https://aws.amazon.com/lambda/&#34;&gt;AWS Lambda&lt;/a&gt;, a “serverless” cloud service that provides an environment to run backend code and integrate with other cloud services. It runs the show, allowing users to manage everything from the camera’s behavior to what happens to gathered data. Another service, &lt;a href=&#34;https://aws.amazon.com/greengrass/&#34;&gt;AWS Greengrass&lt;/a&gt;, connects the instructions from AWS Lambda to the DeepLens, managing tasks like authentication, updates, and reactions to local events.&lt;/p&gt;
&lt;p&gt;Amazon’s IoT service saves information about each DeepLens, and allows users to manage their devices, for example by choosing which model is active on the device, or viewing a live stream from the camera. It also keeps track of what’s going on with the hardware, even when it’s off. When a model is running on the DeepLens, you can view a live stream of its inferences about what it’s seeing (the labeled images). Amazon has released various pretrained models designed to work on the DeepLens. Using a model for detecting faces, we can get a live stream that looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2019/05/facial-recognition-amazon-deeplens/one-face-recognition.jpg&#34; alt=&#34;one-face-recognition&#34;&gt;
&lt;br&gt;Me looking at the DeepLens in my kitchen&lt;/p&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2019/05/facial-recognition-amazon-deeplens/multi-face-recognition.jpg&#34; alt=&#34;multi-face-recognition&#34;&gt;
&lt;br&gt;Facial recognition inferences on multiple people. (Witness my smile of satisfaction at finally finding enthusiastic subjects of facial recognition.)&lt;/p&gt;
&lt;p&gt;Each face that the camera detects gets a box around it, along with the model’s level of certainty that it is a face. The above pictures were the results of an attempt to simulate the conditions where this could be used.&lt;/p&gt;
&lt;h3 id=&#34;the-model&#34;&gt;The Model&lt;/h3&gt;
&lt;p&gt;The model I used was trained on data from &lt;a href=&#34;http://www.image-net.org/&#34;&gt;ImageNet&lt;/a&gt;, a public database with hundreds or thousands of images associated with nouns. (For example they have 1537 &lt;a href=&#34;http://www.image-net.org/synset?wnid=n03376595&#34;&gt;pictures of folding chairs&lt;/a&gt;.) ImageNet is &lt;a href=&#34;https://arxiv.org/search/?query=imagenet&amp;amp;searchtype=all&amp;amp;source=header&#34;&gt;commonly&lt;/a&gt; used to train and test computer vision models.&lt;/p&gt;
&lt;p&gt;However, the training for this model didn’t stop there: Amazon used transfer learning from another large image dataset, &lt;a href=&#34;http://cocodataset.org/#home&#34;&gt;MS-COCO&lt;/a&gt;, to fine-tune the model for face detection. Transfer learning works essentially by retraining the last layer of an already-trained model. In this way it harnesses the “insights” of the existing model (e.g. about shapes, colors, and positions) by repurposing this information to make predictions about something else. In this case, whether something is a face.&lt;/p&gt;
&lt;p&gt;Since this model was pretrained and optimized by Amazon for the DeepLens, it provides a low effort route to implementing a computer vision model on the DeepLens. I didn’t have to do any of the processing on my own hardware. The DeepLens hardware took care of all the predictions, though the biggest resource savings were from not having to train the model myself (which can take days, or longer).&lt;/p&gt;
&lt;p&gt;When the facial recognition model is deployed and the DeepLens is on, an AWS Lambda function written in Python repeatedly prompts the camera to get frames from the camera:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;frame = awscam.getLastFrame()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&amp;hellip;to resize the frames before inference (the model accepts frames of particular size):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;frame_resize = cv2.resize(frame, (input_height, input_width))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&amp;hellip;to pass the frames to the model:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;parsed_inference_results = model.parseResult(model_type, model.doInference(frame_resize))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&amp;hellip;and to use the results to draw boxes around the faces:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), (&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;255&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;165&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;20&lt;/span&gt;), &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;10&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As you can see from how often “cv2” appears in the code above, this implementation relies heavily on code from &lt;a href=&#34;https://opencv.org&#34;&gt;OpenCV&lt;/a&gt;, an open source computer vision framework. Finally, the results are sent to the cloud:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;client.publish(topic=iot_topic, payload=json.dumps(cloud_output))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In the last code snippet above, iot_topic refers to an Amazon “MQTT topic” (Message Queuing Telemetry Transport), for IoT devices. &lt;a href=&#34;https://en.wikipedia.org/wiki/MQTT&#34;&gt;MQTT&lt;/a&gt; is the standard connectivity framework for DeepLens and many other IoT devices. One of its advantages for this context is that it can handle situations with intermittent connectivity, by smoothly queueing messages for when the network connection is stable. The essence of MQTT is to enable publishing and subscribing to different topics. The system of topics enables results from a DeepLens to trigger other processes. For example, the DeepLens could publish a message when it sees a face, and this could prompt another cloud service to do something else, such as save what time and how long the face appeared.&lt;/p&gt;
&lt;p&gt;I wanted to test how data from this model would compare to a human’s perception. The first step was to understand what data the camera offers. It produces data about each frame analyzed: a timestamp (in 13-digit &lt;a href=&#34;https://en.wikipedia.org/wiki/Unix_time&#34;&gt;Unix time&lt;/a&gt;), and the predicted probability that something it identifies is a face. To gather this data, I used the AWS IoT service to manually subscribe to a secure MQTT topic where the DeepLens published its predictions. Each frame processed produces data like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-json&#34; data-lang=&#34;json&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;format&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;json&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;payload&amp;#34;&lt;/span&gt;: {
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;face&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.5654296875&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  },
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;qos&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;timestamp&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1554853281975&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;  &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;&amp;#34;topic&amp;#34;&lt;/span&gt;: &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;$aws/things/deeplens_bnU5sr2sSD2ecW5YkfJZtw/infer&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The data generated by a single frame (with one face) when processed by the DeepLens.&lt;/p&gt;
&lt;p&gt;For my purposes, I was only interested in the timestamps and payloads (which contain the number of faces identified, and their probabilities). I decided to test the facial recognition model under several different conditions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;No faces present&lt;/li&gt;
&lt;li&gt;One face present&lt;/li&gt;
&lt;li&gt;Multiple faces present&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For condition 1 I just aimed it at an empty room for 20 minutes, and for condition 2 I sat in front of the camera for 20 minutes. For condition 3, I aimed the camera at a public space for 20 minutes, and while it was running I kept an ongoing count of the number of people looking in the general direction of the camera (I put the camera in front of a wall with a TV on it so people would be more likely to look towards it). Then I averaged my count over the duration of the sample, which resulted in an average engagement number of 2.5 people, meaning that on average, 2.5 people were looking at the camera. In an attempt to minimize bias, I made my human-eye assessment before looking at any of the data.&lt;/p&gt;
&lt;p&gt;I’ll spoil one aspect of the results right away: there were no false positives under any condition. Even the lower probability guesses corresponded to actual faces, though this result might not hold true in a room with lots of face-like art, that’s not too common of a scenario. This simplified things, since it meant there was no need to set a lower bound on the probabilities which we should count—any face detected by the camera is a face. This also highlights one of my remaining questions about the model: is there useful information to be gained from the probabilities?&lt;/p&gt;
&lt;p&gt;Another important note: I noticed early in the experiment that it almost never detects a face farther than 15 feet away. For the use case of a Liquid Galaxy, the 15-foot range is too short to capture all types of engagement (some people look at it across the room), but from my experience with the system I think that users within this range could be accurately described as focused users—something worth measuring, but certainly not everything worth measuring. After noticing this, I retested condition 2 with my face about 5 feet from the DeepLens, after initially trying it from across a room.&lt;/p&gt;
&lt;h3 id=&#34;how-did-the-deeplens-counts-compare-to-my-counts&#34;&gt;How did the DeepLens counts compare to my counts?&lt;/h3&gt;
&lt;p&gt;&lt;img src=&#34;/blog/2019/05/facial-recognition-amazon-deeplens/results.png&#34; alt=&#34;results&#34;&gt;&lt;/p&gt;
&lt;p&gt;The model matched my performance in conditions 1 and 2, which makes a strong statement about its reliability in relatively static and close-up conditions such as looking at an empty room, or looking at someone stare at their laptop across a small table. In contrast, it did not count as many faces as I did in condition 3—so I’m happy to report I can still outperform A.I. on something.&lt;/p&gt;
&lt;p&gt;Anyway, this suggests that the model is somewhat conservative, at least compared to my count (likely partly due to my eyes having a range larger than 15 feet). Therefore, when considering usage statistics gathered by a similar method, it might make most sense to think of the results as a lower bound, e.g. “the average number of people focused on the system was more than 2.1”.&lt;/p&gt;
&lt;p&gt;It would be useful to experiment with the multiple faces condition again, to see how robust these findings are. It would also be helpful to keep track of factors like how much people move, the lighting, and the orientation of the camera, to see if they might impact the results. It would also be useful to automate the data collection and analysis.&lt;/p&gt;
&lt;p&gt;This investigation has showed me that the DeepLens has a lot of potential as a tool for measuring engagement. Perhaps a future post will examine how it can be used to count users.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Thanks for reading! You are welcome to learn more about &lt;a href=&#34;https://www.visionport.com/&#34;&gt;End Point Liquid Galaxy&lt;/a&gt; and &lt;a href=&#34;https://aws.amazon.com/deeplens/&#34;&gt;AWS DeepLens&lt;/a&gt;.&lt;/p&gt;

      </content>
    </entry>
  
    <entry>
      <title>Self-driving toy car using the Asynchronous Advantage Actor-Critic algorithm</title>
      <link rel="alternate" href="https://www.endpointdev.com/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/"/>
      <id>https://www.endpointdev.com/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/</id>
      <published>2018-08-29T00:00:00+00:00</published>
      <author>
        <name>Kamil Ciemniewski</name>
      </author>
      <content type="html">
        &lt;link rel=&#34;stylesheet&#34; href=&#34;https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.7.1/katex.min.css&#34;&gt;
&lt;script src=&#34;https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.7.1/katex.min.js&#34;&gt;&lt;/script&gt;
&lt;script src=&#34;https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.7.1/contrib/auto-render.min.js&#34;&gt;&lt;/script&gt;
&lt;style&gt;
.katex .op-symbol.large-op {
    line-height: 1.2 !important;
}

.mtight {
    font-size: 0.95em;
}
&lt;/style&gt;
&lt;center&gt;
  &lt;video width=&#34;100%&#34; controls poster=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/poster.png&#34;&gt;
    &lt;source src=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/892-openaigym.video.90.68.video000000.mp4&#34; type=&#34;video/mp4&#34;&gt;
  &lt;/video&gt;
&lt;/center&gt;
&lt;p&gt;The field of &lt;a href=&#34;https://en.wikipedia.org/wiki/Reinforcement_learning&#34;&gt;Reinforcement Learning&lt;/a&gt; has seen a lot of great improvement in the past years. Researchers at universities and companies like &lt;a href=&#34;https://deepmind.com/&#34;&gt;Deep Mind&lt;/a&gt; have been developing new and better ways to train intelligent, artificial agents to solve more and more difficult tasks. The algorithms being developed are requiring less time to train. They also are making the training much more stable.&lt;/p&gt;
&lt;p&gt;This article is about an algorithm that’s one of the most cited lately: A3C — Asynchronous Advantage Actor-Critic.&lt;/p&gt;
&lt;p&gt;As the subject is both wide and deep, I’m assuming the reader has the relevant background mastered already. Although reading it might be interesting even without understanding most of the notions in use, having a good grasp of them will help you get the most out of it.&lt;/p&gt;
&lt;p&gt;Because we’re looking at the Deep Reinforcement Learning, the obvious requirement is to be acquainted with the &lt;a href=&#34;https://en.wikipedia.org/wiki/Artificial_neural_network&#34;&gt;neural networks&lt;/a&gt;. I’m also using different notions known in the field of &lt;a href=&#34;https://en.wikipedia.org/wiki/Reinforcement_learning&#34;&gt;Reinforcement Learning&lt;/a&gt; overall like $Q(a, s)$ and $V(s)$ functions or the n-step return. The mathematical expressions, in particular, are given assuming that the reader already knows what the symbols stand for. Some notions known from other families of RL algorithms are being touched on as well (e.g. experience replay) — to contrast them with the A3C way of solving the same kind of problems. The article along with the source code uses the &lt;a href=&#34;https://gym.openai.com&#34;&gt;OpenAI gym&lt;/a&gt;, Python, and &lt;a href=&#34;https://pytorch.org&#34;&gt;PyTorch&lt;/a&gt; among other Python-related libraries.&lt;/p&gt;
&lt;h3 id=&#34;theory&#34;&gt;Theory&lt;/h3&gt;
&lt;p&gt;The A3C algorithm is a part of the greater class of RL algorithms called &lt;a href=&#34;http://www.scholarpedia.org/article/Policy_gradient_methods&#34;&gt;Policy Gradients&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this approach, we’re creating a model that &lt;strong&gt;approximates the action-choosing policy itself&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Let’s contrast it with &lt;a href=&#34;https://en.wikipedia.org/wiki/Markov_decision_process#Value_iteration&#34;&gt;value iteration&lt;/a&gt;, the goal of which is to learn the &lt;a href=&#34;https://en.wikipedia.org/wiki/Reinforcement_learning#Value_function&#34;&gt;value function&lt;/a&gt; and have policy emerge as the function that chooses an action transitioning to the state of the greatest value.&lt;/p&gt;
&lt;p&gt;With the policy gradient approach, we’re approximating the policy with a differentiable function. Such stated problem requires only a good approximation of the gradient that over time will maximize the rewards.&lt;/p&gt;
&lt;p&gt;The unique approach of A3C adds a very clever twist: we’re also learning an approximation of the value function at the same time. This helps us in getting the variance of the gradient down considerably, making the training much more stable.&lt;/p&gt;
&lt;p&gt;These two aspects of the algorithm are being personified within its name: actor-critic. The policy function approximation is being called the actor, while the value function is being called the critic.&lt;/p&gt;
&lt;h4 id=&#34;the-policy-gradient&#34;&gt;The policy gradient&lt;/h4&gt;
&lt;p&gt;As we’ve noticed already, in order to improve our policy function approximation, we need a gradient that points at the direction that maximizes the rewards.&lt;/p&gt;
&lt;p&gt;I’m not going to reinvent the wheel here. There are some great resources the reader can access to dig deep into the Mathematics of what’s called the Policy Gradient Theorem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html&#34;&gt;Lilian Weng’s excellent article&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;http://incompleteideas.net/book/bookdraft2017nov5.pdf&#34;&gt;Sutton &amp;amp; Barto — Reinforcement Learning: An Introduction&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following equation presents the basic form of the gradient of the policy function:&lt;/p&gt;
&lt;p&gt;$$\nabla_{\theta} J(\theta) = E_{\tau}[,R_{\tau}\cdot\nabla_\theta,\sum_{t=0}^{T-1},log,\pi(a_t|s_t;\theta),]$$&lt;/p&gt;
&lt;p&gt;This states that for each sampled trajectory $\tau$, the correct estimate of the gradient is the expected value of the rewards times the action probabilities moved into the log space. Ascending in this direction makes our rewards greater and greater over time.&lt;/p&gt;
&lt;p&gt;We &lt;strong&gt;can&lt;/strong&gt; derive all the needed intermediary gradients ourselves by hand of course. Because we’re using &lt;a href=&#34;https://pytorch.org&#34;&gt;PyTorch&lt;/a&gt; though, we only need the right loss function.&lt;/p&gt;
&lt;p&gt;Let’s figure out the right loss function formula that will produce the gradient as shown above:&lt;/p&gt;
&lt;p&gt;$$L_\theta=-J(\theta)$$&lt;/p&gt;
&lt;p&gt;Also:&lt;/p&gt;
&lt;p&gt;$$J(\theta)=E_\tau[R_\tau\cdot\sum_{t=0}^{T-1},log,\pi(a_t|s_t;\theta)]$$&lt;/p&gt;
&lt;p&gt;Hence:&lt;/p&gt;
&lt;p&gt;$$L_\theta=-\frac{1}{n}\sum_{t=0}^{n-1}R_t,\cdot,log\pi(a_t|s_t;\theta)$$&lt;/p&gt;
&lt;h4 id=&#34;formalizing-the-accumulation-of-rewards&#34;&gt;Formalizing the accumulation of rewards&lt;/h4&gt;
&lt;p&gt;For now, we’ve been using the $R_\tau$ and $R_t$ terms very abstractly. Let’s make this part more intuitive and concrete now.&lt;/p&gt;
&lt;p&gt;Its true meaning really is “the quality of the sampled trajectory”. Consider the following equation:&lt;/p&gt;
&lt;p&gt;$$R_t=\sum_{i=t}^{N+t}\gamma^{i-t}r_i,+,\gamma^{i-t+1}V(s_{t+N+1})$$&lt;/p&gt;
&lt;p&gt;Each $r_i$ is the reward received from the environment after each step. Each trajectory consists of multiple steps. Each time, we’re sampling actions based on our policy function. This gives probabilities of a given action being best given the state.&lt;/p&gt;
&lt;p&gt;What if we’re taking 5 actions for which we’re not being given any reward but overall it helped us get rewarded in the 6th step? This is exactly the case we’ll be dealing with in this article later when training a toy car to drive based only on pixel values of the scene. In that environment, we’ll be given $-0.1$ “negative” reward each step and something close to $7$ each new “tile” the car stays on the road.&lt;/p&gt;
&lt;p&gt;We need a way to still encourage actions that make us earn rewards in a not too distant future. We also need to be smart and &lt;strong&gt;discount&lt;/strong&gt; future rewards somewhat so that the more immediate the reward is to our action, the more emphasis we put on it.&lt;/p&gt;
&lt;p&gt;That’s exactly what the above equation does. Notice that $\gamma$ becomes a hyper-​parameter. It makes sense to give it value from $(0, 1)$. Let’s consider the following list of rewards: $[r_1, r_2, r_3, r_4]$. For $r_1$, the formula for the discounted accumulated reward is:&lt;/p&gt;
&lt;p&gt;$$R_1=\gamma,r_1,+,\gamma^2r_2,+,\gamma^3r_3,+,\gamma^4r_4,+,\gamma^5V(s_5)$$&lt;/p&gt;
&lt;p&gt;For $r_2$ it’s:&lt;/p&gt;
&lt;p&gt;$$R_2=\gamma,r_2,+,\gamma^2r_3,+,\gamma^3r_4,+,\gamma^4V(s_5)$$&lt;/p&gt;
&lt;p&gt;And so on&amp;hellip; In case when we hit the terminal state, having no “next” state, we substitute $0$ for $V(s_{t+N+1})$.&lt;/p&gt;
&lt;p&gt;We’ve said that in A3C we’re learning the value function at the same time. The $R_t$ as described above becomes the target value when training our $V(s)$. The value function becomes an approximation of the average of the rewards given the state (because $R_t$ depends on us sampling actions in this state).&lt;/p&gt;
&lt;h4 id=&#34;making-the-gradients-more-stable&#34;&gt;Making the gradients more stable&lt;/h4&gt;
&lt;p&gt;One of the greatest inhibitors of the policy gradient performance is what’s broadly called “high variance”.&lt;/p&gt;
&lt;p&gt;I have to admit, the first time I saw that term in this context, I was disoriented. I knew what “variance” was. It’s the “variance of what” that was not clear to me.&lt;/p&gt;
&lt;p&gt;Thankfully I found &lt;a href=&#34;https://www.quora.com/Why-does-the-policy-gradient-method-have-a-high-variance?share=1&#34;&gt;a brilliant answer to this question&lt;/a&gt;. It explains the issue simply yet in detail.&lt;/p&gt;
&lt;p&gt;Let me cite it here:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;When we talk about high variance in the policy gradient method, we’re specifically talking about the facts that the variance of the gradients are high — namely, that $Var(\nabla_{\theta} J(\theta))$ is big.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;To put it in simple terms: because we’re &lt;strong&gt;sampling&lt;/strong&gt; trajectories from the space that is stochastic in nature, we’re bound to have those samples give gradients that disagree a lot on the best direction to take our model’s parameters into.&lt;/p&gt;
&lt;p&gt;I encourage the reader to pause now and read the above-mentioned answer as it’s very vital. The gist of the solution described in it is that we can &lt;strong&gt;subtract a baseline value from each $R_t$&lt;/strong&gt;. An example of a good baseline that was given was to make it into an &lt;strong&gt;average of the sampled accumulated rewards&lt;/strong&gt;. The A3C algorithm uses this insight in a very, very clever way.&lt;/p&gt;
&lt;h4 id=&#34;value-function-as-a-baseline&#34;&gt;Value function as a baseline&lt;/h4&gt;
&lt;p&gt;To learn the $V(s)$ we’re typically using the MSE or Huber loss against the accumulated rewards for each step. This means that over time we’re &lt;strong&gt;averaging those rewards out based on the state we’re finding ourselves in&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Improving our gradient formula with those ideas we now get:&lt;/p&gt;
&lt;p&gt;$$\nabla_{\theta} J(\theta) = E_{\tau}[,\nabla_\theta,\sum_{t=0}^{T-1},log,\pi(a_t|s_t;\theta)\cdot(R_t-V(s_t)),]$$&lt;/p&gt;
&lt;p&gt;It’s important to treat the $(R_t-V(s_t))$ term &lt;strong&gt;as a constant&lt;/strong&gt;. This means that when using PyTorch or any other deep learning framework, the computation of it should occur &lt;strong&gt;outside the graph that influences the gradients&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The enhanced part of the equation is where we get the word “advantage” in the algorithm’s name. The &lt;strong&gt;advantage&lt;/strong&gt; is simply the difference between the accumulated rewards and what those rewards are &lt;strong&gt;on average&lt;/strong&gt; for the given state:&lt;/p&gt;
&lt;p&gt;$$A(a_{t..t+n},s_{t..t+n})=R_t(a_{t..t+n},s_{t..t+n})-V(s_t)$$&lt;/p&gt;
&lt;p&gt;If we’ll make $R_t$ into $Q(s,a)$ as it’s commonly written in literature, we’ll arrive at the formula:&lt;/p&gt;
&lt;p&gt;$$A(s,a)=Q(s,a) - V(s)$$&lt;/p&gt;
&lt;p&gt;What’s the intuition here? Imagine that you’re playing chess with a 5-year-old. You win by a huge margin. Your friend who’s watched lots of master-level games observed this one as well. His take is that even though you scored positively, you still made lots of mistakes. You’ve got your &lt;strong&gt;critic&lt;/strong&gt; here. Your score and what it looks like for the “observing critic” combined is what we call the advantage of the actions you took.&lt;/p&gt;
&lt;h4 id=&#34;guarding-against-the-models-overconfidence&#34;&gt;Guarding against the model’s overconfidence&lt;/h4&gt;
&lt;blockquote&gt;
&lt;p&gt;Although he was warned, Icarus was too young and too enthusiastic about flying. He got excited by the thrill of flying and carried away by the amazing feeling of freedom and started flying high to salute the sun, diving low to the sea, and then up high again.
His father Daedalus was trying in vain to make young Icarus to understand that his behavior was dangerous, and Icarus soon saw his wings melting.
Icarus fell into the sea and drowned.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://www.greekmyths-greekmythology.com/myth-of-daedalus-and-icarus/&#34;&gt;The Myth Of Daedalus And Icarus&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The job of an “actor” is to output probability values for each possible action the agent can take. The greater the probability, the greater the model’s confidence that this action will result in the highest reward.&lt;/p&gt;
&lt;p&gt;What if at some point, the weights are being steered in a way that the model becomes &lt;em&gt;overconfident&lt;/em&gt; of some particular action? If this happens before the model learns much, it becomes a huge problem.&lt;/p&gt;
&lt;p&gt;Because we’re using the $\pi(a|s;\theta)$ distribution to sample trajectories with, we’re not sampling totally at random. In other words, for $\pi(a|s;\theta) = [0.1, 0.4, 0.2, 0.3]$ our sampling chooses the second option 40% of the time. With any action overwhelming the others, we’re losing the ability to &lt;strong&gt;explore&lt;/strong&gt; different paths and thus learn valuable lessons.&lt;/p&gt;
&lt;p&gt;Empirically, I have found myself seeing the process sometimes not even able to escape the “overconfidence” area for long, long hours.&lt;/p&gt;
&lt;h4 id=&#34;regularizing-with-entropy&#34;&gt;Regularizing with entropy&lt;/h4&gt;
&lt;p&gt;Let’s introduce the notion of an &lt;a href=&#34;https://en.wikipedia.org/wiki/Entropy_(information_theory)&#34;&gt;entropy&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In simple words in our case, it’s the measure of how much “knowledge” does given probability distribution posses. It’s being maximized for the uniform distribution. Here’s the formula:&lt;/p&gt;
&lt;p&gt;$$H(X)=E[-log_b(P(X))]$$&lt;/p&gt;
&lt;p&gt;This expands to the following:&lt;/p&gt;
&lt;p&gt;$$H(X)=-\sum_{i=1}^{n}P(x_i)log_b(P(x_i))$$&lt;/p&gt;
&lt;p&gt;Let’s look closer at the values this function produces using the following simple &lt;a href=&#34;https://calca.io&#34;&gt;Calca&lt;/a&gt; code:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;uniform = [0.25, 0.25, 0.25, 0.25]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;more confident = [0.5, 0.25, 0.15, 0.10]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;over confident = [0.95, 0.01, 0.01, 0.03]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;super over confident = [0.99, 0.003, 0.004, 0.003]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;y(x) = x*log(x, 10)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;entropy(dist) = -sum(map(y, dist))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;entropy (uniform) =&amp;gt; 0.6021
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;entropy (more confident) =&amp;gt; 0.5246
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;entropy (over confident) =&amp;gt; 0.1068
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;entropy (super over confident) =&amp;gt; 0.0291&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can use the above to “punish” the model whenever it’s too confident of its choices. As we’re going to use gradient descend, we’ll be minimizing terms that appear in our loss function. Minimizing the entropy as shown above would encourage more confidence though. We’ll need to make it into a negative in the loss to work the way we intend:&lt;/p&gt;
&lt;p&gt;$$L_\theta=-\frac{1}{n}\sum_{t=0}^{n-1}log\pi(a_t|s_t;\theta)\cdot(R_t-V(s_t)),-\beta,H(\pi(a_t|s_t;\theta))$$&lt;/p&gt;
&lt;p&gt;Where $\beta$ is a hyperparameter scaling the effects of the penalty that the entropy has on the gradients. Choosing the right value for $\beta$ becomes very vital for the model’s convergence. In this article, I’m using $0.01$ as with $0.001$ I was still observing the process stuck being overconfident.&lt;/p&gt;
&lt;p&gt;Let’s include the value loss $L_v$ in the loss function formula making it full and ready to be implemented:&lt;/p&gt;
&lt;p&gt;$$L_\theta=-\frac{1}{n}\sum_{t=0}^{n-1}log\pi(a_t|s_t;\theta)\cdot(R_t-V(s_t)),+\alpha,L_v,,-\beta,H(\pi(a_t|s_t;\theta))$$&lt;/p&gt;
&lt;h4 id=&#34;the-last-a-in-a3c&#34;&gt;The last A in A3C&lt;/h4&gt;
&lt;p&gt;So far we’ve gone from the vanilla policy gradients to using the notion of an advantage. We’ve also improved it with the baseline that intuitively makes the model consist of two parts: the actor and the critic. At this point, we have what’s sometimes called the A2C — Advantage Actor-Critic.&lt;/p&gt;
&lt;p&gt;Let us now focus on the last piece of the puzzle: the last A. This last A comes from the word “asynchronous”. It’s been explained very clearly in the &lt;a href=&#34;https://arxiv.org/pdf/1602.01783&#34;&gt;original paper on A3C&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This idea I think is the least complex of all that have their place in the approach. I’ll just comment on what was already written:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;These approaches share a common idea: the sequence of observed data encountered by an online RL agent is non-stationary, and online RL updates are strongly correlated. By storing the agent’s data in an experience replay memory, the data can be batched (Riedmiller, 2005; Schulman et al., 2015a) or randomly sampled (Mnih et al., 2013; 2015; Van Hasselt et al., 2015) from different time-steps. Aggregating over memory in this way reduces non-stationarity and decorrelates updates, but at the same time limits the methods to off-policy reinforcement learning algorithms.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;The A3C unique approach is that it doesn’t use experience replay for de-correlating the updates to the weights of the model. Instead, we’re sampling many different trajectories &lt;strong&gt;at the same time&lt;/strong&gt; in an &lt;strong&gt;asynchronous&lt;/strong&gt; manner.&lt;/p&gt;
&lt;p&gt;This means that we’re creating many clones of the environment and we let our agents experience them at the same time. Separate agents share their weights in one way or another. There are implementations with agents sharing those weights very &lt;strong&gt;literally&lt;/strong&gt; — and performing the updates to the weights on their own whenever they need to. There also are implementations with one main agent holding the main weights and doing the updates based on the gradients reported by the “worker” agents. The worker agents are then being updated with the evolved weights. The environments and agents are not being directly synchronized, working at their own speed. As soon as any of them collects the needed rewards to perform the n-step gradients calculations, the gradients are being applied in one way or another.&lt;/p&gt;
&lt;p&gt;In this article, I’m preferring the second approach — having one “main” agent and making workers synchronize their weights with it each n-step period.&lt;/p&gt;
&lt;h3 id=&#34;practice&#34;&gt;Practice&lt;/h3&gt;
&lt;h4 id=&#34;the-challenge&#34;&gt;The challenge&lt;/h4&gt;
&lt;p&gt;To present the above theory in practical terms, we’re going to code the A3C to train a toy self-driving game car. The algorithm will only have the game’s pixels as inputs. We’re also going to collect rewards.&lt;/p&gt;
&lt;p&gt;Each step, the player will decide how to move the steering wheel, how much throttle to apply and how much brake.&lt;/p&gt;
&lt;p&gt;Points are being assigned for each new “tile” that the car enters staying within the road. There’s a small penalty for each other case of $-0.1$ points.&lt;/p&gt;
&lt;p&gt;We’re going to use &lt;a href=&#34;https://gym.openai.com&#34;&gt;OpenAI Gym&lt;/a&gt; and the environment’s called &lt;a href=&#34;https://gym.openai.com/envs/CarRacing-v0/&#34;&gt;CarRacing&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;You can read a bit more about the setup in the environment’s source code on &lt;a href=&#34;https://github.com/openai/gym/blob/master/gym/envs/box2d/car_racing.py&#34;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&#34;coding-the-agent&#34;&gt;Coding the Agent&lt;/h4&gt;
&lt;p&gt;Our agent is going to output both $\pi(a|s;\theta)$ as well as $V(s)$. We’re going to use the GRU unit to give the agent the ability to remember its previous actions and environments previous features.&lt;/p&gt;
&lt;p&gt;I’ve also decided to use PRelu instead of Relu activations as it &lt;strong&gt;appeared&lt;/strong&gt; to me that this way the agent was learning much quicker (although I don’t have any numbers to back this impression up).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt;: the code presented below &lt;strong&gt;has not been refactored&lt;/strong&gt; in any way. If this was going to be used in production I’d certainly hugely clean it up.&lt;/p&gt;
&lt;p&gt;Here’s the full listing of the agent’s class:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Agent&lt;/span&gt;(nn.Module):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, **kwargs):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;super&lt;/span&gt;(Agent, &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;).&lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(**kwargs)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.init_args = kwargs
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.h = torch.zeros(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;256&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.norm1 = nn.BatchNorm2d(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.norm2 = nn.BatchNorm2d(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv1 = nn.Conv2d(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;32&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt;, stride=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;, padding=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv2 = nn.Conv2d(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;32&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;32&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;3&lt;/span&gt;, stride=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;, padding=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv3 = nn.Conv2d(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;32&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;32&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;3&lt;/span&gt;, stride=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;, padding=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv4 = nn.Conv2d(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;32&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;32&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;3&lt;/span&gt;, stride=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt;, padding=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.gru = nn.GRUCell(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1152&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;256&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.policy = nn.Linear(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;256&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.value = nn.Linear(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;256&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.prelu1 = nn.PReLU()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.prelu2 = nn.PReLU()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.prelu3 = nn.PReLU()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.prelu4 = nn.PReLU()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.xavier_uniform_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv1.weight, gain=nn.init.calculate_gain(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;leaky_relu&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.constant_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv1.bias, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.01&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.xavier_uniform_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv2.weight, gain=nn.init.calculate_gain(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;leaky_relu&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.constant_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv2.bias, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.01&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.xavier_uniform_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv3.weight, gain=nn.init.calculate_gain(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;leaky_relu&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.constant_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv3.bias, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.01&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.xavier_uniform_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv4.weight, gain=nn.init.calculate_gain(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;leaky_relu&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.constant_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv4.bias, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.01&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.constant_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.gru.bias_ih, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.constant_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.gru.bias_hh, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.xavier_uniform_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.policy.weight, gain=nn.init.calculate_gain(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;leaky_relu&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.constant_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.policy.bias, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.01&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.xavier_uniform_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.value.weight, gain=nn.init.calculate_gain(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;leaky_relu&amp;#39;&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        nn.init.constant_(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.value.bias, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.01&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.train()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;reset&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.h = torch.zeros(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;256&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;clone&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, num=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; [ &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.clone_one() &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; _ &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;range&lt;/span&gt;(num) ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;clone_one&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; Agent(**&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.init_args)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;forward&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, state):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        state = state.view(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;96&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;96&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        state = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.norm1(state)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        data = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.prelu1(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv1(state))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        data = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.prelu2(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv2(data))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        data = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.prelu3(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv3(data))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        data = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.prelu4(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.conv4(data))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        data = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.norm2(data)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        data = data.view(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        h = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.gru(data, &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.h)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.h = h.detach()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        pre_policy = h.view(-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        policy = F.softmax(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.policy(pre_policy))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        value = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.value(pre_policy)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; policy, value&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can immediately notice that actor and critic parts share most of the weights. They only differ in the last layer.&lt;/p&gt;
&lt;p&gt;Next, I wanted to abstract out the notion of the “runner”. It encapsulates the idea of a “running agent”. Think of it as the game player — with the joystick and its brain to score game points. I’m discretizing the action space the following way:&lt;/p&gt;
&lt;table&gt;
  &lt;tr&gt;
    &lt;th&gt;Action name&lt;/th&gt;
    &lt;th&gt;value&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Turn left&lt;/td&gt;
    &lt;td&gt;[-0.8, 0.0, 0.0]&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Turn right&lt;/td&gt;
    &lt;td&gt;[0.8, 0.0, 0]&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Full throttle&lt;/td&gt;
    &lt;td&gt;[0.0, 0.1, 0.0]&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;Brake&lt;/td&gt;
    &lt;td&gt;[0.0, 0.0, 0.6]&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Runner&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, agent, ix, train = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;, **kwargs):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent = agent
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.train = train
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.ix = ix
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.reset = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.states = []
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# each runner has its own environment:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env = gym.make(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;CarRacing-v0&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;get_value&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        Returns just the current state&amp;#39;s value.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        This is used when approximating the R.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        If the last step was
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        not terminal, then we&amp;#39;re substituting the &amp;#34;r&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        with V(s) - hence, we need a way to just
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        get that V(s) without moving forward yet.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        _input = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.preprocess(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.states)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        _, _, _, value = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.decide(_input)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; value
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;run_episode&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, yield_every = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;10&lt;/span&gt;, do_render = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        The episode runner written in the generator style.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        This is meant to be used in a &amp;#34;for (...) in run_episode(...):&amp;#34; manner.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        Each value generated is a tuple of:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        step_ix: the current &amp;#34;step&amp;#34; number
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        rewards: the list of rewards as received from the environment (without discounting yet)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        values: the list of V(s) values, as predicted by the &amp;#34;critic&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        policies: the list of policies as received from the &amp;#34;actor&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        actions: the list of actions as sampled based on policies
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        terminal: whether we&amp;#39;re in a &amp;#34;terminal&amp;#34; state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.reset = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        step_ix = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        rewards, values, policies, actions = [[], [], [], []]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env.reset()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# we&amp;#39;re going to feed the last 4 frames to the neural network that acts as the &amp;#34;actor-critic&amp;#34; duo. We&amp;#39;ll use the &amp;#34;deque&amp;#34; to efficiently drop too old frames always keeping its length at 4:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        states = deque([ ])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# we&amp;#39;re pre-populating the states deque by taking first 4 steps as &amp;#34;full throttle forward&amp;#34;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(states) &amp;lt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            _, r, _, _ = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env.step([&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            state = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env.render(mode=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;rgb_array&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            states.append(state)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            logger.info(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Init reward &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(r) )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# we need to repeat the following as long as the game is not over yet:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# the frames need to be preprocessed (I&amp;#39;m explaining the reasons later in the article)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            _input = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.preprocess(states)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# asking the neural network for the policy and value predictions:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            action, action_ix, policy, value = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.decide(_input, step_ix)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# taking the step and receiving the reward along with info if the game is over:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            _, reward, terminal, _ = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env.step(action)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# explicitly rendering the scene (again, this will be explained later)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            state = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env.render(mode=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;rgb_array&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# update the last 4 states deque:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            states.append(state)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(states) &amp;gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                states.popleft()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# if we&amp;#39;ve been asked to render into the window (e. g. to capture the video):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; do_render:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env.render()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.states = states
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            step_ix += &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            rewards.append(reward)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            values.append(value)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            policies.append(policy)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            actions.append(action_ix)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# periodically save the state&amp;#39;s screenshot along with the numerical values in an easy to read way:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.ix == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;2&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;and&lt;/span&gt; step_ix % &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;200&lt;/span&gt; == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                fname = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;./screens/car-racing/screen-&amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(step_ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;int&lt;/span&gt;(time.time())) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;.jpg&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                im = Image.fromarray(state)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                im.save(fname)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                state.tofile(fname + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;.txt&amp;#39;&lt;/span&gt;, sep=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34; &amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                _input.numpy().tofile(fname + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;.input.txt&amp;#39;&lt;/span&gt;, sep=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34; &amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# if it&amp;#39;s game over or we hit the &amp;#34;yield every&amp;#34; value, yield the values from this generator:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; terminal &lt;span style=&#34;color:#080&#34;&gt;or&lt;/span&gt; step_ix % yield_every == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;yield&lt;/span&gt; step_ix, rewards, values, policies, actions, terminal
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                rewards, values, policies, actions = [[], [], [], []]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# following is a very tacky way to allow external using code to mark that it wants us to reset the environment, finishing the episode prematurely. (this would be hugely refactored in the production code but for the sake of playing with the algorithm itself, it&amp;#39;s good enough):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.reset:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.reset = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.reset()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                states = deque([ ])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.states = deque([ ])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; terminal:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.reset()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                states = deque([ ])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;ask_reset&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.reset = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;preprocess&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, states):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; torch.stack([ torch.tensor(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.preprocess_one(image_data), dtype=torch.float32) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; image_data &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; states ])
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;preprocess_one&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, image):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        Scales the rendered image and makes it grayscale
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; rescale(rgb2gray(image), (&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.24&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.16&lt;/span&gt;), anti_aliasing=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;, mode=&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;edge&amp;#39;&lt;/span&gt;, multichannel=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;choose_action&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, policy, step_ix):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        Chooses an action to take based on the policy and whether we&amp;#39;re in the training mode or not. During training, it samples based on the probability values in the policy. During the evaluation, it takes the most probable action in a greedy way.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        policies = [[-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.8&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;], [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.8&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;], [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.1&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;], [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.0&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.6&lt;/span&gt;]]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.train:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            action_ix = np.random.choice(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt;, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, p=torch.tensor(policy).detach().numpy())[&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            action_ix = np.argmax(torch.tensor(policy).detach().numpy())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        logger.info(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Step &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(step_ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; Runner &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; Action ix: &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(action_ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; From: &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(policy))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; np.array(policies[action_ix], dtype=np.float32), action_ix
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;decide&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, state, step_ix = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;999&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        policy, value = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent(state)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        action, action_ix = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.choose_action(policy, step_ix)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; action, action_ix, policy, value
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;load_state_dict&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, state):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        As we&amp;#39;ll have multiple &amp;#34;worker&amp;#34; runners, they will need to be able to sync their agents&amp;#39; weights with the main agent.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        This function loads the weights into this runner&amp;#39;s agent.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.load_state_dict(state)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I’m also encapsulating the training process in a class of its own. You can notice the gradients being clipped before being applied. I’m also clipping the rewards into the range of $&amp;lt;-3, 3&amp;gt;$ to help to keep the variance low.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Trainer&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, gamma, agent, window = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;15&lt;/span&gt;, workers = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;8&lt;/span&gt;, **kwargs):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;super&lt;/span&gt;().&lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(**kwargs)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent = agent
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.window = window
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.gamma = gamma
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.optimizer = optim.Adam(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.parameters(), lr=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1e-4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.workers = workers
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# even though we&amp;#39;re loading the weights into worker agents explicitly, I found that still without sharing the weights as following, the algorithm was not converging:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.share_memory()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;fit&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, episodes = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1000&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;            The higher level method for training the agents.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;            It called into the lower level &amp;#34;train&amp;#34; which orchestrates the process itself.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;            &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        last_update = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        updates = &lt;span style=&#34;color:#038&#34;&gt;dict&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; ix &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;range&lt;/span&gt;(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.workers + &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            updates[ ix ] = { &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;episode&amp;#39;&lt;/span&gt;: &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;step&amp;#39;&lt;/span&gt;: &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;rewards&amp;#39;&lt;/span&gt;: deque(), &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;losses&amp;#39;&lt;/span&gt;: deque(), &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;points&amp;#39;&lt;/span&gt;: &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;mean_reward&amp;#39;&lt;/span&gt;: &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;mean_loss&amp;#39;&lt;/span&gt;: &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; update &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.train(episodes):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            now = time.time()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# you could do something useful here with the updates dict.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# I&amp;#39;ve opted out as I&amp;#39;m using logging anyways and got more value in just watching the log file, grepping for the desired values&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# save the current model&amp;#39;s weights every minute:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; now - last_update &amp;gt; &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;60&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                torch.save(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.state_dict(), &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;./checkpoints/car-racing/&amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;int&lt;/span&gt;(now)) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;-.pytorch&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                last_update = now
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;train&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, episodes = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1000&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        Lower level training orchestration method. Written in the generator style. Intended to be used with &amp;#34;for update in train(...):&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# create the requested number of background agents and runners:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        worker_agents = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.clone(num = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.workers)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        runners = [ Runner(agent=agent, ix = ix + &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, train = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; ix, agent &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;enumerate&lt;/span&gt;(worker_agents) ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# we&amp;#39;re going to communicate the workers&amp;#39; updates via the thread safe queue:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        queue = mp.SimpleQueue()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# if we&amp;#39;ve not been given a number of episodes: assume the process is going to be interrupted with the keyboard interrupt once the user (us) decides so:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; episodes &lt;span style=&#34;color:#080&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;None&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#038&#34;&gt;print&lt;/span&gt;(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Starting out an infinite training process&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# create the actual background processes, making their entry be the train_one method:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        processes = [ mp.Process(target=&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.train_one, args=(runners[ix - &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;], queue, episodes, ix)) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; ix &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;range&lt;/span&gt;(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.workers + &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;) ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# run those processes:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; process &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; processes:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            process.start()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;try&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# what follows is a rather naive implementation of listening to workers updates. it works though for our purposes:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;while&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;any&lt;/span&gt;([ process.is_alive() &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; process &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; processes ]):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                results = queue.get()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;yield&lt;/span&gt; results
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Exception&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;as&lt;/span&gt; e:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            logger.error(&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(e))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;train_one&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, runner, queue, episodes = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1000&lt;/span&gt;, ix = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        Orchestrate the training for a single worker runner and agent. This is intended to run in its own background process.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# possibly naive way of trying to de-correlate the weight updates further (I have no hard evidence to prove if it works, other than my subjective observation):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        time.sleep(ix)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;try&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#888&#34;&gt;# we are going to request the episode be reset whenever our agent scores lower than its max points. the same will happen if the agent scores total of -10 points:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            max_points = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            max_eval_points = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            min_points = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            max_episode = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; episode_ix &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; itertools.count(start=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;, step=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; episodes &lt;span style=&#34;color:#080&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;None&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;and&lt;/span&gt; episode_ix &amp;gt;= episodes:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                max_episode_points = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                points = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#888&#34;&gt;# load up the newest weights every new episode:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                runner.load_state_dict(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.state_dict())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#888&#34;&gt;# every 5 episodes lets evaluate the weights we&amp;#39;ve learned so far by recording the run of the car using the greedy strategy:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; ix == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;and&lt;/span&gt; episode_ix % &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;5&lt;/span&gt; == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    eval_points = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.record_greedy(episode_ix)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; eval_points &amp;gt; max_eval_points:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        torch.save(runner.agent.state_dict(), &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;./checkpoints/car-racing/&amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(eval_points) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;-eval-points.pytorch&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        max_eval_points = eval_points
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#888&#34;&gt;# each n-step window, compute the gradients and apply&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#888&#34;&gt;# also: decide if we shouldn&amp;#39;t restart the episode if we don&amp;#39;t want to explore too much of the not-useful state space:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; step, rewards, values, policies, action_ixs, terminal &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; runner.run_episode(yield_every=&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.window):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    points += &lt;span style=&#34;color:#038&#34;&gt;sum&lt;/span&gt;(rewards)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; ix == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;and&lt;/span&gt; points &amp;gt; max_points:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        torch.save(runner.agent.state_dict(), &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;./checkpoints/car-racing/&amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(points) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;-points.pytorch&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        max_points = points
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; ix == &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;and&lt;/span&gt; episode_ix &amp;gt; max_episode:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        torch.save(runner.agent.state_dict(), &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;./checkpoints/car-racing/&amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(episode_ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;-episode.pytorch&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        max_episode = episode_ix
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; points &amp;lt; -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;10&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;or&lt;/span&gt; (max_episode_points &amp;gt; min_points &lt;span style=&#34;color:#080&#34;&gt;and&lt;/span&gt; points &amp;lt; min_points):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        terminal = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        max_episode_points = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        point = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        runner.ask_reset()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; terminal:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        logger.info(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;TERMINAL for &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; at step &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(step) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; with total points &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(points) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; max: &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(max_episode_points) )
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#888&#34;&gt;# if we&amp;#39;re learning, then compute and apply the gradients and load the newest weights:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; runner.train:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        loss = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.apply_gradients(policies, action_ixs, rewards, values, terminal, runner)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                        runner.load_state_dict(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.state_dict())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    max_episode_points = &lt;span style=&#34;color:#038&#34;&gt;max&lt;/span&gt;(max_episode_points, points)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    min_points = &lt;span style=&#34;color:#038&#34;&gt;max&lt;/span&gt;(min_points, points)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    &lt;span style=&#34;color:#888&#34;&gt;# communicate the gathered values to the main process:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    queue.put((ix, episode_ix, step, rewards, loss, points, terminal))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;except&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Exception&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;as&lt;/span&gt; e:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            string = traceback.format_exc()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            logger.error(&lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(e) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; → &amp;#39;&lt;/span&gt; + string)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            queue.put((ix, -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, [-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;], -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(e) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;&amp;lt;br /&amp;gt;&amp;#39;&lt;/span&gt; + string, &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;record_greedy&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, episode_ix):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        Records the video of the &amp;#34;greedy&amp;#34; run based on the current weights.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        directory = &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;./videos/car-racing/episode-&amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(episode_ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;-&amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;int&lt;/span&gt;(time.time()))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        player = Player(agent=&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent, directory=directory, train=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        points = player.play()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        logger.info(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Evaluation at episode &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(episode_ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;: &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(points) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; points (&amp;#39;&lt;/span&gt; + directory + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;)&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; points
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;apply_gradients&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, policies, actions, rewards, values, terminal, runner):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        worker_agent = runner.agent
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        actions_one_hot = torch.tensor([[ &lt;span style=&#34;color:#038&#34;&gt;int&lt;/span&gt;(i == action) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; i &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;range&lt;/span&gt;(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;4&lt;/span&gt;) ] &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; action &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; actions], dtype=torch.float32)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        policies = torch.stack(policies)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        values = torch.cat(values)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        values_nograd = torch.zeros_like(values.detach(), requires_grad=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;False&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        values_nograd.copy_(values)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        discounted_rewards = &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.discount_rewards(runner, rewards, values_nograd[-&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;], terminal)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        advantages = discounted_rewards - values_nograd
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        logger.info(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Runner &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(runner.ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Rewards: &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(rewards))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        logger.info(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Runner &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(runner.ix) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Discounted Rewards: &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(discounted_rewards.numpy()))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        log_policies = torch.log(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.00000001&lt;/span&gt; + policies)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        one_log_policies = torch.sum(log_policies * actions_one_hot, dim=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        entropy = torch.sum(policies * -log_policies)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        policy_loss = -torch.mean(one_log_policies * advantages)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        value_loss = F.mse_loss(values, discounted_rewards)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        value_loss_nograd = torch.zeros_like(value_loss)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        value_loss_nograd.copy_(value_loss)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        policy_loss_nograd = torch.zeros_like(policy_loss)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        policy_loss_nograd.copy_(policy_loss)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        logger.info(&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39;Value Loss: &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;(value_loss_nograd)) + &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#39; Policy Loss: &amp;#39;&lt;/span&gt; + &lt;span style=&#34;color:#038&#34;&gt;str&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;(policy_loss_nograd)))
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        loss = policy_loss + &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.5&lt;/span&gt; * value_loss - &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.01&lt;/span&gt; * entropy
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.zero_grad()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        loss.backward()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        torch.nn.utils.clip_grad_norm_(worker_agent.parameters(), &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# the following step is crucial. at this point, all the info about the gradients reside in the worker agent&amp;#39;s memory. We need to &amp;#34;move&amp;#34; those gradients into the main agent&amp;#39;s memory:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.share_gradients(worker_agent)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#888&#34;&gt;# update the weights with the computed gradients:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.optimizer.step()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        worker_agent.zero_grad()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;float&lt;/span&gt;(loss.detach())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;share_gradients&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, worker_agent):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; param, shared_param &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;zip&lt;/span&gt;(worker_agent.parameters(), &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.agent.parameters()):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; shared_param.grad &lt;span style=&#34;color:#080&#34;&gt;is&lt;/span&gt; &lt;span style=&#34;color:#080&#34;&gt;not&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;None&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            shared_param._grad = param.grad
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;clip_reward&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, reward):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;&amp;#34;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        Clips the rewards into the &amp;lt;-3, 3&amp;gt; range preventing too big of the gradients variance.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;        &amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;max&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;min&lt;/span&gt;(reward, &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;3&lt;/span&gt;), -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;discount_rewards&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, runner, rewards, last_value, terminal):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            discounted_rewards = [&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt; &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; _ &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; rewards]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        loop_rewards = [ &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.clip_reward(reward) &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; reward &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; rewards ]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; terminal:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            loop_rewards.append(&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;else&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            loop_rewards.append(runner.get_value())
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; main_ix &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;range&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(discounted_rewards) - &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; inside_ix &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;range&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;len&lt;/span&gt;(loop_rewards) - &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, -&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; inside_ix &amp;gt;= main_ix:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    reward = loop_rewards[inside_ix]
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;                    discounted_rewards[main_ix] += &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.gamma**(inside_ix - main_ix) * reward
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; torch.tensor(discounted_rewards)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For the &lt;code&gt;record_greedy&lt;/code&gt; method to work we need the following class:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;class&lt;/span&gt; &lt;span style=&#34;color:#b06;font-weight:bold&#34;&gt;Player&lt;/span&gt;(Runner):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;, directory, **kwargs):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;super&lt;/span&gt;().&lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;__init__&lt;/span&gt;(ix=&lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;999&lt;/span&gt;, **kwargs)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env = Monitor(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env, directory)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;def&lt;/span&gt; &lt;span style=&#34;color:#06b;font-weight:bold&#34;&gt;play&lt;/span&gt;(&lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        points = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;for&lt;/span&gt; step, rewards, values, policies, actions, terminal &lt;span style=&#34;color:#080&#34;&gt;in&lt;/span&gt; &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.run_episode(yield_every = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;1&lt;/span&gt;, do_render = &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;True&lt;/span&gt;):
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;            points += &lt;span style=&#34;color:#038&#34;&gt;sum&lt;/span&gt;(rewards)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#038&#34;&gt;self&lt;/span&gt;.env.close()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;        &lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;return&lt;/span&gt; points&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All the above code can be used as follows (in the Python script):&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#fff;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;if&lt;/span&gt; &lt;span style=&#34;color:#369&#34;&gt;__name__&lt;/span&gt; == &lt;span style=&#34;color:#d20;background-color:#fff0f0&#34;&gt;&amp;#34;__main__&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    agent = Agent()
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    trainer = Trainer(gamma = &lt;span style=&#34;color:#00d;font-weight:bold&#34;&gt;0.99&lt;/span&gt;, agent = agent)
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;    trainer.fit(episodes=&lt;span style=&#34;color:#080;font-weight:bold&#34;&gt;None&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id=&#34;the-importance-of-tuning-of-the-n-step-window-size&#34;&gt;The importance of tuning of the n-step window size&lt;/h4&gt;
&lt;p&gt;Reading the code, you can notice that we’ve chosen $15$ to be the size of the n-step window. We’ve also chosen $\gamma=0.99$. Getting those values right is a subject for tuning. The same ones that work on one game or a challenge will not necessarily work well for the other.&lt;/p&gt;
&lt;p&gt;Here’s a quick explanation of how to think about them: We’re going to be penalized most of the time. It’s important for us to give the algorithm a chance to actually find trajectories that score positively. In the “CarRacing” challenge, I’ve found that it can take 10 steps of moving “full throttle” in the correct direction before we’re being rewarded by entering the new “tile”. I’ve just simply added $5$ of the safety net to that number. No mathematical proof follows this thinking here, but I can tell you though that it made a &lt;strong&gt;huge&lt;/strong&gt; difference in the training time for me. The version of the code I’m presenting above starts to score above 700 points after approximately 10 hours on my Ryzen 7 based computing box.&lt;/p&gt;
&lt;h4 id=&#34;problems-with-the-state-being-returned-from-the-environment---overcoming-with-the-explicit-render&#34;&gt;Problems with the state being returned from the environment - overcoming with the explicit render&lt;/h4&gt;
&lt;p&gt;You might have also noticed that I’m not using the state values returned by the &lt;code&gt;step&lt;/code&gt; method of the gym environment. This might seem contradictory to how the gym is typically being used. After &lt;strong&gt;days&lt;/strong&gt; of not seeing my model converge though, I have found that the &lt;code&gt;step&lt;/code&gt; method was returning &lt;strong&gt;one and the same&lt;/strong&gt; numpy array &lt;strong&gt;on each call&lt;/strong&gt;. You can imagine that it was the absolutely &lt;strong&gt;last&lt;/strong&gt; thing I’ve checked when trying to find that bug.&lt;/p&gt;
&lt;p&gt;I’ve found the &lt;code&gt;render(mode=&#39;rgb_array&#39;)&lt;/code&gt; works as intended each time. I just needed to write my own preprocessing code, to scale it down and make it grayscale.&lt;/p&gt;
&lt;h4 id=&#34;how-to-know-when-the-algorithm-converges&#34;&gt;How to know when the algorithm converges&lt;/h4&gt;
&lt;p&gt;I’ve seen some people thinking that their A3C implementation does not converge. The resulting policy did not seem to be working that well, but the training process was taking a bit longer than “some other implementation”. I fell for this kind of thinking myself as well. My humble bit of advice is to stick to what makes sense mathematically. Someone else’s model might be converging faster simply because of the hardware being used or some slight difference in the code &lt;strong&gt;around&lt;/strong&gt; the training (e.g. explicit render needed in my case). This might not have anything to do with the A3C part at all.&lt;/p&gt;
&lt;p&gt;How do we “stick to what makes sense mathematically”? Simply by logging the value loss and observing it as the training continues. Intuitively, for the model that has converged, we should see that it has already learned the value function. Those values — representing the average of the discounted rewards — should not make the loss too big most of the time. Still, for some states, the best action will make the $R_t$ much bigger than $V(s_t)$ which means that we still should see the loss spiking from time to time.&lt;/p&gt;
&lt;p&gt;Again, the above bit of advice doesn’t come with any mathematical proofs. It’s what I found working and making sense &lt;strong&gt;in my case&lt;/strong&gt;.&lt;/p&gt;
&lt;h3 id=&#34;the-results&#34;&gt;The Results&lt;/h3&gt;
&lt;p&gt;Instead of presenting hard-core statistics about the model’s performance — which wouldn’t make much sense because I stopped it as soon as the “evaluation” videos started looking cool enough) — I’ll just post three videos of the car driving on its own through the three randomly generated tracks.&lt;/p&gt;
&lt;p&gt;Have fun watching and even more fun coding it yourself too!&lt;/p&gt;
&lt;center&gt;
  &lt;video width=&#34;100%&#34; controls poster=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/poster.png&#34;&gt;
    &lt;source src=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/873-openaigym.video.92.68.video000000.mp4&#34; type=&#34;video/mp4&#34;&gt;
  &lt;/video&gt;
&lt;/center&gt;
&lt;center&gt;
  &lt;video width=&#34;100%&#34; controls poster=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/poster.png&#34;&gt;
    &lt;source src=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/892-openaigym.video.90.68.video000000.mp4&#34; type=&#34;video/mp4&#34;&gt;
  &lt;/video&gt;
&lt;/center&gt;
&lt;center&gt;
  &lt;video width=&#34;100%&#34; controls poster=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/poster.png&#34;&gt;
    &lt;source src=&#34;/blog/2018/08/self-driving-toy-car-using-the-a3c-algorithm/904-openaigym.video.77.68.video000000.mp4&#34; type=&#34;video/mp4&#34;&gt;
  &lt;/video&gt;
&lt;/center&gt;
&lt;script&gt;
    renderMathInElement(
        document.body,
        {
            delimiters: [
                {left: &#34;$$&#34;, right: &#34;$$&#34;, display: true},
                {left: &#34;\\[&#34;, right: &#34;\\]&#34;, display: true},
                {left: &#34;$&#34;, right: &#34;$&#34;, display: false},
                {left: &#34;\\(&#34;, right: &#34;\\)&#34;, display: false}
            ]
        }
    );
&lt;/script&gt;

      </content>
    </entry>
  
</feed>
