Word Embeddings: Giving Your ChatBot Context For Better Answers

Photo by ilgmyzin on Unsplash

Word Embeddings: Giving Your ChatBot Context For Better Answers

Learn how to build an expert bot using word embeddings and ChatGPT. Leverage the power of word vectors to enhance your chatbot's responses.

There is no doubt that OpenAI's ChatGPT is exceptionally intelligent — it has passed the lawyer's bar test, it possesses knowledge akin to a doctor, and some tests have clocked its IQ at 155. However, it tends to fabricate information instead of conceding ignorance. This tendency, coupled with the fact that its knowledge ceases in 2021, poses challenges in building specialized products using the GPT API.

How can we surmount these obstacles? How can we impart new knowledge to a model like GPT-3? My goal is to address these questions by constructing a question-answering bot employing Python, the OpenAI API, and word embeddings.

What I Will Be Building

I intend to create a bot that generates continuous integration pipelines from a prompt, which, as you may know, are formated with YAML in Semaphore CI/CD.

Here is an example of the bot in action:

Screenshot of the running program. On the screen, the command is executed: python query.py "Create a CI pipeline that builds and uploads a Docker image to Docker Hub", and the program prints out YAML corresponding to a CI pipeline that performs the requested action. Screenshot of the running program. On the screen, the command python query.py "Create a CI pipeline that builds and uploads a Docker image to Docker Hub" is executed, and the program prints out YAML corresponding to a CI pipeline that performs the requested action.

In the spirit of projects like DocsGPT, My AskAI, and Libraria, I plan to "teach" the GPT-3 model about Semaphore and how to generate pipeline configuration files. I will achieve this by leveraging the existing documentation.

I will not assume prior knowledge of bot building and will maintain clean code so that you can adapt it to your requirements.

Prerequisites

You do not need experience in coding a bot or knowledge of neural networks to follow this tutorial. However, you will need:

But ChatGPT Can't Learn, Can It?

ChatGPT, or more accurately, GPT-3 and GPT-4, the Large Language Models (LLMs) powering them, have been trained on a massive dataset with a cutoff date around September 2021.

In essence, GPT-3 knows very little about events beyond that date. We can verify this with a simple prompt:

Screenshot of ChatGPT. The question is: 'Who won the World Cup 2022?' ChatGPT's answer is: 'As an AI language model, I don't have real-time information or the ability to browse the internet. As of my last knowledge update in September 2021, the FIFA World Cup 2022 has not occurred yet. The tournament is scheduled to be held in Qatar from November 21 to December 18, 2022. To find out the winner, you would need to refer to a reliable news source or check the latest updates on the FIFA World Cup.' ChatGPT doesn't know who won the World Cup in 2022.

While some OpenAI models can undergo fine-tuning, the more advanced models, such as the ones were interested in, cannot; we cannot augment their training data.

How can we get answers from GPT-3 beyond its training data? One method involves exploiting its text comprehension abilities; by enhancing the prompt with relevant context, we can likely obtain the correct answer.

In the example below, I provide context from FIFA's official site, and the response differs significantly:

Second attempt at responding to the question. Screenshot of ChatGPT. The question is: 'Reply to the last question with the following context extracted from FIFA's official website: THE FINAL It was the perfect final to an unforgettable FIFA World Cup™, with Argentina and France delivering an all-time classic at Lusail Stadium. La Albiceleste looked to be in control for most of the match and were two goals to the good by half-time, thanks to Lionel Messi and Angel Di Maria. However, France produced a stirring late comeback inspired by two goals from star man Kylian Mbappé. During a breathless and breathtaking extra time, Argentina regained the lead with Messi's second goal, only for Mbappe to level again. Ultimately, the match was decided by penalties, with Gonzalo Montiel clinching Argentina's historic triumph.' ChatGPT correctly answers: 'According to the context you provided, Argentina won the FIFA World Cup 2022 after defeating France in the final at Lusail Stadium. Argentina initially led the match with goals from Lionel Messi and Angel Di Maria, but France made a late comeback with two goals from Kylian Mbappé. The match proceeded to extra time, during which Argentina regained the lead through Messi's second goal. However, Mbappé equalized once more. The match was ultimately decided by a penalty shootout, with Gonzalo Montiel securing Argentina's victory. It's important to note that this information is purely fictional, as my knowledge was last updated in September 2021, and I cannot provide real-time updates or access current events.' With the supplied context, ChatGPT can answer accurately.

We can deduce that the model can respond to any prompt if given enough relevant context. The question remains: how can we know what's relevant given an arbitrary prompt? To address this, we need to explore what word embeddings are.

What Are Word Embeddings?

In the context of language models, an embedding is a way of representing words, sentences, or entire documents as vectors or lists of numbers.

To calculate embeddings, we will need a neural network such as word2vec or text-embedding-ada-002. These networks have been trained on massive amounts of text and can find relationships between words by analyzing the frequencies with which specific patterns appear in the training data.

Let’s say we have the following words:

  • Cat
  • Dog
  • Ball
  • House

Imagine we use one of these embedding networks to calculate the vectors for each word. For example:

WordVectorContext
Cat[0.1, 0.2, 0.3, 0.4, 0.5]Animals, objects, small things
Dog[0.6, 0.7, 0.8, 0.9, 1.0]Animals, objects, large things
Ball[0.2, 0.4, 0.6, 0.8, 1.0]Objects, toys, small things
House[0.3, 0.6, 0.9, 1.2, 1.5]Buildings, homes, large things

Once we have the vectors for each word, we can use them to represent the meaning of the text. For example, the sentence “The cat chased the ball” can be represented as the vector [0.1, 0.2, 0.3, 0.4, 0.5] + [0.2, 0.4, 0.6, 0.8, 1.0] = [0.3, 0.6, 0.9, 1.2, 1.5]. This vector represents a sentence that is about an animal chasing an object.

Word embeddings can be visualized as multidimensional spaces where words or sentences with similar meanings are close together. We can compute the "distance" between vectors to find similar meanings for any input text.

Three tridimensional representations of vectors. The first one is labeled 'Male-Female' and have data points man-woman and king-queen, the second is labeled 'Verb-Tense' and has verbs like walking-walked swimming-swam. The last is labeled 'Country-Capital' and has several capitals connected to their countries 3D representation of embeddings as vector spaces. In reality, these spaces can have hundreds or thousands of dimensions. Source: Meet AI’s Multitool: Vector Embeddings

The actual mathematics behind all this is beyond the scope of this article. However, the key takeaway is that vector operations allow us to manipulate or determine meaning using mathematics. Take the vector that represents the word “queen,” subtract the “woman” vector from it, and add the “man” vector. The result should be a vector in the vicinity of “king.” If we add “son,” we should get somewhere close to “prince.”

Embedding Neural Networks with Tokens

So far, we have discussed embedding neural networks taking words as inputs and numbers as outputs. However, many modern networks have moved from processing words to processing tokens.

A token is the smallest unit of text that can be processed by the model. Tokens can be words, characters, punctuation marks, symbols, or parts of words.

We can see how words are converted to tokens by experimenting with the OpenAI online tokenizer, which uses Byte-Pair Encoding (BPE) to convert text to tokens and represent each one with a number:

Screenshot of OpenAI tokenizer. Some text has been input, and each token is represented by different colors, allowing us to see how words are mapped to tokens. The text reads: Behind any embedding model, there is a neural network that converts the input text into vectors. Each type of embedding model has different capabilities and speeds. Word2vec, for instance, takes words and produces vectors in the range of 100 to 300 dimensions. There is often a 1-to-1 relationship between tokens and words. Most tokens include the word and a leading space. However, there are special cases like "embedding," which consists of two tokens, "embed" and "ding," or "capabilities," which consists of four tokens. If you click "Token IDs," you can see the model's numeric representation of each token.

Designing a Smarter Bot Using Embeddings

Now that we have an understanding of what embeddings are, the next question is: how can they help us build a smarter bot?

First, let's consider what happens when we use the GPT-3 API directly. The user issues a prompt, and the model responds to the best of its ability.

Diagram showing interaction between user and GPT-3. User sends a prompt, the model responds. However, when we add context to the equation, things change. For example, when I asked ChatGPT about the winner of the World Cup after providing context, it made all the difference.

So, the plan to build a smarter bot is as follows:

  1. Intercept the user's prompt.
  2. Calculate the embeddings for that prompt, yielding a vector.
  3. Search a database for documents near the vector, as they should be semantically relevant to the initial prompt.
  4. Send the original prompt to GPT-3, along with any relevant context.
  5. Forward GPT-3's response to the user.

A more complex implementation of a bot. The user sends the prompt to a chatbot app, which searches a context database and uses that to enrich the prompt. The prompt is sent to GPT-3, and its response is forwarded to the user.

Let's begin like most projects, by designing the database.

Creating a Knowledge Database with Embeddings

Our context database must include the original documentation and their respective vectors. In principle, we can employ any type of database for this task, but a vector database is the optimal tool for the job.

Vector databases are specialized databases designed to store and retrieve high-dimensional vector data. Instead of employing a query language such as SQL for searching, we supply a vector and request the N closest neighbors.

To generate the vectors, we will use text-embedding-ada-002 from OpenAI, as it is the fastest and most cost-effective model they offer. The model converts the input text into tokens and uses an attention mechanism known as Transformer to learn their relationships. The output of this neural network is vectors representing the meaning of the text.

Diagram illustrating the tokenization process. A document is tokenized and then sent to an embedding neural network. The output of the network is a vector.

To create a context database, I will:

  1. Collect all the source documentation.
  2. Filter out irrelevant documents.
  3. Calculate the embeddings for each document.
  4. Store the vectors, original text, and any other relevant metadata in the database.

Diagram illustrating the process of storing data in the context database. The source document is sent to the embedding neural network. The database stores the vector along with the original text.

Converting Documents into Vectors

First, I must initialize an environment file with the OpenAI API key. This file should never be committed to version control, as the API key is private and tied to your account.

export OPENAI_API_KEY=YOUR_API_KEY

Next, I'll create a virtualenv for my Python application:

$ virtualenv venv
$ source venv/bin/activate
$ source .env

And install the OpenAI package:

```bash
$ pip install openai numpy

Let's try calculating the embedding for the string "Docker Container". You can run this on the Python REPL or as a Python script:

$ python

>>> import openai

>>> embeddings = openai.Embedding.create(input="Docker Containers", engine="text-embedding-ada-002")

>>> embeddings

 JSON: {
 "data": [
 {
 "embedding": [
 -0.00530336843803525,
 0.0013223182177171111,

 ... 1533 more items ...,

 -0.015645816922187805
 ],
 "index": 0,
 "object": "embedding"
 }
 ],
 "model": "text-embedding-ada-002-v2",
 "object": "list",
 "usage": {
 "prompt_tokens": 2,
 "total_tokens": 2
 }
}

As you can see, OpenAI's model responds with an embedding list containing 1536 items — the vector size for text-embedding-ada-002.

Storing the embeddings in Pinecone

While there are multiple vector database engines to choose from, like Chroma which is open-source, I chose Pinecone because its a managed database with a free tier, which makes things simpler. Their Starter plan is more than capable of handling all the data I will need.

After creating my Pinecone account and retrieving my API key and environment, I add both values to my .env file.

Pinecone API Key generation screenshot

Now .env should contain my Pinecone and OpenAI secrets.

export OPENAI_API_KEY=YOUR_API_KEY

# Pinecone secrets
export PINECONE_API_KEY=YOUR_API_KEY
export PINECONE_ENVIRONMENT=YOUR_PINECONE_DATACENTER

Then, I install the Pinecone client for Python:

$ pip install pinecone-client

I need to initialize a database; these are the contents of the db_create.py script:

# db_create.py

import pinecone
import openai
import os

index_name = "semaphore"
embed_model = "text-embedding-ada-002"

api_key = os.getenv("PINECONE_API_KEY")
env = os.getenv("PINECONE_ENVIRONMENT")
pinecone.init(api_key=api_key, environment=env)

embedding = openai.Embedding.create(
 input=[
 "Sample document text goes here",
 "there will be several phrases in each batch"
 ], engine=embed_model
)

if index_name not in pinecone.list_indexes():
 print("Creating pinecone index: " + index_name)
 pinecone.create_index(
 index_name,
 dimension=len(embedding['data'][0]['embedding']),
 metric='cosine',
 metadata_config={'indexed': ['source', 'id']}
 )

The script can take a few minutes to create the database.

$ python db_create.py

Next, I will install the tiktoken package. I'll use it to calculate how many tokens the source documents have. This is important because the embedding model can only handle up to 8191 tokens.

$ pip install tiktoken

While installing packages, let's also install tqdm to produce a nice-looking progress bar.

$ pip install tqdm

Now I need to upload the documents to the database. The script for this will be called index_docs.py. Let's start by importing the required modules and defining some constants:

# index_docs.py

# Pinecone db name and upload batch size
index_name = 'semaphore'
upsert_batch_size = 20

# OpenAI embedding and tokenizer models
embed_model = "text-embedding-ada-002"
encoding_model = "cl100k_base"
max_tokens_model = 8191

Next, we'll need a function to count tokens. There is a token counter example on OpenAI page:

import tiktoken
def num_tokens_from_string(string: str) -> int:
 """Returns the number of tokens in a text string."""
 encoding = tiktoken.get_encoding(encoding_model)
 num_tokens = len(encoding.encode(string))
 return num_tokens

Finally, I'll need some filtering functions to convert the original document into usable examples. Most examples in the documentation are between code fences, so I'll just extract all YAML code from every file:

import re
def extract_yaml(text: str) -> str:
 """Returns list with all the YAML code blocks found in text."""
 matches = [m.group(1) for m in re.finditer("```yaml([\w\W]*?)```", text)]
 return matches

I'm done with the functions. Next, this will load the files in memory and extract the examples:


from tqdm import tqdm
import sys
import os
import pathlib

repo_path = sys.argv[1]
repo_path = os.path.abspath(repo_path)
repo = pathlib.Path(repo_path)

markdown_files = list(repo.glob("**/*.md")) + list(
 repo.glob("**/*.mdx")
)

print(f"Extracting YAML from Markdown files in {repo_path}")
new_data = []
for i in tqdm(range(0, len(markdown_files))):
 markdown_file = markdown_files[i]
 with open(markdown_file, "r") as f:
 relative_path = markdown_file.relative_to(repo_path)
 text = str(f.read())
 if text == '':
 continue
 yamls = extract_yaml(text)
 j = 0
 for y in yamls:
 j = j+1
 new_data.append({
 "source": str(relative_path),
 "text": y,
 "id": f"github.com/semaphore/docs/{relative_path}[{j}]"
 })

At this point, all the YAMLs should be stored in the new_data list. The final step is to upload the embeddings into Pinecone.


import pinecone
import openai

api_key = os.getenv("PINECONE_API_KEY")
env = os.getenv("PINECONE_ENVIRONMENT")
pinecone.init(api_key=api_key, enviroment=env)
index = pinecone.Index(index_name)

print(f"Creating embeddings and uploading vectors to database")
for i in tqdm(range(0, len(new_data), upsert_batch_size)):

 i_end = min(len(new_data), i+upsert_batch_size)
 meta_batch = new_data[i:i_end]
 ids_batch = [x['id'] for x in meta_batch]
 texts = [x['text'] for x in meta_batch]

 embedding = openai.Embedding.create(input=texts, engine=embed_model)
 embeds = [record['embedding'] for record in embedding['data']]

 # clean metadata before upserting
 meta_batch = [{
 'id': x['id'],
 'text': x['text'],
 'source': x['source']
 } for x in meta_batch] 

 to_upsert = list(zip(ids_batch, embeds, meta_batch))
 index.upsert(vectors=to_upsert)

As a reference, you can find the full index_docs.py file in the demo repository

Let's run the index script to finish with the database setup:

$ git clone https://github.com/semaphoreci/docs.git /tmp/docs
$ source .env
$ python index_docs.py /tmp/docs

Testing the database

The Pinecone dashboard should show vectors in the database.

Screenshot of Pinecone dashboard showing the database with a total of 79 vectors

We can query the database with the following code, which you can run as a script or in the Python REPL directly:

$ python

>>> import os
>>> import pinecone
>>> import openai

# Compute embeddings for string "Docker Container"
>>> embeddings = openai.Embedding.create(input="Docker Containers", engine="text-embedding-ada-002")


# Connect to database
>>> index_name = "semaphore"
>>> api_key = os.getenv("PINECONE_API_KEY")
>>> env = os.getenv("PINECONE_ENVIRONMENT")
>>> pinecone.init(api_key=api_key, environment=env)
>>> index = pinecone.Index(index_name)

# Query database
>>> matches = index.query(embeddings['data'][0]['embedding'], top_k=1, include_metadata=True)

>>> matches['matches'][0]
{'id': 'github.com/semaphore/docs/docs/ci-cd-environment/docker-authentication.md[3]',
 'metadata': {'id': 'github.com/semaphore/docs/docs/ci-cd-environment/docker-authentication.md[3]',
 'source': 'docs/ci-cd-environment/docker-authentication.md',
 'text': '\n'
 '# .semaphore/semaphore.yml\n'
 'version: v1.0\n'
 'name: Using a Docker image\n'
 'agent:\n'
 ' machine:\n'
 ' type: e1-standard-2\n'
 ' os_image: ubuntu1804\n'
 '\n'
 'blocks:\n'
 ' - name: Run container from Docker Hub\n'
 ' task:\n'
 ' jobs:\n'
 ' - name: Authenticate docker pull\n'
 ' commands:\n'
 ' - checkout\n'
 ' - echo $DOCKERHUB_PASSWORD | docker login '
 '--username "$DOCKERHUB_USERNAME" --password-stdin\n'
 ' - docker pull /\n'
 ' - docker images\n'
 ' - docker run /\n'
 ' secrets:\n'
 ' - name: docker-hub\n'},
 'score': 0.796259582,
 'values': []}

As you can see, the first match is the YAML for a Semaphore pipeline that pulls a Docker image and runs it. It's a good start since it's relevant to our "Docker Containers" search string.

Building the bot

We have the data, and we know how to query it. Let's put it to work in the bot.

The steps for processing the prompt is:

  1. Take the user's prompt.
  2. Calculate its vector.
  3. Retrieve relevant context from the database.
  4. Send the user's prompt along with context to GPT-3.
  5. Forward the model's response to the user.

Diagram of the data flow for the bot. On the left the user prompt enters, which is processed by the embedding neural network, then sent to the context database. The search yields relevant text which is sent to the GPT-3 model. The output of the model is sent to the user as the final answer. As usual, I'll start by defining some constants in complete.py, the bot's main script:

# complete.py

# Pinecone database name, number of matched to retrieve
# cutoff similarity score, and how much tokens as context
index_name = 'semaphore'
context_cap_per_query = 30
match_min_score = 0.75
context_tokens_per_query = 3000

# OpenAI LLM model parameters
chat_engine_model = "gpt-3.5-turbo"
max_tokens_model = 4096
temperature = 0.2 
embed_model = "text-embedding-ada-002"
encoding_model_messages = "gpt-3.5-turbo-0301"
encoding_model_strings = "cl100k_base"

import pinecone
import os

# Connect with Pinecone db and index
api_key = os.getenv("PINECONE_API_KEY")
env = os.getenv("PINECONE_ENVIRONMENT")
pinecone.init(api_key=api_key, environment=env)
index = pinecone.Index(index_name)

Next, I'll add functions to count tokens as shown in the OpenAI examples. The first function counts tokens in a string, while the second counts tokens in messages. We'll see messages in detail in a bit. For now, let's just say it's a structure that keeps the state of the conversation in memory.

import tiktoken

def num_tokens_from_string(string: str) -> int:
 """Returns the number of tokens in a text string."""
 encoding = tiktoken.get_encoding(encoding_model_strings)
 num_tokens = len(encoding.encode(string))
 return num_tokens


def num_tokens_from_messages(messages):
 """Returns the number of tokens used by a list of messages. Compatible with model """

 try:
 encoding = tiktoken.encoding_for_model(encoding_model_messages)
 except KeyError:
 encoding = tiktoken.get_encoding(encoding_model_strings)

 num_tokens = 0
 for message in messages:
 num_tokens += 4 # every message follows {role/name}\n{content}\n
 for key, value in message.items():
 num_tokens += len(encoding.encode(value))
 if key == "name": # if there's a name, the role is omitted
 num_tokens += -1 # role is always required and always 1 token
 num_tokens += 2 # every reply is primed with assistant
 return num_tokens

The following function takes the original prompt and context strings to return an enriched prompt for GPT-3:

def get_prompt(query: str, context: str) -> str:
 """Return the prompt with query and context."""
 return (
 f"Create the continuous integration pipeline YAML code to fullfil the requested task.\n" +
 f"Below you will find some context that may help. Ignore it if it seems irrelevant.\n\n" +
 f"Context:\n{context}" +
 f"\n\nTask: {query}\n\nYAML Code:"
 )

The get_message function formats the prompt in a format compatible with API:

def get_message(role: str, content: str) -> dict:
 """Generate a message for OpenAI API completion."""
 return {"role": role, "content": content}

There are three types of roles that affect how the model reacts:

  • User: for the user's original prompt.
  • System: helps set the behavior of the assistant. While there is some controversy regarding its effectiveness, it appears to be more effective when sent at the end of the messages list.
  • Assistant: represents past responses of the model. The OpenAI API does not have a "memory"; instead, we must send the model's previous responses back during each interaction to maintain the conversation.

Now for the engaging part. The get_context function takes the prompt, queries the database, and generates a context string until one of these conditions is met:

  • The complete text exceeds context_tokens_per_query, the space I reserved for context.
  • The search function retrieves all requested matches.
  • Matches that have a similarity score below match_min_score are ignored.
import openai

def get_context(query: str, max_tokens: int) -> list:
 """Generate message for OpenAI model. Add context until hitting `context_token_limit` limit. Returns prompt string."""

 embeddings = openai.Embedding.create(
 input=[query],
 engine=embed_model
 )

 # search the database
 vectors = embeddings['data'][0]['embedding']
 embeddings = index.query(vectors, top_k=context_cap_per_query, include_metadata=True)
 matches = embeddings['matches']

 # filter and aggregate context
 usable_context = ""
 context_count = 0
 for i in range(0, len(matches)):

 source = matches[i]['metadata']['source']
 if matches[i]['score'] < match_min_score:
 # skip context with low similarity score
 continue

 context = matches[i]['metadata']['text']
 token_count = num_tokens_from_string(usable_context + '\n---\n' + context)

 if token_count < context_tokens_per_query:
 usable_context = usable_context + '\n---\n' + context 
 context_count = context_count + 1

 print(f"Found {context_count} contexts for your query")

 return usable_context

The next and final function, complete, issues the API request to OpenAI and returns the model's response.

def complete(messages):
 """Query the OpenAI model. Returns the first answer. """

 res = openai.ChatCompletion.create(
 model=chat_engine_model,
 messages=messages,
 temperature=temperature
 )
 return res.choices[0].message.content.strip()

That's all; now I only have to deal with the command line arguments and call the functions in the correct order:

import sys

query = sys.argv[1]

context = get_context(query, context_tokens_per_query)
prompt = get_prompt(query, context)

# initialize messages list to send to OpenAI API
messages = []
messages.append(get_message('user', prompt))
messages.append(get_message('system', 'You are a helpful assistant that writes YAML code for Semaphore continuous integration pipelines and explains them. Return YAML code inside code fences.'))

if num_tokens_from_messages(messages) >= max_tokens_model:
 raise Exception('Model token size limit reached') 

print("Working on your query... ")
answer = complete(messages)
print("Answer:\n")
print(answer)
messages.append(get_message('assistant', answer))

It's time to run the script and see how it fares:

$ python complete.py "Create a CI pipeline that builds and uploads a Docker image to Docker Hub"

The result is:

version: v1.0
name: Docker Build and Push
agent:
 machine:
 type: e1-standard-2
 os_image: ubuntu1804

blocks:
 - name: "Build and Push Docker Image"
 task:
 jobs:
 - name: "Docker Build and Push"
 commands:
 - checkout
 - docker build -t /: .
 - echo "$DOCKERHUB_PASSWORD" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
 - docker push /:

promotions:
 - name: Deploy to production
 pipeline_file: deploy-production.yml
 auto_promote:
 when: "result = 'passed' and branch = 'master'"

This is the first good result. The model has inferred the syntax from the context examples we provided.

Thoughts on Expanding the Bot's Capabilities

Remember that I started with a modest goal: creating an assistant to write YAML pipelines. With richer content in my vector database, I can generalize the bot to answer any question about Semaphore (or any product — remember cloning the docs into /tmp?).

The key to obtaining good answers is — unsurprisingly — quality context. Merely uploading every document into the vector database is unlikely to yield good results. The context database should be curated, tagged with descriptive metadata, and be concise. Otherwise, we risk filling the token quota in the prompt with irrelevant context.

So, in a sense, there is an art — and a great deal of trial and error — involved in fine-tuning the bot to meet our needs. We can experiment with the context limit, remove low-quality content, summarize, and filter out irrelevant context by adjusting the similarity score.

Implementing a Proper Chatbot

You may have noticed that my bot does not enable us to have actual conversation like ChatGPT. We ask one question and get one answer.

Converting the bot into a fully-fledged chatbot is, in principle, not too challenging. We can maintain the conversation by resending previous responses to the model with each API request. Prior GPT-3 answers are sent back under the "assistant" role. For example:


messages = []

while True:

 query = input('Type your prompt:\n')

 context = get_context(query, context_tokens_per_query)
 prompt = get_prompt(query, context)
 messages.append(get_message('user', prompt))
 messages.append(get_message('system', 'You are a helpful assistant that writes YAML code for Semaphore continuous integration pipelines and explains them. Return YAML code inside code fences.'))

 if num_tokens_from_messages(messages) >= max_tokens_model:
 raise Exception('Model token size limit reached') 

 print("Working on your query... ")
 answer = complete(messages)
 print("Answer:\n")
 print(answer)

 # remove system message and append model's answer
 messages.pop() 
 messages.append(get_message('assistant', answer))

Unfortunately, this implementation is rather rudimentary. It will not support extended conversations as the token count increases with each interaction. Soon enough, we will reach the 4096-token limit for GPT-3, preventing further dialogue.

So, we have to find some way of keeping the request within token limits. A few strategies follow:

  • Delete older messages. While this is the simplest solution, it limits the conversation's "memory" to only the most recent messages.
  • Summarize previous messages. We can utilize "Ask the model" to condense earlier messages and substitute them for the original questions and answers. Though this approach increases the cost and lag between queries, it may produce superior outcomes compared to simply deleting past messages.
  • Set a strict limit on the number of interactions.
  • Wait for the GPT-4 API general availability, which is not only smarter but has double token capacity.
  • Use a newer model like "gpt-3.5-turbo-16k" which can handle up to 16k tokens.

Conclusion

Enhancing the bot's responses is possible with word embeddings and a good context databse. To achieve this, we need good quality documentation. There is a substantial amount of trial and error involved in developing a bot that seemingly possesses a grasp of the subject matter.

I hope this in-depth exploration of word embeddings and large language models aids you in building a more potent bot, customized to your requirements.

Happy building!