Chatbot interactions have been revolutionized with advancements in AI and NLP like OpenAI's GPT and LangChain. In this post, we'll explore how to use Shakudo to simplify and enhance the process of building a Q&A app for an Internal Knowledge base from conceptualization to deployment. We chose Confluence for this tutorial because it's an optimal platform for creating internal knowledge bases. Its intuitive interface supports efficient information management, and its advanced search capabilities ensure that you find what you need without unnecessary delays.
Want to skip to the code? It’s available on GitHub.
ChatGPT's human-like capability to extract information from vast data has truly transformed the field of language models. But with a 4096-token context limit, extracting details from extensive text documents is still a challenge. There are multiple ways to get around this problem.
Option one involves generating text snippets and sequentially prompting the large language model (LLM), refining the answer step by step. Although this method covers the text effectively, it falls short when it comes to time and cost efficiency due to its resource-intensive nature.
Option two involves utilizing LLMs with larger context windows, such as the Claude model by Anthropic, offering a 100k-token window. However, it partially solves the problem as we need to ensure that the model can accurately and comprehensively extract from our extensive knowledge base.
Option three capitalizes on the power of embeddings and similarity search and is the one we chose for the tutorial. It maintains an embedding vector store for each text snippet, calculates question embeddings, and retrieves the nearest text snippets via a similarity search on embedding. The retrieved text snippets are used to query the LLM with by constructing a prompt to obtain an answer
Our proposed architecture operates as a pipeline that efficiently retrieves information from a knowledge base (in this case, Confluence) in response to user queries. It includes four main steps:
Step 1: Knowledge base processing
This step involves transforming information from a knowledge base into a more manageable format for subsequent stages. Information is segmented into smaller text snippets and vector representations (embeddings) of these snippets are generated for quick and easy comparison and retrieval. Here, we use Langchain's ConfluenceLoader with TextSplitter and TokenSplitter to efficiently split the documents into text snippets. Then, we create embeddings using OpenAI's ada-v2 model.
Step 2: User query processing
When a user submits a question, it is transformed into an embedding using the same process applied to the text snippets. Langchain's RetrievalQA, in conjunction with ChromaDB, then identifies the most relevant text snippets based on their embeddings.
Step 3: Answer generation
Relevant text snippets, together with the user's question, are used to generate a prompt. This prompt is processed by our chosen LLM to generate an appropriate response to the user's query.
Step 4: Streamlit service and Shakudo deployment
Finally, we package everything into a Streamlit application, expose the endpoint, and deploy it on a cluster using Shakudo. This step ensures a seamless transition from development to production quickly and reliably, as Shakudo automates DevOps tasks and lets developers use Langchain, Hugging Face pipelines, and LLM models effortlessly with pre-built images.
We use OpenAI's adav2 for text embeddings and OpenAI's gpt-3.5-turbo as our LLM. OpenAI offers a range of embedding models and LLMs. The ones that we have chosen balance efficiency and cost-effectiveness, but depending on your needs, other models might be more suitable.
This stage involves preparing the embedding model for text snippet processing and the LLM model for the final query response. Our go-to models are ada-v2 for embeddings and gpt-3.5-turbo for text generation, respectively. Learn more about embeddings from OpenAI Documentation.
After that, let’s initialize the LLM model to be used for the final LLM call to query with prompt:
The 'temperature' parameter in the LLM initialization impacts the randomness of the model's responses, with higher values producing more random responses and lower values producing more deterministic ones. Here, we've set it to 0, which makes the output entirely deterministic.
To wrap up the environment setup, we specify the OpenAI API key, a prerequisite for LangChain's functionality. Make sure the API environment key is named OPENAI_API_KEY – it's a requirement for LangChain.
For security, we store the key in a .env file, and ensure the key is correctly recognized by our application. Never print or share your keys as this can expose them to potential security threats.
With our environment set up, we are now ready to start building our Confluence Q&A application.
In this step, we will extract the documents from the Confluence knowledge base, transform these documents into text snippets, generate embeddings for these snippets, and store these embeddings in a Chroma store.
ConfluenceLoader is a powerful tool that allows us to extract documents from a Confluence site using login credentials. It currently supports username/api_key and OAuth2 authentication methods. Be careful when handling these credentials, as they are sensitive information.
Next, we split these documents into smaller, manageable text snippets. We employ CharacterTextSplitter and TokenTextSplitter from LangChain for this task.
Lastly, we generate embeddings for these text snippets and store them in a Chroma database. The Chroma class handles this with the help of an embedding function.
We have now successfully transformed our knowledge base into a store of embedded text snippets, ready for efficient querying in the subsequent stages of our pipeline.
For a dynamic confluence pages, the vector store creation process can be scheduled with the help of Shakudo jobs pipeline.
ChromaDB is an advanced indexing system that significantly accelerates retrieval based on what we call ‘semantic similarity’. In other words, it’s really good at finding and matching things that have the same meaning, making our process quicker and more accurate.
Next, we deal with ‘questions embeddings’. This is a way for us to understand the meaning behind the questions. You could say it’s like converting the questions into a language that ChromaDB speaks fluently. Now, ChromaDB is able to pinpoint the most useful snippets of information. These snippets form the foundation of smart and detailed responses in our app.
These steps enhance the effectiveness of our RetrievalQA chain. We’re making sure that our app is delivering fast, accurate and useful answers to the questions received.
In this step, we construct a prompt for our LLM. A prompt is a message that sets the context and asks the question that we want the LLM to answer. To pass a custom prompt with context and question, you can define your own template as follows:
In the following class, ConfluenceQA, we package all the necessary steps that include initializing the models, embedding, and combining the retriever and answer generator into one organized module. This encapsulation improves code readability and reusability.
Once the ConfluenceQA class is set up, you can initialize and run it as follows:
Remember that the above approach is a structured way to access the Confluence knowledge base and get your desired information using a combination of embeddings, retrieval, and prompt engineering. However, the success of the approach would largely depend on the quality of the knowledge base and the prompt that is used to question the LLM.
Let’s wrap our solution in a Streamlit app and deploy it as a service. This will make it accessible either locally or on a cloud-based cluster.
To create an interactive web application around our ConfluenceQA class, we use Streamlit, a Python library that simplifies app creation. Below is the breakdown of the code:
We start by importing necessary modules and initializing our ConfluenceQA instance:
We then define a sidebar form for user inputs:
And finally, we provide a user interface for asking questions and getting answers:
This Streamlit app can be launched locally or on a cluster and allows us to interact with our Confluence Q&A system in a user-friendly manner.
Finally our app is ready, and we can deploy it as a service on Shakudo. The platform makes the process of deployment easier, allowing you to quickly put your app online.
To deploy our app on Shakudo, we need two key files: pipeline.yaml, which describes our deployment pipeline, and run.sh, a bash script to set up and run our application. Here's what these files look like:
In this script:
After you have these files ready, navigate to the service tab and click the '+' button to create a new service. Fill in the details in the service creation panel and click 'CREATE'. On the services dashboard, click on the ‘Endpoint URL’ button to access the application. Check out the Shakudo docs for a more detailed explanation.
Now, your application is live! You can browse through the user interface to see how it works.
Shakudo is designed to facilitate the entire lifecycle of data and AI applications. Automating all stages of the development process, including the stack integrations, deployment, and the ongoing management of data-driven applications.
By using Shakudo, you’ll take advantage of:
This blog provides a comprehensive guide to developing a Confluence Q&A application utilizing the power of Shakudo, Langchain, and ChatGPT, aiming to resolve the challenge posed by ChatGPT's token limit when extracting information from extensive text documents. We used a new method leveraging embeddings and similarity search, ensuring a more efficient process of retrieving accurate information from a knowledge base.
Are you ready to start building more efficient data-driven applications? Check out our blog post on “How to Easily and Securely Integrate LLMs for Enterprise Data Initiatives”. Feel free to reach out to our team to check out how you can start using Shakudo to help your business. Thank you for reading and happy coding!
This code is adapted based on the work in LLM-WikipediaQA, where the author compares FastChat-T5, Flan-T5 with ChatGPT running a Q&A on Wikipedia Articles.
Buster: Overview figure inspired from Buster’s demo. Buster is a QA bot that can be used to answer from any source of documentation.
Claude model: 100K Context Window model from Anthropic AI