Welcome to our latest tutorial! In our previous blog post, we explored how to implement vector search using the ELSER model in Elasticsearch. Building on that foundation, we’ll now take the next step and create an AI-powered search application. This app will leverage large language models (LLM) and Elasticsearch to deliver insightful and detailed responses based on user queries.
By the end of this guide, you’ll have a fully functional application that combines the advanced search capabilities of Elasticsearch with the nuanced understanding of LLMs, all while using Flask to manage the backend and Bulma for a sleek frontend.
Table of contents
Overview
The app will:
- Accept user input via a web form.
- Query an Elasticsearch index to retrieve relevant documents.
- Use an LLM (like GPT or Groq) to process those documents and answer the user’s query based on the returned data.
- Display the results on the web interface.
What will we be using?
In this tutorial, we’ll use several key technologies to build our AI-powered search app:
- Flask: A lightweight Python web framework to create a simple API that interacts with Elasticsearch and the LLM.
- Elasticsearch: A powerful search engine used to store and retrieve documents based on the user’s query.
- OpenAI or Groq LLMs: Large language models that will process the retrieved documents and generate meaningful insights.
- HTML and JavaScript: For building a basic frontend to interact with the Flask API and display results.
What is Flask and what will it be used for?
Flask is a lightweight web framework for Python that makes it easy to build web applications and APIs. It’s known for its simplicity and flexibility, allowing developers to quickly set up servers with minimal configuration.
In our application, Flask will serve as the backend framework. It will:
- Handle incoming requests from the frontend (such as user queries).
- Interact with Elasticsearch to retrieve relevant documents.
- Send the retrieved documents to the LLM for processing.
- Return the results back to the frontend for display.
Flask’s minimalistic design makes it an ideal choice for this project, allowing us to focus on integrating Elasticsearch and LLMs efficiently.
Prerequisites
Before starting, ensure you have the following:
- Python 3.8+ installed
- Docker (for running Elasticsearch)
- Elasticsearch cluster set up with your data
- Access to an LLM API (OpenAI or Groq)
Initializing our project
Before we go ahead and write the files for our application, we will need to create a virtual enviroment for our software.
Setting Up a Python Virtual Environment
A Python virtual environment is an isolated environment where you can install packages and dependencies for your project without affecting your global Python setup. This helps avoid conflicts between package versions across different projects and keeps everything organized.
Install venv
(if not already installed)
If you’re using Python 3, the venv
module should come pre-installed. To check, run:
python3 -m venv --help
Create a Virtual Environment
Navigate to your project directory and run the following command to create a virtual environment:
python3 -m venv .venv
Activate the Virtual Environment
Once the virtual environment is created, you need to activate it. The activation command differs depending on your operating system. For this example, on a UNIX based OS, you will need to execute:
source .venv/bin/activate
If you are on a Windows machine, you should go for other ways to activate venv.
Install Dependencies
Now that the virtual environment is active, you can install any required packages for your project (like Flask, dotenv, cors, etc.) without affecting your global Python setup. Below is what we need for our first steps:
pip install Flask python-dotenv elasticsearch
Setting up Flask and Elasticsearch
We will begin by creating the core Flask application and connecting it to our Elasticsearch instance. Let’s start small by creating the app.py
file:
from elasticsearch import Elasticsearch
from flask import Flask, jsonify, make_response, request, send_from_directory
import os
from dotenv import load_dotenv
load_dotenv()
app = Flask(__name__)
# Connect to Elasticsearch
client_el = Elasticsearch(
os.environ.get('ELASTICSEARCH_URI'),
basic_auth=(
os.environ.get('ELASTICSEARCH_USER'),
os.environ.get('ELASTICSEARCH_PASSWORD')
)
)
@app.route("/", methods=['POST', 'GET'])
def query_view():
if request.method == 'POST':
# Get the user's query from the POST request
prompt = request.json['prompt']
# Perform a search on Elasticsearch
el_resp = client_el.search(index='enwikiquote', body={
"query": {
"match": {
"text": prompt
}
}
})
# Return the search result as JSON
return el_resp["hits"]["hits"]
# Serve the frontend
return make_response(send_from_directory(".", path="index.html"))
if __name__ == "__main__":
app.run(host="0.0.0.0", port=int(os.environ.get('PORT', 5000)))
And before we test it, we need to add our credentials to elasticsearch! We can achieve this by creating a new .env
file, and in it, we put the important credentials that we left on app.py
:
ELASTICSEARCH_URI=http://localhost:9200
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=changeme
To test our Flask app’s endpoint and simulate a user query, we can use a cURL request. Here’s an example of how to send a request to the /
endpoint with a user prompt.
Explanation
- We use
Flask
to create the web server. - The
Elasticsearch
client connects to an Elasticsearch instance using environment variables. - The
/
endpoint allows users to submit search queries via POST requests, and it fetches the matching documents from the Elasticsearch index.
Example cURL Command
~/ai-test$ curl -X POST http://localhost:5000/ -H "Content-Type: application/json" -d '{"prompt": "What romantic dramas were released after 2003?"}'
[
{
"_id": "119043",
"_ignored": [
"opening_text.keyword",
"source_text.keyword",
"text.keyword"
],
"_index": "enwikiquote",
"_score": 14.329657,
"_source": {
"auxiliary_text": [
"Wikipedia",
"This film article is a stub. You can help out with Wikiquote by expanding it!"
],
"category": [
"Film stubs",
"1984 films",
...
Integrating LLMs for AI-Powered Responses
Now that our basic Flask app can communicate with Elasticsearch, let’s enhance it by incorporating LLMs to analyze the documents returned by Elasticsearch and generate a more insightful answer.
First, set up new environment variables to store your API keys for OpenAI and Groq.
OPENAI_API_KEY=your-openai-api-key
MODEL=gpt-3.5-turbo
For this blog post, we are currently using OpenAI‘s models for AI insights. But you are allowed to use any other model (even local ones!) that is more convinient for you, for an example, Groq (not sponsored, by the way) has a much more forgiving free trial usage, in which can be perfect for our tutorial, since our documents will be taking a heavy usage on large tokens for the models.
Update app.py
Now, let’s install the OpenAI’s python package before doing any changes to our script:
pip install openai
Let’s modify app.py
to include API calls to OpenAI for document analysis:
# ...
from openai import OpenAI
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
MODEL = os.environ.get("MODEL")
# Initialize OpenAI based on available keys
client_openai = OpenAI(api_key=OPENAI_API_KEY)
def get_completion(prompt, docs):
# Craft the input for the LLM
query = f"""
TOP DOCUMENTS FOR USERS QUESTION:
{docs}
ORIGINAL USER'S QUESTION: {prompt}
"""
# Choose the right LLM
client = client_openai
# Send the query to the LLM
message = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": """Your objective is to answer
the user's question based on the documents retrieved from
the prompt.
If you don't know the answer or it's missing
from the documents, CLARIFY it to the user and don't make
up any inexistent information outside of the documents
provided, and don't mention it.
There can be multiple answers to the user's question.
Don't include [...] on your answers. Detail the
items (example: including the director, synopsys, writter,
main plot, etc.) on what's written on the documents. Don't use
italics or bold text.
Only prompt your answer.
At the end of your answer, ALWAYS provide the _id(s) from the
document(s) that most fits the user's question and your answer
AND documents you end up citing in your answer
(example: "_id:12345", "_id:12345,678891,234567").
PROMPT EXAMPLE:
TOP DOCUMENTS FOR USERS QUESTION:
{
"_id": "100090",
"_source": {
"content_model": "wikitext",
"opening_text": "Before Sunset is a 2004 sequel to the 1995 romantic drama film Before Sunrise. Directed by Richard Linklater. Written by Richard Linklater, Ethan Hawke, Julie Delpy, and Kim Krizan. What if you had a second chance with the one that got away? (taglines)",
"wiki": "enwikiquote",
"auxiliary_text": [
"Wikipedia"
],
}
}
{
"_id": "104240",
"_source": {
"content_model": "wikitext",
"opening_text": "Dedication is a 2007 romantic dramedy about a misogynistic children's book author who is forced to work closely with a female illustrator instead of his long-time collaborator and only friend. Directed by Justin Theroux. Written by David Bromberg. With each moment we write our story.",
"wiki": "enwikiquote",
"auxiliary_text": [
"Wikipedia",
"This film article is a stub. You can help out with Wikiquote by expanding it!"
],
}
}
ORIGINAL USER'S QUESITON: Are there any romantic drama written after 2003?
YOUR ANSWER:
There are several romantic dramas that were written or filmed after 2003, including:
Before Sunrise, a 1995 romantic drama film that Before Sunset is a sequel to, The Mirror Has Two Faces, a 1996 American romantic dramedy film, and Get Real, a 1998 British romantic comedy-drama film about the coming of age of a gay teen.
The Mirror Has Two Faces is a 1996 American romantic dramedy film written by Richard LaGravenese, based on the 1958 French film Le Miroir à Deux Faces.
_id:100090,104240
""" },
{"role": "user", "content": query}
]
)
return {"message": message.choices[0].message.content, "docs": docs}
# ...
And on our query_view()
function, we need to update our response to get complemented with the AI Insight:
# ...
@app.route("/", methods=['POST', 'GET'])
def query_view():
if request.method == 'POST':
prompt = request.json['prompt']
el_resp = client_el.search(index='enwikiquote_vectorized', source={
"excludes": [ "source_text", "text", "text_embedding" ]
}, query={
"sparse_vector": {
"field": "text_embedding",
"inference_id": "my-elser-model",
"query": prompt
}
})
response = get_completion(prompt, el_resp["hits"]["hits"])
return jsonify({'response': response["message"], "docs": response["docs"]})
return make_response(send_from_directory(".", path="index.html"))
# ...
Explanation
- The function
get_completion
is responsible for querying the LLM API (either OpenAI or Groq) with the user’s prompt and the documents retrieved from Elasticsearch. - It creates a formatted query that sends both the user’s question and the relevant documents to the LLM.
Example cURL Command
~/ai-test$ curl -X POST http://localhost:5000/ -H "Content-Type: application/json" -d '{"prompt": "What romantic dramas were released after 2003?"}'
{
"docs": [
{
"_id": "100090",
"_ignored": [
"source_text.keyword",
"text.keyword"
],
"_index": "enwikiquote_vectorized",
"_score": 13.876076,
"_source": {
"auxiliary_text": [
"Wikipedia"
],
"category": [
"2004 films",
"American films",
"Romantic drama films",
"Sequel films",
"Films directed by Richard Linklater"
]
...
],
"response": "There are several romantic dramas that were released after 2003, including:
Before Sunset, a 2004 sequel to the 1995 romantic drama film Before Sunrise directed by Richard Linklater, written by Richard Linklater, Ethan Hawke, Julie Delpy, and Kim Krizan
15, a 2002 Singaporean film expanded version directed by Royston Tan that is about teenage gangsters, however it technically comes before the requested date, however the source says it's an award-winning 'short film', that means is possible the version that matches user question can be before or after 2002, and the source could not confirm which was first.,
though it is about teenage gangsters it shares elements with romantic drama
You & Me, this is a 2008 romantic drama film directed by Dou Yang and Dong Yu, not included in the top result documents.
_id:100090, 103638"
}
Modifying the Frontend to Handle Search Input
In this step, we will build the frontend for our application. The frontend is responsible for collecting the user’s query, sending it to the Flask API, and displaying the AI-generated results. To make this process smoother, we’ll use two popular libraries: Bulma for styling and Axios for making HTTP requests.
Bulma
Bulma is a modern CSS framework that helps developers build clean and responsive web interfaces quickly. It provides pre-built classes for layout, typography, buttons, forms, and much more, making it easy to design a polished user interface without writing much custom CSS.
In our project, Bulma will:
- Style the search form and results.
- Create responsive layouts that look great on all screen sizes.
- Add smooth transitions and hover effects to the results boxes.
Axios
Axios is a promise-based HTTP client for JavaScript, making it easier to send asynchronous requests to servers. Unlike the traditional XMLHttpRequest
or the fetch
API, Axios offers a more concise syntax and automatic handling of JSON data.
Let’s add a simple index.html
file to serve as the front end for our application, allowing users to submit queries and view results.
<html>
<head>
<title>AI-Powered Search</title>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bulma/css/bulma.min.css">
</head>
<body>
<section class="section">
<div class="container">
<h1 class="title">Search the Wiki</h1>
<form id="search-form">
<div class="field">
<label class="label">Enter your query</label>
<div class="control">
<input id="prompt" class="input" type="text" placeholder="Search for something...">
</div>
</div>
<button class="button is-primary" type="submit">Search</button>
</form>
<div id="results" class="mt-5"></div>
</div>
</section>
<script src="https://cdn.jsdelivr.net/npm/axios/dist/axios.min.js"></script>
<script>
document.getElementById('search-form').addEventListener('submit', function (e) {
e.preventDefault();
const prompt = document.getElementById('prompt').value;
axios.post('/', { prompt: prompt })
.then(function (response) {
document.getElementById('results').innerHTML = response.data.response;
response.data.docs.forEach(({ _source }) => {
let categories_spans = "";
_source.category.forEach(element => {
categories_spans = [...categories_spans, `<span class="tag is-dark">${element}</span>`]
});
document.getElementById('results').innerHTML += `
<div class="box">
<h4 class="title is-4">${_source.title}</h4>
${categories_spans.join(' ')}
<p>${_source.opening_text}</p>
</div>
`;
});
})
.catch(function (error) {
console.error(error);
});
});
</script>
</body>
</html>
Running the Application
With the app set up, we can now run the flask application with:
flask run --debug
Visit http://localhost:5000
to interact with your AI-powered search app!
Conclusion
In this tutorial, we built an AI-powered search application using large language models (LLMs) and Elasticsearch. We covered how to set up the backend with Flask, connect to Elasticsearch, integrate LLMs for AI insights, and create a responsive frontend using Bulma and Axios. This application allows users to query a vectorized Elasticsearch database and receive detailed answers generated by an AI model, all while maintaining an intuitive user interface.
By following these steps, you’ve not only learned how to create a powerful search engine but also how to leverage the combined power of LLMs and Elasticsearch to deliver rich, document-based insights.
If you’d like to explore the full source code or contribute to the project, you can find the repository here: https://github.com/EdRamos12/ai-insight-search-w-elasticsearch.
Feel free to customize and expand on this application to suit your needs. Happy coding!