Unleashing LLM Potential with RAG: Building Intelligent Assistants with Azure and LangChain

Cloud computing for digital storage and transfer big data on int

By Sivachandran NKK Technical Blog March 12, 2025

In this blog, we delve into the fascinating realm of Retrieval-Augmented Generation (RAG) and its application with Azure OpenAI and LangChain. Discover how this innovative approach harnesses the power of pre-trained large language models (LLMs) to generate informative and human-like responses, enhancing real-world applications.

Retrieval-Augmented Generation (RAG)

This concept of RAG applied to large language models (LLMs) and vector databases (Vector DBs) is very similar to how a lawyer utilizes past judgments for a new case. Here’s the breakdown:

Knowledge: The lawyer possesses a vast legal knowledge base gained through education and experience. This is similar to the LLM’s general knowledge acquired from its training data.
Judgments (Vector DB): The lawyer retrieves relevant past judgments (documents) from a legal database. This is analogous to the Vector DB storing documents and their vector representations, which capture the semantic meaning.

The Case:

The client presents the lawyer with a new case (user query).

RAG in Action:

Retrieval (Similar to Lawyer Research): The user query is processed (similar to the lawyer understanding the case). We use techniques like vector search to find the Vector DB for documents (judgments) that are most relevant to the query (case). These documents become the retrieved passages.

Generation (Similar to Lawyer Argument): The LLM is presented with the user query and the retrieved passages (similar to the lawyer considering both the case and relevant past judgments). The LLM uses its knowledge and the context provided by the retrieved passages to generate a response (similar to the lawyer formulating an argument based on their knowledge and past cases).

Benefits (Like a Stronger Case):

Accuracy: The retrieved passages provide factual grounding, reducing the LLM’s tendency towards “hallucinations” (making things up). This ensures a more accurate and relevant response, just like a lawyer’s argument is strengthened by relevant precedents.
Efficiency: The LLM focuses on the retrieved passages, the most relevant information, instead of vast amounts of data. This improves efficiency, similar to how a lawyer doesn’t need to research every single law for a case.

Now that we understand the basics, let’s explore the technical details.

Tools:

Azure OpenAI: This cutting-edge LLM from Microsoft is capable of generating human-quality text, making it ideal for the generation stage of the RAG pipeline. [Azure OpenAI]
LangChain: This open-source library provides tools and components for building complex NLP pipelines, including the Agent feature, which facilitates conversation retrieval. [LangChain RAG]
Azure AI Search: This cloud-based service acts as the “library” in our analogy. We’ll use it to store and search for relevant information based on user queries. [Azure AI Search]

Building the RAG System:

Here’s a step-by-step breakdown of the process:

1. Data Preparation:

1. We’ll need a dataset of documents containing the knowledge base for our system. This could include FAQs, employee handbooks, policies and procedures documents, or any relevant corpus.
2. Each document needs to be preprocessed and cleaned for better retrieval and generation.

2. Embedding and Indexing:

1. We’ll employ the “text-embedding-3-large” model from Azure OpenAI to generate vector embeddings for each document in our dataset. These embeddings capture the semantic meaning of the text in a numerical format.
2. We’ll then store these embeddings along with the corresponding documents in Azure AI Search. This effectively creates a searchable “library” of knowledge represented by vectors.

3. LangChain’s Retriever:

1. LangChain’s Retriever feature helps build the retrieval component. It takes a user query and retrieves the most relevant documents from the Azure AI Search index based on a combination of vector similarity (using embeddings) and semantic similarity (using keyword matching).

4. Prompt Engineering and Generation:

1. We’ll define prompts (templates) that guide the GPT model in generating responses. These prompts will incorporate the retrieved information.
2. The GPT model will then use this prompt and its internal knowledge to generate a comprehensive response.

Sample Workbook:

# Python notebook scripts
This setup will help you load the JSON file, ingest it into Azure AI Search, and use LangChain to query and generate responses based on the ingested data.
To kickstart with loading JSON file and ingesting it into Azure AI Search, follow these steps:
## Import section
```python
from langchain_community.document_loaders import JSONLoader
from langchain_community.vectorstores.azuresearch import AzureSearch
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage
from typing import Dict
from langchain_core.runnables import RunnablePassthrough
```
## Loader
```python
loader = JSONLoader(file_path=, jq_schema=".[]", text_content=False)
docs = loader.load()
docs
```
## Embedding
```python
embeddings: AzureOpenAIEmbeddings = AzureOpenAIEmbeddings(
    azure_endpoint=openai_endpoint,
    api_key=openai_key,
    azure_deployment="text-embedding-3-large"
)
```
## Vector store
```python
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=ai_search_endpoint,
    azure_search_key=ai_search_key,
    index_name=index_name,
    embedding_function=embeddings.embed_query
)
```
## Processing the Bulk data into small chunks
```python
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(docs)
vector_store.add_documents(split_docs)
docs = vector_store.hybrid_search(
    query="*",
    k=2,
)
for d in docs:
    print(d.page_content)
```
## LLM Configuration
```python
llm = AzureChatOpenAI(
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    azure_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
    openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)
```
## Converting the vector store into the retriever
```python
retriever = vector_store.as_retriever()
```
## Setting up the context
```python
SYSTEM_TEMPLATE = """
Answer the user's questions based on the below context. 
If the context doesn't contain any relevant information to the question, don't make something up and just say "I don't know":

{context}

"""
question_answering_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            SYSTEM_TEMPLATE,
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)
```
## Chain configuration
```python
document_chain = create_stuff_documents_chain(chat, question_answering_prompt)
def parse_retriever_input(params: Dict):
    return params["messages"][-1].content
retrieval_chain = RunnablePassthrough.assign(
    context=parse_retriever_input | retriever,
).assign(
    answer=document_chain,
)
```
## Invoke the chain
```python
retrieval_chain.invoke(
    {
        "messages": [
            HumanMessage(content="What John Smith states from his experiment?")
        ],
    }
)
```

Data Privacy and Security:

Powerful AI models without compromising confidentiality. This is a win-win for businesses with sensitive data.

Security and privacy are crucial in a Retrieval-Augmented Generation (RAG) implementation, particularly when it comes to handling and protecting data. Azure OpenAI Service, integrated with Azure’s comprehensive security framework, provides specific assurances that align with these needs.

Security Measures in Azure OpenAI Service for RAG Implementation:

1. Data Encryption

2. Access Controls

1. Role-Based Access Control (RBAC)
2. Azure Active Directory (AAD)

3. Network Security

1. Virtual Networks (VNets)
2. Private Endpoints

4. Compliance and Certifications

1. Regulatory Compliance: Azure OpenAI Service complies with numerous regulatory standards like GDPR, HIPAA, ISO/IEC 27001, SOC 1/2/3, ensuring that data handling practices meet stringent regulatory requirements.
2. Audit and Monitoring: Continuous monitoring and logging are in place to detect and respond to any security incidents.

Privacy Measures in Azure OpenAI Service for RAG Implementation:

No Learning from Customer Data: Azure OpenAI Service has strict policies ensuring that customer data, including documents used in RAG implementations, is not used to train or improve the models. This is critical for maintaining the privacy and confidentiality of the data.
Data Processing Agreements: These agreements outline how customer data is handled, processed, and protected, providing transparency and assurance regarding data usage.

What makes LangChain stand out?

LangChain helps RAG-based implementations work better with Azure OpenAI by giving a complete framework that combines parts for document storage, search, and text generation. It enhances workflow management, ensures data security and privacy, simplifies development and deployment, and optimizes performance, making it easier to build and maintain powerful RAG systems on Azure. Here’s how LangChain facilitates this process:

Hierarchical Indexing

This allows for efficient retrieval and better handling of large datasets. Hierarchical indexing breaks down the data into manageable chunks, improving the performance and accuracy of search queries. The indexing API lets you load and keep in sync documents from any source into a vector store. Specifically, it helps:

Avoid writing duplicated content into the vector store
Avoid re-writing unchanged content
Avoid re-computing embeddings over unchanged content

For Azure environments, LangChain can be configured to work with Azure AI Search, which supports vector search capabilities. This integration makes it easy to work with embeddings made by LLMs and makes sure that relevant documents can be found quickly and correctly.

Retriever

It supports different types of retrievers, including BM25, dense retrieval (vector-based), and hybrid approaches. Custom retrievers can be implemented to leverage the specific capabilities of Azure AI Search, allowing for fine-tuned search and retrieval processes tailored to the application’s requirements.

Benefits

Simplified Development: LangChain abstracts many complexities involved in building RAG-based applications, providing high-level interfaces and utilities.
Flexibility: Developers can customize various components, such as retrievers and vector stores, to meet specific requirements.
Scalability: Leveraging Azure’s cloud infrastructure, LangChain can scale to handle large datasets and high query volumes.
Performance: Hierarchical indexing and efficient vector search integration ensure fast and accurate retrieval, enhancing the overall performance.

Real-World Applications:

Onboarding Assistant: Imagine a new employee joining a company. A RAG-powered assistant, built with this approach, can answer their questions about company policies and procedures in a natural, conversational manner. This not only streamlines onboarding but also fosters a more engaging experience for new hires.

Data Briefing Assistant: Researchers or analysts often face the challenge of quickly grasping key insights from large datasets. A RAG-based assistant can act as a valuable tool here. The assistant can provide briefings that are both clear and useful by getting relevant data summaries based on user queries and using the LLM’s ability to put together different pieces of information. This greatly speeds up the research process.

Conclusion

Using Retrieval-Augmented Generation (RAG) with Azure OpenAI and LangChain provides a robust framework for creating intelligent assistants that can give accurate, relevant, and natural responses. This method improves the precision and speed of large language models (LLMs) by putting them on a huge, already-processed knowledge base. This makes it less likely that they will come up with irrelevant or fabricated information. When you combine Azure AI Search for vector storage and retrieval with LangChain’s flexible retrieval mechanisms, you can make RAG implementations that are secure, scalable, and good for a wide range of real-world uses.

With Azure’s security measures, including encryption, role-based access, and compliance with regulations like GDPR and HIPAA, RAG systems can securely handle sensitive data, making them ideal for enterprise environments. LangChain further simplifies development, enabling rapid deployment and customization of RAG-based applications. This opens a lot of possibilities, from data briefing tools to onboarding assistants.

Sivachandran NKK

+ posts

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by Google. In addition to certain standard Google cookies, reCAPTCHA sets a necessary cookie (_GRECAPTCHA) when executed for the purpose of providing its risk analysis.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other".
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
Zoominfo	session	Zoominfo uses technologies to collect and store information when you interact with services it offer to their partners, such as advertising services or analytics. All of those processes are meant to improve your user experience and the overall quality of our services.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_gtag_UA_111355416_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	This cookie is used to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	This is set by Hotjar to identify a new user’s first session. It stores a true/false value, indicating whether this was the first time Hotjar saw this user. It is used by Recording filters to identify new user sessions.
_hjid	1 year	This is a Hotjar cookie that is set when the customer first lands on a page using the Hotjar script.
_hjIncludedInPageviewSample	2 minutes	This cookie is set to let Hotjar know whether the user is included in the data sampling defined by site's pageview limit.
_hjIncludedInSessionSample	2 minutes	This cookie is set to let Hotjar know whether the user is included in the data sampling defined by site's daily session limit.
_hjTLDTest	session	Hotjar test cookie to check the most generic cookie path it should use, instead of the page hostname. This is done so that cookies can be shared across subdomains (where applicable). To determine this, we store the _hjTLDTest cookie for different URL substring alternatives until it fails. After this check, the cookie is removed.
oktgid	1 year	This cookie is used for storing the visitor ID of the user who clicked on an okt.to link.
oktsid	session	This cookie is used for storing the session ID of the user who clicked on an okt.to link.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by YouTube and is used to track the views of embedded videos on YouTube pages.

Cookie	Duration	Description
__gwtCookieCheck	session	This cookie is used to check if the visitors' browser supports cookies.
AnalyticsSyncHistory	1 month	These cookies are used to deliver advertisements more relevant to you and your interests. They are also used to limit the number of times you see an advertisement as well as help measure the effectiveness of the advertising campaign. They remember that you have visited a website and this information is shared with other organizations such as advertisers.
li_gc	2 years	These cookies are used to deliver advertisements more relevant to you and your interests. They are also used to limit the number of times you see an advertisement as well as help measure the effectiveness of the advertising campaign. They remember that you have visited a website and this information is shared with other organizations such as advertisers.
UserMatchHistory	1 month	LinkedIn - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.