RagChatbot / README.md
sami606713's picture
Upload 17 files
27a8994 verified

Backend

This folder contains the backend services for the Document Chat App.

app.py

This file is the main entry point for the Streamlit web application. It handles the user interface, chat history management, and interacts with the agent to process user queries and generate responses.

main.py

This script is responsible for processing documents. It loads and extracts data (tables, texts, images) from PDF files in the data directory, summarizes them using the summerizer module, and then chunks and adds the processed documents to the vector store. It keeps track of processed files in processed_files.txt to avoid reprocessing.

data/

This directory is intended to store the raw PDF documents that need to be processed by the system.

vectorStore/

This directory stores the generated vector embeddings of the processed documents. These embeddings are used by the agent for retrieving relevant information during the chat.

agent/

This module contains the logic for the conversational agent, which uses the vector store to answer questions based on the processed documents.

summerizer/

This module provides functionalities for summarizing different types of content (text, images) extracted from the documents.

utils/

This module contains utility functions, such as helper.py for loading and extracting data from documents.

tool/

This module likely contains tools or functions used by the agent to perform specific tasks.

generator.py

This file likely contains code related to generating responses or content within the application.

How to Run

To run the backend application, you will typically run app.py using Streamlit after ensuring all dependencies are installed and documents are processed by main.py.

streamlit run app.py

Running main.py for Document Embedding

To process and embed documents, run the main.py script. This script will load PDF files from the data directory, extract and summarize their contents, and then add them to the vector store.

python main.py

Make sure that the data directory contains the PDF files you want to process. The script will log processed files in processed_files.txt to avoid reprocessing them.