CDRC Semantic Search System
The CDRC Semantic Search System is a project designed to enhance the search capabilities of the Centre for Consumer Data Research (CDRC) data catalogue. The goal is to implement a semantic search approach that goes beyond traditional keyword-based searches, providing users with more accurate and relevant results.
GitHub README
Overview
The CDRC Semantic Search System is a project designed to enhance the search capabilities of the Centre for Consumer Data Research (CDRC) data catalogue. The goal is to implement a semantic search approach that goes beyond traditional keyword-based searches, providing users with more accurate and relevant results.
Features
- Semantic Search: Utilizes advanced natural language processing techniques to understand the meaning behind user queries, enabling a more intuitive and precise search experience.
System Architecture
The CDRC Semantic Search System follows a standard Retrieval Augmented Generation (RAG) architecture:
Credit to Heiko Hotz (https://towardsdatascience.com/rag-vs-finetuning-which-is-the-best-tool-to-boost-your-llm-application-94654b1eaba7)
Installation
To get started with the CDRC Semantic Search System, follow these steps:
Clone the repository:
git clone https://github.com/cjber/cdrc-semantic-search.git
Install dependencies:
With pip:
cd cdrc-semantic-search
pip install -r requirements.txt
With pdm:
cd cdrc-semantic-search
pdm install
Configure the system:
Edit the
config/config.toml
file to customize settings such as API keys, or model settings.Run the system using a DVC pipeline.
dvc repro
Note: The CDRC Semantic Search System is an ongoing project, and we appreciate your feedback and support in making it a valuable tool for researchers at the Centre for Consumer Data Research.