CDRC Semantic Search System

github
Published

May 23, 2024

The CDRC Semantic Search System is a project designed to enhance the search capabilities of the Centre for Consumer Data Research (CDRC) data catalogue. The goal is to implement a semantic search approach that goes beyond traditional keyword-based searches, providing users with more accurate and relevant results.

GitHub README

Overview

The CDRC Semantic Search System is a project designed to enhance the search capabilities of the Centre for Consumer Data Research (CDRC) data catalogue. The goal is to implement a semantic search approach that goes beyond traditional keyword-based searches, providing users with more accurate and relevant results.

Features

  • Semantic Search: Utilizes advanced natural language processing techniques to understand the meaning behind user queries, enabling a more intuitive and precise search experience.

Streamlit demo (19/12/23)

System Architecture

The CDRC Semantic Search System follows a standard Retrieval Augmented Generation (RAG) architecture:

Credit to Heiko Hotz (https://towardsdatascience.com/rag-vs-finetuning-which-is-the-best-tool-to-boost-your-llm-application-94654b1eaba7) Credit to Heiko Hotz (https://towardsdatascience.com/rag-vs-finetuning-which-is-the-best-tool-to-boost-your-llm-application-94654b1eaba7)

Installation

To get started with the CDRC Semantic Search System, follow these steps:

  1. Clone the repository:

    git clone https://github.com/cjber/cdrc-semantic-search.git
  2. Install dependencies:

With pip:

cd cdrc-semantic-search
pip install -r requirements.txt

With pdm:

cd cdrc-semantic-search
pdm install
  1. Configure the system:

    Edit the config/config.toml file to customize settings such as API keys, or model settings.

  2. Run the system using a DVC pipeline.

    dvc repro

Note: The CDRC Semantic Search System is an ongoing project, and we appreciate your feedback and support in making it a valuable tool for researchers at the Centre for Consumer Data Research.