About

Software Engineer at the Consumer Data Research Centre (CDRC), with over 5 years of experience programming in Python. PhD with published research papers in geographic information science and NLP.

In my current role, I worked with Llama Index and LangChain to build a Retrieval Augmented Generation (RAG) LLM-powered search system for the CDRC Data Catalogue. I am currently developing a data ingestion pipeline using Prefect to automate data checks, while building various data products, including a large-scale UK geocoded corpus from the Common Crawl.

Extensive experience with PyTorch (Lightning), Hugging Face transformers, and the OpenAI API. MLOps and reproducibility with DVC, Prefect, GitHub Actions and Docker/Podman. Machine Learning and data processing with Scikit-Learn, Pandas and Polars1.


Footnotes

  1. My new favourite dataframe library.↩︎