UROP Proceeding 2024-25

School of Engineering Department of Computer Science and Engineering 157 Retrieval Augmented Generation with Vector Database Supervisor: ZHOU Xiaofang / CSE Student: LI Xinwei / COGBM Course: UROP 2100, Fall The rapid progress in deep learning has highlighted the significance of vector embeddings in modern applications such as retrieval-augmented generation (RAG) and similarity search. This evolution has driven the widespread use of vector databases and indices, enabling efficient approximate nearest neighbor (ANN) searches across extensive datasets. PostgreSQL is a robust open-source object-relational database system, while pgvector is a specialized tool for performing vector similarity searches within PostgreSQL. This paper focuses on two primary aspects: the processing of the HK_data_600M_embeddings_146358 dataset and an exploration of the pgvector tool through various experiments. These experiments include measuring the time taken for hybrid searches with different selectivity predicates, ensuring an adequate number of neighbors are found, and assessing recall rates.

RkJQdWJsaXNoZXIy NDk5Njg=