Our proposal for an open source Datafari FOCUSING on AI has been accepted by the NGI Search CONSORTIUM. it receives 150.000€ in funding to move Datafari towards the latest AI search technologies.
Nine years after its birth and traditionnaly oriented towards keywords based search technologies such as BM25, Datafari Enterprise Search is moving towards vector search and large language models (LLMs). The objective is for users to be able to discuss with their documents: to be able to ask questions in natural language, and to get either the best relevant documents or generated answers, based on their needs. To accompany these technological efforts, and because these enhancements will be fully available in the open source edition of Datafari Enterprise Search, the NGI Search EU consortium welcomed the Neural Datafari proposal as parts of its beneficiaries. The funded Neural Datafari project will fulfill the following major promises:
For Apache Solr:
- Enhancing the embedding phase for incoming documents. This means that for Apache Solr users, it will be simpler to enrich documents at indexing time. The actual LLM based vector embedding will be external to Solr, but its usage will be simplified as much as possible.
- Adding automatic embedding of users queries for KNN search. The embedding in itself will be external to Solr, as above.
For Datafari Community Edition:
- Integration of the new vector search capabilities of Apache Solr into Datafari
- Addition of RAG capabilities to Datafari
This project is led by France Labs, makers of Datafari, and is done together with Sease.io, a UK based company expert in open source search, in particular Apache Solr and Opensearch. They are already major contributors to Apache Solr in the AI domain.
The NGI Search consortium is composed of two Universities, Aarhus and Murcia Universities, two SMEs, FundingBox and Linknovate Science and one open source community, OW2.