Waiting for the 2.0 version of Constellio, we have decided to draw and explain the system architecture of Constellio 1.3
We thought it could help better understanding the way things work. This entry does not cover how these components are mapped to java classes, servlets, files and databases, but it gives a good overview of how it works.
This is the first entry of a series, as it would take too much time and space to explain all the components in one entry.
Before we explain the components, let us remember that search engines have 3 layers: the data retrieval layer (aka connectors), the indexing layer, and the searching layer. Wherever applicable, I’ll use this terminology to detail the components.
This first entry explains the data retrieval part.
On the left, you have everything that concerns the connectors, i.e. the data retrieval layer. Historically, Constellio relies on the google connector framework. There may be an evolution for version 2.0, but up to 1.3 it focuses on the google connectors (BTW, thanks again to google for releasing these connectors under the Apache V2 license!).
We gathered the different elements in the Google Connector Framework box, although this box does not represent java code per se.
This box is composed of the Google Connector Manager, and the set of connectors which are available in Constellio. The google Connector Manager (version 2.6 for Constellio 1.3) is the entry point for interacting with the connectors. These connectors are the ones doing the real job, i.e. connecting to the systems, fetching the data and sending it back to the Manager. The Connector Manager is also the access point to all the configuration info, which explains why the Constellio admin component is doing calls to the google connector manager: this is in order to read or update the connectors config info.
Constellio proposes a set of connectors, some of them directly provided by google under Apache license, and some others implemented by Doculibre, but respecting the google connector framework in order to be managed by the google connector manager.
Among the connectors, we can mention the most popular ones: Files (provided by Google), Web (provided by Doculibre), DB (provided by Google), Sharepoint (provided by Google), Email (provided by Doculibre).
Once the data is retrieved by the connectors, it is handled by the Connector Manager which pushes the data through the Constellio Feed Component. This component stores all the document retrieved, together with metadata, into a local store (in a DB table), and each entry is called a record. It’s this content which is used by the indexing manager for the indexing part, but this will be explained in Part 2 of this blog series.