Tutorial on Authorizations for Manifold CF and Solr

NOTE: If you are interested in using ManifoldCF with Solr, you may want to look at our Datafari software , which combines Apache ManifoldCF with Solr, so it eases this kind of integration. The code is available on google code: https://github.com/francelabs/datafari

Manifold CF (MCF) provides a early-binding authorization mechanism for file searchs. The aim of this entry is to will describe this mechanism, and then to show you the different steps needed to configure MCF and Solr to use this fonctionnality.

MCF extracts ACLs from files at crawling-time, and injects them into Solr as specific fields for the Solr document. Continue reading

Tutorial for combining ManifoldCF and Solr for files search

NOTE: If you are interested in using ManifoldCF with Solr, you may want to look at our Datafari software, which combines Apache ManifoldCF with Solr, so it eases this kind of integration. The code is available on google code: https://github.com/francelabs/datafari

With the arrival of Manifold CF 1.0 (now already in v2.5), the open source community is looking for tutorials to combine it with Solr 4. That’s the intent of this tutorial, which will drive you through the different steps required to make it work.

First, we’ll recap the installation process of Manifold CF (we’ll call it MCF later on), and of Solr. Second, we’ll configure both tools so that they can interact with each other. Third, we’ll configure MCF so that it crawls a windows file share. In this tutorial, when I specify installation directory such as solr-4.1.0, you have to complete with the absolute path of the installation directory. Continue reading

Slow Constellio admin interface

During a customer installation of Constellio 1.3, we have noticed an utterly slow loading of the admin interface, which is rather unusual (it is not that slow on other installs with the same amount of indexed content).
After an analysis, we have identified that the Constellio admin UI triggers every 5 seconds a query to its Solr, in order to know the number of indexed files, so as to update this figure in its admin UI. And in our particular installation, this query was taking an awful time to be processed (although the user queries were very fast). We could have changed the query frequency, but even the query by itself was slow. So we decided to change the query in order to have a much better query time. Continue reading

Searching everything = Talend + Constellio + Solr

We are preparing a series of blog entries for January/February 2013, related to combining Talend, Constellio and Solr in order to benefit from the power of Talend to have way more connectors to be used in combination with Constellio. We didn’t have time yet to work on a pure Talend + Solr solution, which would leverage ManifoldCF, so our entries will be about using the Google Connector Manager used in Constellio 1.3

Don’t hesitate to share with us if it is a blog series that is exciting for you.

Constellio 1.3 architecture part 3

Waiting for the 2.0 version of Constellio, we have decided to draw and explain the system architecture of Constellio 1.3
We thought it could help better understanding the way things work. This entry does not cover how these components are mapped to java classes, servlets, files and databases, but it gives a good overview of how it works.

This is the third and last entry of a series, as it would take too much time and space to explain all the components in one entry. Continue reading

Activating early binding in Constellio 1.3

Waiting for Constellio V2.0, we thought you may be interested in seeing how to activate early binding in Constellio 1.3
As a reminder, there are two ways to manage security for documents search: early binding and late binding. By security management, we mean the fact that an authorised user in a search engine is allowed to see as an answer to a search request, only the results he is actually allowed to see.
Early binding is the recommended way as it provides the fastest answer time. It consists in storing as part of the index the ACL (Access Control List) of the indexed documents, as an additional field of the Lucene index. Thus, when someone does a search, his username is appended to the search query, and there is a field filtering based on his username. The pros is that it only impacts the search time by the time it takes to filter on a field (which means a very small overhead). The con is that the documents ACLs are only synchronised when the documents are recrawled and reindexed. So if you plan a crawl everynight, your indexed ACLs will only be updated every night, hence generating a potential one day discrepancy. Still, this is the recommended way for standard scenarios, as most enterprise needs don’t require a to-the-minute update of the ACLs of files. Continue reading

Constellio 1.3 architecture part 2

Waiting for the 2.0 version of Constellio, we have decided to draw and explain the system architecture of Constellio 1.3
We thought it could help better understanding the way things work. This entry does not cover how these components are mapped to java classes, servlets, files and databases, but it gives a good overview of how it works.

This is the second entry of a series, as it would take too much time and space to explain all the components in one entry. Continue reading

Constellio 1.3 architecture Part 1

Waiting for the 2.0 version of Constellio, we have decided to draw and explain the system architecture of Constellio 1.3
We thought it could help better understanding the way things work. This entry does not cover how these components are mapped to java classes, servlets, files and databases, but it gives a good overview of how it works.

This is the first entry of a series, as it would take too much time and space to explain all the components in one entry. Continue reading