Solr - fli-iam/shanoir-ng GitHub Wiki

Solr

Solr is the popular, blazing fast, open source enterprise search platform built on Apache Lucene™. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world's largest internet sites.

Solr indexing in shanoir-ng

Data to be indexed

One document Solr corresponds to one dataset in Shanoir. We need to index information about dataset, examination, subject and study into Solr, so that Solr could better perform the search. For one document Solr, here are the fields which will be indexed:

datasetId
datasetName
datasetType**? MR CT PET?**
datasetNature**? T1, T2, Diff, etc..**
datasetCreationDate
examinationComment
examinationDate
subjectName
studyId
studyName

Since Shanoir-ng is based on micro services architecture, the information about dataset and examination could be found directly from the datasets micro service; the information about study and subject needs to be received from the studies micro service. As solr focus on the document/dataset level, we will implement the Solr indexer inside the datasets micro service. Information that we need from studies service will be replicated into the datasets micro service via message queue.

DatasetId could be defined as uniqueKey in the Solr schema design, so that the updates will be easier. studyId is indexed here to add Shanoir's access rights into the Solr search.

Indexing schedule

The total indexing (remove all + index all) will be done each day at 6am.

Solr search in shanoir-ng

The Solr page is accessible with one menu. At the left side, some facet fields will be displayed to configure the search and to refine the results.

Facet fields

studyName
subjectName
datasetName
datasetCreationDate
examinationComment

Facet search

StudyName field helps to refine the results according to one study selected in the whole study list. The other facet fields will also refine the results, with text field since many results will present in each list.

Full Text search

Maybe not required?

Results layout

The results will be presented at the right side of the Solr page. The results are displayed in a tree, with the structure of studyName - subjectName - examinationComment and examinationDate - datasetName.

Actions

See dataset details

Dataset details page can be accessed by clicking on the datasetName.

Download dataset

Download datasets in lot

Access rights

The access rights to the Solr documents should be based on Shanoir's roles and rights specification. One user is allowed to see datasets on which he has the right. For the download rights, it depends totally on his role and rights in Shanoir. Since user's access rights is based on its roles and his rights defined in the study_user table, with can_see, can_download labels, we added information about the study lists that one user has rights on, to the Solr query. To get the study lists information, we receive the user's identifiant within the keycloak token, check his rights in the study_user table and the role table. According to his rights, we added a list of studyId to the Solr query, so that only the studies in which the user is member could be seen and/or downloaded.