Monthly MeSH Update – Summary - dedreval/xdps-docs GitHub Wiki

Summary of the monthly MeSH update (“Download Meshterms”) procedure: steps, resources (remote/local files), database tables, triggers, and success/error flows.

Code: XDPS/CochraneCMS (MedlineDownloader, MeshtermDownloader, MeshtermManager, MonthlyMeshtermExecutor, MonthlyMeshTermsPackageGenerator, etc.).
Relation to annual update: The tables filled by this flow (COCHRANE_MESHTERM_RECORD_DATES, COCHRANE_MESHTERM_RECORDS, COCHRANE_MESHTERM_DESCRIPTORS, COCHRANE_MESHTERM_QUALIFIERS) are read by Step 3 of the annual MeSH update (Update MeSH in WML3G). See ANNUAL_MESH_UPDATE_SUMMARY.md.


1. Clear steps (triggers + code)

Step Name What happens Code / trigger
1 Start Download Meshterms User or scheduler starts the flow for an issue Manual: Issue List → select issue → action "Download Meshterms" (IssueWrapper.MeshtermsDownloadAction). Scheduled: MonthlyMeshtermExecutor (cron cochrane.meshterm-updater.monthly-pattern, e.g. 0 20 0 1 * ? = 00:20 on 1st of month) finds open issue and runs same action if MeSH not already downloaded.
2 Medline download (PubMed) For each CDSR review, fetch MeSH XML from PubMed and save to issue-specific folder MedlineDownloader (JMS queue medline-downloader-service) → timer → MeshtermDownloader.download(params) per record. Params from getParams() = one entry per CDSR review (findSysrevReviewRecordNames()); destination = cms.resources.medline.downloader.downloads + currentIssue (e.g. …/202401).
3 Callback to MeshtermManager After all downloads for the batch, send list of downloaded files to processor MeshtermDownloaderCallback.sendCallback(destinationDirectory) → lists .xml files (excl. search.xml) in directory → creates MeshtermManagerParameters(files, issue, issueId, title) → sends to JMS queue meshterm-manager-service.
4 Process MeSH and fill DB Parse each XML, create/update MeSH rows per record MeshtermManager (JMS queue meshterm-manager-service) → process() → for each file (up to MAX_FILES_ALLOWED=10000): updateMeshterms(file, recordName, issue) → parse MeshHeadings, create/find COCHRANE_MESHTERM_RECORD_DATES, DESCRIPTORS, QUALIFIERS, COCHRANE_MESHTERM_RECORDS. If more files remain, re-queue; else finish().
5 Mark issue and deliver DS MeSH Set issue “meshterms downloaded”, trigger initial DS MeSH package MeshtermManager.finish()setIssueMeshtermsDownloaded(issueId, true); deliverInitialPackageForDS(issueId) → publish TYPE_DS_MESH for CCA DB (generate + send).
6 Generate and upload WML3G packages Build zip(s) of WML3G for records with MeSH in this issue; deliver to content manager findUpdatedRecords(issue, issueId)meshtermStorage.findUpdatedRecords(issue) (record names where COCHRANE_MESHTERM_RECORD_DATES.date ≥ issue) → MonthlyMeshTermsPackageGenerator.generateAndUpload(recordNames, issueId, CDSR, WML3G_POSTFIX) → batches by cochrane.meshterm-updater.monthly-batch (e.g. 50), delay between batches cochrane.meshterm-updater.monthly-batch-interval (e.g. 30 min) → zip + deliverPackage / schedule PackageUploader.

2. What happens on each step (inputs, process, outputs)

Step 1: Start Download Meshterms

  • Input: Selected issue (manual) or open issue for configured date (scheduler).
  • Process:
    • Set issue.meshtermsDownloading = true.
    • Build MeshtermDownloaderRequest(issue, issueId, title) with plan MESHTERM_DOWNLOAD and callback MeshtermDownloaderCallback.
    • Params: getParams()EntireDBStorage.findSysrevReviewRecordNames() = all distinct CDSR record names (productSubtitle = REVIEWS). One download param per record: revmanId = record name, writeSearchResult = "false".
    • Destination directory: cms.resources.medline.downloader.downloads + currentIssue (e.g. …/cms/medline/downloaded/202401).
    • Send request to JMS queue queue/medline-downloader-service.
  • Output: Message in Medline download queue; issue flagged as “meshterms downloading”.

Step 2: Medline download (PubMed)

  • Input: IMedlineRequest (params list, destination directory). Remote: PubMed E-utilities (cms.cochrane.medline.esearch.url, cms.cochrane.medline.efetch.url – e.g. esearch.fcgi, efetch.fcgi).
  • Process:
    • MedlineDownloader.onMessage → init MeshtermDownloader, create timer. If cms.cochrane.meshterm.record.update.calendar is true, getWaitTime() restricts processing to 5:00–21:00 weekdays (GMT-5); else immediate.
    • On timer: for each param map (revmanId, destinationDirectory): MeshtermDownloader.download(p):
      • ESearch: esearch.fcgi?db=pubmed&retmode=xml&term=<revmanId>[page] AND medline[sb]&sort=pub+date → take first PMID from IdList.
      • EFetch: efetch.fcgi?db=pubmed&retmode=xml&id=<id> → write response to <destinationDirectory>/<revmanId>.xml. Optionally write search result to <revmanId>.search.xml if writeSearchResult=true.
    • If a batch fails or there are remaining params, re-queue request with remaining params. If all done, finishProcess()callback.sendCallback(destinationDirectory).
  • Output (local): One XML file per record in <cms.resources.medline.downloader.downloads><currentIssue>/ (e.g. CD001234.xml). DB: None in this step.

Step 3: Callback to MeshtermManager

  • Input: destinationDirectory (path to issue folder).
  • Process: MeshtermDownloaderCallback.sendCallback(path) → list files with .xml and without search.xmlMeshtermManagerParameters(files, issue, issueId, title) → send to JMS queue meshterm-manager-service.
  • Output: Message to MeshtermManager queue.

Step 4: Process MeSH and fill DB

  • Input (local): XML files in issue folder (PubMed MeSH XML with <MeshHeading>, <DescriptorName>, <QualifierName>, MajorTopicYN).
  • Process: MeshtermManager.process(parameters):
    • For each file (batch of up to 10000): recordName = filename without extension; updateMeshterms(file, recordName, issue):
      • If record not in COCHRANE_MESHTERM_RECORD_DATES: createRecordDate(recordName, issue).
      • Parse MeshHeadings → for each descriptor/qualifier: findDescriptorId / findQualifierId (create COCHRANE_MESHTERM_DESCRIPTORS / COCHRANE_MESHTERM_QUALIFIERS if new); findMeshtermRecordId (create COCHRANE_MESHTERM_RECORDS row if new).
      • Remove COCHRANE_MESHTERM_RECORDS rows for this record that are no longer in the new set.
      • If any insert/update: updateRecordDate(recordName, issue).
    • If more files: re-queue parameters with incremented iterator. Else: finish(parameters).
  • Output (DB): Rows in COCHRANE_MESHTERM_RECORD_DATES, COCHRANE_MESHTERM_RECORDS, COCHRANE_MESHTERM_DESCRIPTORS, COCHRANE_MESHTERM_QUALIFIERS.

Step 5: Mark issue and deliver DS MeSH

  • Input: issueId (and issue metadata).
  • Process: setIssueMeshtermsDownloaded(issueId, true); deliverInitialPackageForDS(issueId) → create PublishWrapper(TYPE_DS_MESH, CCA db), setGenerate(true), setSend(true), publishDb → generate and send DS MeSH package for the issue.
  • Output: Issue marked “meshterms downloaded”; DS MeSH package generated and sent (downstream).

Step 6: Generate and upload WML3G packages

  • Input: issue, issueId. DB read: findUpdatedRecords(issue) = record names from COCHRANE_MESHTERM_RECORD_DATES where date >= :date (issue number).
  • Process: MonthlyMeshTermsPackageGenerator.generateAndUpload(recordNames, issueId, CDSR, WML3G_POSTFIX):
    • Split recordNames into batches of size cochrane.meshterm-updater.monthly-batch (e.g. 50).
    • For each batch: generatePackage → build zip of WML3G files for those records; filename pattern clsysrev/clsysrev_<year>_<month>_mu_aut_<suffix>_<timestamp>.zip. Then deliverPackage (immediate or schedule PackageUploader with delay cochrane.meshterm-updater.monthly-batch-interval ms, e.g. 30 min).
    • deliverPackageContentManager.newPackageReceived(file); set initial package delivered for issue.
  • Output (local): Zip file(s) under package path. Downstream: Package delivered to content manager.

3. Structures involved

Remote resources

Resource Step Purpose
PubMed E-utilities (NCBI) 2 ESearch: find PMID for query <recordName>[page] AND medline[sb]. EFetch: get MeSH XML for that PMID. URLs from cms.cochrane.medline.esearch.url, cms.cochrane.medline.efetch.url.

Local files

Path / file Step Direction Purpose
cms.resources.medline.downloader.downloads + <issue> (e.g. …/cms/medline/downloaded/202401) 2 out Directory for PubMed MeSH XML per record (<recordName>.xml; optional <recordName>.search.xml).
Same directory 3 in MeshtermDownloaderCallback lists .xml files (excl. search.xml) for MeshtermManager.
Same directory 4 in MeshtermManager reads each file to parse MeshHeadings and fill MeSH tables.
Package path (from FilePathCreator.buildPackagePath) 6 out Zip file(s) clsysrev_<year>_<month>_mu_aut_..._<timestamp>.zip for WML3G delivery.

Database tables

Table Step Role
COCHRANE_MESHTERM_RECORD_DATES 4 write, 6 read One row per record name with MeSH; date = issue number. Filled in Step 4 (createRecordDate / setIssue). Step 6: findUpdatedRecords(issue) reads record names where date >= issue.
COCHRANE_MESHTERM_RECORDS 4 write Links record name to descriptor id + qualifier id. Filled/updated in Step 4.
COCHRANE_MESHTERM_DESCRIPTORS 4 write Descriptor text + major topic. Filled when new descriptor seen in Step 4.
COCHRANE_MESHTERM_QUALIFIERS 4 write Qualifier text + major topic. Filled when new qualifier seen in Step 4.
COCHRANE_ENTIRE_DB 1 read findSysrevReviewRecordNames() = distinct CDSR record names (productSubtitle = REVIEWS) for download params.
Issue (meshterms flags) 1, 4, 5 meshtermsDownloading, meshtermsDownloaded set/read by IssueStorage.

Do rows from these tables ever get deleted?

  • COCHRANE_MESHTERM_RECORD_DATES, DESCRIPTORS, QUALIFIERS: No. Rows are never deleted; only created or updated (RECORD_DATES: setIssue).
  • COCHRANE_MESHTERM_RECORDS: Yes, per record. In Step 4, when MeSH is re-downloaded for a record, MeshtermManager.updateMeshterms() deletes any COCHRANE_MESHTERM_RECORDS row for that record whose (descriptor, qualifier) pair is no longer in the new PubMed XML. So obsolete headings for a record are removed on each Download Meshterms run for that record.
  • (The annual flow deletes rows from COCHRANE_MESHTERM_CHANGED_DESCRIPTORS, RECORD_4_CHECK, RECORD_DESCRIPTORS at start/end of Step 2; see ANNUAL_MESH_UPDATE_SUMMARY.md.)

JMS queues

Queue Producer Consumer
queue/medline-downloader-service IssueWrapper (Step 1); MedlineDownloader (re-queue on incomplete) MedlineDownloader
queue/meshterm-manager-service MeshtermDownloaderCallback (Step 3); MeshtermManager (re-queue when more files) MeshtermManager

4. Configuration properties

Property Default / example Purpose
cochrane.meshterm-updater.monthly-pattern 0 20 0 1 * ? Cron for MonthlyMeshtermExecutor (e.g. 00:20 on 1st of month).
cochrane.meshterm-updater.monthly-batch 50 Max records per WML3G package in Step 6.
cochrane.meshterm-updater.monthly-batch-interval 1800000 Delay (ms) between uploading multiple packages (e.g. 30 min).
cms.resources.medline.downloader.downloads ${cms_resources}/cms/medline/downloaded/ Base directory for downloaded MeSH XML; + issue number (e.g. 202401) per run.
cms.cochrane.medline.esearch.url https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?... PubMed ESearch URL.
cms.cochrane.medline.efetch.url https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?... PubMed EFetch URL.
cms.cochrane.meshterm.record.update.calendar false If true, MedlineDownloader only processes during 5:00–21:00 weekdays (GMT-5).

5. Success flow

  1. Step 1: User or MonthlyMeshtermExecutor triggers “Download Meshterms” for an issue → request with params (all CDSR review names) and destination …/downloads/<issue> sent to medline-downloader queue.
  2. Step 2: MedlineDownloader runs (after optional time window); for each record, MeshtermDownloader fetches PMID and MeSH XML, writes <recordName>.xml into issue folder. No fatal errors → sendCallback(destinationDirectory).
  3. Step 3: Callback lists XML files, sends MeshtermManagerParameters to meshterm-manager queue.
  4. Step 4: MeshtermManager processes all files (in batches of 10000); creates/updates RECORD_DATES, RECORDS, DESCRIPTORS, QUALIFIERS. No exception → finish().
  5. Step 5: Issue marked meshterms downloaded; DS MeSH package generated and sent for CCA.
  6. Step 6: findUpdatedRecords(issue) returns record names; MonthlyMeshTermsPackageGenerator builds zip(s), delivers to ContentManager (and optionally schedules delayed uploads for later batches). Process completes successfully.

6. Error flow

  • Step 1: JMS/configuration failure → message not sent; issue may stay in “meshterms downloading” until manual reset or retry.
  • Step 2: PubMed unavailable, HTTP error, or download exception → MedlineDownloader logs warning; if runtime exception, remaining params re-queued. If all batches fail or only some complete, finishIncompleteProcess re-queues remaining params; no callback until all single-record downloads for the request complete. MessageSender.sendReport(MESHTERM_WARNINGS, err) on non-fatal errors.
  • Step 3: Directory missing or not readable → MeshtermDownloaderCallback.sendCallback throws → MedlineDownloaderException; finishProcess may not be called; issue can stay “downloading”.
  • Step 4: MeshtermManager exception (e.g. DB, parse) → issueStorage.setIssueMeshtermsDownloaded(issueId, false), setIssueMeshtermsDownloading(issueId, false); process stops; no finish → no Step 5 or 6 for that run. If only part of files processed, params re-queued with iterator for next batch.
  • Step 5: Publish/delivery failure → logged; setIssueMeshtermsDownloading(issueId, false); findUpdatedRecords still runs (Step 6).
  • Step 6: findUpdatedRecords(issue) empty → “no records with meshterms” exception in MeshtermManager.finish. Package generation or upload failure → logged; zip path or delivery may be null.

7. Quick reference

  • Trigger: “Download Meshterms” (manual per issue or MonthlyMeshtermExecutor).
  • Steps: Start → Medline download (PubMed) → Callback → MeshtermManager (parse + DB) → Mark issue + DS MeSH delivery → Generate/upload WML3G packages.
  • Resources: PubMed E-utilities (remote); cms.resources.medline.downloader.downloads + issue (local XML); package path (local zip).
  • DB written: COCHRANE_MESHTERM_RECORD_DATES, COCHRANE_MESHTERM_RECORDS, COCHRANE_MESHTERM_DESCRIPTORS, COCHRANE_MESHTERM_QUALIFIERS (Step 4). DB read: COCHRANE_ENTIRE_DB (Step 1), COCHRANE_MESHTERM_RECORD_DATES (Step 6).
  • Key classes: IssueWrapper.MeshtermsDownloadAction, MonthlyMeshtermExecutor, MedlineDownloader, MeshtermDownloader, MeshtermDownloaderCallback, MeshtermManager, MonthlyMeshTermsPackageGenerator, MeshtermStorage.
⚠️ **GitHub.com Fallback** ⚠️