Monthly MeSH Update – Summary - dedreval/xdps-docs GitHub Wiki
Summary of the monthly MeSH update (“Download Meshterms”) procedure: steps, resources (remote/local files), database tables, triggers, and success/error flows.
Code: XDPS/CochraneCMS (MedlineDownloader, MeshtermDownloader, MeshtermManager, MonthlyMeshtermExecutor, MonthlyMeshTermsPackageGenerator, etc.). Relation to annual update: The tables filled by this flow (COCHRANE_MESHTERM_RECORD_DATES, COCHRANE_MESHTERM_RECORDS, COCHRANE_MESHTERM_DESCRIPTORS, COCHRANE_MESHTERM_QUALIFIERS) are read by Step 3 of the annual MeSH update (Update MeSH in WML3G). See ANNUAL_MESH_UPDATE_SUMMARY.md.
1. Clear steps (triggers + code)
Step
Name
What happens
Code / trigger
1
Start Download Meshterms
User or scheduler starts the flow for an issue
Manual: Issue List → select issue → action "Download Meshterms" (IssueWrapper.MeshtermsDownloadAction). Scheduled:MonthlyMeshtermExecutor (cron cochrane.meshterm-updater.monthly-pattern, e.g. 0 20 0 1 * ? = 00:20 on 1st of month) finds open issue and runs same action if MeSH not already downloaded.
2
Medline download (PubMed)
For each CDSR review, fetch MeSH XML from PubMed and save to issue-specific folder
MedlineDownloader (JMS queue medline-downloader-service) → timer → MeshtermDownloader.download(params) per record. Params from getParams() = one entry per CDSR review (findSysrevReviewRecordNames()); destination = cms.resources.medline.downloader.downloads + currentIssue (e.g. …/202401).
3
Callback to MeshtermManager
After all downloads for the batch, send list of downloaded files to processor
Parse each XML, create/update MeSH rows per record
MeshtermManager (JMS queue meshterm-manager-service) → process() → for each file (up to MAX_FILES_ALLOWED=10000): updateMeshterms(file, recordName, issue) → parse MeshHeadings, create/find COCHRANE_MESHTERM_RECORD_DATES, DESCRIPTORS, QUALIFIERS, COCHRANE_MESHTERM_RECORDS. If more files remain, re-queue; else finish().
5
Mark issue and deliver DS MeSH
Set issue “meshterms downloaded”, trigger initial DS MeSH package
MeshtermManager.finish() → setIssueMeshtermsDownloaded(issueId, true); deliverInitialPackageForDS(issueId) → publish TYPE_DS_MESH for CCA DB (generate + send).
6
Generate and upload WML3G packages
Build zip(s) of WML3G for records with MeSH in this issue; deliver to content manager
findUpdatedRecords(issue, issueId) → meshtermStorage.findUpdatedRecords(issue) (record names where COCHRANE_MESHTERM_RECORD_DATES.date ≥ issue) → MonthlyMeshTermsPackageGenerator.generateAndUpload(recordNames, issueId, CDSR, WML3G_POSTFIX) → batches by cochrane.meshterm-updater.monthly-batch (e.g. 50), delay between batches cochrane.meshterm-updater.monthly-batch-interval (e.g. 30 min) → zip + deliverPackage / schedule PackageUploader.
2. What happens on each step (inputs, process, outputs)
Step 1: Start Download Meshterms
Input: Selected issue (manual) or open issue for configured date (scheduler).
Process:
Set issue.meshtermsDownloading = true.
Build MeshtermDownloaderRequest(issue, issueId, title) with plan MESHTERM_DOWNLOAD and callback MeshtermDownloaderCallback.
Params:getParams() → EntireDBStorage.findSysrevReviewRecordNames() = all distinct CDSR record names (productSubtitle = REVIEWS). One download param per record: revmanId = record name, writeSearchResult = "false".
MedlineDownloader.onMessage → init MeshtermDownloader, create timer. If cms.cochrane.meshterm.record.update.calendar is true, getWaitTime() restricts processing to 5:00–21:00 weekdays (GMT-5); else immediate.
On timer: for each param map (revmanId, destinationDirectory): MeshtermDownloader.download(p):
ESearch:esearch.fcgi?db=pubmed&retmode=xml&term=<revmanId>[page] AND medline[sb]&sort=pub+date → take first PMID from IdList.
EFetch:efetch.fcgi?db=pubmed&retmode=xml&id=<id> → write response to <destinationDirectory>/<revmanId>.xml. Optionally write search result to <revmanId>.search.xml if writeSearchResult=true.
If a batch fails or there are remaining params, re-queue request with remaining params. If all done, finishProcess() → callback.sendCallback(destinationDirectory).
Output (local): One XML file per record in <cms.resources.medline.downloader.downloads><currentIssue>/ (e.g. CD001234.xml). DB: None in this step.
Step 3: Callback to MeshtermManager
Input:destinationDirectory (path to issue folder).
Process:MeshtermDownloaderCallback.sendCallback(path) → list files with .xml and without search.xml → MeshtermManagerParameters(files, issue, issueId, title) → send to JMS queue meshterm-manager-service.
Output: Message to MeshtermManager queue.
Step 4: Process MeSH and fill DB
Input (local): XML files in issue folder (PubMed MeSH XML with <MeshHeading>, <DescriptorName>, <QualifierName>, MajorTopicYN).
Process:MeshtermManager.process(parameters):
For each file (batch of up to 10000): recordName = filename without extension; updateMeshterms(file, recordName, issue):
If record not in COCHRANE_MESHTERM_RECORD_DATES:createRecordDate(recordName, issue).
Parse MeshHeadings → for each descriptor/qualifier: findDescriptorId / findQualifierId (create COCHRANE_MESHTERM_DESCRIPTORS / COCHRANE_MESHTERM_QUALIFIERS if new); findMeshtermRecordId (create COCHRANE_MESHTERM_RECORDS row if new).
Remove COCHRANE_MESHTERM_RECORDS rows for this record that are no longer in the new set.
If any insert/update: updateRecordDate(recordName, issue).
If more files: re-queue parameters with incremented iterator. Else: finish(parameters).
Output (DB): Rows in COCHRANE_MESHTERM_RECORD_DATES, COCHRANE_MESHTERM_RECORDS, COCHRANE_MESHTERM_DESCRIPTORS, COCHRANE_MESHTERM_QUALIFIERS.
Step 5: Mark issue and deliver DS MeSH
Input:issueId (and issue metadata).
Process:setIssueMeshtermsDownloaded(issueId, true); deliverInitialPackageForDS(issueId) → create PublishWrapper(TYPE_DS_MESH, CCA db), setGenerate(true), setSend(true), publishDb → generate and send DS MeSH package for the issue.
Output: Issue marked “meshterms downloaded”; DS MeSH package generated and sent (downstream).
Step 6: Generate and upload WML3G packages
Input:issue, issueId. DB read:findUpdatedRecords(issue) = record names from COCHRANE_MESHTERM_RECORD_DATES where date >= :date (issue number).
Split recordNames into batches of size cochrane.meshterm-updater.monthly-batch (e.g. 50).
For each batch: generatePackage → build zip of WML3G files for those records; filename pattern clsysrev/clsysrev_<year>_<month>_mu_aut_<suffix>_<timestamp>.zip. Then deliverPackage (immediate or schedule PackageUploader with delay cochrane.meshterm-updater.monthly-batch-interval ms, e.g. 30 min).
deliverPackage → ContentManager.newPackageReceived(file); set initial package delivered for issue.
Output (local): Zip file(s) under package path. Downstream: Package delivered to content manager.
3. Structures involved
Remote resources
Resource
Step
Purpose
PubMed E-utilities (NCBI)
2
ESearch: find PMID for query <recordName>[page] AND medline[sb]. EFetch: get MeSH XML for that PMID. URLs from cms.cochrane.medline.esearch.url, cms.cochrane.medline.efetch.url.
Zip file(s) clsysrev_<year>_<month>_mu_aut_..._<timestamp>.zip for WML3G delivery.
Database tables
Table
Step
Role
COCHRANE_MESHTERM_RECORD_DATES
4 write, 6 read
One row per record name with MeSH; date = issue number. Filled in Step 4 (createRecordDate / setIssue). Step 6: findUpdatedRecords(issue) reads record names where date >= issue.
COCHRANE_MESHTERM_RECORDS
4 write
Links record name to descriptor id + qualifier id. Filled/updated in Step 4.
COCHRANE_MESHTERM_DESCRIPTORS
4 write
Descriptor text + major topic. Filled when new descriptor seen in Step 4.
COCHRANE_MESHTERM_QUALIFIERS
4 write
Qualifier text + major topic. Filled when new qualifier seen in Step 4.
COCHRANE_ENTIRE_DB
1 read
findSysrevReviewRecordNames() = distinct CDSR record names (productSubtitle = REVIEWS) for download params.
Issue (meshterms flags)
1, 4, 5
meshtermsDownloading, meshtermsDownloaded set/read by IssueStorage.
Do rows from these tables ever get deleted?
COCHRANE_MESHTERM_RECORD_DATES, DESCRIPTORS, QUALIFIERS:No. Rows are never deleted; only created or updated (RECORD_DATES: setIssue).
COCHRANE_MESHTERM_RECORDS:Yes, per record. In Step 4, when MeSH is re-downloaded for a record, MeshtermManager.updateMeshterms() deletes any COCHRANE_MESHTERM_RECORDS row for that record whose (descriptor, qualifier) pair is no longer in the new PubMed XML. So obsolete headings for a record are removed on each Download Meshterms run for that record.
(The annual flow deletes rows from COCHRANE_MESHTERM_CHANGED_DESCRIPTORS, RECORD_4_CHECK, RECORD_DESCRIPTORS at start/end of Step 2; see ANNUAL_MESH_UPDATE_SUMMARY.md.)
JMS queues
Queue
Producer
Consumer
queue/medline-downloader-service
IssueWrapper (Step 1); MedlineDownloader (re-queue on incomplete)
MedlineDownloader
queue/meshterm-manager-service
MeshtermDownloaderCallback (Step 3); MeshtermManager (re-queue when more files)
MeshtermManager
4. Configuration properties
Property
Default / example
Purpose
cochrane.meshterm-updater.monthly-pattern
0 20 0 1 * ?
Cron for MonthlyMeshtermExecutor (e.g. 00:20 on 1st of month).
cochrane.meshterm-updater.monthly-batch
50
Max records per WML3G package in Step 6.
cochrane.meshterm-updater.monthly-batch-interval
1800000
Delay (ms) between uploading multiple packages (e.g. 30 min).
cms.resources.medline.downloader.downloads
${cms_resources}/cms/medline/downloaded/
Base directory for downloaded MeSH XML; + issue number (e.g. 202401) per run.
If true, MedlineDownloader only processes during 5:00–21:00 weekdays (GMT-5).
5. Success flow
Step 1: User or MonthlyMeshtermExecutor triggers “Download Meshterms” for an issue → request with params (all CDSR review names) and destination …/downloads/<issue> sent to medline-downloader queue.
Step 2: MedlineDownloader runs (after optional time window); for each record, MeshtermDownloader fetches PMID and MeSH XML, writes <recordName>.xml into issue folder. No fatal errors → sendCallback(destinationDirectory).
Step 3: Callback lists XML files, sends MeshtermManagerParameters to meshterm-manager queue.
Step 4: MeshtermManager processes all files (in batches of 10000); creates/updates RECORD_DATES, RECORDS, DESCRIPTORS, QUALIFIERS. No exception → finish().
Step 5: Issue marked meshterms downloaded; DS MeSH package generated and sent for CCA.
Step 6: findUpdatedRecords(issue) returns record names; MonthlyMeshTermsPackageGenerator builds zip(s), delivers to ContentManager (and optionally schedules delayed uploads for later batches). Process completes successfully.
6. Error flow
Step 1: JMS/configuration failure → message not sent; issue may stay in “meshterms downloading” until manual reset or retry.
Step 2: PubMed unavailable, HTTP error, or download exception → MedlineDownloader logs warning; if runtime exception, remaining params re-queued. If all batches fail or only some complete, finishIncompleteProcess re-queues remaining params; no callback until all single-record downloads for the request complete. MessageSender.sendReport(MESHTERM_WARNINGS, err) on non-fatal errors.
Step 3: Directory missing or not readable → MeshtermDownloaderCallback.sendCallback throws → MedlineDownloaderException; finishProcess may not be called; issue can stay “downloading”.
Step 4: MeshtermManager exception (e.g. DB, parse) → issueStorage.setIssueMeshtermsDownloaded(issueId, false), setIssueMeshtermsDownloading(issueId, false); process stops; no finish → no Step 5 or 6 for that run. If only part of files processed, params re-queued with iterator for next batch.
Step 6:findUpdatedRecords(issue) empty → “no records with meshterms” exception in MeshtermManager.finish. Package generation or upload failure → logged; zip path or delivery may be null.
7. Quick reference
Trigger: “Download Meshterms” (manual per issue or MonthlyMeshtermExecutor).