Annual MeSH Update – Summary - dedreval/xdps-docs GitHub Wiki
Summary of the annual MeSH update procedure: clear steps, what happens at each step, structures involved (remote files, local files, database tables/rows), and success vs error flows.
Source: Wiki "How to perform annual MeSH update" (e.g. saved as C:\Tickets\XDPS-2646\How to perform annual MeSH update · wiley_XDPS Wiki.html).
Code: XDPS/CochraneCMS (Term2NumHelper, MeshtermCodesUpdaterManager, Wml3gValidationHandler, etc.).
| Step | Wiki name | What you do | Code / trigger |
|---|---|---|---|
| 1 | Download and Rebuild the MeSH resources | Load term2num resources (FTP or manual), then Rebuild | UI: Issue List → Term2Num section → "Load by FTP and Rebuild" or "Rebuild". Term2NumHelper.makeTerm2Num(downloadMesh). |
| 2 | Check changed MeSH codes | Run "Check Changed Mesh Codes" from UI |
MeshtermCodesUpdaterManager.prepareMeshtermCodes(user) or process start → onStart → initChangedDescriptors, prepareData, createUpdateProcess. |
| 3 | Update MeSH in WML3G (CDSR) | Use meshterm_outdated_records; run WML3G re-conversion | If CDSR and property enabled: startWML3GToWML3GConversionProcess → Wml3gValidationHandler (process type Ml3gMeshUpdateSelective). |
| 4 | Sending re-converted CDSR DOIs to HW | Downstream delivery | Outside this flow. |
| 5 | Sending processed CENTRAL DOIs | Downstream delivery | Outside this flow. |
Note: The wiki does not describe "Download Meshterms" (per-issue NLM/PubMed download that fills COCHRANE_MESHTERM_RECORDS etc.). Step 3 reads those tables; they are filled only by "Download Meshterms" (manual or MonthlyMeshtermExecutor). See ANNUAL_MESH_UPDATE_AND_TABLE_POPULATION.md.
-
1.1 Load term2num resources
-
Remote: NLM FTP
nlmpubs.nlm.nih.gov(or property-configured host). Files:descYYYY.zip/descYYYY.xml,qYYYY.bin,qualYYYY.xml,mtreesYYYY.bin(YYYY = current year). -
Local (download):
RESOURCES_TERM2NUM_DOWNLOADS(e.g./CT/CochraneCMS/cms/term2num/downloaded). FTP download viaTerm2NumHelper.downloadMeshResources()places files (possibly .zip) in this dir. Zips are unzipped to the same dir during the Rebuild stage (at the start ofmakeTerm2Num(), before building the mtree map and running Perl), not during download. - DB: None.
-
Remote: NLM FTP
-
1.2 Rebuild term2num
-
Input (local): Downloaded files in term2num/downloaded + Perl scripts dir (
TERM2NUM_PERLSCRIPTS), e.g.map_desc2treenum.pl,gen_qualifier_config.pl. -
Process: Build mtree nums map from
mtreesYYYY.bin, descriptor list fromdescYYYY.xml, PA tree file, full tree file; rungen_qualifier_config.pl(qualifiers), thenmap_desc2treenum.pl→ produces term2num.xml and term2num.xsl. -
What is the mtree nums map? An in-memory map built from mtreesYYYY.bin in Java (
Term2NumHelper.createMTreeNumsMap()):-
Source:
mtreesYYYY.bin– one line per (descriptor, tree number) pair. Each line is split on;: first part = key, second part = tree number. -
Keys: The first field of each line (before
;) – in NLM’s format this is typically the descriptor identifier (e.g. descriptor name or Descriptor UI) as it appears in the file. -
Values: A list of MeSH tree numbers that start with D27 (Pharmacological Action subtree only). For each line, if the tree number starts with
D27, it is added to the list for that key; other tree numbers are ignored. -
Type:
Map<String, ArrayList<String>>– key = descriptor (string from file), value = list of D27 tree numbers (e.g.D27.505.954.122.085). -
Use: The map is written (as string) into an intermediate PA tree file (
mtreesYYYY.pa.bin); that file is concatenated with the originalmtreesYYYY.binto form mtreesYYYY.full.bin, which is the tree source passed to map_desc2treenum.pl. In Java, the map was also intended for use when parsingdescYYYY.xmlto attach PA tree numbers to descriptors (inDescriptorRecordParser.supTreeNumbers()), but that path is currently disabled.
-
Source:
-
What is the descriptor list from descYYYY.xml? It is built by parsing descYYYY.xml with DescriptorRecordParser (SAX) in Java. In the current code, the only place that adds to this list is
supTreeNumbers()(when a descriptor has aPharmacologicalActionListand the map has PA numbers for the referred-to descriptor); that method is disabled (if (true) return;), so the descriptor list is typically empty. When enabled, each entry would be a line of the formdescriptorName;paTreeNumber.descUI. Those lines are appended to the PA tree file so that the Perl script sees supplementary (PA) descriptor–treeNumber lines in the full tree file. So the “descriptor list” is not a list of all descriptors from descYYYY.xml; it is a list of extra lines (PA supplement) for the combined tree file. -
Output (local):
RESOURCES_TERM2NUM_OUTPUT(e.g./CT/CochraneCMS/rendering/CDTS/xdps): term2num.xml, term2num.xsl. Manual: copy to render server; rename previous term2num.xml → term2num_old.xml. - DB: None.
-
Input (local): Downloaded files in term2num/downloaded + Perl scripts dir (
Obsolete / legacy steps (Step 1 Rebuild)
The following parts of the Rebuild flow are legacy and add no useful data in the current code. They can be simplified or dropped:
-
PA tree file (mtreesYYYY.pa.bin) and full tree file (mtreesYYYY.full.bin)
The .pa.bin file contains onlymap.toString()(the Java mtree nums map). That map is a D27-only subset of data already present in mtrees.bin, in a format (e.g.{key=value, ...}) that the Perl script does not parse as validTerm;Codelines. The Perl script gets all descriptor → tree-number data from the mtrees.bin part of the full file; the .pa.bin block does not add correct data and can produce garbage entries. You can drop .pa.bin from the pipeline and pass only mtrees.bin (no concatenation with .pa.bin) to map_desc2treenum.pl; term2num.xml/xsl would be the same or cleaner, with no garbage from the map dump. -
Descriptor list from descYYYY.xml
The only code that adds to this list isDescriptorRecordParser.supTreeNumbers()(PA supplement lines). That method is disabled (if (true) return;), so the descriptor list is always empty and no supplementary lines are written to .pa.bin. The PA-supplement design (giving descriptors extra D27 tree numbers from PharmacologicalActionList in descYYYY.xml) is therefore unused. -
Building the mtree nums map (D27-only)
The map’s only use in the current flow is to write .pa.bin. The map is not consumed by the Perl script in a useful way (the script reads mtrees.bin directly). If .pa.bin is dropped, building this map could be skipped as well, unless re-enabling the PA supplement (supTreeNumbers) is planned.
-
What is in term2num.xml and term2num.xsl, and how are they used?
Both files are produced by the Perl script map_desc2treenum.pl and contain the same mapping data in XML form: MeSH descriptor/qualifier text → tree numbers or qualifier codes (normalized: spaces→underscore, leading
afor descriptors andbfor qualifiers so IDs are valid in XML).term2num.xml – Standalone XML:
-
Root:
<term2num>with two sections. -
<hSet>– Descriptor set: one<h term="...">per MeSH descriptor (e.g.a_humans,a_body_regions). Each<h>contains one or more<n num="..." />with the MeSH tree number (dots replaced by underscores, e.g.D06_472_040_585_353). -
<qSet>– Qualifier set: one<q str="..." code="..." />per qualifier;str= normalized qualifier text (e.g.b_anatomy_&_histology),code= qualifier code.
Short sample – real snippet from term2num.xml (elements
<term2num>,<hSet>,<h>,<n>,<qSet>,<q>):<term2num> <hSet> <h term="a_11hydroxycorticosteroids"> <n num="D06_472_040_585_353" /> </h> <h term="a_12e7_antigen"> <n num="D09_400_430_890_200_049" /> <n num="D12_776_395_550_200_049" /> <n num="D12_776_543_550_200_062" /> <n num="D23_050_301_350_049" /> </h> <h term="a_14alpha_demethylase_inhibitors"> <n num="D27_505_389_500_059" /> <n num="D27_505_519_389_335_059" /> </h> </hSet> <qSet> <q str="banalogs__derivatives" code="AA" /> <q str="banatomy__histology" code="AH" /> <q str="bblood_supply" code="BS" /> </qSet> </term2num>
(term2num.xml also has a DTD and a placeholder
<h term="a">with empty<n num="" />; the above shows the real descriptor/qualifier structure.)term2num.xsl – XSLT stylesheet that embeds the same structure inside an
<xsl:variable name="meshindex">so the mapping is inline (no separate XML file). Structure under the variable is identical to term2num.xml (<term2num><hSet>…</hSet><qSet>…</qSet></term2num>).How they are used:
-
XDPS / CochraneCMS (Step 2 – Check changed MeSH codes): Only term2num.xml (and term2num_old.xml) is used. The Java code parses both as DOM, reads
<hSet>and its<h>children (attributeterm, child<n>attributesnum), and diffs them to get the list of changed descriptor terms. That list is written to COCHRANE_MESHTERM_CHANGED_DESCRIPTORS. No use of term2num.xsl here. -
CmsRenderService (rendering – WileyML, HTML, PDF): The render pipeline needs to resolve MeSH descriptor/qualifier text to tree numbers or codes (e.g. for display or linking). It does this by loading the mapping:
- Either term2num.xml is loaded via
document($paramfile)into a variable (e.g.meshindex) in stylesheets such as cochrane.xsl or mesh.xsl (paramfilepoints to../../term2num.xmlor../term2num.xml). - Or term2num.xsl is used as a stylesheet that already contains the same mapping in
<xsl:variable name="meshindex">, so no separate XML file is read at transform time. So: term2num.xml = mapping as standalone data file; term2num.xsl = same mapping embedded in XSL for pipelines that prefer a single stylesheet with inline data.
- Either term2num.xml is loaded via
XSL – what files it applies to and why
File Applies to (input) Why term2num.xsl Nothing (not a transform). It is a data file in XSL form: it only defines <xsl:variable name="meshindex">with the same<term2num>/<hSet>/<h>/<n>/<qSet>/<q>structure. Pipelines that want a single stylesheet can include/use it so no separate term2num.xml is loaded.cochrane.xsl WileyML (WML3G) source for CDSR. Used when rendering CDSR content (e.g. to HTML/PDF). It loads term2num via document($paramfile)(paramfile=../../term2num.xml) and uses$meshindexto resolve MeSH descriptor/qualifier text to tree numbers or codes for output (e.g. links). Locations:CmsRenderService/.../xdps/wileyml/lib/cochrane.xsl,wileyml_wol/lib/cochrane.xsl,diamond/wileyml/lib/cochrane.xsl.mesh.xsl CENTRAL legacy XML (e.g. <D>,<MCW>).Used when rendering CENTRAL content. It loads term2num via document($paramfile)(paramfile=../term2num.xml) and uses$meshindexto resolve MeSH terms to tree numbers for output. Locations:CmsRenderService/.../xdps/central/mesh.xsl,diamond/central/mesh.xsl. -
Root:
-
Input (local):
FilePathCreator.getTerm2NumFilePath("term2num.xml")-
FilePathCreator.getTerm2NumFilePath("term2num_old.xml")
Both must exist; elseIOException(e.g. "term2num.xml doesn't exist").
-
Process:
-
clearDescriptors() – delete rows from
COCHRANE_MESHTERM_CHANGED_DESCRIPTORS,COCHRANE_MESHTERM_RECORD_DESCRIPTORS,COCHRANE_MESHTERM_RECORD_4_CHECK. - initChangedDescriptors() – diff term2num.xml vs term2num_old.xml (descriptor terms + tree numbers); insert changed descriptor terms into COCHRANE_MESHTERM_CHANGED_DESCRIPTORS.
- If no changed descriptors → "Descriptors didn't change. Mesh update aborted." (exception or early end).
-
prepareData(dbData) per DB (from
cms.cochrane.meshterm.record.update.dbs, e.g. CDSR, CENTRAL):-
initRecords4Check – For each record in the DB, insert one or more rows into COCHRANE_MESHTERM_RECORD_4_CHECK (record id, name, doi, version, latest-version flag).
DB = one of the configured Cochrane content databases (fromcms.cochrane.meshterm.record.update.dbs), e.g. CDSR (Cochrane Database of Systematic Reviews) or CENTRAL (Cochrane Central Register of Controlled Trials).
Record = one content item in that database: one row in COCHRANE_ENTIRE_DB for that database — i.e. for CDSR, one systematic review (one article); for CENTRAL, one trial/citation. The list of records is obtained viaIEntireDBStorage.getRecordIdsAndNames(dbName, …)(returns recordId → name). For CDSR, version info fromIVersionManager.getVersions(recordName)is used to add a row per version (latest + previous versions); for CENTRAL, one row per record (no version/doi). -
initRecordDescriptors – for each "not yet checked" record in RECORD_4_CHECK: read WML3G/legacy source from repo, regex-extract MeSH descriptor text (CDSR:
<MeSHdescriptor>…</MeSHdescriptor>/<MeSHcheckWord>…</MeSHcheckWord>; CENTRAL:<D>…</D>/<MCW>…</MCW>), normalize (replace symbols, whitespace→underscore), insert COCHRANE_MESHTERM_RECORD_DESCRIPTORS (record_id, descriptor text).
-
initRecords4Check – For each record in the DB, insert one or more rows into COCHRANE_MESHTERM_RECORD_4_CHECK (record id, name, doi, version, latest-version flag).
-
createUpdateProcess(dbData, user) – find "outdated" records (records that have at least one descriptor in RECORD_DESCRIPTORS that appears in CHANGED_DESCRIPTORS); save list to file meshterm_outdated_records; for CDSR, if property
cms.cochrane.meshterm.record.update.ml3g_to_ml3g.conversion.enabledis true, start Update MeSH in WML3G process for those record IDs.
-
clearDescriptors() – delete rows from
-
Output (local):
-
meshterm_outdated_records – path
FilePathCreator.getMeshtermRecordUpdatedFilePath(dbName)=<entire-dir>/<dbName>/meshterm_outdated_records(filename fromcms.cochrane.meshterm.record.update.file_name, defaultmeshterm_outdated_records). One line per outdated record (doi or name).
-
meshterm_outdated_records – path
-
Output (DB): Rows in COCHRANE_MESHTERM_CHANGED_DESCRIPTORS, COCHRANE_MESHTERM_RECORD_4_CHECK, COCHRANE_MESHTERM_RECORD_DESCRIPTORS.
-
Where column values come from:
COCHRANE_MESHTERM_CHANGED_DESCRIPTORS
Column Source idGenerated (AUTO). descriptorOne distinct descriptor term (text) from the diff of term2num.xml vs term2num_old.xml. Terms come from <hSet>child nodes, attributeterm; leadinga_orais stripped (Perl ID normalization). One row per changed descriptor term.COCHRANE_MESHTERM_RECORD_4_CHECK
Column Source idGenerated (AUTO). checkedDefault false; set totruebyupdateRecords4CheckStatusafter descriptors are parsed for that row.record_idRecord ID from IEntireDBStorage.getRecordIdsAndNames(dbName, …) (key in the map: ID of the record in the DB). nameCDSR: From IVersionManager.getVersions(recordName) → PrevVO.name(latest-version name or previous-version name). CENTRAL: FromidsAndNames.get(recordId)(record name).versionCDSR: From PrevVO.version(previous-version number; null for latest). CENTRAL: null.latest_versionCDSR: truefor the current/latest version row,falsefor each previous-version row. CENTRAL: alwaystrue.doiCDSR: From PrevVO.buildDoi()(e.g. CD012345.pub3). CENTRAL: null.database_idFrom ResultStorage.getDatabaseEntity(dbName) when saving (DB row for CDSR or CENTRAL). COCHRANE_MESHTERM_RECORD_DESCRIPTORS
Column Source idGenerated (AUTO). descriptorText extracted from WML3G or legacy source by regex: CDSR pattern <MeSHdescriptor>…</MeSHdescriptor>/<MeSHcheckWord>…</MeSHcheckWord>, CENTRAL<D>…</D>/<MCW>…</MCW>. Then lowercased and replaceSymbols() (e.g.&→&, spaces→underscore, strip-'();&+,‐). One row per (record, descriptor text) pair.record_idFK to COCHRANE_MESHTERM_RECORD_4_CHECK – the row for the record whose WML3G/source was parsed to get this descriptor. Sample XML context (what the regex matches):
CDSR (WML3G) – inside
<MeSHterms>/<MeSHheading>; captured text is the content of the element (e.g.Body Regions,Humans):<MeSHterms> <MeSHheading> <MeSHdescriptor>Body Regions</MeSHdescriptor> <MeSHqualifier>anatomy & histology</MeSHqualifier> </MeSHheading> <MeSHheading> <MeSHcheckWord>Humans</MeSHcheckWord> </MeSHheading> </MeSHterms>
CENTRAL (legacy source) –
<D>= descriptor,<MCW>= check word; captured text is the element content:<D>Randomized controlled trials</D> <MCW>Meta-analysis</MCW>
- Input: Record IDs of outdated records (from Step 2); ContentLocation = ENTIRE (no delivery file).
-
Per record:
-
Read WML3G: Path =
ContentLocation.ENTIRE.getPathToMl3g(…, cdNumber, false)→ e.g.<content-root>/clsysrev/entire/ml3g/<cdNumber>.xml. -
MeSH from DB:
JatsMeshtermManager.generateMeshTerms(record, MeshtermStorage)→ reads COCHRANE_MESHTERM_RECORD_DATES, COCHRANE_MESHTERM_RECORDS, COCHRANE_MESHTERM_DESCRIPTORS, COCHRANE_MESHTERM_QUALIFIERS (by record name). Builds MeshHeadingList XML (descriptors + qualifiers + major-topic flags). -
Merge:
ContentHandler.updateWML3G(record, ml3gSource, meshTerms, …)→ XSLT (WileyML-meshterm-insert) replaces<MeSHterms>in WML3G with new block from MeshHeadingList. - Validate: WML3G grammar (WILEY_ML3GV2_GRAMMAR).
-
Write: Same path as read – overwrite
<content-root>/clsysrev/entire/ml3g/<cdNumber>.xml; also write assets path and optional CCA/ml2.1 paths; optional tmp copy.
-
Read WML3G: Path =
- Structures: No new DB rows; only read from MeSH tables. Local files: same WML3G XML file overwritten in place.
| File / location | Step | Purpose |
|---|---|---|
| NLM FTP (e.g. nlmpubs.nlm.nih.gov) | 1 | descYYYY.xml/zip, qYYYY.bin, qualYYYY.xml, mtreesYYYY.bin – annual MeSH/qualifier/tree data for term2num. |
| Path / file | Step | Direction | Purpose |
|---|---|---|---|
RESOURCES_TERM2NUM_DOWNLOADS (e.g. …/term2num/downloaded/) |
1 | in | Downloaded NLM files (possibly .zip); unzipped here during Rebuild, not during download. |
RESOURCES_TERM2NUM_OUTPUT (e.g. …/rendering/CDTS/xdps/) |
1 | out | term2num.xml, term2num.xsl. |
| Same dir (manual) | 1 | in/out | term2num_old.xml = backup of previous term2num.xml. |
getTerm2NumFilePath("term2num.xml") |
2 | in | New descriptor/tree map. |
getTerm2NumFilePath("term2num_old.xml") |
2 | in | Previous map for diff. |
getFilePathForEntireMl3gXml(dbName, recName) (CDSR) |
2 | in | WML3G XML to extract current MeSH text (regex). |
getFilePathToSourceEntire(dbName, recName) (CENTRAL) |
2 | in | Legacy source for CENTRAL MeSH extraction. |
getPreviousMl3gXmlPath(name, version) |
2 | in | Previous-version WML3G for non-latest CDSR versions. |
getMeshtermRecordUpdatedFilePath(dbName) |
2 | out | meshterm_outdated_records list (one line per record). |
ContentLocation.ENTIRE.getPathToMl3g(…, cdNumber, false) |
3 | in + out | WML3G XML read then overwritten with merged MeSH. |
| Table | Step | Role |
|---|---|---|
| COCHRANE_MESHTERM_CHANGED_DESCRIPTORS | 2 | One row per descriptor term that changed between term2num_old and term2num (text only). Filled in initChangedDescriptors; cleared at start of Step 2 and at process end. |
| COCHRANE_MESHTERM_RECORD_4_CHECK | 2 | One row per record to check (id, name, doi, version, latest-version). Filled from entire DB record list + CDSR versions; cleared at start/end. |
| COCHRANE_MESHTERM_RECORD_DESCRIPTORS | 2 | One row per record + descriptor text extracted from WML3G/legacy source. Used to decide "outdated" (descriptor in CHANGED_DESCRIPTORS). Cleared at start/end. |
| COCHRANE_MESHTERM_RECORD_DATES | 3 read | Step 3 only reads. Populated by "Download Meshterms" (not by wiki steps). |
| COCHRANE_MESHTERM_RECORDS | 3 read | Step 3 only reads. Links record name to MeSH data. |
| COCHRANE_MESHTERM_DESCRIPTORS | 3 read | Step 3 only reads. Descriptor text + major. |
| COCHRANE_MESHTERM_QUALIFIERS | 3 read | Step 3 only reads. Qualifier text + major. |
| COCHRANE_ENTIRE_DB / record list | 2 | Read for list of records (getRecordIdsAndNames, getRecordListCount). |
| Version / previous-version metadata | 2 | CDSR: getVersions(recordName) for RECORD_4_CHECK versions. |
When are COCHRANE_MESHTERM_RECORD_DATES, RECORDS, DESCRIPTORS, QUALIFIERS filled?
These four tables are filled only by the "Download Meshterms" flow (they are not written by the annual MeSH update Steps 1–3).
-
When the flow runs
-
Manually: From the UI – Issue List → select an issue → action "Download Meshterms" (
IssueWrapper→MeshtermDownloaderRequest). -
Scheduled: MonthlyMeshtermExecutor (scheduled task, property
cochrane.meshterm-updater.monthly-pattern) finds the open issue for the configured date and runs the same "Download Meshterms" action if MeSH are not already downloaded for that issue.
-
Manually: From the UI – Issue List → select an issue → action "Download Meshterms" (
-
How they get filled
- For the chosen issue, the system downloads MeSH data from PubMed/NLM (Medline plan: MeshtermDownloader) – one XML file per record (article) in that issue.
- MeshtermDownloaderCallback receives the list of downloaded files and sends MeshtermManagerParameters (files, issue, issueId, title) to a JMS queue.
-
MeshtermManager (message listener) processes each file: parses MeSH headings from the XML (descriptor name, qualifiers, major-topic flags) and, for each record:
-
COCHRANE_MESHTERM_RECORD_DATES: one row per record name (with issue date) – created when the record is first seen (
MeshtermStorage.createRecordDate). -
COCHRANE_MESHTERM_DESCRIPTORS / COCHRANE_MESHTERM_QUALIFIERS: rows created when a descriptor or qualifier text (+ major) is new (
createDescriptor/createQualifier). -
COCHRANE_MESHTERM_RECORDS: rows linking record name to descriptor id + qualifier id – created when that (record, descriptor, qualifier) combination is new (
createMeshtermRecord).
-
COCHRANE_MESHTERM_RECORD_DATES: one row per record name (with issue date) – created when the record is first seen (
So the tables are filled per issue, per record in that issue, as the MeSH XML files are processed. Code:
MeshtermManager.process()→updateMeshterms()→parseMeshterms()+MeshtermStorage(createRecordDate, findDescriptorId/createDescriptor, findQualifierId/createQualifier, findMeshtermRecordId/createMeshtermRecord).
Do rows from these tables ever get deleted?
| Table | Deleted? | When / how |
|---|---|---|
| COCHRANE_MESHTERM_CHANGED_DESCRIPTORS | Yes | Cleared at start of Step 2 and at end of the Check Changed MeSH codes process. MeshtermCodesUpdaterManager.clearDescriptors() → MeshtermStorage.deleteChangedDescriptors() (DELETE FROM MeshtermChangedDescriptorEntity). |
| COCHRANE_MESHTERM_RECORD_4_CHECK | Yes | Cleared at start/end of Step 2; also when record count ≠ RECORD_4_CHECK count (then RECORD_4_CHECK + RECORD_DESCRIPTORS deleted and refilled). clearDescriptors() → deleteRecords4Check() (all rows); or deleteRecords4Check(dbId) when re-init. |
| COCHRANE_MESHTERM_RECORD_DESCRIPTORS | Yes | Same as RECORD_4_CHECK: cleared at start/end of Step 2 and when count mismatch. clearDescriptors() → deleteRecordDescriptors(); or deleteRecordDescriptors(dbId) when re-init. |
| COCHRANE_MESHTERM_RECORDS | Yes (per record) |
Not bulk-deleted. When Download Meshterms runs for a record, MeshtermManager.updateMeshterms() compares existing COCHRANE_MESHTERM_RECORDS for that record with the new MeSH XML; any row for a (record, descriptor, qualifier) combination no longer in the new XML is deleted (deleteMeshterms → MeshtermStorage.deleteMeshtermRecord(meshtermRecordId)). So obsolete MeSH headings for a record are removed on re-download. |
| COCHRANE_MESHTERM_RECORD_DATES | No | Rows are never deleted. Only created or updated (setIssue). |
| COCHRANE_MESHTERM_DESCRIPTORS | No | Rows are never deleted. Only created when a new descriptor text (+ major) is seen. Orphaned descriptor rows can remain if no record references them anymore. |
| COCHRANE_MESHTERM_QUALIFIERS | No | Rows are never deleted. Only created when a new qualifier text (+ major) is seen. Orphaned qualifier rows can remain. |
-
Step 1
- FTP (or manual) places NLM files in term2num/downloaded; Rebuild runs without exception → term2num.xml and term2num.xsl in OUTPUT dir. On render server, previous term2num.xml renamed to term2num_old.xml.
-
Step 2
- term2num.xml and term2num_old.xml exist.
- Diff yields at least one changed descriptor → rows in COCHRANE_MESHTERM_CHANGED_DESCRIPTORS.
- prepareData runs for each configured DB: RECORD_4_CHECK and RECORD_DESCRIPTORS filled; no fatal IO on WML3G/source read.
- Outdated records found → meshterm_outdated_records file written; for CDSR, WML3G update process started with those record IDs.
-
Step 3
- For each record ID: WML3G file exists at ENTIRE path; MeSH data present in COCHRANE_MESHTERM_RECORDS/DESCRIPTORS/QUALIFIERS (from prior "Download Meshterms"); merge produces valid WML3G; file overwritten; validation passes. Process completes with success; optional downstream Steps 4–5 (sending DOIs) run.
-
Step 1
- FTP / network: Download fails → ProcessException, "Downloading mesh resources failed".
- Missing/unzipped files: Rebuild uses missing file → ProcessException (e.g. creating map or running Perl).
- Perl / script: map_desc2treenum.pl or gen_qualifier_config.pl fails → ProcessException, "Creating term2num.xml and term2num.xsl completed with exception".
-
Step 2
-
term2num.xml or term2num_old.xml missing:
checkTerm2NumExisting→ IOException, "term2num.xml doesn't exist" (or term2num_old.xml); message says to put file in RESOURCES_TERM2NUM_OUTPUT. Process aborts (prepareMeshtermCodes throws; onStart path can end successfully with "Descriptors didn't change" if nothing to do). - No descriptor changes: Diff empty → "Descriptors didn't change. Mesh update aborted." (exception from prepareMeshtermCodes; or in onStart, endProcess(SUCCESSFUL) and return).
- DB / RECORD_4_CHECK mismatch: If record count != RECORD_4_CHECK count, RECORD_4_CHECK and RECORD_DESCRIPTORS deleted and refilled (initRecords4Check).
- WML3G/source read failure: getSource() catches IOException, logs "Could not read source file from …", returns empty string → no descriptors extracted for that record (possible "outdated" miss or empty list).
- createUpdateProcess: If no outdated records, no file written, no Step 3 started ("records with outdated mesh terms were not found").
- Step 3 start failure: If startWML3GToWML3GConversionProcess throws (e.g. process creation), error logged; "Failed to create process … Update MeSH in WML3G".
-
term2num.xml or term2num_old.xml missing:
-
Step 3 (per record)
- Missing WML3G file: Read fails → record fails, error collector, process continues other records.
- No MeSH in DB: generateMeshTerms returns null/empty → merge can produce empty MeSH block; if XSL/schema requires non-empty elements, validation can fail (e.g. GR28 "Empty <MeSHdescriptor>").
- Validation failure: conv.validate(ml3gSource, WILEY_ML3GV2_GRAMMAR) non-empty → CmsException, record marked failed, error collector.
- Asset/path errors: getAssetsUris or putFile failures → record fails. At end, clearDescriptors() runs (Step 2 tables cleared).
-
Wiki: "How to perform annual MeSH update" (e.g.
C:\Tickets\XDPS-2646\How to perform annual MeSH update · wiley_XDPS Wiki.html). - Step 1: Term2NumHelper – FTP/download → term2num/downloaded; Rebuild → term2num.xml/xsl in OUTPUT; manually backup as term2num_old.xml.
- Step 2: MeshtermCodesUpdaterManager – term2num vs term2num_old → CHANGED_DESCRIPTORS; RECORD_4_CHECK + RECORD_DESCRIPTORS from WML3G/source; outdated list → meshterm_outdated_records; optionally start Step 3 for CDSR.
- Step 3: Wml3gValidationHandler – read WML3G from ENTIRE path, merge MeSH from COCHRANE_MESHTERM_* tables, validate, overwrite same WML3G file.
- MeSH tables for Step 3 are populated only by "Download Meshterms" (or monthly job), not by Steps 1–2.