Matlab xmlread DTD issues - nickcounts/MDRT GitHub Wiki
Matlab xmlread() DTD Issues
Parsing XML files with DOCTYPPE definition (DTD) declarations causes Matlab's xmlread() function to fail. The best explanation of and solution to this issue came from user Suever on stackexchange.com. It is reproduced here:
Solution
You need to disable external DTD loading for the parser. To accomplish this, you can create a custom DocumentBuilder object, disable the external DTD loading, and pass this as the second input to xmlread
.
From the hidden xmlread
documentation (visible if you open the file with edit xmlread):
% Advanced use:
% Note that FILENAME can also be an InputSource, File, or InputStream object
% DOMNODE = XMLREAD(FILENAME,...,P,...) where P is a DocumentBuilder object
% DOMNODE = XMLREAD(FILENAME,...,'-validating',...) will create a validating
% parser if one was not provided.
% DOMNODE = XMLREAD(FILENAME,...,ER,...) where ER is an EntityResolver will
% will set the EntityResolver before parsing
% DOMNODE = XMLREAD(FILENAME,...,EH,...) where EH is an ErrorHandler will
% will set the ErrorHandler before parsing
% [DOMNODE,P] = XMLREAD(FILENAME,...) will return a parser suitable for passing
% back to XMLREAD for future parses.
%
So this ends up looking something like this:
% Create the DocumentBuilder
builder = javax.xml.parsers.DocumentBuilderFactory.newInstance;
% Disable validation
builder.setFeature('http://apache.org/xml/features/nonvalidating/load-external-dtd', false);
% Read your file
xml = xmlread(filename, builder);
Keep in mind that this could potentially result in your file being parsed incorrectly.
Update
So looking into this a little closer, once we get past the DTD validation failing, the FEX xml2struct
doesn't handle the DOCTYPE entry in the XML correctly and tries to process it just like a normal node. You could modify the source of xml2struct
to detect this internally:
if node.getNodeType == node.DOCUMENT_TYPE_NODE
However, it would probably be easier to just remove all the DOCTYPEs for all of your XML files. The following script should be able to do this.
folder = 'directory/where/all/files/live';
files = dir(fullfile(folder, '*.xml'));
for k = 1:numel(files)
filename = fullfile(folder, files(k).name);
fid = fopen(filename, 'rt');
content = fread(fid, '*char')';
fclose(fid);
newcontent = regexprep(content, '\n\s*?<!DOCTYPE.*?(?=\n)', '');
fout = fopen(filename, 'wt');
fwrite(fout, newcontent);
fclose(fout);
end