Archived Forum Post

Index of archived forum posts

Question:

Parse XML with HTML tags

Jan 02 '17 at 13:02

I have the following problem. I received an XML which contains Tags:

<descripgrp> <descrip type="Notes"> Connect the AC power cord to the AC power adapter, then to the back of the <xref tlink="English:photo printer">photo printer</xref>. Please use "cable" instead of "cord". </descrip> </descripgrp>

CkXml_content gives me (correctly) everything, but the <xref>. Is there any way to get the full content like it is?

I also tried: CkXml_GetXmlSb $xml $sb set description [CkStringBuilder_getBetween $sb "<descrip type="klzzwxh:0007Notesklzzwxh:0008">" "</descrip>"]

Which somewhat worked. It put the "<xref" into a new line, like this:

Connect the AC power cord to the AC power adapter, then to the back of the . Please use "cable" instead of "cord". <xref tlink="English:photo printer">photo printer</xref>

Therefore the round trip using LoadXmlFile and GetXml does not work in this case (bummer).

Any ideas how to solve this?


Answer

Thanks. This is due to a limitation of the Chilkat XML parser -- it's not designed for "marked up text". In other words, the XML parsing is designed to handle XML data where nodes contain either child nodes, or text. It does not maintain the location of child nodes within text content.

There are two workarounds. (1) Use a different XML parser -- one that is better at dealing with marked up text, or (2) you could try using Chilkat.HtmlToXml to "convert" to XML. This would result in each text segment being placed within it's own "<text>...</text>" node. Your original XML would get converted to:

<?xml version="1.0" encoding="iso-8859-1" ?>
<root>
    <descripgrp>
        <descrip type="Notes">
            <text>Connect the AC power cord to the AC power adapter, then to the back of the </text>
            <xref tlink="English:photo printer">
                <text>photo printer</text>
            </xref>
            <text>. Please use "cable" instead of "cord". </text>
        </descrip>
    </descripgrp>
</root>