Archived Forum Post

Index of archived forum posts

Question:

Does your HTML to XML Conversion Library support HTML5?

Oct 14 '13 at 12:39

Does your HTML to XML Conversion Library support HTML5?

Do you plan to provide support?


Answer

HTML5 is intended to subsume not only HTML 4, but also XHTML 1 and DOM Level 2 HTML. ( http://en.wikipedia.org/wiki/HTML5 )

Therefore, given that XHTML is just an application of XML (just like other things are applications of XML, such as a SOAP request), it follows that any HTML5 document is already an XML document.

The purpose of the HTML-to-XML conversion is to convert HTML that may not already be valid XML, into valid XML so that it may be parsed programmatically. Part of the benefit is for errors, non-terminated tags, etc. to be automatically fixed in a reasonable way as to maintain the original intended structure. In addition, the text becomes encapsulated in "text" nodes.

If you already have HTML5, then technically there's no need to convert to XML because you already have XML. If however, you have a mixture of HTML and HTML5, and wish to parse all of these documents in the same way, then you could certainly convert all to XML using Chilkat HTML-to-XML. The only benefit to converting the HTML5 is that the text will be broken out into "text" nodes.