Rescue terrible HTML with TagSoup XHTML
The problem is that the Web is still mostly populated by the scary legacy of poorly structured HTML, much of it not even compliant to the more lenient SGML standard. XHTML is a friendly enough format for parsing and screen-scraping, but the Web still has a lot of messy HTML out there. In this tip Uche Ogbuji demonstrates the use of TagSoup to turn just about any HTML into neat XHTML.
|
|
The problem is that the Web is still mostly populated by the scary legacy of poorly structured HTML, much of it not even compliant to the more lenient SGML standard. XHTML is a friendly enough format for parsing and screen-scraping, but the Web still has a lot of messy HTML out there. In this tip Uche Ogbuji demonstrates the use of TagSoup to turn just about any HTML into neat XHTML. Full Story |
This topic does not have any threads posted yet!
You cannot post until you login.