[gull] Bug report of Python module
Daniel Cordey
dc at mjt.ch
Tue Feb 19 16:06:00 CET 2008
On Tuesday 19 February 2008, Nicolas Borboën wrote:
> http://bugs.python.org/
Merci, c'est exactement ca !
ceci mis-a-part, j'ai trouve des bugs report pour tous mes problemes. Ces bugs
date de 2004... le commentaire suivant m'eclaire sur la veritable utilite de
HTMLParser. Voila ce qui arrive quand on veut fait du traitement syntaxique
avec des regexp() et des algorithmes... :-)
###########
HTMLParser (and lots of other parsers I tried) has
definitely limits when it comes to error recovering. I dont
know if its good to put further development effort in
HTMLParser as it will IMHO never reach the ability to cope
with all the crappy HTML out there.
If you really want to have a html parser in Python, I
suggest you look at my htmlsax module packaged with
linkchecker (linkchecker.sf.net) and webcleaner
(webcleaner.sf.net), the parser is tested with lots of real
world examples.
The parser packaged with linkchecker has line counting, the
one with webcleaner not.
#################
dc
More information about the gull
mailing list