Changeset 2271:4ff55e355ff8 in livinglogic.python.xist

Show
Ignore:
Timestamp:
02/22/05 16:15:21 (15 years ago)
Author:
Walter Doerwald <walter@…>
Branch:
default
Message:

The encoding used in parsing defaults to None now. If no encoding,
is specified, parseURL falls back to the one from the Content-Type header.
If the encoding still can't be determined, "utf-8" will be used from
"normal" XML parsing and "iso-8859-1" for parsing with tidy.

Files:
2 modified

Legend:

Unmodified
Added
Removed
  • NEWS.xml

    r2270 r2271  
    88have been merged into <module>ll.xist.xsc</module> (This avoids 
    99import problems).</item> 
     10<item>The encoding used for parsing now default to <lit>None</lit>. When 
     11reading from an &url; and no default encoding has been specified the one 
     12from the <lit>Content-Type</lit> header is used. If this still doesn't 
     13result in a usable encoding, <lit>"utf-8"</lit> is used when parsing &xml; 
     14and <lit>iso-8859-1</lit> is used when parsing broken &html;.</item> 
    1015</ulist> 
    1116</section> 
  • _xist/parsers.py

    r2270 r2271  
    440440    """ 
    441441 
    442     def __init__(self, saxparser=SGMLOPParser, nspool=None, prefixes=None, tidy=False, loc=True, validate=True, encoding="utf-8"): 
     442    def __init__(self, saxparser=SGMLOPParser, nspool=None, prefixes=None, tidy=False, loc=True, validate=True, encoding=None): 
    443443        """ 
    444444        <par>Create a new <class>Parser</class> instance.</par> 
     
    473473        <term><arg>validate</arg></term><item>Should the parsed &xml; nodes be validated after parsing?</item> 
    474474 
    475         <term><arg>encoding</arg></term><item>The default encoding to use, when to source doesn't provide an &xml; header.</item> 
     475        <term><arg>encoding</arg></term><item>The default encoding to use, when the 
     476        source doesn't provide an &xml; header. The default <lit>None</lit> results in 
     477        <lit>"utf-8"</lit> for parsing &xml; and <lit>"iso-8859-1"</lit> when parsing 
     478        broken &html; (when <lit><arg>tidy</arg></lit> is true).</item> 
    476479        </dlist> 
    477480        """ 
     
    579582 
    580583        if self.tidy: 
     584            if encoding is None: 
     585                encoding = "iso-8859-1" 
    581586            return self._parseHTML(stream, base, sysid, encoding) 
     587 
     588        if encoding is None: 
     589            encoding = "utf-8" 
    582590 
    583591        source = sax.xmlreader.InputSource(sysid) 
     
    630638        if sysid is None: 
    631639            sysid = str(base) 
    632         return self._parse(stream, base, sysid, self.encoding) 
     640        encoding = self.encoding 
     641        if encoding is None: 
     642            encoding = stream.encoding 
     643        return self._parse(stream, base, sysid, encoding) 
    633644 
    634645    def parseFile(self, name, base=None, sysid=None):