Did I find the right examples for you? yes no      Crawl my project      Python Jobs

All Samples(6)  |  Call(4)  |  Derive(0)  |  Import(2)
Converts HTML entities to unicode.  For example '&' becomes '&'.

FIXME:
WARNING: There is a bug between sgmllib.SGMLParser.goahead() and
BeautifulSoup.BeautifulStoneSoup.handle_entityref() where entity-like
strings that don't match known entities are guessed at (if they come in
the middle of the text) or are omitted (if they come at the end of the
text).

Further, unrecognized entities will have their leading ampersand escaped(more...)

        def entities_to_unicode(text):
    """Converts HTML entities to unicode.  For example '&' becomes '&'.

    FIXME:
    WARNING: There is a bug between sgmllib.SGMLParser.goahead() and
    BeautifulSoup.BeautifulStoneSoup.handle_entityref() where entity-like
    strings that don't match known entities are guessed at (if they come in
    the middle of the text) or are omitted (if they come at the end of the
    text).

    Further, unrecognized entities will have their leading ampersand escaped
    and trailing semicolon (if it exists) stripped. Examples:

    Inputs "...&bob;...", "...&bob&...", "...&bob;", and "...&bob" will give
    outputs "...&bob...", "...&bob&...", "...&bob", and "...",
    respectively.
    """
    soup = BeautifulSoup.BeautifulStoneSoup(text,
        convertEntities=BeautifulSoup.BeautifulStoneSoup.ALL_ENTITIES)
    string = unicode(soup)
    # for some reason plain old instances of & aren't converted to & ??
    string = string.replace('&', '&')
    return string
        


src/m/e/mediadrop-HEAD/mediadrop/lib/xhtml/__init__.py   mediadrop(Download)
from webhelpers import text
 
from mediadrop.lib.xhtml.htmlsanitizer import (Cleaner,
    entities_to_unicode as decode_entities,
    encode_xhtml_entities as encode_entities)
        string = strip_xhtml(string)
 
    string = decode_entities(string)
 
    if len(string) > size:
    if not string:
        return u''
    new_str = decode_entities(string)
    if len(new_str) <= size + buffer:
        return string
 
    if _decode_entities:
        string = decode_entities(string)
 
    return string

src/m/e/mediadrop-HEAD/mediadrop/model/__init__.py   mediadrop(Download)
from unidecode import unidecode
 
from mediadrop.lib.xhtml.htmlsanitizer import entities_to_unicode
from mediadrop.model.meta import DBSession, metadata
 
    string = unicode(string).lower()
    # Replace xhtml entities
    string = entities_to_unicode(string)
    # Transliterate to ASCII, as best as possible:
    string = unidecode(string)