![]() ![]() 166 167 ``add_nofollow``: 168 If true, then any tags will have ``rel="nofollow"`` added to them. 162 163 ``safe_attrs``: 164 A set of attribute names to override the default list of attributes 165 considered 'safe' (when safe_attrs_only=True). 158 159 ``safe_attrs_only``: 160 If true, only include 'safe' attributes (specifically the list 161 from the feedparser HTML sanitisation web site). ![]() ![]() 155 156 ``remove_unknown_tags``: 157 Remove any tags that aren't standard parts of HTML. 152 153 ``allow_tags``: 154 A list of tags to include (default include all). the whole subtree, not just the tag itself. Killing also removes the tag's content, 151 i.e. 148 149 ``kill_tags``: 150 A list of tags to kill. Only the tags will be removed, 147 their content will get pulled up into the parent tag. 132 133 ``embedded``: 134 Removes any embedded objects (flash, iframes) 135 136 ``frames``: 137 Removes any frame-related tags 138 139 ``forms``: 140 Removes any form tags 141 142 ``annoying_tags``: 143 Tags that aren't *wrong*, but are annoying. 129 130 ``processing_instructions``: 131 Removes any processing instructions. Defaults to the value of the ``style`` option. 117 118 ``inline_style`` 119 Removes any style attributes. ![]() 111 112 ``comments``: 113 Removes any comments. Also removes stylesheets 110 as they could contain Javascript. 107 108 ``javascript``: 109 Removes any Javascript, like an ``onclick`` attribute. The cleaning is controlled by attributes you can 103 override attributes in a subclass, or set them in the constructor. searchġ00 """ 101 Instances cleans the document of each of the possible offending 102 elements. compile ( 76 r'(?:javascript|jscript|livescript|vbscript|data|about|mocha):', 77 re. search 75 _is_possibly_malicious_scheme = re. I ) 70 71 # All kinds of schemes besides just javascript: that can cause 72 # execution: 73 _is_image_dataurl = re. I ) 66 67 # Do I have to worry about 68 _css_import_re = re. 59 # Look at these tests: 60 61 62 # This is an IE-specific construct you can have in a stylesheet to 63 # run some Javascript: 64 _css_javascript_re = re. 55 # UTF-7 detections? Example: 56 # +ADw-SCRIPT+AD4-alert('XSS') +ADw-/SCRIPT+AD4- 57 # you don't always have to have the charset set, if the page has no charset 58 # and there's UTF7-like code in it. html import xhtml_to_html, _transform_result 18 19 try : 20 unichr 21 except NameError : 22 # Python 3 23 unichr = chr 24 try : 25 unicode 26 except NameError : 27 # Python 3 28 unicode = str 29 try : 30 bytes 31 except NameError : 32 # Python ? That's the worst of the 54 # metas. html import fromstring, tostring, XHTML_NAMESPACE 17 from lxml. parse import urlsplit 14 from lxml import etree 15 from lxml. 5 """ 6 7 import re 8 import copy 9 try : 10 from urlparse import urlsplit 11 except ImportError : 12 # Python 3 13 from urllib. Source Code for Module 1 """A cleanup tool for HTML. ![]()
0 Comments
Leave a Reply. |