zimscraperlib.rewriting.css
CSS Rewriting
This modules contains tools to rewrite CSS retrieved from an online source so that it can safely operate within a ZIM, linking only to ZIM entries everytime a URL is used.
The rewriter needs to have an article url rewriter to rewrite URLs found in CSS, an optional base href if the CSS to rewrite was found inline an HTML document which has a base href set, and an optional flag indicating if in case of parsing error we want to fallback to simple regex rewriting or we prefer to drop the offending rule.
Classes:
-
CssRewriter–CSS rewriting class
-
FallbackRegexCssRewriter–Fallback CSS rewriting based on regular expression.
CssRewriter
CssRewriter(
url_rewriter: ArticleUrlRewriter,
base_href: str | None,
*,
remove_errors: bool = False,
)
CSS rewriting class
Parameters:
-
url_rewriter(ArticleUrlRewriter) –the rewriter of URLs
-
base_href(str | None) –if CSS to rewrite has been found inline on an HTML page, this is
the potential base href found in HTML document remove_errors: if True, we just drop bad CSS rules ; if False, we fallback to regex-based rewriting of the whole CSS document
Methods:
-
rewrite–Rewrite a 'standalone' CSS document
-
rewrite_inline–Rewrite an 'inline' CSS document
Attributes:
Source code in src/zimscraperlib/rewriting/css.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
base_href
instance-attribute
base_href = base_href
fallback_rewriter
instance-attribute
fallback_rewriter = FallbackRegexCssRewriter(
url_rewriter, base_href
)
remove_errors
instance-attribute
remove_errors = remove_errors
url_rewriter
instance-attribute
url_rewriter = url_rewriter
rewrite
Rewrite a 'standalone' CSS document
'standalone' means "not inline an HTML document"
Source code in src/zimscraperlib/rewriting/css.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | |
rewrite_inline
Rewrite an 'inline' CSS document
'inline' means "inline an HTML document"
Source code in src/zimscraperlib/rewriting/css.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 | |
FallbackRegexCssRewriter
FallbackRegexCssRewriter(
url_rewriter: ArticleUrlRewriter, base_href: str | None
)
Bases: RxRewriter
Fallback CSS rewriting based on regular expression.
This is obviously way less powerful than real CSS parsing, but it allows to cope with CSS we failed to parse without dropping any CSS rule (problem could be just a parsing issue, not necessarily a bad CSS rule)
Create a RxRewriter adapted for CSS rules rewriting
Methods:
-
rewrite–Apply the unique
compiled_rulespattern and replace the content.
Attributes:
-
compiled_rule(Pattern[str] | None) – -
rules–
Source code in src/zimscraperlib/rewriting/css.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | |
rules
instance-attribute
rules = rules or []
rewrite
Apply the unique compiled_rules pattern and replace the content.
Source code in src/zimscraperlib/rewriting/rx_replacer.py
120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | |