Skip to content

zimscraperlib.rewriting.rx_replacer

Classes:

  • RxRewriter

    RxRewriter is a generic rewriter base on regex.

Functions:

  • add_around

    Create a rewrite_function which add a prefix and a suffix around the match.

  • add_prefix

    Create a rewrite_function which add the prefix to the matching str.

  • add_suffix

    Create a rewrite_function which add the suffix to the matching str.

  • m2str

    Call a rewrite_function with a string instead of a match object.

  • replace

    Create a rewrite_function replacing src by target in the matching str.

  • replace_all

    Create a rewrite_function which replace the whole match with text.

  • replace_prefix_from

    Returns a function which replaces everything before match with prefix.

Attributes:

TransformationAction module-attribute

TransformationAction = Callable[
    [Match[str], dict[str, Any] | None], str
]

TransformationRule module-attribute

TransformationRule = tuple[
    Pattern[str], TransformationAction
]

RxRewriter

RxRewriter(
    rules: Iterable[TransformationRule] | None = None,
)

RxRewriter is a generic rewriter base on regex.

The main "input" is a list of rules, each rule being a tuple (regex, rewriting_function). We want to apply each rule to the content. But doing it blindly is counter-productive. It would means that we have to do N replacements (N == number of rules). To avoid that, we create one unique regex (compiled_rule) equivalent to (regex0|regex1|regex2|...) and we do only one replacement with this regex. When we have a match, we do N regex search to know which rules is corresponding and we apply the associated rewriting_function.

Methods:

  • rewrite

    Apply the unique compiled_rules pattern and replace the content.

Attributes:

Source code in src/zimscraperlib/rewriting/rx_replacer.py
103
104
105
106
107
108
109
110
def __init__(
    self,
    rules: Iterable[TransformationRule] | None = None,
):
    self.rules = rules or []
    self.compiled_rule: re.Pattern[str] | None = None
    if self.rules:
        self._compile_rules(self.rules)

compiled_rule instance-attribute

compiled_rule: Pattern[str] | None = None

rules instance-attribute

rules = rules or []

rewrite

rewrite(
    text: str | bytes, opts: dict[str, Any] | None = None
) -> str

Apply the unique compiled_rules pattern and replace the content.

Source code in src/zimscraperlib/rewriting/rx_replacer.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
def rewrite(
    self,
    text: str | bytes,
    opts: dict[str, Any] | None = None,
) -> str:
    """
    Apply the unique `compiled_rules` pattern and replace the content.
    """
    if isinstance(text, bytes):
        text = text.decode()

    def replace(m_object: re.Match[str]) -> str:
        """
        This method search for the specific rule which have matched and apply it.
        """
        for i, rule in enumerate(self.rules, 1):
            if not m_object.group(i):
                # This is not the ith rules which match
                continue
            result = rule[1](m_object, opts)
            return result
        # fallback never supposed to be reached since this method is called
        # by Pattern.sub which already checks there is a match
        return text  # pragma: no cover

    assert self.compiled_rule is not None  # noqa
    return self.compiled_rule.sub(replace, text)

add_around

add_around(
    prefix: str, suffix: str
) -> TransformationAction

Create a rewrite_function which add a prefix and a suffix around the match.

Source code in src/zimscraperlib/rewriting/rx_replacer.py
22
23
24
25
26
27
28
29
30
31
def add_around(prefix: str, suffix: str) -> TransformationAction:
    """
    Create a rewrite_function which add a `prefix` and a `suffix` around the match.
    """

    @m2str
    def f(x: str) -> str:
        return prefix + x + suffix

    return f

add_prefix

add_prefix(prefix: str) -> TransformationAction

Create a rewrite_function which add the prefix to the matching str.

Source code in src/zimscraperlib/rewriting/rx_replacer.py
34
35
36
37
38
39
def add_prefix(prefix: str) -> TransformationAction:
    """
    Create a rewrite_function which add the `prefix` to the matching str.
    """

    return add_around(prefix, "")

add_suffix

add_suffix(suffix: str) -> TransformationAction

Create a rewrite_function which add the suffix to the matching str.

Source code in src/zimscraperlib/rewriting/rx_replacer.py
42
43
44
45
46
47
def add_suffix(suffix: str) -> TransformationAction:
    """
    Create a rewrite_function which add the `suffix` to the matching str.
    """

    return add_around("", suffix)

m2str

m2str(
    function: Callable[[str], str],
) -> TransformationAction

Call a rewrite_function with a string instead of a match object. A lot of rewrite function don't need the match object as they are working directly on text. This decorator can be used on rewrite_function taking a str.

Source code in src/zimscraperlib/rewriting/rx_replacer.py
 9
10
11
12
13
14
15
16
17
18
19
def m2str(function: Callable[[str], str]) -> TransformationAction:
    """
    Call a rewrite_function with a string instead of a match object.
    A lot of rewrite function don't need the match object as they are working
    directly on text. This decorator can be used on rewrite_function taking a str.
    """

    def wrapper(m_object: re.Match[str], _opts: dict[str, Any] | None) -> str:
        return function(m_object[0])

    return wrapper

replace

replace(src: str, target: str) -> TransformationAction

Create a rewrite_function replacing src by target in the matching str.

Source code in src/zimscraperlib/rewriting/rx_replacer.py
65
66
67
68
69
70
71
72
73
74
def replace(src: str, target: str) -> TransformationAction:
    """
    Create a rewrite_function replacing `src` by `target` in the matching str.
    """

    @m2str
    def f(x: str) -> str:
        return x.replace(src, target)

    return f

replace_all

replace_all(text: str) -> TransformationAction

Create a rewrite_function which replace the whole match with text.

Source code in src/zimscraperlib/rewriting/rx_replacer.py
77
78
79
80
81
82
83
84
85
86
def replace_all(text: str) -> TransformationAction:
    """
    Create a rewrite_function which replace the whole match with text.
    """

    @m2str
    def f(_x: str) -> str:
        return text

    return f

replace_prefix_from

replace_prefix_from(
    prefix: str, match: str
) -> TransformationAction

Returns a function which replaces everything before match with prefix.

Source code in src/zimscraperlib/rewriting/rx_replacer.py
50
51
52
53
54
55
56
57
58
59
60
61
62
def replace_prefix_from(prefix: str, match: str) -> TransformationAction:
    """
    Returns a function which replaces everything before `match` with `prefix`.
    """

    @m2str
    def f(x: str) -> str:
        match_index = x.index(match)
        if match_index == 0:
            return prefix
        return x[:match_index] + prefix

    return f