Difference between revisions of "Regular Expressions"
The Wiki of Unify contains information on clients and devices, communications systems and unified communications. - Unify GmbH & Co. KG is a Trademark Licensee of Siemens AG.
(→Elements of a regular expression) |
|||
| Line 5: | Line 5: | ||
== Elements of a regular expression == | == Elements of a regular expression == | ||
| − | + | === Literal characters === | |
| − | * Character | + | Most characters simply match themselves. The letter '''a''' matches the letter "a" in the text. |
| − | * | + | |
| − | * Quantifiers | + | === Anchors (position markers) === |
| − | * | + | Anchors do not match a character, they match a ''position'' in the text. |
| − | * | + | * <code>^</code> (caret) — matches the start of a line. |
| − | * | + | * <code>$</code> (dollar sign) — matches the end of a line. |
| − | * | + | * <code>\b</code> — matches a word boundary (the position between a word character and a non-word character). |
| + | |||
| + | === Character classes === | ||
| + | Enclosed in square brackets <code>[ ]</code>, a character class matches '''one''' character from a defined set. | ||
| + | * <code>[aeiou]</code> — matches any single vowel. | ||
| + | * <code>[a-z]</code> — matches any lowercase letter from a to z (range). | ||
| + | * <code>[0-9a-fA-F]</code> — matches any hexadecimal digit. | ||
| + | |||
| + | === Negated character classes === | ||
| + | A caret <code>^</code> placed immediately inside a character class negates it, matching any character '''not''' in the set. | ||
| + | * <code>[^aeiou]</code> — matches any character that is not a vowel. | ||
| + | * <code>[^0-9]</code> — matches any character that is not a digit. | ||
| + | |||
| + | === Special (shorthand) character classes === | ||
| + | Predefined shortcuts for common character sets. | ||
| + | * <code>.</code> (dot) — matches any single character except a newline. | ||
| + | * <code>\d</code> — matches any digit (same as <code>[0-9]</code>). | ||
| + | * <code>\D</code> — matches any non-digit. | ||
| + | * <code>\w</code> — matches any word character: letter, digit, or underscore (same as <code>[a-zA-Z0-9_]</code>). | ||
| + | * <code>\W</code> — matches any non-word character. | ||
| + | * <code>\s</code> — matches any whitespace character (space, tab, newline). | ||
| + | * <code>\S</code> — matches any non-whitespace character. | ||
| + | |||
| + | === Quantifiers (repetition) === | ||
| + | Specify how many times the preceding element must occur. | ||
| + | * <code>*</code> — zero or more times. | ||
| + | * <code>+</code> — one or more times. | ||
| + | * <code>?</code> — zero or one time (makes the element optional). | ||
| + | * <code>{n}</code> — exactly ''n'' times. | ||
| + | * <code>{n,}</code> — ''n'' or more times. | ||
| + | * <code>{n,m}</code> — between ''n'' and ''m'' times (inclusive). | ||
| + | |||
| + | === Greedy vs. lazy quantifiers === | ||
| + | By default, quantifiers are '''greedy''' — they match as much text as possible. Adding a <code>?</code> after a quantifier makes it '''lazy''' (matches as little as possible). | ||
| + | * <code>.*</code> — greedy: matches as many characters as possible. | ||
| + | * <code>.*?</code> — lazy: matches as few characters as possible. | ||
| + | |||
| + | === Groups === | ||
| + | Parentheses <code>( )</code> group multiple characters or sub-patterns into a single unit. Groups can be quantified, and their matched content can be referenced later. | ||
| + | * '''Capturing group''': <code>(abc)</code> — matches "abc" and remembers the match for later use (back-reference or replacement). | ||
| + | * '''Non-capturing group''': <code>(?:abc)</code> — groups the pattern without remembering the match (useful for performance or clarity). | ||
| + | * '''Named group''': <code>(?P<name>abc)</code> or <code>(?<name>abc)</code> — a capturing group accessible by name instead of number. | ||
| + | |||
| + | === Back-references === | ||
| + | Refer back to the content matched by a previous capturing group. | ||
| + | * <code>\1</code> — matches the same text that was matched by the first capturing group. | ||
| + | * <code>\2</code> — matches the same text as the second group, and so on. | ||
| + | |||
| + | === Alternation === | ||
| + | The pipe <code>|</code> acts as an "or" operator, matching either the pattern on the left or the pattern on the right. | ||
| + | * <code>cat|dog</code> — matches "cat" or "dog". | ||
| + | * <code>(red|blue) car</code> — matches "red car" or "blue car". | ||
| + | |||
| + | === Escape character === | ||
| + | The backslash <code>\</code> removes the special meaning of the following character, allowing it to be matched literally. | ||
| + | * <code>\.</code> — matches a literal dot (instead of "any character"). | ||
| + | * <code>\[</code> — matches a literal opening bracket. | ||
| + | * <code>\\</code> — matches a literal backslash. | ||
| + | |||
| + | It is also used for encoded characters: | ||
| + | * <code>\n</code> — newline. | ||
| + | * <code>\t</code> — tab. | ||
| + | * <code>\r</code> — carriage return. | ||
| + | * <code>\x20</code> — the character with hexadecimal code 20 (a space). | ||
| + | |||
| + | === Lookahead and lookbehind (assertions) === | ||
| + | These check whether a pattern exists before or after the current position, '''without consuming''' any characters. | ||
| + | * <code>(?=abc)</code> — '''positive lookahead''': succeeds if "abc" follows. | ||
| + | * <code>(?!abc)</code> — '''negative lookahead''': succeeds if "abc" does '''not''' follow. | ||
| + | * <code>(?<=abc)</code> — '''positive lookbehind''': succeeds if "abc" precedes. | ||
| + | * <code>(?<!abc)</code> — '''negative lookbehind''': succeeds if "abc" does '''not''' precede. | ||
| + | |||
| + | === Flags (modifiers) === | ||
| + | Flags change how the entire expression behaves. They are typically placed after the closing delimiter (e.g. <code>/pattern/gi</code>). | ||
| + | * <code>i</code> — case-insensitive matching. | ||
| + | * <code>g</code> — global: find all matches, not just the first. | ||
| + | * <code>m</code> — multiline: <code>^</code> and <code>$</code> match the start/end of each line, not just the whole string. | ||
| + | * <code>s</code> — single-line (dotall): <code>.</code> also matches newline characters. | ||
== Examples == | == Examples == | ||
Revision as of 09:46, 18 May 2026
Contents
- 1 Introduction
- 2 Elements of a regular expression
- 2.1 Literal characters
- 2.2 Anchors (position markers)
- 2.3 Character classes
- 2.4 Negated character classes
- 2.5 Special (shorthand) character classes
- 2.6 Quantifiers (repetition)
- 2.7 Greedy vs. lazy quantifiers
- 2.8 Groups
- 2.9 Back-references
- 2.10 Alternation
- 2.11 Escape character
- 2.12 Lookahead and lookbehind (assertions)
- 2.13 Flags (modifiers)
- 3 Examples
- 4 Testing your regular expressions
Introduction
Regular expressions (often called regex or regexp) are powerful sequences of characters that define a search pattern. They're used for string matching within text, allowing you to search and match strings based on a specified pattern. A regular expression may contain literals or special characters with a predefined meaning.
Elements of a regular expression
Literal characters
Most characters simply match themselves. The letter a matches the letter "a" in the text.
Anchors (position markers)
Anchors do not match a character, they match a position in the text.
-
^(caret) — matches the start of a line. -
$(dollar sign) — matches the end of a line. -
\b— matches a word boundary (the position between a word character and a non-word character).
Character classes
Enclosed in square brackets [ ], a character class matches one character from a defined set.
-
[aeiou]— matches any single vowel. -
[a-z]— matches any lowercase letter from a to z (range). -
[0-9a-fA-F]— matches any hexadecimal digit.
Negated character classes
A caret ^ placed immediately inside a character class negates it, matching any character not in the set.
-
[^aeiou]— matches any character that is not a vowel. -
[^0-9]— matches any character that is not a digit.
Special (shorthand) character classes
Predefined shortcuts for common character sets.
-
.(dot) — matches any single character except a newline. -
\d— matches any digit (same as[0-9]). -
\D— matches any non-digit. -
\w— matches any word character: letter, digit, or underscore (same as[a-zA-Z0-9_]). -
\W— matches any non-word character. -
\s— matches any whitespace character (space, tab, newline). -
\S— matches any non-whitespace character.
Quantifiers (repetition)
Specify how many times the preceding element must occur.
-
*— zero or more times. -
+— one or more times. -
?— zero or one time (makes the element optional). -
{n}— exactly n times. -
{n,}— n or more times. -
{n,m}— between n and m times (inclusive).
Greedy vs. lazy quantifiers
By default, quantifiers are greedy — they match as much text as possible. Adding a ? after a quantifier makes it lazy (matches as little as possible).
-
.*— greedy: matches as many characters as possible. -
.*?— lazy: matches as few characters as possible.
Groups
Parentheses ( ) group multiple characters or sub-patterns into a single unit. Groups can be quantified, and their matched content can be referenced later.
- Capturing group:
(abc)— matches "abc" and remembers the match for later use (back-reference or replacement). - Non-capturing group:
(?:abc)— groups the pattern without remembering the match (useful for performance or clarity). - Named group:
(?P<name>abc)or(?<name>abc)— a capturing group accessible by name instead of number.
Back-references
Refer back to the content matched by a previous capturing group.
-
\1— matches the same text that was matched by the first capturing group. -
\2— matches the same text as the second group, and so on.
Alternation
The pipe | acts as an "or" operator, matching either the pattern on the left or the pattern on the right.
-
cat|dog— matches "cat" or "dog". -
(red|blue) car— matches "red car" or "blue car".
Escape character
The backslash \ removes the special meaning of the following character, allowing it to be matched literally.
-
\.— matches a literal dot (instead of "any character"). -
\[— matches a literal opening bracket. -
\\— matches a literal backslash.
It is also used for encoded characters:
-
\n— newline. -
\t— tab. -
\r— carriage return. -
\x20— the character with hexadecimal code 20 (a space).
Lookahead and lookbehind (assertions)
These check whether a pattern exists before or after the current position, without consuming any characters.
-
(?=abc)— positive lookahead: succeeds if "abc" follows. -
(?!abc)— negative lookahead: succeeds if "abc" does not follow. -
(?<=abc)— positive lookbehind: succeeds if "abc" precedes. -
(?<!abc)— negative lookbehind: succeeds if "abc" does not precede.
Flags (modifiers)
Flags change how the entire expression behaves. They are typically placed after the closing delimiter (e.g. /pattern/gi).
-
i— case-insensitive matching. -
g— global: find all matches, not just the first. -
m— multiline:^and$match the start/end of each line, not just the whole string. -
s— single-line (dotall):.also matches newline characters.
Examples
Some example regular expressions to be used within Openscape Endpoint Management.
IP address range
The following examples can be used for matching IP address ranges
| Regular Expression | Description |
|---|---|
| 192\.168\.0\.((2[5-9])|(3[0-9])) | Starting at 192.168.0.25 until 192.168.0.39 |
| 192\.168\.1\.[0-9]{1,3} | All addresses within subnet 192.168.1.0/24 |
Device types
The following examples can be used for matching specific device types
| Regular Expression | Description |
|---|---|
| CP[67].* | Matches CP600, CP700, CP700X and CP710 |
| ^CP700$ | Matches CP700 but not CP700X |
Testing your regular expressions
If you want to test your regular expression, there are plenty of websites that allow you to do this online.



