Regular Expressions

Introduction

Regular expressions (often called regex or regexp) are powerful sequences of characters that define a search pattern. They're used for string matching within text, allowing you to search and match strings based on a specified pattern. A regular expression may contain literals or special characters with a predefined meaning.

Elements of a regular expression

Literal characters

Most characters simply match themselves. The letter a matches the letter "a" in the text.

Anchors (position markers)

Anchors do not match a character, they match a position in the text.

^ (caret) — matches the start of a line.
$ (dollar sign) — matches the end of a line.
\b — matches a word boundary (the position between a word character and a non-word character).

Character classes

Enclosed in square brackets [ ], a character class matches one character from a defined set.

[aeiou] — matches any single vowel.
[a-z] — matches any lowercase letter from a to z (range).
[0-9a-fA-F] — matches any hexadecimal digit.

Negated character classes

A caret ^ placed immediately inside a character class negates it, matching any character not in the set.

[^aeiou] — matches any character that is not a vowel.
[^0-9] — matches any character that is not a digit.

Special (shorthand) character classes

Predefined shortcuts for common character sets.

. (dot) — matches any single character except a newline.
\d — matches any digit (same as [0-9]).
\D — matches any non-digit.
\w — matches any word character: letter, digit, or underscore (same as [a-zA-Z0-9_]).
\W — matches any non-word character.
\s — matches any whitespace character (space, tab, newline).
\S — matches any non-whitespace character.

Quantifiers (repetition)

Specify how many times the preceding element must occur.

* — zero or more times.
+ — one or more times.
? — zero or one time (makes the element optional).
{n} — exactly n times.
{n,} — n or more times.
{n,m} — between n and m times (inclusive).

Greedy vs. lazy quantifiers

By default, quantifiers are greedy — they match as much text as possible. Adding a ? after a quantifier makes it lazy (matches as little as possible).

.* — greedy: matches as many characters as possible.
.*? — lazy: matches as few characters as possible.

Groups

Parentheses ( ) group multiple characters or sub-patterns into a single unit. Groups can be quantified, and their matched content can be referenced later.

Capturing group: (abc) — matches "abc" and remembers the match for later use (back-reference or replacement).
Non-capturing group: (?:abc) — groups the pattern without remembering the match (useful for performance or clarity).
Named group: (?P<name>abc) or (?<name>abc) — a capturing group accessible by name instead of number.

Back-references

Refer back to the content matched by a previous capturing group.

\1 — matches the same text that was matched by the first capturing group.
\2 — matches the same text as the second group, and so on.

Alternation

The pipe | acts as an "or" operator, matching either the pattern on the left or the pattern on the right.

cat|dog — matches "cat" or "dog".
(red|blue) car — matches "red car" or "blue car".

Escape character

The backslash \ removes the special meaning of the following character, allowing it to be matched literally.

\. — matches a literal dot (instead of "any character").
\[ — matches a literal opening bracket.
\\ — matches a literal backslash.

It is also used for encoded characters:

\n — newline.
\t — tab.
\r — carriage return.
\x20 — the character with hexadecimal code 20 (a space).

Lookahead and lookbehind (assertions)

These check whether a pattern exists before or after the current position, without consuming any characters.

(?=abc) — positive lookahead: succeeds if "abc" follows.
(?!abc) — negative lookahead: succeeds if "abc" does not follow.
(?<=abc) — positive lookbehind: succeeds if "abc" precedes.
(?<!abc) — negative lookbehind: succeeds if "abc" does not precede.

Flags (modifiers)

Flags change how the entire expression behaves. They are typically placed after the closing delimiter (e.g. /pattern/gi).

i — case-insensitive matching.
g — global: find all matches, not just the first.
m — multiline: ^ and $ match the start/end of each line, not just the whole string.
s — single-line (dotall): . also matches newline characters.

Examples

Some example regular expressions to be used within Openscape Endpoint Management.

IP address range

The following examples can be used for matching IP address ranges

Regular Expression	Description
192\.168\.0\.((2[5-9])\|(3[0-9]))	Starting at 192.168.0.25 until 192.168.0.39
192\.168\.1\.[0-9]{1,3}	All addresses within subnet 192.168.1.0/24

Device types

The following examples can be used for matching specific device types

Regular Expression	Description
CP[67].*	Matches CP600, CP700, CP700X and CP710
^CP700$	Matches CP700 but not CP700X

Testing your regular expressions

If you want to test your regular expression, there are plenty of websites that allow you to do this online.

Views