Regular Expressions (Regex) Complete Guide: Patterns, Groups, Flags

Regular expressions — commonly called regex or regexp — are one of the most powerful and versatile tools in a developer's toolkit. A regex is a sequence of characters that defines a search pattern. In this guide you will learn regex from the ground up, with practical examples you can use immediately.

What are Regular Expressions?

A regular expression is a pattern that describes a set of strings. They are used for searching, validating, extracting, and replacing text. Every major programming language supports regex — JavaScript, Python, Java, PHP, Go, Ruby — and the core syntax is largely the same across all of them.

For example, the regex \d{3}-\d{4} matches any 7-digit phone number formatted as 123-4567.

Basic Characters and Literals

Any letter, digit, or symbol that is not a special character matches itself literally:

hello     — Matches the exact string "hello"
abc123    — Matches "abc123"
bytekit   — Matches "bytekit"

Special characters (called metacharacters) have special meaning and must be escaped with a backslash \ to match literally:

. ^ $ * + ? { } [ ] \ | ( )

Character Classes

Character classes match any one character from a set:

[abc]       — Matches 'a', 'b', or 'c'
[a-z]       — Any lowercase letter
[A-Z]       — Any uppercase letter
[0-9]       — Any digit
[a-zA-Z]    — Any letter (upper or lower)
[^abc]      — Any character EXCEPT 'a', 'b', 'c' (negation with ^)
[a-z0-9]    — Any lowercase letter or digit

Shorthand Character Classes

\d    — Any digit [0-9]
\D    — Any non-digit [^0-9]
\w    — Any word character [a-zA-Z0-9_]
\W    — Any non-word character
\s    — Any whitespace (space, tab, newline)
\S    — Any non-whitespace
.     — Any character EXCEPT newline (with default flags)

Anchors

Anchors do not match characters — they match positions:

^     — Start of string (or line with multiline flag)
$     — End of string (or line with multiline flag)
\b    — Word boundary (position between word and non-word char)
\B    — Non-word boundary

^hello      — "hello" only at the start of the string
world$      — "world" only at the end of the string
^hello$     — Exactly "hello" — nothing before or after
\bcat\b     — The word "cat" as a whole word (not "concatenate")

Quantifiers

Quantifiers specify how many times a pattern should repeat:

*      — Zero or more times
+      — One or more times
?      — Zero or one time (optional)
{n}    — Exactly n times
{n,}   — At least n times
{n,m}  — Between n and m times

\d+        — One or more digits
\d*        — Zero or more digits
\d?        — Zero or one digit
\d{4}      — Exactly 4 digits (e.g., a year)
\d{2,4}    — Between 2 and 4 digits

Greedy vs Lazy

By default quantifiers are greedy — they match as much as possible. Add ? after any quantifier to make it lazy (match as little as possible):

<.+>    — Greedy: matches "<b>bold</b>" as one match
<.+?>   — Lazy: matches "<b>" and "</b>" separately

Groups and Capturing

Parentheses () create groups. Groups capture the matched text for later use:

(\d{4})-(\d{2})-(\d{2})   — Captures year, month, day from a date

// Input: "2025-01-15"
// Group 1: "2025"
// Group 2: "01"
// Group 3: "15"

Non-Capturing Groups

(?:abc)    — Groups without capturing (for grouping only, no $1 reference)

Named Groups

(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
// Named groups: year, month, day

Alternation (OR)

cat|dog       — Matches "cat" or "dog"
yes|no|maybe  — Matches "yes", "no", or "maybe"
(jpg|png|gif) — Matches image extensions

Lookaheads and Lookbehinds

Zero-width assertions that check what comes before or after a position without consuming characters:

\d+(?= dollars)    — Positive lookahead: digits followed by " dollars"
\d+(?! dollars)    — Negative lookahead: digits NOT followed by " dollars"
(?<=₹)\d+         — Positive lookbehind: digits preceded by "₹"
(?<!₹)\d+         — Negative lookbehind: digits NOT preceded by "₹"

Regex Flags

g  — Global: find all matches (not just the first)
i  — Case-insensitive: "Hello" matches "hello", "HELLO"
m  — Multiline: ^ and $ match start/end of each line
s  — Dotall: . matches newlines too

Practical Regex Examples

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

// Matches: arjun@example.com, user.name+tag@domain.co.in
// Rejects: @example.com, user@, plainaddress

Indian Mobile Number

^[6-9]\d{9}$

// Matches: 9876543210, 6512345678
// Rejects: 1234567890 (doesn't start with 6-9), 98765432 (too short)

URL Validation

^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{2,256}\.[a-z]{2,6}\b[-a-zA-Z0-9@:%_+.~#?&\/=]*$

Strong Password

^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
// At least 8 chars, 1 uppercase, 1 lowercase, 1 digit, 1 special char

Extract All HTML Tags

<[^>]+>

Find IP Addresses

\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b

Summary

Regular expressions are immensely powerful once you understand the building blocks: literals, character classes, quantifiers, groups, anchors and flags. Practice with real text to build confidence. Use our free Regex Tester to test your patterns live in your browser — see all matches, capture groups and positions instantly.

Regular Expressions Guide: Patterns, Flags and Examples