Regular expressions โ commonly called regex or regexp โ are one of the most powerful and versatile tools in a developer's toolkit. A regex is a sequence of characters that defines a search pattern. In this guide you will learn regex from the ground up, with practical examples you can use immediately.
What are Regular Expressions?
A regular expression is a pattern that describes a set of strings. They are used for searching, validating, extracting, and replacing text. Every major programming language supports regex โ JavaScript, Python, Java, PHP, Go, Ruby โ and the core syntax is largely the same across all of them.
For example, the regex \d{3}-\d{4} matches any 7-digit phone number formatted as 123-4567.
Basic Characters and Literals
Any letter, digit, or symbol that is not a special character matches itself literally:
hello โ Matches the exact string "hello"
abc123 โ Matches "abc123"
bytekit โ Matches "bytekit"
Special characters (called metacharacters) have special meaning and must be escaped with a backslash \ to match literally:
. ^ $ * + ? { } [ ] \ | ( )
Character Classes
Character classes match any one character from a set:
[abc] โ Matches 'a', 'b', or 'c'
[a-z] โ Any lowercase letter
[A-Z] โ Any uppercase letter
[0-9] โ Any digit
[a-zA-Z] โ Any letter (upper or lower)
[^abc] โ Any character EXCEPT 'a', 'b', 'c' (negation with ^)
[a-z0-9] โ Any lowercase letter or digit
Shorthand Character Classes
\d โ Any digit [0-9]
\D โ Any non-digit [^0-9]
\w โ Any word character [a-zA-Z0-9_]
\W โ Any non-word character
\s โ Any whitespace (space, tab, newline)
\S โ Any non-whitespace
. โ Any character EXCEPT newline (with default flags)
Anchors
Anchors do not match characters โ they match positions:
^ โ Start of string (or line with multiline flag)
$ โ End of string (or line with multiline flag)
\b โ Word boundary (position between word and non-word char)
\B โ Non-word boundary
^hello โ "hello" only at the start of the string
world$ โ "world" only at the end of the string
^hello$ โ Exactly "hello" โ nothing before or after
\bcat\b โ The word "cat" as a whole word (not "concatenate")
Quantifiers
Quantifiers specify how many times a pattern should repeat:
* โ Zero or more times
+ โ One or more times
? โ Zero or one time (optional)
{n} โ Exactly n times
{n,} โ At least n times
{n,m} โ Between n and m times
\d+ โ One or more digits
\d* โ Zero or more digits
\d? โ Zero or one digit
\d{4} โ Exactly 4 digits (e.g., a year)
\d{2,4} โ Between 2 and 4 digits
Greedy vs Lazy
By default quantifiers are greedy โ they match as much as possible. Add ? after any quantifier to make it lazy (match as little as possible):
<.+> โ Greedy: matches "<b>bold</b>" as one match
<.+?> โ Lazy: matches "<b>" and "</b>" separately
Groups and Capturing
Parentheses () create groups. Groups capture the matched text for later use:
(\d{4})-(\d{2})-(\d{2}) โ Captures year, month, day from a date
// Input: "2025-01-15"
// Group 1: "2025"
// Group 2: "01"
// Group 3: "15"
Non-Capturing Groups
(?:abc) โ Groups without capturing (for grouping only, no $1 reference)
Named Groups
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
// Named groups: year, month, day
Alternation (OR)
cat|dog โ Matches "cat" or "dog"
yes|no|maybe โ Matches "yes", "no", or "maybe"
(jpg|png|gif) โ Matches image extensions
Lookaheads and Lookbehinds
Zero-width assertions that check what comes before or after a position without consuming characters:
\d+(?= dollars) โ Positive lookahead: digits followed by " dollars"
\d+(?! dollars) โ Negative lookahead: digits NOT followed by " dollars"
(?<=โน)\d+ โ Positive lookbehind: digits preceded by "โน"
(?<!โน)\d+ โ Negative lookbehind: digits NOT preceded by "โน"
Regex Flags
g โ Global: find all matches (not just the first)
i โ Case-insensitive: "Hello" matches "hello", "HELLO"
m โ Multiline: ^ and $ match start/end of each line
s โ Dotall: . matches newlines too
Practical Regex Examples
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
// Matches: arjun@example.com, user.name+tag@domain.co.in
// Rejects: @example.com, user@, plainaddress
Indian Mobile Number
^[6-9]\d{9}$
// Matches: 9876543210, 6512345678
// Rejects: 1234567890 (doesn't start with 6-9), 98765432 (too short)
URL Validation
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{2,256}\.[a-z]{2,6}\b[-a-zA-Z0-9@:%_+.~#?&\/=]*$
Strong Password
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
// At least 8 chars, 1 uppercase, 1 lowercase, 1 digit, 1 special char
Extract All HTML Tags
<[^>]+>
Find IP Addresses
\b(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\b
Summary
Regular expressions are immensely powerful once you understand the building blocks: literals, character classes, quantifiers, groups, anchors and flags. Practice with real text to build confidence. Use our free Regex Tester to test your patterns live in your browser โ see all matches, capture groups and positions instantly.