Advanced JavaScriptFeatured
JavaScript Regular Expressions: Pattern Matching Guide
Master regular expressions in JavaScript. Learn pattern matching, regex methods, capturing groups, and advanced techniques.
By JavaScriptDoc Team•
regexregular expressionspattern matchingjavascriptvalidation
JavaScript Regular Expressions: Pattern Matching Guide
Regular expressions (regex) are powerful patterns used for matching, searching, and manipulating text. JavaScript provides robust regex support for text processing and validation.
Introduction to Regular Expressions
Regular expressions are patterns that describe sets of strings. They're used for pattern matching, validation, and text manipulation.
// Creating regular expressions
// Literal notation
const pattern1 = /hello/;
const pattern2 = /hello/gi; // With flags
// Constructor notation
const pattern3 = new RegExp('hello');
const pattern4 = new RegExp('hello', 'gi');
// Dynamic patterns
const searchTerm = 'world';
const dynamicPattern = new RegExp(searchTerm, 'i');
// Testing patterns
console.log(pattern1.test('hello world')); // true
console.log(pattern1.test('Hello world')); // false (case sensitive)
console.log(pattern2.test('Hello world')); // true (case insensitive)
Basic Pattern Matching
Character Classes
// Literal characters
/cat/.test('cat'); // true
/cat/.test('concatenate'); // true
// Character classes
/[aeiou]/.test('hello'); // true (contains vowel)
/[0-9]/.test('abc123'); // true (contains digit)
/[a-z]/.test('Hello'); // true (contains lowercase)
/[A-Z]/.test('Hello'); // true (contains uppercase)
/[a-zA-Z]/.test('123'); // false (no letters)
// Negated character classes
/[^aeiou]/.test('xyz'); // true (contains non-vowel)
/[^0-9]/.test('123'); // false (only digits)
// Predefined character classes
/\d/.test('123'); // true (digit)
/\D/.test('abc'); // true (non-digit)
/\w/.test('hello_123'); // true (word character)
/\W/.test('!@#'); // true (non-word character)
/\s/.test('hello world'); // true (whitespace)
/\S/.test(' '); // false (only whitespace)
// Dot matches any character except newline
/./.test('a'); // true
/./.test('\n'); // false
/[\s\S]/.test('\n'); // true (matches anything including newline)
Quantifiers
// Basic quantifiers
/a*/.test(''); // true (0 or more)
/a+/.test(''); // false (1 or more)
/a?/.test(''); // true (0 or 1)
/a{3}/.test('aaa'); // true (exactly 3)
/a{2,4}/.test('aaa'); // true (2 to 4)
/a{2,}/.test('aa'); // true (2 or more)
// Greedy vs lazy quantifiers
const text = '<div>content</div>';
/<.*>/.exec(text)[0]; // '<div>content</div>' (greedy)
/<.*?>/.exec(text)[0]; // '<div>' (lazy)
// Common patterns
const patterns = {
// Phone number formats
phone: /\d{3}-\d{3}-\d{4}/,
phoneAlt: /\(\d{3}\) \d{3}-\d{4}/,
// Email (simplified)
email: /[\w._%+-]+@[\w.-]+\.[A-Za-z]{2,}/,
// URL (simplified)
url: /https?:\/\/(www\.)?[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?/,
// Date formats
dateUS: /\d{1,2}\/\d{1,2}\/\d{4}/,
dateISO: /\d{4}-\d{2}-\d{2}/,
};
// Testing patterns
console.log(patterns.phone.test('123-456-7890')); // true
console.log(patterns.email.test('user@example.com')); // true
Anchors and Boundaries
// Start and end anchors
/^hello/.test('hello world'); // true (starts with)
/world$/.test('hello world'); // true (ends with)
/^hello$/.test('hello'); // true (exact match)
/^hello$/.test('hello world'); // false
// Word boundaries
/\bcat\b/.test('cat'); // true
/\bcat\b/.test('concatenate'); // false
/\bcat\b/.test('the cat sat'); // true
/\Bcat/.test('concatenate'); // true (non-boundary)
// Multiline mode
const multiline = `first line
second line
third line`;
// Without multiline flag
/^second/.test(multiline); // false
// With multiline flag
/^second/m.test(multiline); // true
/line$/m.test(multiline); // true
// Line break patterns
/\n/.test(multiline); // true
/\r\n|\r|\n/.test('Windows\r\nUnix\nMac\r'); // true
Groups and Capturing
Basic Groups
// Capturing groups
const pattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = pattern.exec('2024-03-15');
console.log(match[0]); // '2024-03-15' (full match)
console.log(match[1]); // '2024' (first group)
console.log(match[2]); // '03' (second group)
console.log(match[3]); // '15' (third group)
// Non-capturing groups
const nonCapturing = /(?:https?|ftp):\/\//;
'https://'.match(nonCapturing); // ['https://'] (no groups)
// Named capturing groups
const namedPattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const namedMatch = namedPattern.exec('2024-03-15');
console.log(namedMatch.groups.year); // '2024'
console.log(namedMatch.groups.month); // '03'
console.log(namedMatch.groups.day); // '15'
// Backreferences
const duplicate = /(\w+) \1/; // Matches repeated words
duplicate.test('the the'); // true
duplicate.test('the cat'); // false
// Named backreferences
const quote = /(?<quote>['"]).*?\k<quote>/;
quote.test('"hello"'); // true
quote.test("'world'"); // true
quote.test('"mixed'); // false
Alternation
// Basic alternation
/cat|dog/.test('cat'); // true
/cat|dog/.test('dog'); // true
/cat|dog/.test('bird'); // false
// Grouping with alternation
/gr(a|e)y/.test('gray'); // true
/gr(a|e)y/.test('grey'); // true
// Complex patterns
const fileExtension = /\.(jpg|jpeg|png|gif|webp)$/i;
fileExtension.test('image.jpg'); // true
fileExtension.test('image.PNG'); // true
fileExtension.test('image.pdf'); // false
// Multiple options
const sizes = /^(small|medium|large|x-large|xx-large)$/;
const colors = /^(red|green|blue|rgb\(\d+,\s*\d+,\s*\d+\)|#[0-9a-f]{6})$/i;
String Methods with Regex
Search and Match
// String.match()
const text = 'The price is $10.99 and $25.50';
// Without g flag - returns first match with groups
const firstMatch = text.match(/\$(\d+\.\d{2})/);
console.log(firstMatch[0]); // '$10.99'
console.log(firstMatch[1]); // '10.99'
// With g flag - returns all matches (no groups)
const allMatches = text.match(/\$\d+\.\d{2}/g);
console.log(allMatches); // ['$10.99', '$25.50']
// String.matchAll() - returns iterator with all matches and groups
const matches = [...text.matchAll(/\$(\d+\.\d{2})/g)];
matches.forEach((match) => {
console.log(`Found ${match[0]} with value ${match[1]}`);
});
// String.search() - returns index of first match
const index = text.search(/\$\d+/);
console.log(index); // 13
Replace Operations
// Basic replace
const text = 'Hello World';
text.replace(/world/i, 'JavaScript'); // 'Hello JavaScript'
// Global replace
const repeated = 'cat cat cat';
repeated.replace(/cat/g, 'dog'); // 'dog dog dog'
// Using capture groups
const date = '2024-03-15';
date.replace(/(\d{4})-(\d{2})-(\d{2})/, '$2/$3/$1'); // '03/15/2024'
// Function replacer
const prices = 'Items cost $10, $20, and $30';
const updated = prices.replace(/\$(\d+)/g, (match, price) => {
return `$${parseInt(price) * 1.1}`; // 10% increase
});
console.log(updated); // 'Items cost $11, $22, and $33'
// Named groups in replacement
const swap = 'John Doe';
swap.replace(/(?<first>\w+) (?<last>\w+)/, '$<last>, $<first>'); // 'Doe, John'
// Complex transformations
function titleCase(str) {
return str.replace(/\b\w+/g, (word) => {
return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
});
}
console.log(titleCase('hello world from javascript')); // 'Hello World From Javascript'
Split Operations
// Basic split
'a,b,c'.split(/,/); // ['a', 'b', 'c']
'a, b , c'.split(/\s*,\s*/); // ['a', 'b', 'c'] (trim spaces)
// Split with limit
'a,b,c,d,e'.split(/,/, 3); // ['a', 'b', 'c']
// Split with capturing groups
'a1b2c3'.split(/(\d)/); // ['a', '1', 'b', '2', 'c', '3', '']
// Complex splitting
const text = 'Hello. How are you? I am fine!';
const sentences = text.split(/[.!?]+\s*/);
console.log(sentences); // ['Hello', 'How are you', 'I am fine', '']
// CSV parsing
function parseCSV(csv) {
return csv.split('\n').map((row) => {
return row.split(/,(?=(?:[^"]*"[^"]*")*[^"]*$)/);
});
}
const csv = 'name,age,city\n"Doe, John",30,"New York"';
console.log(parseCSV(csv));
Advanced Patterns
Lookahead and Lookbehind
// Positive lookahead (?=)
const passwordPattern = /^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$/;
// Requires uppercase, lowercase, digit, min 8 chars
passwordPattern.test('Pass123word'); // true
passwordPattern.test('password123'); // false (no uppercase)
// Negative lookahead (?!)
const notEndingWith = /\w+(?!\.com)/;
'example'.match(notEndingWith); // ['example']
'test.com'.match(notEndingWith); // ['test'] (not including .com)
// Positive lookbehind (?<=)
const afterDollar = /(?<=\$)\d+\.\d{2}/;
'Price: $10.99'.match(afterDollar); // ['10.99']
// Negative lookbehind (?<!)
const notAfterDollar = /(?<!\$)\d+\.\d{2}/;
'10.99 vs $10.99'.match(notAfterDollar); // ['10.99'] (first one)
// Complex password validation
const strongPassword =
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;
// Email validation with lookahead
const emailPattern = /^(?!.*\.\.)([\w.%+-]+)@([\w.-]+\.[A-Za-z]{2,})$/;
Unicode and Special Characters
// Unicode property escapes
/\p{Letter}/u.test('A'); // true
/\p{Number}/u.test('5'); // true
/\p{Emoji}/u.test('😀'); // true
/\p{Script=Greek}/u.test('Ω'); // true
// Unicode categories
/\p{Uppercase_Letter}/u.test('A'); // true
/\p{Lowercase_Letter}/u.test('a'); // true
/\p{Currency_Symbol}/u.test('$'); // true
// Matching emojis
const emojiPattern = /\p{Emoji_Presentation}/gu;
const text = 'Hello 😀 World 🌍!';
const emojis = text.match(emojiPattern);
console.log(emojis); // ['😀', '🌍']
// Escaping special characters
function escapeRegex(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}
const userInput = 'Price is $10.99 (on sale)';
const escaped = escapeRegex(userInput);
const pattern = new RegExp(escaped);
console.log(pattern.test('Price is $10.99 (on sale)')); // true
Practical Examples
Form Validation
class FormValidator {
constructor() {
this.patterns = {
email: /^[\w._%+-]+@[\w.-]+\.[A-Za-z]{2,}$/,
phone: /^\+?1?\s*\(?\d{3}\)?[-\s.]?\d{3}[-\s.]?\d{4}$/,
zip: /^\d{5}(-\d{4})?$/,
username: /^[a-zA-Z0-9_]{3,20}$/,
password:
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/,
url: /^(https?:\/\/)?(www\.)?[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?$/,
creditCard: /^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$/,
date: /^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/(19|20)\d{2}$/,
};
}
validate(type, value) {
if (!this.patterns[type]) {
throw new Error(`Unknown validation type: ${type}`);
}
return this.patterns[type].test(value);
}
getErrorMessage(type, value) {
if (this.validate(type, value)) return null;
const messages = {
email: 'Please enter a valid email address',
phone: 'Please enter a valid phone number',
zip: 'Please enter a valid ZIP code',
username:
'Username must be 3-20 characters, letters, numbers, and underscores only',
password:
'Password must be at least 8 characters with uppercase, lowercase, number, and special character',
url: 'Please enter a valid URL',
creditCard: 'Please enter a valid credit card number',
date: 'Please enter a date in MM/DD/YYYY format',
};
return messages[type] || 'Invalid input';
}
sanitize(type, value) {
switch (type) {
case 'phone':
return value.replace(/\D/g, '');
case 'creditCard':
return value.replace(/[\s-]/g, '');
default:
return value;
}
}
}
// Usage
const validator = new FormValidator();
console.log(validator.validate('email', 'user@example.com')); // true
console.log(validator.getErrorMessage('password', 'weak')); // Error message
Text Processing
class TextProcessor {
// Extract all URLs from text
extractURLs(text) {
const urlPattern =
/https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)/gi;
return text.match(urlPattern) || [];
}
// Extract hashtags
extractHashtags(text) {
const hashtagPattern = /#\w+/g;
return text.match(hashtagPattern) || [];
}
// Extract mentions
extractMentions(text) {
const mentionPattern = /@\w+/g;
return text.match(mentionPattern) || [];
}
// Clean text
cleanText(text) {
return text
.replace(/[^\w\s]/gi, '') // Remove special characters
.replace(/\s+/g, ' ') // Normalize whitespace
.trim();
}
// Highlight search terms
highlight(text, searchTerms) {
const pattern = new RegExp(
`(${searchTerms.map(escapeRegex).join('|')})`,
'gi'
);
return text.replace(pattern, '<mark>$1</mark>');
}
// Word frequency
wordFrequency(text) {
const words = text.toLowerCase().match(/\b\w+\b/g) || [];
return words.reduce((freq, word) => {
freq[word] = (freq[word] || 0) + 1;
return freq;
}, {});
}
// Smart truncate
truncate(text, maxLength, suffix = '...') {
if (text.length <= maxLength) return text;
const truncated = text.substr(0, maxLength - suffix.length);
// Find last complete word
const lastSpace = truncated.lastIndexOf(' ');
return truncated.substr(0, lastSpace) + suffix;
}
}
// Markdown parser example
class SimpleMarkdownParser {
constructor() {
this.rules = [
{ pattern: /^### (.+)$/gm, replacement: '<h3>$1</h3>' },
{ pattern: /^## (.+)$/gm, replacement: '<h2>$1</h2>' },
{ pattern: /^# (.+)$/gm, replacement: '<h1>$1</h1>' },
{ pattern: /\*\*(.+?)\*\*/g, replacement: '<strong>$1</strong>' },
{ pattern: /\*(.+?)\*/g, replacement: '<em>$1</em>' },
{
pattern: /\[([^\]]+)\]\(([^)]+)\)/g,
replacement: '<a href="$2">$1</a>',
},
{ pattern: /`([^`]+)`/g, replacement: '<code>$1</code>' },
{ pattern: /^- (.+)$/gm, replacement: '<li>$1</li>' },
];
}
parse(markdown) {
let html = markdown;
this.rules.forEach((rule) => {
html = html.replace(rule.pattern, rule.replacement);
});
// Wrap list items in ul
html = html.replace(/(<li>.*<\/li>\s*)+/g, (match) => {
return `<ul>${match}</ul>`;
});
// Convert line breaks to paragraphs
html = html
.split('\n\n')
.map((para) => {
if (!para.match(/^<[^>]+>/)) {
return `<p>${para}</p>`;
}
return para;
})
.join('\n');
return html;
}
}
Data Extraction
// Log file parser
class LogParser {
parseApacheLog(log) {
const pattern =
/^(\S+) \S+ \S+ \[([^\]]+)\] "(\w+) ([^"]+)" (\d+) (\d+|-) "([^"]*)" "([^"]*)"$/;
const match = log.match(pattern);
if (!match) return null;
return {
ip: match[1],
timestamp: match[2],
method: match[3],
path: match[4],
status: parseInt(match[5]),
size: match[6] === '-' ? 0 : parseInt(match[6]),
referer: match[7],
userAgent: match[8],
};
}
parseErrorLog(log) {
const pattern =
/^\[(\w+) (\w+) (\d+) ([\d:]+) (\d+)\] \[(\w+)\] \[client ([\d.]+)\] (.+)$/;
const match = log.match(pattern);
if (!match) return null;
return {
timestamp: `${match[2]} ${match[3]} ${match[4]} ${match[5]}`,
level: match[6],
clientIP: match[7],
message: match[8],
};
}
}
// CSV parser with quoted fields
function parseCSVLine(line) {
const pattern = /(?:^|,)("(?:[^"]|"")*"|[^,]*)/g;
const fields = [];
let match;
while ((match = pattern.exec(line)) !== null) {
let field = match[1];
if (field.startsWith('"') && field.endsWith('"')) {
field = field.slice(1, -1).replace(/""/g, '"');
}
fields.push(field);
}
return fields;
}
// JSON extractor from mixed content
function extractJSON(text) {
const jsonPattern = /{[^{}]*(?:{[^{}]*}[^{}]*)*}/g;
const matches = text.match(jsonPattern) || [];
const validJSON = [];
matches.forEach((match) => {
try {
const parsed = JSON.parse(match);
validJSON.push(parsed);
} catch (e) {
// Not valid JSON
}
});
return validJSON;
}
Performance Optimization
// Compile once, use many times
class RegexCache {
constructor() {
this.cache = new Map();
}
get(pattern, flags = '') {
const key = `${pattern}:::${flags}`;
if (!this.cache.has(key)) {
this.cache.set(key, new RegExp(pattern, flags));
}
return this.cache.get(key);
}
test(pattern, string, flags) {
return this.get(pattern, flags).test(string);
}
exec(pattern, string, flags) {
return this.get(pattern, flags).exec(string);
}
}
// Optimize complex patterns
function optimizeEmailValidation(email) {
// Quick checks before regex
if (!email || email.length < 3) return false;
if (!email.includes('@')) return false;
if (email.startsWith('@') || email.endsWith('@')) return false;
// Now use regex for detailed validation
return /^[\w._%+-]+@[\w.-]+\.[A-Za-z]{2,}$/.test(email);
}
// Benchmark regex operations
function benchmarkRegex(pattern, testStrings, iterations = 10000) {
const regex = new RegExp(pattern);
const results = {};
testStrings.forEach((str) => {
const start = performance.now();
for (let i = 0; i < iterations; i++) {
regex.test(str);
}
const end = performance.now();
results[str] = (end - start) / iterations;
});
return results;
}
Best Practices
-
Use the right tool
// Simple string operations are faster // Bad: Using regex for simple checks /^hello/.test(str); // Good: Using string methods str.startsWith('hello');
-
Escape user input
function escapeRegex(str) { return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); } const userInput = 'user.name'; const pattern = new RegExp(escapeRegex(userInput));
-
Use non-capturing groups when appropriate
// Bad: Capturing when not needed /(https?|ftp):\/\// // Good: Non-capturing group /(?:https?|ftp):\/\//
-
Be careful with global flag
const regex = /test/g; console.log(regex.test('test')); // true console.log(regex.test('test')); // false (lastIndex changed) // Reset lastIndex or create new regex regex.lastIndex = 0;
Conclusion
Regular expressions are powerful tools for text processing:
- Pattern matching for validation and search
- Text manipulation with replace and split
- Data extraction from structured text
- Advanced features like lookarounds and Unicode
- Performance considerations for optimization
Key takeaways:
- Start simple and build complexity gradually
- Test patterns thoroughly with edge cases
- Use online regex testers for debugging
- Consider performance for complex patterns
- Escape user input to prevent injection
- Document complex patterns for maintainability
Master regular expressions to handle complex text processing tasks efficiently!