JavaScript Regular Expressions: Pattern Matching Guide

Regular expressions (regex) are powerful patterns used for matching, searching, and manipulating text. JavaScript provides robust regex support for text processing and validation.

Introduction to Regular Expressions

Regular expressions are patterns that describe sets of strings. They're used for pattern matching, validation, and text manipulation.

// Creating regular expressions
// Literal notation
const pattern1 = /hello/;
const pattern2 = /hello/gi; // With flags

// Constructor notation
const pattern3 = new RegExp('hello');
const pattern4 = new RegExp('hello', 'gi');

// Dynamic patterns
const searchTerm = 'world';
const dynamicPattern = new RegExp(searchTerm, 'i');

// Testing patterns
console.log(pattern1.test('hello world')); // true
console.log(pattern1.test('Hello world')); // false (case sensitive)
console.log(pattern2.test('Hello world')); // true (case insensitive)

Basic Pattern Matching

Character Classes

// Literal characters
/cat/.test('cat'); // true
/cat/.test('concatenate'); // true

// Character classes
/[aeiou]/.test('hello'); // true (contains vowel)
/[0-9]/.test('abc123'); // true (contains digit)
/[a-z]/.test('Hello'); // true (contains lowercase)
/[A-Z]/.test('Hello'); // true (contains uppercase)
/[a-zA-Z]/.test('123'); // false (no letters)

// Negated character classes
/[^aeiou]/.test('xyz'); // true (contains non-vowel)
/[^0-9]/.test('123'); // false (only digits)

// Predefined character classes
/\d/.test('123'); // true (digit)
/\D/.test('abc'); // true (non-digit)
/\w/.test('hello_123'); // true (word character)
/\W/.test('!@#'); // true (non-word character)
/\s/.test('hello world'); // true (whitespace)
/\S/.test('   '); // false (only whitespace)

// Dot matches any character except newline
/./.test('a'); // true
/./.test('\n'); // false
/[\s\S]/.test('\n'); // true (matches anything including newline)

Quantifiers

// Basic quantifiers
/a*/.test(''); // true (0 or more)
/a+/.test(''); // false (1 or more)
/a?/.test(''); // true (0 or 1)
/a{3}/.test('aaa'); // true (exactly 3)
/a{2,4}/.test('aaa'); // true (2 to 4)
/a{2,}/.test('aa'); // true (2 or more)

// Greedy vs lazy quantifiers
const text = '<div>content</div>';
/<.*>/.exec(text)[0]; // '<div>content</div>' (greedy)
/<.*?>/.exec(text)[0]; // '<div>' (lazy)

// Common patterns
const patterns = {
  // Phone number formats
  phone: /\d{3}-\d{3}-\d{4}/,
  phoneAlt: /\(\d{3}\) \d{3}-\d{4}/,

  // Email (simplified)
  email: /[\w._%+-]+@[\w.-]+\.[A-Za-z]{2,}/,

  // URL (simplified)
  url: /https?:\/\/(www\.)?[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?/,

  // Date formats
  dateUS: /\d{1,2}\/\d{1,2}\/\d{4}/,
  dateISO: /\d{4}-\d{2}-\d{2}/,
};

// Testing patterns
console.log(patterns.phone.test('123-456-7890')); // true
console.log(patterns.email.test('user@example.com')); // true

Anchors and Boundaries

// Start and end anchors
/^hello/.test('hello world'); // true (starts with)
/world$/.test('hello world'); // true (ends with)
/^hello$/.test('hello'); // true (exact match)
/^hello$/.test('hello world'); // false

// Word boundaries
/\bcat\b/.test('cat'); // true
/\bcat\b/.test('concatenate'); // false
/\bcat\b/.test('the cat sat'); // true
/\Bcat/.test('concatenate'); // true (non-boundary)

// Multiline mode
const multiline = `first line
second line
third line`;

// Without multiline flag
/^second/.test(multiline); // false

// With multiline flag
/^second/m.test(multiline); // true
/line$/m.test(multiline); // true

// Line break patterns
/\n/.test(multiline); // true
/\r\n|\r|\n/.test('Windows\r\nUnix\nMac\r'); // true

Groups and Capturing

Basic Groups

// Capturing groups
const pattern = /(\d{4})-(\d{2})-(\d{2})/;
const match = pattern.exec('2024-03-15');
console.log(match[0]); // '2024-03-15' (full match)
console.log(match[1]); // '2024' (first group)
console.log(match[2]); // '03' (second group)
console.log(match[3]); // '15' (third group)

// Non-capturing groups
const nonCapturing = /(?:https?|ftp):\/\//;
'https://'.match(nonCapturing); // ['https://'] (no groups)

// Named capturing groups
const namedPattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const namedMatch = namedPattern.exec('2024-03-15');
console.log(namedMatch.groups.year); // '2024'
console.log(namedMatch.groups.month); // '03'
console.log(namedMatch.groups.day); // '15'

// Backreferences
const duplicate = /(\w+) \1/; // Matches repeated words
duplicate.test('the the'); // true
duplicate.test('the cat'); // false

// Named backreferences
const quote = /(?<quote>['"]).*?\k<quote>/;
quote.test('"hello"'); // true
quote.test("'world'"); // true
quote.test('"mixed'); // false

Alternation

// Basic alternation
/cat|dog/.test('cat'); // true
/cat|dog/.test('dog'); // true
/cat|dog/.test('bird'); // false

// Grouping with alternation
/gr(a|e)y/.test('gray'); // true
/gr(a|e)y/.test('grey'); // true

// Complex patterns
const fileExtension = /\.(jpg|jpeg|png|gif|webp)$/i;
fileExtension.test('image.jpg'); // true
fileExtension.test('image.PNG'); // true
fileExtension.test('image.pdf'); // false

// Multiple options
const sizes = /^(small|medium|large|x-large|xx-large)$/;
const colors = /^(red|green|blue|rgb\(\d+,\s*\d+,\s*\d+\)|#[0-9a-f]{6})$/i;

String Methods with Regex

Search and Match

// String.match()
const text = 'The price is $10.99 and $25.50';

// Without g flag - returns first match with groups
const firstMatch = text.match(/\$(\d+\.\d{2})/);
console.log(firstMatch[0]); // '$10.99'
console.log(firstMatch[1]); // '10.99'

// With g flag - returns all matches (no groups)
const allMatches = text.match(/\$\d+\.\d{2}/g);
console.log(allMatches); // ['$10.99', '$25.50']

// String.matchAll() - returns iterator with all matches and groups
const matches = [...text.matchAll(/\$(\d+\.\d{2})/g)];
matches.forEach((match) => {
  console.log(`Found ${match[0]} with value ${match[1]}`);
});

// String.search() - returns index of first match
const index = text.search(/\$\d+/);
console.log(index); // 13

Replace Operations

// Basic replace
const text = 'Hello World';
text.replace(/world/i, 'JavaScript'); // 'Hello JavaScript'

// Global replace
const repeated = 'cat cat cat';
repeated.replace(/cat/g, 'dog'); // 'dog dog dog'

// Using capture groups
const date = '2024-03-15';
date.replace(/(\d{4})-(\d{2})-(\d{2})/, '$2/$3/$1'); // '03/15/2024'

// Function replacer
const prices = 'Items cost $10, $20, and $30';
const updated = prices.replace(/\$(\d+)/g, (match, price) => {
  return `$${parseInt(price) * 1.1}`; // 10% increase
});
console.log(updated); // 'Items cost $11, $22, and $33'

// Named groups in replacement
const swap = 'John Doe';
swap.replace(/(?<first>\w+) (?<last>\w+)/, '$<last>, $<first>'); // 'Doe, John'

// Complex transformations
function titleCase(str) {
  return str.replace(/\b\w+/g, (word) => {
    return word.charAt(0).toUpperCase() + word.slice(1).toLowerCase();
  });
}

console.log(titleCase('hello world from javascript')); // 'Hello World From Javascript'

Split Operations

// Basic split
'a,b,c'.split(/,/); // ['a', 'b', 'c']
'a, b , c'.split(/\s*,\s*/); // ['a', 'b', 'c'] (trim spaces)

// Split with limit
'a,b,c,d,e'.split(/,/, 3); // ['a', 'b', 'c']

// Split with capturing groups
'a1b2c3'.split(/(\d)/); // ['a', '1', 'b', '2', 'c', '3', '']

// Complex splitting
const text = 'Hello. How are you? I am fine!';
const sentences = text.split(/[.!?]+\s*/);
console.log(sentences); // ['Hello', 'How are you', 'I am fine', '']

// CSV parsing
function parseCSV(csv) {
  return csv.split('\n').map((row) => {
    return row.split(/,(?=(?:[^"]*"[^"]*")*[^"]*$)/);
  });
}

const csv = 'name,age,city\n"Doe, John",30,"New York"';
console.log(parseCSV(csv));

Advanced Patterns

Lookahead and Lookbehind

// Positive lookahead (?=)
const passwordPattern = /^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$/;
// Requires uppercase, lowercase, digit, min 8 chars
passwordPattern.test('Pass123word'); // true
passwordPattern.test('password123'); // false (no uppercase)

// Negative lookahead (?!)
const notEndingWith = /\w+(?!\.com)/;
'example'.match(notEndingWith); // ['example']
'test.com'.match(notEndingWith); // ['test'] (not including .com)

// Positive lookbehind (?<=)
const afterDollar = /(?<=\$)\d+\.\d{2}/;
'Price: $10.99'.match(afterDollar); // ['10.99']

// Negative lookbehind (?<!)
const notAfterDollar = /(?<!\$)\d+\.\d{2}/;
'10.99 vs $10.99'.match(notAfterDollar); // ['10.99'] (first one)

// Complex password validation
const strongPassword =
  /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;

// Email validation with lookahead
const emailPattern = /^(?!.*\.\.)([\w.%+-]+)@([\w.-]+\.[A-Za-z]{2,})$/;

Unicode and Special Characters

// Unicode property escapes
/\p{Letter}/u.test('A'); // true
/\p{Number}/u.test('5'); // true
/\p{Emoji}/u.test('😀'); // true
/\p{Script=Greek}/u.test('Ω'); // true

// Unicode categories
/\p{Uppercase_Letter}/u.test('A'); // true
/\p{Lowercase_Letter}/u.test('a'); // true
/\p{Currency_Symbol}/u.test('$'); // true

// Matching emojis
const emojiPattern = /\p{Emoji_Presentation}/gu;
const text = 'Hello 😀 World 🌍!';
const emojis = text.match(emojiPattern);
console.log(emojis); // ['😀', '🌍']

// Escaping special characters
function escapeRegex(string) {
  return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

const userInput = 'Price is $10.99 (on sale)';
const escaped = escapeRegex(userInput);
const pattern = new RegExp(escaped);
console.log(pattern.test('Price is $10.99 (on sale)')); // true

Practical Examples

Form Validation

class FormValidator {
  constructor() {
    this.patterns = {
      email: /^[\w._%+-]+@[\w.-]+\.[A-Za-z]{2,}$/,
      phone: /^\+?1?\s*\(?\d{3}\)?[-\s.]?\d{3}[-\s.]?\d{4}$/,
      zip: /^\d{5}(-\d{4})?$/,
      username: /^[a-zA-Z0-9_]{3,20}$/,
      password:
        /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/,
      url: /^(https?:\/\/)?(www\.)?[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?$/,
      creditCard: /^\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}$/,
      date: /^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/(19|20)\d{2}$/,
    };
  }

  validate(type, value) {
    if (!this.patterns[type]) {
      throw new Error(`Unknown validation type: ${type}`);
    }
    return this.patterns[type].test(value);
  }

  getErrorMessage(type, value) {
    if (this.validate(type, value)) return null;

    const messages = {
      email: 'Please enter a valid email address',
      phone: 'Please enter a valid phone number',
      zip: 'Please enter a valid ZIP code',
      username:
        'Username must be 3-20 characters, letters, numbers, and underscores only',
      password:
        'Password must be at least 8 characters with uppercase, lowercase, number, and special character',
      url: 'Please enter a valid URL',
      creditCard: 'Please enter a valid credit card number',
      date: 'Please enter a date in MM/DD/YYYY format',
    };

    return messages[type] || 'Invalid input';
  }

  sanitize(type, value) {
    switch (type) {
      case 'phone':
        return value.replace(/\D/g, '');
      case 'creditCard':
        return value.replace(/[\s-]/g, '');
      default:
        return value;
    }
  }
}

// Usage
const validator = new FormValidator();
console.log(validator.validate('email', 'user@example.com')); // true
console.log(validator.getErrorMessage('password', 'weak')); // Error message

Text Processing

class TextProcessor {
  // Extract all URLs from text
  extractURLs(text) {
    const urlPattern =
      /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)/gi;
    return text.match(urlPattern) || [];
  }

  // Extract hashtags
  extractHashtags(text) {
    const hashtagPattern = /#\w+/g;
    return text.match(hashtagPattern) || [];
  }

  // Extract mentions
  extractMentions(text) {
    const mentionPattern = /@\w+/g;
    return text.match(mentionPattern) || [];
  }

  // Clean text
  cleanText(text) {
    return text
      .replace(/[^\w\s]/gi, '') // Remove special characters
      .replace(/\s+/g, ' ') // Normalize whitespace
      .trim();
  }

  // Highlight search terms
  highlight(text, searchTerms) {
    const pattern = new RegExp(
      `(${searchTerms.map(escapeRegex).join('|')})`,
      'gi'
    );
    return text.replace(pattern, '<mark>$1</mark>');
  }

  // Word frequency
  wordFrequency(text) {
    const words = text.toLowerCase().match(/\b\w+\b/g) || [];
    return words.reduce((freq, word) => {
      freq[word] = (freq[word] || 0) + 1;
      return freq;
    }, {});
  }

  // Smart truncate
  truncate(text, maxLength, suffix = '...') {
    if (text.length <= maxLength) return text;

    const truncated = text.substr(0, maxLength - suffix.length);
    // Find last complete word
    const lastSpace = truncated.lastIndexOf(' ');
    return truncated.substr(0, lastSpace) + suffix;
  }
}

// Markdown parser example
class SimpleMarkdownParser {
  constructor() {
    this.rules = [
      { pattern: /^### (.+)$/gm, replacement: '<h3>$1</h3>' },
      { pattern: /^## (.+)$/gm, replacement: '<h2>$1</h2>' },
      { pattern: /^# (.+)$/gm, replacement: '<h1>$1</h1>' },
      { pattern: /\*\*(.+?)\*\*/g, replacement: '<strong>$1</strong>' },
      { pattern: /\*(.+?)\*/g, replacement: '<em>$1</em>' },
      {
        pattern: /\[([^\]]+)\]\(([^)]+)\)/g,
        replacement: '<a href="$2">$1</a>',
      },
      { pattern: /`([^`]+)`/g, replacement: '<code>$1</code>' },
      { pattern: /^- (.+)$/gm, replacement: '<li>$1</li>' },
    ];
  }

  parse(markdown) {
    let html = markdown;

    this.rules.forEach((rule) => {
      html = html.replace(rule.pattern, rule.replacement);
    });

    // Wrap list items in ul
    html = html.replace(/(<li>.*<\/li>\s*)+/g, (match) => {
      return `<ul>${match}</ul>`;
    });

    // Convert line breaks to paragraphs
    html = html
      .split('\n\n')
      .map((para) => {
        if (!para.match(/^<[^>]+>/)) {
          return `<p>${para}</p>`;
        }
        return para;
      })
      .join('\n');

    return html;
  }
}

Data Extraction

// Log file parser
class LogParser {
  parseApacheLog(log) {
    const pattern =
      /^(\S+) \S+ \S+ \[([^\]]+)\] "(\w+) ([^"]+)" (\d+) (\d+|-) "([^"]*)" "([^"]*)"$/;
    const match = log.match(pattern);

    if (!match) return null;

    return {
      ip: match[1],
      timestamp: match[2],
      method: match[3],
      path: match[4],
      status: parseInt(match[5]),
      size: match[6] === '-' ? 0 : parseInt(match[6]),
      referer: match[7],
      userAgent: match[8],
    };
  }

  parseErrorLog(log) {
    const pattern =
      /^\[(\w+) (\w+) (\d+) ([\d:]+) (\d+)\] \[(\w+)\] \[client ([\d.]+)\] (.+)$/;
    const match = log.match(pattern);

    if (!match) return null;

    return {
      timestamp: `${match[2]} ${match[3]} ${match[4]} ${match[5]}`,
      level: match[6],
      clientIP: match[7],
      message: match[8],
    };
  }
}

// CSV parser with quoted fields
function parseCSVLine(line) {
  const pattern = /(?:^|,)("(?:[^"]|"")*"|[^,]*)/g;
  const fields = [];
  let match;

  while ((match = pattern.exec(line)) !== null) {
    let field = match[1];
    if (field.startsWith('"') && field.endsWith('"')) {
      field = field.slice(1, -1).replace(/""/g, '"');
    }
    fields.push(field);
  }

  return fields;
}

// JSON extractor from mixed content
function extractJSON(text) {
  const jsonPattern = /{[^{}]*(?:{[^{}]*}[^{}]*)*}/g;
  const matches = text.match(jsonPattern) || [];
  const validJSON = [];

  matches.forEach((match) => {
    try {
      const parsed = JSON.parse(match);
      validJSON.push(parsed);
    } catch (e) {
      // Not valid JSON
    }
  });

  return validJSON;
}

Performance Optimization

// Compile once, use many times
class RegexCache {
  constructor() {
    this.cache = new Map();
  }

  get(pattern, flags = '') {
    const key = `${pattern}:::${flags}`;

    if (!this.cache.has(key)) {
      this.cache.set(key, new RegExp(pattern, flags));
    }

    return this.cache.get(key);
  }

  test(pattern, string, flags) {
    return this.get(pattern, flags).test(string);
  }

  exec(pattern, string, flags) {
    return this.get(pattern, flags).exec(string);
  }
}

// Optimize complex patterns
function optimizeEmailValidation(email) {
  // Quick checks before regex
  if (!email || email.length < 3) return false;
  if (!email.includes('@')) return false;
  if (email.startsWith('@') || email.endsWith('@')) return false;

  // Now use regex for detailed validation
  return /^[\w._%+-]+@[\w.-]+\.[A-Za-z]{2,}$/.test(email);
}

// Benchmark regex operations
function benchmarkRegex(pattern, testStrings, iterations = 10000) {
  const regex = new RegExp(pattern);
  const results = {};

  testStrings.forEach((str) => {
    const start = performance.now();

    for (let i = 0; i < iterations; i++) {
      regex.test(str);
    }

    const end = performance.now();
    results[str] = (end - start) / iterations;
  });

  return results;
}

Best Practices

Use the right tool

// Simple string operations are faster
// Bad: Using regex for simple checks
/^hello/.test(str);

// Good: Using string methods
str.startsWith('hello');

Escape user input

function escapeRegex(str) {
  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

const userInput = 'user.name';
const pattern = new RegExp(escapeRegex(userInput));

Use non-capturing groups when appropriate

// Bad: Capturing when not needed
/(https?|ftp):\/\//

// Good: Non-capturing group
/(?:https?|ftp):\/\//

Be careful with global flag

const regex = /test/g;
console.log(regex.test('test')); // true
console.log(regex.test('test')); // false (lastIndex changed)

// Reset lastIndex or create new regex
regex.lastIndex = 0;

Conclusion

Regular expressions are powerful tools for text processing:

Pattern matching for validation and search
Text manipulation with replace and split
Data extraction from structured text
Advanced features like lookarounds and Unicode
Performance considerations for optimization

Key takeaways:

Start simple and build complexity gradually
Test patterns thoroughly with edge cases
Use online regex testers for debugging
Consider performance for complex patterns
Escape user input to prevent injection
Document complex patterns for maintainability

Master regular expressions to handle complex text processing tasks efficiently!