Regular Expressions (Regex) in Python

Regular expressions are powerful tools for searching, matching, and manipulating strings based on patterns.

1. Introduction to Regular Expressions

  • The re module in Python provides functions for working with regular expressions.
  • You can use regular expressions to:
    • Search for specific patterns within a string.
    • Replace parts of a string that match a pattern.
    • Split strings based on patterns.

2. Basic Functions in the re module

  • re.match(): Checks if the regular expression matches the beginning of the string.
    import re
    result = re.match(r'Hello', 'Hello, world!')
    print(result)  # Output: <re.Match object; span=(0, 5), match='Hello'>
    
  • re.search(): Searches for the first location where the regular expression matches.
    result = re.search(r'world', 'Hello, world!')
    print(result)  # Output: <re.Match object; span=(7, 12), match='world'>
    
  • re.findall(): Returns a list of all non-overlapping matches of the regular expression in the string.
    result = re.findall(r'\d+', 'There are 12 apples and 34 oranges')
    print(result)  # Output: ['12', '34']
    
  • re.sub(): Replaces occurrences of the pattern with a specified string.
    result = re.sub(r'apples', 'bananas', 'There are 12 apples and 34 oranges')
    print(result)  # Output: There are 12 bananas and 34 oranges
    

3. Special Characters in Regular Expressions

  • .: Matches any character except a newline.
  • ^: Matches the start of the string.
  • $: Matches the end of the string.
  • []: Matches any single character inside the brackets.
  • |: Acts as a logical OR operator.
  • \d: Matches any digit (equivalent to [0-9]).
  • \w: Matches any alphanumeric character (equivalent to [a-zA-Z0-9_]).
  • +: Matches 1 or more occurrences of the preceding character or group.
  • *: Matches 0 or more occurrences of the preceding character or group.

4. Examples

  • Extracting digits from a string:
    text = "The price is 50 dollars"
    numbers = re.findall(r'\d+', text)
    print(numbers)  # Output: ['50']
    
  • Checking if a string starts with a certain word:
    text = "Hello, world!"
    if re.match(r'^Hello', text):
        print("String starts with 'Hello'")
    
  • Validating an email address:
    email = "test@example.com"
    if re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', email):
        print("Valid email")
    else:
        print("Invalid email")
    

5. Exercise:

  • Exercise 1: Write a function that extracts all phone numbers from a given string. Assume phone numbers follow the format XXX-XXX-XXXX.
  • Exercise 2: Create a program that validates a given username. The username must start with a letter and contain only letters, digits, and underscores.

Would you like to work on these exercises, or would you like a deeper dive into any specific part of regular expressions?