View on GitHub

presentations

Presentation notes from JMU Unix Users Group meetings

Regular Expressions

Connor Sample - https://tabulate.tech


https://regex101.com

Regex 101 documentation


What are regular expressions?

Common use cases


Basic Syntax


Basic Syntax - Character classes


Basic Syntax - Quantifiers


Escaping special characters


Examples

  1. Finding patterns in files:
    • Use grep to search for specific strings or patterns
    • grep 'error' logfile.txt
    • grep -E '([0-9]{1,3}\.){3}[0-9]{1,3}' access.log
  2. Replacing text in files:
    • Use sed or awk to perform find and replace operations
    • sed -i 's/\.html"/"/g' file.txt
  3. Validating input:
    • Ensure input adheres to specific formats or constraints
    • <input type="text" pattern="[A-Za-z]{3}"

Advanced Techniques


Word Boundary Marker


More Advanced Techniques


Example 1: Matching Digits

Text: “I have 3 apples and 5 oranges.”


Example 1: Solution

Regex: \d+


Example 2: Matching Words with 3 characters

\b can be used to match “word-boundaries”, which is the space around word characters (\w).

Text: “The quick brown fox jumps over the lazy dog.”


Example 2: Solution

Regex: \b\w{3}\b


Example 3: Matching Parts of a Date

Use capture groups to extract March, 5, and 2024. Text: March 5th, 2024


Example 3: Solution

Regex: ([A-Za-z]+)\s+(\d{1,2})(?:[A-Za-z]*),\s+(\d{4})


Example 4: Matching Email Addresses

Text: “Contact us at payment@uugix.org.”

Simplified email specification:

Domain specification:

Use \b to ensure the email is standalone


Example 4: Solution

Regex: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b


Best practices


h:700px


(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:
[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\
[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)
+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|
[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|
[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\
[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

h:700px


import re

pattern = re.compile(r"""
\d+           # match one or more digits
\.            # match the `.` character
[A-Za-z]{5}   # match 5 letters of any case 
""", re.X)

pattern2 = re.compile(r"""
(?x)  # verbose mode
\d*   # match an optional digit
""")

Q&A/Practice