Regex

Posted by John Liu on Saturday, November 27, 2021

Some Regex syntax

Find quoted text

"(.*?)"

Lookahead: matches “x” only if “x” is followed by “y”

x(?=y)

Negative lookahead: matches “x” only if “x” is NOT followed by “y”

x(?!y)

Lookbehind: matches “x” only if “x” is preceded by “y”

(?=y)x

Negative lookbehind: matches “x” only if “x” is NOT preceded by “y”

(?!y)x

CSV file line columns counts

Using comma as delimiter.

,|(".*?"(,(?!$))?)

break above down as:

,       a standard comma
|       or
(       start of group
".*?"   quote of a string, for value contains comma, CSV will quote the entire string
(       start of inner group
,(?!$)  a coma that not last character of the line. If comma is last character of the line, it needs to be counted separatly
)?      end of inner group, and this group is optional.
)       end of group

csv file comma delimited section

[^,(?<(".*?"(,(?!$))?))]*,|(".*?"(,(?!$))?)

find line contains 5 commas, counts comma in quoted string

^([^,]*,){5}[^,]*$

break above down as:

^       #Start of string
(       #Start of group
[^,]*   #Any character except comma, zero or more times
,       #A comma
){5}    #End and repeat the group 5 times
[^,]*   #Any character except comma, zero or more times again
$       #End of string

find string contains 5 commas, exclude counts comma in quoted string

(([^,(".*?")]*,)|(".*?",)){5}

break above down as:

(                   #start of group
([^,(".*?")]*,)     #any non-quoted string end with comma
|                   #or
(".*?",)            #quoted string end with comma
)                   #end of group
{5}                 #repeat the group 5 times

find string between two specific words

(?s)^To: (.*)(?=^Subject:)

break above down as:

(?s)                #using Singleline mode
^To:                #line start with To: 
(.*)                #any characters
(?=^Subject:)       #lookahead for line start with Subject:, and stop just before the last Subject:

extract all To and Cc email address

^To: .*(\n|\r|\r\n)(^\s+.*(\n|\r|\r\n))*(^CC: .*(\n|\r|\r\n)(^\s+.*(\n|\r|\r\n)?)*)?(?=^Subject:)?

break above down as:

^To: .*(\n|\r|\r\n)     #find line start with To: (we use \n|\r|\r\n so we don't need the (?s) for Singleline mode)
(^\s+.*(\n|\r|\r\n))*   #zero or more lines that start with space following above line. This is for To: addresses have been split into muliple lines
(^CC: .*(\n|\r|\r\n)(^\s+.*(\n|\r|\r\n)?)*)?    #find 0 or 1 line start with Cc:, include Cc: addresses have been split into multiple lines
(?=^Subject:)?           #look ahead and stop at line start with Subject:

Useful checksheet sites:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet https://stackoverflow.com/questions/39636124/regular-expression-works-on-regex101-com-but-not-on-prod