ABAP Regular Expressions: Pattern Search and Replace

Category
ABAP-Statements
Published
Author
Johannes

Regular Expressions (Regex) enable complex pattern search and replacement in strings. ABAP supports POSIX-compatible regular expressions via statements and classes.

Usage Options

  1. FIND … REGEX – Search pattern in string
  2. REPLACE … REGEX – Replace pattern
  3. matches() – Check if string matches pattern
  4. cl_abap_regex / cl_abap_matcher – Object-oriented API

Regex Syntax Overview

CharacterMeaningExample
.Any charactera.c -> abc, aXc
*0 or moreab*c -> ac, abc, abbc
+1 or moreab+c -> abc, abbc
?0 or 1ab?c -> ac, abc
^Start^Hello
$EndWorld$
[abc]Character class[aeiou] -> vowels
[^abc]Negated class[^0-9] -> non-digits
[a-z]Range[A-Za-z] -> letters
\dDigit [0-9]\d{4} -> 4 digits
\wWord character [a-zA-Z0-9_]\w+
\sWhitespace\s+ -> spaces
\bWord boundary\bword\b
{n}Exactly n timesa{3} -> aaa
{n,m}n to m timesa{2,4} -> aa, aaa, aaaa
(...)Group(ab)+ -> ab, abab
|Orcat|dog

Examples

1. FIND with REGEX

DATA: lv_text TYPE string VALUE 'Order 12345 from 2024-11-15'.
" Find number
FIND REGEX '\d+' IN lv_text MATCH OFFSET DATA(lv_offset)
MATCH LENGTH DATA(lv_length).
IF sy-subrc = 0.
DATA(lv_number) = substring( val = lv_text off = lv_offset len = lv_length ).
WRITE: / 'Found:', lv_number. " 12345
ENDIF.

2. Find All Matches (FIND ALL OCCURRENCES)

DATA: lv_text TYPE string VALUE 'Tel: 030-12345, Fax: 040-67890, Mobile: 0170-9876543'.
" Find all phone numbers
FIND ALL OCCURRENCES OF REGEX '\d{3,4}-\d+'
IN lv_text
RESULTS DATA(lt_results).
LOOP AT lt_results INTO DATA(ls_result).
DATA(lv_phone) = substring( val = lv_text
off = ls_result-offset
len = ls_result-length ).
WRITE: / 'Phone:', lv_phone.
ENDLOOP.
" Output:
" Phone: 030-12345
" Phone: 040-67890
" Phone: 0170-9876543

3. Extract Groups (Submatches)

DATA: lv_date TYPE string VALUE 'Date: 2024-11-15'.
" Extract date with groups
FIND REGEX '(\d{4})-(\d{2})-(\d{2})'
IN lv_date
SUBMATCHES DATA(lv_year) DATA(lv_month) DATA(lv_day).
IF sy-subrc = 0.
WRITE: / 'Year:', lv_year. " 2024
WRITE: / 'Month:', lv_month. " 11
WRITE: / 'Day:', lv_day. " 15
ENDIF.

4. REPLACE with REGEX

DATA: lv_text TYPE string VALUE 'Price: 123.45 EUR, Discount: 10.00 EUR'.
" Replace numbers with XXX
REPLACE ALL OCCURRENCES OF REGEX '\d+\.?\d*'
IN lv_text WITH 'XXX'.
WRITE: / lv_text. " Price: XXX EUR, Discount: XXX EUR

5. Backreferences

DATA: lv_text TYPE string VALUE 'The the cat sits on on the roof.'.
" Remove duplicate words (backreference \1)
REPLACE ALL OCCURRENCES OF REGEX '\b(\w+)\s+\1\b'
IN lv_text WITH '$1'.
WRITE: / lv_text. " The cat sits on the roof.

6. matches() – Check If Pattern Matches

DATA: lv_email TYPE string VALUE '[email protected]'.
" Simple email validation
IF matches( val = lv_email regex = '^\w+@\w+\.\w+$' ).
WRITE: / 'Valid email'.
ELSE.
WRITE: / 'Invalid email'.
ENDIF.
" Multiple checks
DATA: lv_phone TYPE string VALUE '+49-170-1234567'.
DATA(lv_valid_phone) = xsdbool(
matches( val = lv_phone regex = '^\+?\d{2,3}-\d{2,4}-\d{4,}$' )
).

7. contains() with Regex

DATA: lv_text TYPE string VALUE 'Order No. 12345 has been shipped'.
" Contains number?
IF contains( val = lv_text regex = '\d+' ).
WRITE: / 'Text contains numbers'.
ENDIF.
" Starts with pattern
IF contains( val = lv_text regex = '^Order' ).
WRITE: / 'This is an order'.
ENDIF.

8. count() with Regex

DATA: lv_text TYPE string VALUE 'a1b2c3d4e5'.
" Count of digits
DATA(lv_digit_count) = count( val = lv_text regex = '\d' ).
WRITE: / 'Digit count:', lv_digit_count. " 5
" Count of words
DATA: lv_sentence TYPE string VALUE 'This is an example sentence with words'.
DATA(lv_word_count) = count( val = lv_sentence regex = '\b\w+\b' ).
WRITE: / 'Word count:', lv_word_count. " 7

9. cl_abap_regex – Object-oriented

DATA: lv_text TYPE string VALUE 'Name: Max Mustermann, Age: 30'.
" Create regex object
DATA(lo_regex) = cl_abap_regex=>create_pcre( pattern = '(\w+):\s*(\S+)' ).
" Create matcher
DATA(lo_matcher) = lo_regex->create_matcher( text = lv_text ).
" Iterate through all matches
WHILE lo_matcher->find_next( ).
DATA(lv_full) = lo_matcher->get_match( ).
DATA(lv_key) = lo_matcher->get_submatch( 1 ).
DATA(lv_value) = lo_matcher->get_submatch( 2 ).
WRITE: / 'Match:', lv_full.
WRITE: / ' Key:', lv_key, 'Value:', lv_value.
ENDWHILE.
" Output:
" Match: Name: Max
" Key: Name Value: Max
" Match: Age: 30
" Key: Age Value: 30

10. Practical: Extract Emails

DATA: lv_text TYPE string VALUE
DATA: lt_emails TYPE string_table.
" Find all emails
FIND ALL OCCURRENCES OF REGEX '[\w.+-]+@[\w.-]+\.\w{2,}'
IN lv_text
RESULTS DATA(lt_results).
LOOP AT lt_results INTO DATA(ls_result).
APPEND substring( val = lv_text
off = ls_result-offset
len = ls_result-length ) TO lt_emails.
ENDLOOP.
LOOP AT lt_emails INTO DATA(lv_email).
WRITE: / lv_email.
ENDLOOP.
" Output:

11. Practical: Clean Data

" Normalize phone number
DATA: lv_phone TYPE string VALUE '+49 (0) 170 / 123 45 67'.
" Remove all non-digits except +
REPLACE ALL OCCURRENCES OF REGEX '[^\d+]' IN lv_phone WITH ''.
WRITE: / lv_phone. " +491701234567
" Reduce multiple spaces
DATA: lv_text TYPE string VALUE 'Too many spaces here'.
REPLACE ALL OCCURRENCES OF REGEX '\s{2,}' IN lv_text WITH ' '.
WRITE: / lv_text. " Too many spaces here

12. Practical: Validations

" Validate IBAN (simplified)
DATA: lv_iban TYPE string VALUE 'DE89370400440532013000'.
IF matches( val = lv_iban regex = '^[A-Z]{2}\d{2}[A-Z0-9]{4}\d{7}([A-Z0-9]?){0,16}$' ).
WRITE: / 'Valid IBAN format'.
ENDIF.
" Validate ZIP code (Germany)
DATA: lv_plz TYPE string VALUE '12345'.
IF matches( val = lv_plz regex = '^\d{5}$' ).
WRITE: / 'Valid German ZIP code'.
ENDIF.
" Validate date (YYYY-MM-DD)
DATA: lv_date TYPE string VALUE '2024-11-15'.
IF matches( val = lv_date regex = '^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$' ).
WRITE: / 'Valid date format'.
ENDIF.

13. Practical: Parsing

" Parse log line
DATA: lv_log TYPE string VALUE '2024-11-15 10:30:45 [ERROR] Database connection failed'.
FIND REGEX '^(\d{4}-\d{2}-\d{2})\s+(\d{2}:\d{2}:\d{2})\s+\[(\w+)\]\s+(.+)$'
IN lv_log
SUBMATCHES DATA(lv_date) DATA(lv_time) DATA(lv_level) DATA(lv_message).
IF sy-subrc = 0.
WRITE: / 'Date:', lv_date.
WRITE: / 'Time:', lv_time.
WRITE: / 'Level:', lv_level.
WRITE: / 'Message:', lv_message.
ENDIF.

14. Practical: CSV Parsing

DATA: lv_csv TYPE string VALUE 'Max;Mustermann;30;Berlin'.
DATA: lt_fields TYPE string_table.
" Split by semicolon (alternative to SPLIT)
FIND ALL OCCURRENCES OF REGEX '[^;]+' IN lv_csv RESULTS DATA(lt_matches).
LOOP AT lt_matches INTO DATA(ls_match).
APPEND substring( val = lv_csv off = ls_match-offset len = ls_match-length )
TO lt_fields.
ENDLOOP.
" Or simpler with SPLIT:
SPLIT lv_csv AT ';' INTO TABLE lt_fields.
DATA: lv_text TYPE string VALUE 'ABAP is great, abap is cool'.
" Case-insensitive with (?i)
FIND ALL OCCURRENCES OF REGEX '(?i)abap'
IN lv_text
MATCH COUNT DATA(lv_count).
WRITE: / 'Found:', lv_count, 'times'. " 2

16. Escape Function

DATA: lv_search TYPE string VALUE 'a.b*c?'.
" Escape special characters for literal search
DATA(lv_escaped) = escape( val = lv_search format = cl_abap_format=>e_regex ).
WRITE: / 'Escaped:', lv_escaped. " a\.b\*c\?
" Now lv_escaped can be used in REGEX
DATA: lv_text TYPE string VALUE 'Test a.b*c? end'.
FIND REGEX lv_escaped IN lv_text.
IF sy-subrc = 0.
WRITE: / 'Found!'.
ENDIF.

Common Regex Patterns

PurposePattern
Number\d+ or [0-9]+
Decimal number\d+\.?\d*
Word\w+ or [A-Za-z]+
Email (simple)[\w.+-]+@[\w.-]+\.\w{2,}
URLhttps?://[\w./%-]+
Date (YYYY-MM-DD)\d{4}-\d{2}-\d{2}
ZIP code (DE)\d{5}
Phone (DE)(\+49|0)\d{2,4}[-/]?\d+
IP address\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Remove whitespace\s+ -> “
HTML tags<[^>]+>

FIND/REPLACE Options

FIND REGEX pattern IN text
[ IGNORING CASE ] " Ignore case
[ MATCH OFFSET off ] " Start position of match
[ MATCH LENGTH len ] " Length of match
[ MATCH COUNT cnt ] " Number of matches
[ SUBMATCHES s1 s2 ... ] " Group contents
[ RESULTS result_tab ]. " All matches as table

Important Notes / Best Practice

  • PCRE syntax (Perl-compatible) with cl_abap_regex=>create_pcre().
  • Standard ABAP Regex is POSIX-compatible.
  • Use escape() to escape special characters.
  • Submatches with (...) for group extraction.
  • (?i) at the beginning for case-insensitive search.
  • Performance: Compiled regex (cl_abap_regex) for repeated use.
  • matches() checks if entire string matches the pattern.
  • contains( ... regex = ...) checks if pattern is contained.
  • \d, \w, \s are shorthand for character classes.
  • Test regex with online tools (regex101.com) before implementing.