ARTICLE
escape - Escape Function
Syntax
... escape( val = text format =
format ) ...
Effect
This function gets the content of the character string in
text , and hides certain special
characters with escape characters according to a rule specified in
format .
The possible values of format are defined as constants with the
prefix "E_" in the class
CL_ABAP_FORMAT . Each value defines which special characters are
replaced, and how. There are rules for special characters in markup
languages ( XML and HTML ), in URIs and URLs
, in JSON , as well as in regular expressions and character
string templates. An important part is also played by attack protection
using Cross Site Scripting ( XSS ) on Web applications.
format expects data objects of the type i . An invalid
value for format raises an exception of the class
CX_SY_STRG_PAR_VAL . For all characters whose codes are between x00
and xFF, the program DEMO_ESCAPE
demonstrates the effect of all associated formats from the class
CL_ABAP_FORMAT . The top row contains the names of the constants from
the class CL_ABAP_FORMAT without the prefix "E_" . The
other rows show the effect on the characters specified in the first two
columns.
This function can be specified in
general and character-like
expression positions . The return code has the type string .
Rules for Markup Languages (Including JavaScript)
The program DEMO_ESCAPE_MARKUP
demonstrates the escape rules for markup languages. Formats with
"_JS" in their name are intended for content with JavaScript
components. The following table summarizes the escape rules:
Format
" ' TAB LF
CR BS FF \
ctrl-char
E_XML_TEXT amp; lt; -----
-----
E_XML_ATTR amp; lt; -
quot; apos; #9; #xA;
#xD; ----
E_XML_ATTR_SQ amp; lt; --
apos; #9; #xA; #xD;
----
E_HTML_TEXT amp; lt;
gt; ---------
E_HTML_ATTR amp; lt;
gt; quot; #39; ------
-
E_HTML_ATTR_DQ amp; lt;
gt; quot; --------
E_HTML_ATTR_SQ amp; lt;
gt; - #39; -------
E_HTML_JS --- \" \' \t \n
\r \b \f \\ \xhh
E_HTML_JS_HTML amp; lt;
gt; quot; #39; \t
\n \r \b \f \\ \xhh
The first column contains the names of the formats from the class
CL_ABAP_FORMAT . The other columns show the escape characters that
replace the special characters in the first row. None of the other
characters are affected. TAB , LF CR , BS ,
and FF are the control characters for Tabulator, Line Feed
Carriage Return , Backspace , and Form Feed , to
which the codes x09 , x0A , x0D , x08 , and
x0C are assigned in 7-Bit ASCII . ctrl-char
represents all control characters with codes less than x20 that
are not covered by those shown here. Some of these can be converted to
\xhh , where "hh" is the hexadecimal value of the code. If
there is no value in a field (-), the special character is not affected.
Rules for URL/URIs
The program DEMO_ESCAPE_URL_URI
demonstrates the escape rules for URLs and URIs . All
characters with codes between x00 and 7F are converted to %hh
(except for the characters listed in the following table), where hh
is the hexadecimal value of the code.
Format Unconverted Characters
E_URL [0-9] , [a-z] , [A-Z] , ! , $
, ' , ( , ) , * , + , , , -
, . , _ , , / , : , ; ,
= , ? , @
E_URL_FULL [0-9] , [a-z] , [A-Z] , ! ,
$ , ' , ( , ) , * , + , , ,
- , . , _
E_URI [0-9] , [a-z] , [A-Z] , ! , $
, ' , ( , ) , * , + , , , -
, . , _ , , / , : , ; ,
= , ? , @ , ~ , # , [ , ]
E_URI_FULL [0-9] , [a-z] , [A-Z] , - ,
. , _ , ~
All characters with codes from x80 are converted to their
UTF-8 representation. Depending on the
character, one to four bytes are represented in the form %hh ,
where hh is the hexadecimal value of a byte.
Rules for JSON
The program DEMO_ESCAPE_JSON
demonstrates the escape rules of the format E_JSON_STRING for
JSON . The special characters " and
\ are prefixed with the escape character \ . Control
characters with the codes x08 , x09 , x0A , x0C
and x0D are escaped using \b , \t , \n ,
\f , and \r respectively. All other codes less than x20
are converted to a four-character hexadecimal representation and
prefixed by \u . None of the other characters are affected.
Rules for Regular Expressions
The program DEMO_ESCAPE_REGEX
demonstrates the escape rules of the format E_REGEX for regular
expressions. The special characters of
regular expressions are prefixed by the associated escape character
\ . Control characters with the codes x08 , x09 ,
x0A , x0B , x0C , and x0D are escaped using \b
, \t , \n , \v , \f , and \r
respectively.
Rules for String Templates
The program
DEMO_ESCAPE_STRING_TEMPLATE demonstrates the escape rules of the
format E_STRING_TPL for string templates. The special characters
of string templates ( | , \ ,
{ , } ) are prefixed by the associated escape character
\ . Control characters with the codes x09 , x0A , and
x0D are replaced by \t , \n , and \r respectively.
Rules for Cross Site Scripting
The program DEMO_ESCAPE_XSS
demonstrates the escape rules of the formats E_XSS_... that
enable attacks using Cross Site Scripting (
XSS ) on Web applications to be prevented. Rules exist for
XML / HTML content, JavaScript content, Cascading Style Sheets (
CSS ), and URL content.
The rules for XSS include all the rules for individual formats,
plus some extra rules. They are particularly distinct from the rules for
markup languages, including JavaScript (see above). These
extended rules are designed to be used to protect ABAP programs from
Cross Site Scripting , when content can be constructed from
non-secure sources. The transformations listed above are replaced or
modified as follows:
Markup languages : Format E_XSS_ML . All characters
(except [0-9] , [a-z] , [A-Z] , , , - ,
. , _ , and control characters) are transformed to
#xhh; or #xhhhh; , where hh or hhhh
is the hexadecimal value of the code. All control characters are
transformed to #xfffd; .
JavaScript : Format E_XSS_JS . All characters
(except [0-9] , [a-z] , [A-Z] , , , . ,
and _ ) are transformed to \xhh or \uhhhh , where
hh or hhhh is the hexadecimal value of the code.
URL/URIs : Format E_XSS_URL . All characters (except
[0-9] , [a-z] , [A-Z] , * , - , . ,
and _ ) are transformed to %hh , where hh is the
hexadecimal value of the code. All characters with codes from x80 are
converted to their UTF-8 representation.
Depending on the character, one to four bytes are represented in the
form %hh , where hh is the hexadecimal value of a byte.
CSS : Format E_XSS_CSS . All characters (except
[0-9] , [a-z] , and [A-Z] ) are transformed to \hh
or \hhhh , where hh or hhhh is the hexadecimal value
of the code. A blank is inserted after hh or hhhh if the
following character is a valid hexadecimal digit.
If the format from the class CL_ABAP_FORMAT has the additional
ending "_NU" , all characters with codes greater than xFF
are converted to a four-character hexadecimal representation, with
varying marking depending on the type of the content.
Notes
The class CL_ABAP_DYN_PRG contains
methods ESCAPE_XSS_... that wrap calls of the predefined function
escape with the formats E_XSS_... . It is generally
recommended to use the predefined function directly.
escape used with rules for XSS is recommended to protect
against Cross Site Scripting , but might not be secure enough in
all cases. For example, it may be best to use a whitelist to
check an unsafe URL , so that phishing attacks can be
detected as well as XSS . To guarantee that no code injections are
used, never generate JavaScript dynamically from unsafe sources.
Example
See
String Functions, escape
for HTML
String Functions, escape
for XSS
Runtime Exceptions
Catchable Exceptions
CX_SY_CONVERSION_CODEPAGE_EX
Reason for error: A character cannot be converted in a conversion to
UTF-8 . This can only occur with characters from the
surrogate area . The position
and code of the character is listed in the exception object.
Runtime error: CONVT_CHARACTER
CX_SY_STRG_PAR_VAL
Reason for error: Invalid value in format .
Runtime error: STRG_ILLEGAL_PAR
Documentation extract taken from SAP system, � Copyright SAP AG. All rights reserved