Você está na página 1de 13

ABAP Keyword Documentation

SAP NetWeaver AS ABAP Release 731, Copyright 2015 SAP AG. All rights reserved.

Differences between Unicode and Non-Unicode Programs


Comments and Literals in Non-Unicode Programs
Names in Unicode Programs
Program Structure of Unicode Programs
Operand Types in Unicode Programs
Alignment in Unicode Systems
Offset and Length Specifications in Unicode Programs
Access to Memory Sequences in Unicode Programs
Conversion of Structures in Unicode Programs
Structure Typing in Unicode Programs
Structure Enhancements and Unicode Programs
Character String and Byte String Processing in Unicode Programs
Function Module Calls in Unicode Programs
Open SQL in Unicode Programs
The File Interface in Unicode Programs
Lists in Unicode Systems

Differences between Unicode and Non-Unicode Programs


The ABAP key word documentation describes the ABAP statements for both Unicode and non-Unicode
systems. Only Unicode programs can be compiled and executed in Unicode systems. In non-Unicode systems,
this is also possible for non-Unicode programs. However, Unicode programs should also be used in nonUnicode systems, for the following reasons:

Static type checks are executed in Unicode programs.

Byte processing and string processing is separated in Unicode programs.

Structures are always handled as structures in Unicode programs.

Uncontrolled access to segments of the working memory is not possible in Unicode programs.

This makes Unicode programs easier to understand, more robust, and easier to maintain than non-Unicode
programs.
The following section lists the language constructs and statements for which there are differences between
Unicode and non-Unicode programs:

Comments and Literals in Non-Unicode Programs

Names in Unicode Programs

Program Structure of Unicode Programs

Operand Types in Unicode Programs

Alignment in Unicode Systems

Offset and Length Specifications in Unicode Programs

Access to Memory Sequences in Unicode Programs

Conversion of Structures in Unicode Programs

Structure Typing in Unicode Programs

Structure Enhancements and Unicode Programs

String and Byte String Processing in Unicode Programs

Function Module Calls in Unicode Programs

Open SQL in Unicode Programs

File Interface in Unicode Programs

Lists in Unicode Systems

Comments and Literals in Non-Unicode Programs


In non-Unicode systems, no characters should be used in comments unless they are available in all code
pages supported by SAP. In the worst case, a program can no longer be executed when a code page other
than the one in which it was created is used. We recommend the usage of 7-Bit-ASCII-characters only.
Note
In a Unicode system, all source codes are stored in Unicode and this is why this problem does not occur there.
However, even in Unicode programs, do not use characters for comments and literals if they cannot be
displayed in non-Unicode programs, so that programs can be transported from a Unicode system to a nonUnicode system without losses during conversion.

Names in Unicode Programs


Only the following characters are allowed in names in Unicode programs:

1.

The letters "A" through "Z"

2.

The digits "0" through "9"

3.

Underscores ("_")

For compatibility reasons, you can also use the characters "%", "$", "?", "-", "#", and "*" but these should be
used only in exception cases (for example, for existing program generations) and with good justification. You
can also use forward slashes ("/") for namespace prefixes.
Note
Apart from ABAP Objects, non-Unicode programs can also use characters other than the ones listed above.
This can cause the following problems in these programs:

If characters are used that are not available in all code page supported by SAP, it might not be possible to run
certain programs when using a different code page to the one in which they were created.

No string templates can be used in a non-Unicode program.

Program Structure of Unicode Programs


Non-accessible statements (statements that are not assigned to a processing block) lead to a syntax error in
Unicode programs. In non-Unicode programs, at present only a syntax warning is issued.

Operand Types in Unicode Programs


One of the most important differences between Unicode and non-Unicode programs is the clear distinction
between character-type data objects and byte-type data objects, and the restriction of data types whose objects
can be viewed as character-type. This has an influence on all statements in which character-type operands are
expected, and in particular on character string and byte string processing.

Character-type data objects


In Unicode programs, only the following elementary data objects are now character-type:
Data type
c
d
n
t
string

Meaning
Text field
Date field
Numerical text
Time field
Text string

In addition, structures are character-type if they contain only flat character-type components (only components
from the above table with the exception of text strings).
In Unicode programs, a structure can now essentially only be used at an operand position that expects a single
field if the structure is character-type. It is then handled in the same way as a data object of type c.
In non-Unicode programs, all flat structures and byte-type data objects are also still handled as character-type
data objects (implicit casting).
Note
The incorrect use of structures at operand positions is greatly restricted in Unicode programs. For example, a
structure that contains a numeric component can no longer be used at a numeric operand position.

Byte-type data objects


In Unicode programs, elementary data objects of types x and xstring are byte-type. In non-Unicode
programs, data objects of this type are generally handled as character-type. Conversely, in non-Unicode
programs, at positions in which byte processing takes place (SET BIT, GET BIT and the logical operators O,

Z, M), character-type data objects are still expected, while in Unicode programs only byte-type data objects are
permitted.
Note
In Unicode programs, the storage of byte strings in character-type containers causes problems, as the byte
order of character-type data objects in Unicode systems is platform dependent. In non-Unicode systems, this
only applies for data objects of numeric data types. The content of the data objects is interpreted incorrectly if a
container of this type is stored persistently and is then imported to an application server with a different byte
sequence.

Alignment in Unicode Systems


In Unicode systems, in addition to alignment requirements for numeric data objects of types i, decfloat16,
decfloat34, f, s, and of deep data objects, all character-like data types are also affected. The alignment is
determined by the length requirement of a character in the memory.
As a consequence, in structures with components of different data types, the alignment gaps in Unicode
systems may be different to those in non Unicode systems. For enhancements between structures, the
Unicode fragment view concept has been introduced, which divides a structure into fragments according to its
alignment gaps.
Note
Alignment gaps can also occur at the end of structures, as the overall length of the structure is determined by
the component with the largest alignment requirement.
Example
In the following structure, alignment gaps (A) occur in Unicode systems that are not present in non-Unicode
systems. The first alignment gap is formed as a result of the alignment of the substructure struc2, the second
due to the alignment of the component c of type c>, and the third is due to the component d of type i.
DATA:
BEGIN OF struc1,
a TYPE x LENGTH 1,
BEGIN OF struc2,
b TYPE x LENGTH 1,
c TYPE c LENGTH 6,
END OF struc2,
d
TYPE i,
END OF struc1.
Non-Unicode system [ a | b | cccccc | dddd ]
Unicode system
[ a | A | b | A | cccccccccccc | AA | dddd ]

Offset and Length Specifications in Unicode Programs


Offset/length specifications are made by appending [+off][(len)] to the name of a data object in operand
position, and the specifications are used to access subareas of a data object. This type of programming is no
longer completely possible in Unicode systems because, for example when accessing structures with

components of different data types, it is not possible to define whether offset and length should be specified in
characters or bytes. Furthermore, restrictions have been introduced that forbid access to memory areas outside
of flat data objects.

Offset/Length Specifications for Elementary Data Objects


Offset/length specifications are permitted for character-like data objects and byte-like data objects. The
specification of offset and length is interpreted either as a number of characters or as a number of bytes. The
rules that determine which data objects in Unicode programs count as character-like or byte-like objects do not
allow for offset/length specifications for data objects of numeric data types.
Note
The method of using data objects of type c as containers for storing structures of different types, which are
often not known until runtime, in which components are accessed using offset/length access, is no longer
possible in Unicode programs. Instead of these containers, the statement CREATE DATA can be used to
generate data objects of any structure. To enable access to existing containers, these can be assigned to a
field symbol using the CASTING addition of the statement ASSIGN. The COMPONENT addition can then be used
to access components.

Offset/Length Specifications for Structures


An offset/length specification for a structure is only permitted in Unicode systems if the structure is either

character-like (meaning it only contains flat character-like components), or it is

flat, has a character-like initial fragment according to the Unicode fragment view, and the offset/length
specification accesses this initial fragment.

In both cases, the specification of offset and length is interpreted as a number of characters.
Example
The following structure has both character-like and non-character-like components:
DATA:
BEGIN OF struc,
a TYPE c LENGTH 3,
b TYPE n LENGTH 4,
c TYPE d,
d TYPE t,
e TYPE decfloat16,
f TYPE c LENGTH 28,
g TYPE x LENGTH 2,
END OF struc.

"Length
"Length
"Length
"Length
"Length
"Length
"Length

3 characters
4 characters
8 characters
6 characters
8 bytes
28 characters
2 bytes

The Unicode fragment view splits the structure into five areas, F1 - F5.
[ aaa | bbbb | cccccccc | ddd | AAA | eeee | fffffffffffff | gg ]
[
F1
| F2 | F3 |
F4
| F5 ]
Offset/length access is only possible for the character-like initial fragment F1. Specifications such as
struc(21) or struc+7(14) are accepted and are handled as a single field of type c. An access such as
struc+57(2), for example, is not permitted in Unicode systems.

Offset/Length Specifications for Actual Parameters


For actual parameters specified in PERFORM, in Unicode programs, it is not possible to specify a memory area
outside of the actual parameter using offset/length specifications. In particular, it is no longer possible to specify
an offset without a length, as this would implicitly set the length of the actual parameter.

Offset/Length Specification for Field Symbols

When assigning a memory area to a field symbol using the ASSIGN statement, in Unicode programs it is now
only possible to use offset/length specifications to access the memory within the data object. The addition
RANGE defines the data object.
Field symbols themselves are also allocated an assignable memory area. This is effective if a field symbol is
used as a source in the ASSIGN statement.
In non-Unicode programs, the assignable area is defined by the data area of the current program, which can
lead to references being overwritten.
If a data object is entered as a source in ASSIGN, no offset can be specified without a length unless the explicit
RANGE addition is specified. Otherwise, this would implicitly set the length of the data object. If the name of a
field symbol is specified, its data type in Unicode programs must be flat and elementary if an offset is specified
without a length.
Note
Previously, cross-field offset/length accesses could be usefully implemented in the ASSIGN statement for
processing repeating groups in structures. In order to enable this in Unicode systems, the ASSIGN statement
has been enhanced with the additions RANGE and INCREMENT.

Access to Memory Sequences in Unicode Programs


The following (obsolete) statements access data objects that are stored in the memory as an equally spaced
sequence:

DO ... VARYING

WHILE ... VARY

ADD ... THEN ... UNTIL

ADD ... FROM ... TO

In the DO and WHILE loops in Unicode programs, all data objects of the sequence must be compatible and
either be structure components that belong to the same structure, or subareas of the same data object
specified using offset/length specifications. In Unicode programs, a RANGE must also be entered if it cannot be
statically recognized that the data objects involved are components of the same structure. Otherwise, the
permitted memory area is determined from the smallest possible substructure.
When memory sequences are added using ADD, in Unicode programs, all data objects of the sequence must
be components of a structure. If this cannot be statically recognized in the syntax check, a structure must be
specified using the addition RANGE.

Conversion of Structures in Unicode Programs


The most important differences between the behaviors of Unicode programs and non-Unicode programs are
the changed conversion rules for structures for assignments and for comparisons.
Note

Two structures in Unicode programs are only compatible when all alignment gaps are identical on all platforms.
This applies in particular for alignment gaps that are created by included structures (INCLUDE)

Assignments Between Flat Structures


In non-Unicode programs, incompatible flat structures are treated as data objects of the type c, whereas in
Unicode programs, conversion rules apply which assign the most important role to the Unicode fragment view
of the structures.

Assignments Between Flat Structures and Single Fields


Non-Unicode programs always handle flat structures as data objects of the type c when assigning from and to
elementary data objects. In Unicode programs, however, a conversion rule applies, stating that the structure
must be character-like (at the very least in its initial fragment).

Comparisons Between Incompatible Flat Structures


As is the case with assignments, the structures are not handled as c fields, but in accordance with their
Unicode fragment view (see Comparison Rules Between Operands).

Comparisons Between Flat Structures and Single Fields


As is the case with assignments, the system checks whether the structure is character-like, at the very least in
its initial fragment (see Comparison Operators for All Data Types).

Structure Typing in Unicode Programs


For downward compatibility reasons, a structure can still be cast for field symbols and parameters of function
modules and subroutines using the obsolete addition STRUCTURE.
When assigning a data object to such a field symbol or passing an actual parameter to such a formal
parameter, in non-Unicode programs, the system only checks whether the length of the data object or actual
parameter has at least the length of the structure and whether the alignment is identical at runtime. Unicode
programs make a difference between structured and elementary data objects or actual parameters. For a
structured data object or actual parameter, its Unicode fragment view must match the cast structure including
all alignment gaps (including the closing ones). In addition, an elementary data object or actual parameter must
be character-like and flat.
When a formal parameter of a function module is typed with a flat structure using LIKE instead of TYPE, LIKE
has the same effect as STRUCTURE. However, the system checks the exact length when passing the
parameters in non-Unicode programs.
Note
The check of the Unicode fragment view can avoid problems that occur in non-Unicode systems due to closing
alignment gaps. This can include the non-type-compliant filling of actual parameters with the content of an
alignment gap.

Structure Enhancements and Unicode Programs

ABAP Dictionary structures and database tables that are delivered by SAP can be enhanced using customizing
includes or append structures. These types of changes cause problems in Unicode programs if the
enhancements change the Unicode fragment view.
For this reason, the option to classify structures and database tables was introduced, which makes it possible
to recognize and handle problems related to structure enhancements. This classification is used during in the
program check to create a warning at all points where the program works with structures, and where later
structure enhancements can cause syntax errors or changes in program behavior. When you define a structure
or a database table in ABAP Dictionary, you can specify the enhancement categories that are displayed in the
following table as classification.
Level Category
1
Unclassified
2
Cannot be enhanced

Meaning
The structure does not have an enhancement category.
The structure must not be enhanced.
All structure components and their enhancements must be character-like
and flat.

Can be enhanced and character-like

Can be enhanced and character-like or


numeric

All structure components and their enhancements must be flat.

Can be enhanced in any way

All structure components and their enhancements can have any data
type.

The warnings displayed after the program check are classified into three levels from the following table,
depending on the consequences of the permitted structure enhacements.
Level Type of Check Meaning
An enhancement that fully utilizes the enhancement category of the structure in question leads to a
A
Syntax check
syntax error.
B
Extended check Permitted enhancements can lead to a syntax errors, but not always.
Permitted enhancements cannot lead to syntax errors, although changes to program behavior do
C
Extended check
result in semantic problems.
Example
If the structure ddic_struc in ABAP Dictionary is defined only with flat components but is classified as Can
be enhanced in any way, then the following program section leads to a warning in the syntax check. If the
structure were to be enhanced by a deep component after the program was delivered, the program would be
syntactically incorrect and no longer executable. This is why in this case you either have to classify the
structure ddic_struc in ABAP Dictionary as Can be enhanced and character-like or else you cannot specify
the offset/length in the program.
DATA: my_struc TYPE ddic_struc,
str TYPE string,
off TYPE i,
len TYPE i.
...
str = my_struc+off(len).

Character String and Byte String Processing in Unicode


Programs

In Unicode programs, character string and byte string processing are strictly separated. The operands of
character string processing must be character-like data objects, and operands in byte string processing must
be byte-like data objects. In non-Unicode programs, byte strings are normally handled in the same way as
character strings.

Syntactic Separation

Statements for Character String and Byte String Processing


In the statements for character string and byte string processing, in Unicode programs, the distinction is made
in the statements that are intended for both types of processing by the optional addition IN CHARACTER|BYTE
MODE. In this case, IN CHARACTER MODE is the default.
Note
The addition IN CHARACTER|BYTE MODE is also used in the statements for determining length and offset:

DESCRIBE FIELD ... LENGTH

DESCRIBE DISTANCE

In this case, the specifications are mandatory.

Relational Operators for Character Strings and Byte Strings


Relational operators exist both for character strings and for byte strings. In Unicode programs, the latter can no
longer be used for byte strings.

Functions for Character Strings and Byte Strings


The description functions are divided into description functions for character strings and description functions
for byte strings. In particular, in Unicode programs, strlen can now only be used for character-like arguments,
while xstrlen is available for byte-like arguments.

Function Module Calls in Unicode Programs


In Unicode programs, a handleable exception is raised in a general function module call if an incorrect formal
parameter is specified and the name of the function module is specified using a constant or as a literal. If the
name of the function module is specified by a variable, and in non-Unicode programs, the specification of an
incorrect formal parameter is ignored.

Open SQL in Unicode Programs


When work areas are used in Open SQL statements, in non-Unicode programs, their structure is not taken into
account. Only the length and the alignment are checked.
In Unicode programs, for structured work areas the Unicode fragment view must be correct, and elementary
work areas must be character-type.

The File Interface in Unicode Programs


Since the content of files frequently reflects the structure of data in the working memory, the file interface in a
Unicode system must fulfill the following requirements:

It must be possible to exchange data between Unicode and non-Unicode systems.

It must be possible to exchange data between different Unicode systems.

It must be possible to exchange data between different non-Unicode systems that use different code pages.

For this reason, in Unicode programs, you must always define the code page used to encode the charactertype data that is written in text files or that is read from text files.
You must also consider that a Unicode program must be executable in a non-Unicode system as well as a
Unicode system. Some of the syntax rules for the file interface have therefore been modified so that
programming data access in Unicode programs is less prone to errors than in non-Unicode programs.

Before every read or write access, a file must be opened explicitly using OPEN DATASET. Furthermore, a file
that is already open cannot be opened again. In non-Unicode programs, the first time a file is accessed, it is
implicitly opened using the standard settings. The statement for opening a file can be applied to an open file in
non-Unicode-programs, although a file can only be opened once within a program.

When opening the file, the access type and type of file storage must be specified explicitly using the following
additions:

INPUT|OUTPUT|APPENDING|UPDATE

[LEGACY] BINARY|TEXT MODE

When opening a file in TEXT MODE, the ENCODING addition must be used to specify the character
representation. When opening a file in LEGACY MODE, the byte order (endian) and a non-Unicode code page
must be specified.
In non-Unicode programs, if nothing is entered, a file is opened with implicit standard settings.

If a file is opened for reading, the content can only be read. In non-Unicode programs, it is also possible to gain
write access to these files.

If a file is opened as a text file, only the content of character-type data objects can be read or written. In nonUnicode programs, byte-type and numeric data objects are also allowed.

Note
In Unicode programs, file names can also contain blank characters.

Lists in Unicode Systems


Introduction
A WRITE statement writes the content of data objects to a list. When data is written with a WRITE statement,
the output is stored in the list buffer and accessed from there for display when the list is called.

Each time a data object is produced by WRITE, the system defines an output length either implicitly or explicitly;
the implicit output length depends on the data type. The output length defines the following two attributes:

Number of positions or memory spaces available for characters in the list buffer

Number of columns or cells available in the actual list

If the output length is shorter than the length of the data object, the system shortens its content according to
certain rules when writing the data to the list buffer. Any values lost in numeric fields are indicated by a *.
When displaying or printing a list, the content stored in the list buffer is transferred to the list as follows:

In non-Unicode systems, each character occupies the same amount of space in the list buffer as it requires
columns in the list. In single-byte systems, a character occupies one byte in the list buffer and one column in the
list, while a character that occupies several bytes in the list buffer in multi-byte systems also occupies the same
number of columns in the list. For this reason, all the characters stored in the list buffer are displayed in the list in
non-Unicode systems.

In Unicode systems, every character usually occupies one place in the list buffer. However, a character can also
occupy more than one column, as is the case for Eastern Asian characters. However, since the list only contains
the same number of columns as there are positions in the list buffer, the number of characters that can be
displayed in the list is smaller than the number of characters stored in the list buffer in this case. List output is
shortened accordingly, with the page formatted according to the specified alignment and marked with the
characters > or <. You can then only display the entire content of the list by choosing the menu path System
List Unicode Display.

For this reason, the horizontal position of the list cursor only has the same meaning as the output column in a
list displayed or printed in non-Unicode systems. In Unicode systems, this is only guaranteed for the top and
bottom output limits.

Rules for WRITE Statements


To avoid cutting off values unintentionally as far as possible, the rules for WRITE statements in Unicode
programs have been modified and extended.

Operands in the WRITE Statement


If the data object specified in WRITE is a flat structure, this must be purely character-like in Unicode programs.
Note
This also applies for the statement WRITE TO, in which the target field must also be character-like.

WRITE Statements with Implicit Output Length


In Unicode programs, WRITE statements without an explicitly specified output length for all data objects except
text field literals and data objects of the type string behave in the same way as in non-Unicode programs.
This means fewer characters may be displayed in the list than are stored in the list buffer.
In the case of text field literals and data objects of the type string, the system assumes that all characters are
to be displayed. For this reason the implicit output length is calculated using the characters contained in the
data object so that it corresponds to the number of columns needed in the list. If this output length is greater
than the length of the data object, surplus positions are filled with blanks when the data is written to the list
buffer. When displaying the data in the list, the system removes these blanks, since the character
representation fills the output length exactly.

WRITE Statements with Explicit Output Length


If a numeric data object is specified as an explicit output length after the AT addition for a WRITE statement, the
value of this number is used as the output length, both in Unicode and non-Unicode systems. In Unicode
systems, the number of characters displayed in the list can differ from the number of characters stored in the
list buffer. You can specify the output length in the following way instead of using numeric data objects:

1.

WRITE AT (*) ...

2.

3.

1.

In data objects of the types c and string, the output length is set to the number of columns required to
display the entire content in the list; closing blanks are ignored for type c. In the case of data objects of the
type string, this has the same meaning as the implicit length.

2.

In data objects of the types d and t, the output length is set to 10 and 8.

3.

In data objects of the numeric types i, f, and p, the output length is set to the value required to display the
current value including thousand separators. This rule is applied to the value after any CURRENCY,
DECIMALS, NO-SIGN, ROUND, or UNIT have been used.

4.

The implicit output length is used for data objects of the types n, x, and xstring.

WRITE AT (**) ...

4.

1.

In data objects of the type c, the output length is set to twice the length of the data object, and in data objects
of the type string, to twice the number of characters contained in the object.

2.

In data objects of the types d and t, the output length is set to 10 and 8.

3.

In data objects of the numeric types i, f, and p, the output length is set to the value required in order to
display the maximum possible values for these types, including plus and minus signs and thousands
separators. This rule is applied to the value after any CURRENCY, DECIMALS, NO-SIGN, ROUND, or
UNIT additions have been used.

4.

The implicit output length is used for data objects of the types n, x, and xstring.

The behavior of the output lengths (*) and (**) when using the addition USING EDIT MASK and the
templates for date fields is described in Formatting Options.

Additions for GET/SET CURSOR FIELD/LINE


The additions DISPLAY OFFSET and MEMORY OFFSET take account of the fact that data objects can occupy
different lengths when displayed in a list and when stored temporarily in the list buffer.
In accordance with this fact, the addition DISPLAY OFFSET off positions the cursor in the column in the
output area specified in off for the SET CURSOR { FIELD f | LINE l } statement. The addition MEMORY
OFFSET off positions the cursor on the character in the output area that is located in the position (of the data
object in f) in the list buffer specified in off.
In the same way, a GET CURSOR { FIELD f | LINE l } statement used with the addition DISPLAY
OFFSET off places the cursor position in the output area in the data object off. When you use the addition
MEMORY OFFSET off, the cursor position in the list buffer that is assigned to the character displayed is placed
in the data object off. The DISPLAY addition is the default and can be left out.

Class for Formatting Lists


Class CL_ABAP_LIST_UTILITIES has been introduced to calculate output lengths, convert values from the list
buffer, and define field limits. The return codes of the methods of this class can be used to program a correct
column alignment in ABAP lists, even for output of Eastern Asian characters.

List Settings
The objects in a list can be displayed in different output lengths by specifying the desired length in the menu
under System List Unicode Display. This is particularly advantageous for screen lists in Unicode systems
where the output is cut off as indicated by the characters > or <.

Recommendations
We recommend that you adhere to the following rules when programming lists, to ensure that they have the
same appearance and functions both in Unicode and non-Unicode systems:

Specify an adequate output length

Do not overwrite parts of a field

Do not use the additions RIGHT-JUSTIFIED or CENTERED for WRITE TO if this statement is followed by
list output with WRITE.

In customer-programmed horizontal scrolling with a SCROLL statement, you should only specify the upper or
lower limit of data objects displayed, since the positions in the list buffer and in the list displayed are only certain
to match for these field limits in Unicode systems.