Escolar Documentos
Profissional Documentos
Cultura Documentos
Contents
• Unicode Overview
• What is Unicode
• Need for Unicode
• Unicode Character formats
• Unicode @ SAP
• SAP Pre-Unicode solutions
• Why SAP adopted Unicode
What is Unicode?
What Is Unicode?
• Unicode Standard:
The Unicode Standard is a character coding system designed to support the
worldwide interchange, processing and display of written text of the diverse
languages and technical disciplines of the modern world.
In addition, it supports classical and historical texts of many written
languages.
• The Unicode Consortium:
The Consortium cooperates with
W3C
ISO
and has liaison status "C" with ISO/IEC/
JTC1/SC2/WG2, which is responsible for in refining the specification and
expanding the character set of ISO/IEC 10646
What is Unicode? (contd.)
– It is an open character set, which means that it keeps growing and adding
less frequently used characters.
– The remaining 75% is open for future use but not by any means expected
to be filled up and finally there is a character set with plenty of space!
Unicode Overview
• Even for a single language like English no single encoding was adequate for
all the letters, punctuation, and technical symbols in common use.
• These encoding systems conflict with one another. That is, two encodings
can use the same number for two different characters, or use different
numbers for the same character.
• Programs are written to either handle one single encoding at a time and
switch between them, or to convert between external and internal encodings
Need For Unicode (contd.)
• For strings, however, storing 32 bits for each character takes up too much
space, especially considering that the highest value 0x10FFFF, takes up only
21 bits. 11 bits are always unused in a 32-bit word storing a Unicode code
point.
• Therefore, you will find that software generally uses 16-bit or 8-bit units as a
compromise, with a variable number of code units per Unicode code point.
Unicode Character Formats (contd.)
Unicode @ SAP
• Each user can only access one code page at a time: a user who logs in as a
Japanese user cannot enter German characters ,and all German characters in
the database will not be correctly displayed.
SAP Pre-Unicode Solutions (contd.)
• It is possible for a user to log on with German and then manipulate the character
set and font settings so that he can enter what appear to be Japanese characters;
these characters will not be correctly stored in the database and this data will be
corrupt
• If a user wants to enter f.i. Japanese, he/she must log on in Japanese
• To insure that no data corruption occurs, the following restrictions must be
followed:
Global data must contain only 7-bit ASCII characters, which are in all
code pages
Users may use only the characters of their log-in language or 7-bit ASCII
Batch processes must be assigned with the correct user ID and language
EBCDIC code pages are not supported
View in Different Code Pages
Recommendations From SAP (Pre-Unicode)
• In general, using a single standard code page for new installations and
upgrades is the optimal decision
• If additional languages or language combinations are needed, SAP
recommends Unambiguous Blended Code Pages for new installations and
MDMP for existing installations.
• Unambiguous Blended Code Pages only support certain language
combinations and therefore an MDMP setup may be the only possibility for
new installations as well.
Unicode @ SAP
• Enable a single set of source code to be written to process data in virtually all
languages
Overview
Overview
• Each character mapped using 16 bits (= 2 bytes) which offers a maximum of 216
bit combinations
• Data Types
• Data Layout of Structures
• Unicode Fragment Views
Data Types
• For several data types (like F and I) the memory address should start with
multiples of 4 or 8 and for Character-type it should start with multiples 2 or 4
depending on the Unicode representation.
• Examples:
BEGIN OF struc2,
a(1) TYPE X,
BEGIN OF struc3,
b(1) TYPE X,
c(6) TYPE C,
END OF struc3,
d TYPE I,
END OF struc2.
Unicode Fragment View
Restrictions in Unicode
Overview
• Assignments
• CO BYTE-CO
CN BYTE-CN
CA BYTE-CA
NA BYTE-NA
CS BYTE-CS
NS BYTE-NS
CP BYTE-CP
NP BYTE-NP
Processing Strings (contd.)
STRLEN – only for character-type fields and returns the length in characters
• Opening Files:
The OPEN DATASET dsn ... statement must include at least one of the
following additions:
• Bit Statements:
SET BIT i OF f [TO g].
GET BIT i OF f [INTO g].
f O x, f Z x, and f M x .
The field f must be either X or XSTRING in all the above bit operations.
Other Changes relating to Unicode (contd.)
• Generating Subroutine:
GENERATE SUBROUTINE POOL itab NAME name
The generated program inherits the contents of the Unicode flag of the
generating program.
• Saving Programs:
INSERT REPORT prog FROM itab.
This statement now includes a new addition, UNICODE ENABLING uc,
with which the Unicode flag of the inserted report is given the value of uc
Other Changes relating to Unicode (contd.)
• A syntax or runtime error occurs when you access the database with a generic key
unless the key is purely character-type. This affects the following commands:
Please note that this statement is Obsolete and should no longer be used.
Database Operations
• The fragment views of the work area and the database table must be
identical with regard to the length of the database table.
• If the work area is a single field, the field must be character-type and the
database table purely character-type
Storing Data Clusters in Database Tables
• Data clusters are not converted when they are migrated from a non-
Unicode database to a Unicode system. For this reason, there may be
ABAP cluster tables in a Unicode system that contain non-Unicode
characters. These characters are automatically converted during each
import.
• When the data is exported, any Unicode characters that may be contained in
the data objects are stored in a platform-specific format in the Unicode
system.
Structure Enhancements
• Range:
ASSIGN feld1 TO <fs> RANGE feld2.
Sets the range limits, making it possible to define addresses past field limits
• Increment:
ASSIGN field INCREMENT n TO <fs>
First the range for the access is defined from the length of field and the
INCREMENT definition of the range for the access as defined by ASSIGN
fld+n*sizeof[field] (sizeof[fld]) TO <f>.
If sy-subrc > 0 then no increment is made to the field symbol
Assignments to Field Symbols (contd.)
• Casting:
ASSIGN field TO CASTING.
ASSIGN field TO <fs> CASTING TYPE type.
ASSIGN field TO <fs> CASTING TYPE (typename).
ASSIGN field TO <fs> CASTING LIKE fld.
ASSIGN field TO <fs> CASTING DECIMALS dec.
Provides different views on a structure with casts on different types.
Treats the contents of a field as a value of another type using a field symbol
field must be at least as long as the type that was assigned to the field
symbol, <fs>
If the field symbol type is a deep structure the system checks for the offset
and reference component match of the area covered by <fs>
Includes with Group Names
Creating Data Objects Dynamically
• Data Objects:
Creating and Accessing data objects on the heap
Creating Data Objects Dynamically (contd.)
• Table Objects:
Creation of Table Objects at run time.
The line type and table key can be entered statically or dynamically
Storing Data Clusters
• Character Utilities:
Class CL_ABAP_CHAR_UTILITIES
New Classes For UNICODE (CONTD.)
• Conversion Classes:
Unicode & ABAP
• From the list of Unicode syntax errors, one can go directly to the affected
programs and remove the errors.
• Persistently traces the execution of all program objects within one system.
• Traces all processing blocks i.e. forms, methods, modules and ABAP events.
• Collects information
– number of calls
– number of runtime errors
– number of program changes
Transaction SCOV
THANK YOU