Você está na página 1de 82

Unicode Enabling of ABAP

Contents

• Unicode Overview
• What is Unicode
• Need for Unicode
• Unicode Character formats

• Unicode @ SAP
• SAP Pre-Unicode solutions
• Why SAP adopted Unicode

• Impact of Unicode on ABAP


• Overview
• Concepts and Conventions
• Restrictions in Unicode
• New ABAP Features
• Tools for Unicode enabling
Unicode Overview

What is Unicode?
What Is Unicode?

• Fundamentally, computers store letters and other characters by assigning a


number for each one.
• Unicode provides a unique number (Code point) for every character, no
matter what the platform, no matter what the program, no matter what the
language.
– Notation U+nnnn (where nnnn are hexadecimal digits)

Character encoding schema


for (nearly) all characters
used world wide
What is Unicode? (contd.)

• Unicode = universally encoded character set to store information from any


language
• The Unicode standard primarily encodes scripts rather than languages
• Scripts comprise several languages that historically share the same set of
symbols
• In many cases a script may serve to write dozens of languages (e.g. the Latin
script)
• In other cases one script complies to one language (e.g. Hangul)
• Additionally it also includes punctuation marks, diacritics, mathematical
symbols, technical symbols, musical symbols, arrows, etc.
• In all, the Unicode Standard comprises >95.000 characters, ideograph sets,
symbols (version 4.0)
What is Unicode? (contd.)

• Unicode Standard:
The Unicode Standard is a character coding system designed to support the
worldwide interchange, processing and display of written text of the diverse
languages and technical disciplines of the modern world.
In addition, it supports classical and historical texts of many written
languages.
• The Unicode Consortium:
The Consortium cooperates with
W3C
ISO
and has liaison status "C" with ISO/IEC/
JTC1/SC2/WG2, which is responsible for in refining the specification and
expanding the character set of ISO/IEC 10646
What is Unicode? (contd.)

• Unicode: The last character set?

– It is an open character set, which means that it keeps growing and adding
less frequently used characters.

– The standard assigns numbers from 0 to 0x10FFFF, which is more than a


million possible numbers for characters.

– 5% of this space is used, 5% is in preparation, about 13% is reserved


for private use, and about 2% is reserved not for use

– The remaining 75% is open for future use but not by any means expected
to be filled up and finally there is a character set with plenty of space!
Unicode Overview

Need for Unicode


Need For Unicode

• Hundreds of encodings have been developed, each for small groups of


languages and special purposes.

• There is no single, authoritative source of precise definitions of many of the


encodings and their names.

• No single encoding could contain enough characters: for example, the


European Union alone requires several different encodings to cover all its
languages.

• Even for a single language like English no single encoding was adequate for
all the letters, punctuation, and technical symbols in common use.

• Incompatibilities between different code pages


Need For Unicode (contd.)

• These encoding systems conflict with one another. That is, two encodings
can use the same number for two different characters, or use different
numbers for the same character.

• Any given computer (especially servers) needs to support many different


encodings; yet whenever data is passed between different encodings or
platforms, that data always runs the risk of corruption.

• Programs are written to either handle one single encoding at a time and
switch between them, or to convert between external and internal encodings
Need For Unicode (contd.)

• Languages & Code Pages


Unicode, One Code Page For All Scripts
Unicode Overview

Unicode Character Formats


Unicode Character Formats

Representation of Unicode characters:


 UTF-16: Unicode Transformation Format, 16 bit Encoding
- Fixed Length, 1character = 2 bytes
- Platform dependent BYTE order
- 2 BYTE alignment restriction
 UTF-8: Unicode Transformation Format, 8 bit Encoding
- Variable length, 1character = 1 to 4 BYTES
- Platform Independent
- No alignment restriction
- 7 BIT US ASCII compatible
Unicode Character Formats (contd.)

 UTF-32: Unicode Transformation Format, 32 bit Encoding


• For single characters, 32-bit integer variables are most appropriate for the
value range of Unicode.

• For strings, however, storing 32 bits for each character takes up too much
space, especially considering that the highest value 0x10FFFF, takes up only
21 bits. 11 bits are always unused in a 32-bit word storing a Unicode code
point.

• Therefore, you will find that software generally uses 16-bit or 8-bit units as a
compromise, with a variable number of code units per Unicode code point.
Unicode Character Formats (contd.)
Unicode @ SAP

SAP Pre-Unicode solutions


SAP Pre-Unicode Solutions

• Single Code Page System


System using one standard code page which can support a specific set
of languages.

• Blended Code Page System (Release 3.0D)


Multi byte blended code pages, which contain characters out of several
standard code pages. Blended code pages are not standard code pages, but
SAP-customized code pages that were devised to support an increased
number of possible language combinations in a single
code page.
a) Ambiguous Blended Code Page System: Two characters can share
the same code point.
b) Unambiguous Blended Code Page System: Each code point refers
exactly to one character.
SAP Pre-Unicode Solutions (contd.)

• MDMP System Configuration (Release 3.1I)


Multiple Display/Multiple Processing.
System using more than one system code page on the application
server.
Allows languages to be used together in one system although the
characters of those languages are not in the same code page.
SAP Pre-Unicode Solutions (contd.)

• Language Combinations Before Unicode

• Each user can only access one code page at a time: a user who logs in as a
Japanese user cannot enter German characters ,and all German characters in
the database will not be correctly displayed.
SAP Pre-Unicode Solutions (contd.)

• It is possible for a user to log on with German and then manipulate the character
set and font settings so that he can enter what appear to be Japanese characters;
these characters will not be correctly stored in the database and this data will be
corrupt
• If a user wants to enter f.i. Japanese, he/she must log on in Japanese
• To insure that no data corruption occurs, the following restrictions must be
followed:
Global data must contain only 7-bit ASCII characters, which are in all
code pages
Users may use only the characters of their log-in language or 7-bit ASCII
Batch processes must be assigned with the correct user ID and language
EBCDIC code pages are not supported
View in Different Code Pages
Recommendations From SAP (Pre-Unicode)

• In general, using a single standard code page for new installations and
upgrades is the optimal decision
• If additional languages or language combinations are needed, SAP
recommends Unambiguous Blended Code Pages for new installations and
MDMP for existing installations.
• Unambiguous Blended Code Pages only support certain language
combinations and therefore an MDMP setup may be the only possibility for
new installations as well.
Unicode @ SAP

Why SAP adopted Unicode?


Why SAP adopted Unicode?

• Globalization = Internationalization + Localization

• The Unicode Standard has already been adopted by industry leaders as


Apple, HP, IBM, JustSystem, Microsoft, Oracle, Sun, Sybase, Unisys and
many others.

• Unicode is required by modern standards such as XML, Java, ECMAScript


(JavaScript), LDAP, CORBA 3.0, WML, etc.,

• It is the official way to implement ISO/IEC 10646.


Why SAP adopted Unicode? (contd.)

• Allows text data from different languages to be stored in one repository

• Enable a single set of source code to be written to process data in virtually all
languages

• Simplifies addition of new language support to an e-business application


since character processing and storage remains unchanged

• Lowers cost of implementation

• Faster speed to market

• Better customer satisfaction


Why SAP adopted Unicode? (contd.)
View In UNICODE System
Unicode - SAP
Platforms supported by SAP for Unicode systems

• The following indicates the current development status and availability


different OS and Database combinations for Unicode-based mySAP
technical components:
Unicode & ABAP

Overview
Overview

• Non-Unicode Versions of SAP :- Versions prior to 4.7

• Unicode Versions of SAP :- Version 4.7 & above

• Each character mapped using 16 bits (= 2 bytes) which offers a maximum of 216
bit combinations

• Affects any older ABAP program in which an explicit or implicit assumption is


made about the internal length of a character
Overview (contd.)

Character Expansion Model


- Separate Unicode & Non-Unicode versions of R/3
- No explicit Unicode Data type in ABAP
- Single ABAP source for Unicode & Non-Unicode systems
OVERVIEW (contd.)

• Program attributes ‘Unicode checks’ Non-Unicode Unicode


active
System System
- Required to run on a Unicode System
• If attribute is set additional
restrictions: Attribute Set Ok Ok
(Unicode
- Apply at run time and compile time
enabled)
- Apply in Unicode as well as Non-
Unicode System.
- Ensure that program will run on both Attribute not Ok Not allowed
US & NUS with identical behavior Set (not
Unicode
enabled)
Overview (contd.)

Screen Shot of Program Attribute Screen


Unicode & ABAP

Concepts & Conventions


Concepts & Conventions

• Data Types
• Data Layout of Structures
• Unicode Fragment Views
Data Types

 The following data types can be interpreted as Character Type in a


Unicode program
 C: Character
 N: Numeric character
 D: Date
 T: Time
 STRING: String
 Character-type structures: Structures which either directly or in
substructures contain only fields of types C, N, D or T.

 Byte Type : X, Xstring (for bit operations)


Data Layout of Structures

• For several data types (like F and I) the memory address should start with
multiples of 4 or 8 and for Character-type it should start with multiples 2 or 4
depending on the Unicode representation.

• Within structures, bytes are inserted before or after components with


alignment requirements to achieve the necessary alignment.
• Examples:
BEGIN OF struc1,
a(1) TYPE X,
b(1) TYPE X,
c(6) TYPE C,
END OF struc1.

For struc1 there is no alignment gap for NUS or US.


Data layout of structures (Contd..)

• Examples:
BEGIN OF struc2,
a(1) TYPE X,
BEGIN OF struc3,
b(1) TYPE X,
c(6) TYPE C,
END OF struc3,
d TYPE I,
END OF struc2.
Unicode Fragment View

• BEGIN OF struc, Unicode Fragment Views:


a(2) TYPE C, F1, F2, F3, F4,F5, F6
b(4) TYPE N,
c TYPE D,
d TYPE T,
e TYPE F,
f(2) TYPE X,
g(4) TYPE X,
h(8) TYPE C,
i(8) TYPE C,
END OF struc.
Unicode & ABAP

Restrictions in Unicode
Overview

• Access Using Offset & Length Addressing

• Assignments

• Casting Data Objects

• Processing Strings in UNICODE

• Determining Length & Distance

• Specifying Key for Table Access


Access Using Offset & Length Addressing

• Accessing Single fields:


- Offset or Length based access supported for single fields of Character
Type, strings, X, Xstrings
• Accessing Structure:
- This access type results in errors if both character type and non-
character type fields are present in the area addressed by the offset and the
length
• Passing parameters to subroutines:
- Passing parameters using perform with offset and length specification
beyond field boundaries is not allowed in Unicode programming
Access Using Offset & Length Addressing (contd.)

• Accessing Field Symbols:


- Offset- or length-based access with ASSIGN is only permitted within a
predefined range
- ASSIGN feld [+off(len)]TO <f>. : Range corresponds to the field
boundaries in case of elementary fields or, in case of flat structures, to the
purely character-type starting fragment
- ASSIGN <g>[+off(len)] TO <f>. : Range of the target FS = Range of the
Source FS
Assignments

• Conversion between Flat Structures (MOVE):


- The fragments of both structures up to the second-last fragment of the
shorter structure are identical
- The last fragment of the shorter structure must be of character or
byte-type group
- The corresponding fragment of the longer structure is a character or
byte type group with a greater length
Assignments (contd.)

• Conversion between internal tables


- Tables can be converted if their row types are convertible. The restrictions
described for structures also apply for the conversion of tables.
• Implicit Conversions:
- The same rules also apply to all ABAP statements that use implicit
conversions according to the MOVE semantics.
APPEND wa TO itab.
INSERT wa INTO itab.
MODIFY itab FROM wa.
READ TABLE itab ...INTO wa.
LOOP AT itab INTO wa.
Processing Strings

• Statements for processing strings:


CLEAR ... WITH
CONCATENATE
CONDENSE
CONVERT TEXT ... INTO SORTABLE CODE
FIND
OVERLAY
REPLACE
SEARCH
SHIFT
SPLIT
TRANSLATE ... TO UPPER/LOWER CASE
TRANSLATE ... USING
• FROM CODEPAGE and FROM NUMBER FORMAT are not allowed with TRANSLATE
• The arguments must be single fields of type C, N, D, T, STRING or purely
character-type structures
• CONCATENATE a x b INTO c statement is not possible if a, b, and c are character-
type but x has type X.
Processing Strings (contd.)

• Comparison operators for processing byte and character strings:

- Operators require single fields of type C, N, D, T, STRING or purely


character-type structures as arguments

• CO BYTE-CO
CN BYTE-CN
CA BYTE-CA
NA BYTE-NA
CS BYTE-CS
NS BYTE-NS
CP BYTE-CP
NP BYTE-NP
Processing Strings (contd.)

• Functions for processing byte and character strings:

STRLEN – only for character-type fields and returns the length in characters

XSTRLEN – for finding length of byte strings

NUMOFCHAR - returns the number of characters in a character-type field


Processing Strings (contd.)

• Output in fields and lists:


If the source field is a flat structure in a WRITE statement, it must have
character type only, in a Unicode system. This affects the following
statements:
WRITE f.
WRITE f TO g[+off][(len)]..
WRITE (name) TO g..
WRITE f TO itab[+off][(len)] INDEX idx..
WRITE (name) TO itab[+off][(len)] INDEX idx..
Establishing Length & Distance

• NUS – DESCRIBE DISTANCE BETWEEN…..


US - Addition of IN BYTE MODE / IN CHARACTER MODE

• NUS – DESCRIBE FIELD….. LENGTH


US - Addition of IN BYTE MODE / IN CHARACTER MODE
ABAP File Interface

• Opening Files:
The OPEN DATASET dsn ... statement must include at least one of the
following additions:

... IN TEXT MODE ENCODING ... (UTF-8)

... IN BINARY MODE ... (UTF-8 )

... IN LEGACY TEXT MODE ...(Non Unicode Format)

... IN LEGACY BINARY MODE ... (Non Unicode Format)


ABAP File Interface (contd.)

• Reading & Writing Files:

READ DATASET dsn INTO f : For Reading File


TRANSFER f TO dsn : For Writing into File

• If the file is opened in TEXT MODE, f must be a character-type field ( C, N, D,


or T), a string, or a purely character-type structure

• If the file is opened in LEGACY TEXT MODE, or LEGACY BINARY MODE,


conversion errors may occur
Other Changes relating to Unicode

• Bit Statements:
SET BIT i OF f [TO g].
GET BIT i OF f [INTO g].
f O x, f Z x, and f M x .

The field f must be either X or XSTRING in all the above bit operations.
Other Changes relating to Unicode (contd.)

• ADD n1 THEN n2 UNTIL nz [ ACCORDING TO sel ] GIVING m [RANGE


str ].
ADD n1 THEN n2 UNTIL nz TO m [ RANGE str ].
• Operands n1, n2, and nz must be of the same type and length.
• The distance between nz and n1 must be an integral multiple of the distance
between n2 and n1.
• The fields n1, n2, and nz must be in one structure. If this is not statically
recognizable, you must use the RANGE str addition to explicitly specify a
structure as a valid area.
• If it is discovered that the addressed fields are not within the structure
specified using the RANGE addition, a syntax or runtime error occurs.
Other Changes relating to Unicode (contd.)

• Loops with VARY & varying Additions:


DO ... VARYING f FROM f1 NEXT f2 [ RANGE f3 ].
WHILE ... VARY f FROM f1 NEXT f2 [ RANGE f3 ].
• The fields f, f1, and f2 must be type-compatible with one another
• RANGE for valid accesses must be implicitly or explicitly implemented
Other Changes relating to Unicode (contd.)

• Generating Subroutine:
GENERATE SUBROUTINE POOL itab NAME name
The generated program inherits the contents of the Unicode flag of the
generating program.

• Saving Programs:
INSERT REPORT prog FROM itab.
This statement now includes a new addition, UNICODE ENABLING uc,
with which the Unicode flag of the inserted report is given the value of uc
Other Changes relating to Unicode (contd.)

• Types and GET/SET PARAMETER:


GET PARAMETER ID pid FIELD f
SET PARAMETER ID pid FIELD f
The field f must be of character type. For non-character type we use
IMPORT & EXPORT statements.
Specifying the Key for Tables Accesses

• A syntax or runtime error occurs when you access the database with a generic key
unless the key is purely character-type. This affects the following commands:

READ TABLE dbtab ...SEARCH GKEQ ...


READ TABLE dbtab ...SEARCH GKGE ...
LOOP AT dbtab ...
REFRESH itab FROM TABLE dbtab.

Please note that this statement is Obsolete and should no longer be used.
Database Operations

SELECT * FROM dbtab ... INTO wa / INTO TABLE itab ...


FETCH NEXT CURSOR c ... INTO wa / INTO TABLE itab.
INSERT INTO dbtab ... FROM wa / FROM TABLE itab.
UPDATE dbtab ... FROM wa / FROM TABLE itab.
MODIFY dbtab ... FROM wa / FROM TABLE itab.
DELETE dbtab FROM wa / FROM TABLE itab.

• The fragment views of the work area and the database table must be
identical with regard to the length of the database table.

• If the work area is a single field, the field must be character-type and the
database table purely character-type
Storing Data Clusters in Database Tables

• Data clusters are not converted when they are migrated from a non-
Unicode database to a Unicode system. For this reason, there may be
ABAP cluster tables in a Unicode system that contain non-Unicode
characters. These characters are automatically converted during each
import.
• When the data is exported, any Unicode characters that may be contained in
the data objects are stored in a platform-specific format in the Unicode
system.
Structure Enhancements

• Problems caused by structure enhancements:


Enhancements change the fragment views and hence affect the checks for
assignments and comparisons
• Enhancement classification in ABAP Dictionary:
Structure Enhancements (contd.)

• Enhancement handling in program check:


Unicode & RFC
Unicode & ABAP

New ABAP Features


Assignments to Field Symbols

• Range:
ASSIGN feld1 TO <fs> RANGE feld2.
Sets the range limits, making it possible to define addresses past field limits
• Increment:
ASSIGN field INCREMENT n TO <fs>
First the range for the access is defined from the length of field and the
INCREMENT definition of the range for the access as defined by ASSIGN
fld+n*sizeof[field] (sizeof[fld]) TO <f>.
If sy-subrc > 0 then no increment is made to the field symbol
Assignments to Field Symbols (contd.)

• Casting:
ASSIGN field TO CASTING.
ASSIGN field TO <fs> CASTING TYPE type.
ASSIGN field TO <fs> CASTING TYPE (typename).
ASSIGN field TO <fs> CASTING LIKE fld.
ASSIGN field TO <fs> CASTING DECIMALS dec.
Provides different views on a structure with casts on different types.
Treats the contents of a field as a value of another type using a field symbol
field must be at least as long as the type that was assigned to the field
symbol, <fs>
If the field symbol type is a deep structure the system checks for the offset
and reference component match of the area covered by <fs>
Includes with Group Names
Creating Data Objects Dynamically

• Data Objects:
Creating and Accessing data objects on the heap
Creating Data Objects Dynamically (contd.)

• Table Objects:
Creation of Table Objects at run time.

CREATE DATA dref (TYPE [STANDARD|SORTED|HASHED] TABLE OF


(LineType | (Name) | REF TO DATA | REF TO Obj))
| (LIKE [STANDARD | SORTED | HASHED] TABLE OF LineObj )[ WITH
(UNIQUE|NON-UNIQUE) ( KEY (K1 ... Kn | (KEYTAB) | TABLE_LINE )|
DEFAULT KEY ) ][ INITIAL SIZE M ]

The line type and table key can be entered statically or dynamically
Storing Data Clusters

• Using fields of type XSTRING as data containers

Writing data to an XSTRING (EXPORT):


Storing Data Clusters (contd.)

• Reading Data from an XSTRING (IMPORT):

• Automatic Conversion of data during IMPORT.


Generic Types for Field Symbols & Parameters
New Classes For UNICODE

• Character Utilities:
Class CL_ABAP_CHAR_UTILITIES
New Classes For UNICODE (CONTD.)

• Conversion Classes:
Unicode & ABAP

Tools for Unicode Enabling


ABAP Unicode Scan Tool UCCHECK

• Transaction UCCHECK is used to examine a Unicode program set for


syntax errors without having to set the program attribute "Unicode checks
active" for every individual program.

• From the list of Unicode syntax errors, one can go directly to the affected
programs and remove the errors.

• It is also possible to automatically create transport requests and set the


Unicode program attribute for a program set.
Transaction UCCHECK
ABAP Coverage Analyzer

• Coverage Analyzer (transaction SCOV):

• Persistently traces the execution of all program objects within one system.

• Traces all processing blocks i.e. forms, methods, modules and ABAP events.

• Collects information
– number of calls
– number of runtime errors
– number of program changes
Transaction SCOV
THANK YOU

Você também pode gostar