Você está na página 1de 10

J.

Kelley, Migrating to Unicode, Part II Volume 14, Number 5—May 2010 (Special Issue)

Migrating to
Unicode, Part II
By Josh Kelley

Versions: C++Builder 2010, 2009

P
art I introduced Unicode and covered the vari-
ous options for working with Unicode text in C,
C++, the Windows API, and the VCL. Now, in
Part II, I will specifically discuss how to migrate a
C++Builder application to Unicode. Unicode support. And, even a completely migrated
application may still need to deal with ANSI or UTF-8
when interfacing with legacy file formats or APIs, or
Migrating C++Builder when reading or writing data from or to the disk or
applications to Unicode network.
There are two basic approaches to handling Unicode Regardless of which approach you choose, there
issues when migrating a pre-2009 C++Builder applica- are a number of C and C++ techniques that can be
tion to C++Builder 2009 and above. used to help with a Unicode migration:

1. You can do a complete Unicode migration. Use The Windows API includes functions for convert-
the Windows Unicode APIs instead of ANSI APIs, ing between ANSI and Unicode, and the VCL
replace char with wchar_t, replace std::string provides conversion constructors to easily convert
with std::wstring, and use UnicodeString in- between AnsiString and UnicodeString values.
stead of AnsiString. Depending on how your C The standard Windows header tchar.h includes
and C++ code is written, this could be a major macros designed to let you write code that com-
undertaking. piles as either ANSI or Unicode. This can help
2. You can leave your application using ANSI and when converting code one portion at a time.
convert to Unicode only where it‘s absolutely ne- C++-specific typedefs, and the use of C++ fea-
cessary. Because only the VCL portion of tures such as function overloading, can fill in the
C++Builder requires the use of Unicode, your C gaps left by tchar.h in writing code that compiles
and C++ string manipulation and Windows API as either ANSI or Unicode.
interaction can continue to use char and
std::string. Even the portions of your code that
The C++ concept known as ―shims‖ (as described
interact with the VCL can often get away with in Matthew Wilson‘s Imperfect C++ [1] and as used
continuing to use ANSI strings, thanks to the im- in the STLsoft library [2]), combined with the use
plicit conversions between AnsiString and Un- of C++ templates, can make it simple to write ge-
icodeString that the C++ VCL provides (de-
neric code that works with both AnsiString and
UnicodeString (and C-style strings, and
scribed in more detail below).
std::string, and anything else you care to sup-
Of course, these two approaches aren‘t mutually ex- port).
clusive. You can do an initial, ―quick-and-dirty‖ mi- The use of variadic functions such as printf()
gration using the minimal approach, and then gradu- and sprint() presents a special challenge for mi-
ally implement the complete approach for the por- grating to Unicode, since the compiler is unable to
tions of your application that would most benefit from catch ANSI-versus-Unicode issues with these.

C++Builder Developer’s Journal 19 www.bcbjournal.com


Volume 14, Number 5—May 2010 (Special Issue) J. Kelley, Migrating to Unicode, Part II

Standalone scripts can be used to transform these Listing 1: Converting to and from Unicode
variadic function calls into a format that the com-
// Sample C++ functions for doing
piler can check and then revert them to their nor-
// Unicode<->ANSI conversions using the
mal format after all issues are addressed. // Windows API. Note the use of
// boost::scoped_array to dynamically
// allocate memory and automatically clean
Converting text to and from Unicode // it up once we‟re done.
Before any migration can proceed, you need to know
std::wstring AnsiToUnicode(const char *s)
how to convert between the various Unicode encod- {
ings and the various ANSI encodings. The two easiest DWORD size = MultiByteToWideChar(CP_ACP,
ways are using the Windows API and using the VCL. 0, s, -1, NULL, 0);
The relevant Windows API functions are Wide- if (size == 0) {
return std::wstring();
CharToMultiByte() [3], which, despite its name, }
converts from UTF-16 to the encoding of your choice boost::scoped_array<wchar_t> buffer(
(UTF-8 or any of the various ANSI encodings); and new wchar_t[size]);
MultiByteToWideChar(CP_ACP, 0, s, -1,
MultiByteToWideChar() [4], which converts from the buffer.get(), size);
encoding of your choice to UTF-16. MSDN has full return std::wstring(buffer.get());
documentation on using these functions. }
Converting using the VCL is even easier. The VCL
std::string UnicodeToAnsi(const wchar_t *s)
provides C++ conversion constructors – constructors {
that can be called with only a single argument – so DWORD size = WideCharToMultiByte(CP_ACP,
that you can construct a UnicodeString from an An- 0, s, -1, NULL, 0, NULL, NULL);
if (size == 0) {
siString or UTF8String, or vice versa. Because C++ return std::string();
conversion constructors are implicitly invoked as }
needed, this also lets you provide an AnsiString boost::scoped_array<char> buffer(
wherever a UnicodeString is needed, or vice versa. new char[size]);
WideCharToMultiByte(CP_ACP, 0, s, -1,
(For example, this lets you assign a UnicodeString to buffer.get(), size, NULL, NULL);
an AnsiString.) See Listing 1 for example code. return std::string(buffer.get());
The ease with which conversions can be done in }
the VCL can have drawbacks. Because the assignment __fastcall TForm1::TForm1(TComponent* Owner)
operators look just like regular assignment and the : TForm(Owner)
conversion constructors can be implicitly invoked, {
your code may be converting between ANSI and UTF- // Implicit Unicode-to-ANSI conversion:
AnsiString s1 = L"Hello, world!";
16 without your even being aware of it. This can add // Implicit ANSI-to-Unicode conversion:
runtime overhead, but more importantly, it can result UnicodeString s2 = "Hello, world!";
in loss of data when converting from UTF-16 to an
// This works without modification in
ANSI encoding that cannot represent all of the Un- // C++Builder 2009, even though Caption
icode characters. Delphi includes a compiler warning // is Unicode and s1 is ANSI.
when this happens (―W1058: Implicit string cast with Label1->Caption = s1;
potential data loss from ‗string‘ to ‗AnsiString.‘‖), but // An implicit conversion from Unicode to
// ANSI. NOTE: This could lose data.
C++Builder will silently accept it. s1 = Label2->Caption;
Ideally there would be an option to have the // Identical to the above, but explicit.
C++Builder compiler emit a warning any time these s1 = AnsiString(Label2->Caption);
ANSI-Unicode conversions are implicitly invoked, but // We can also use a temporary AnsiString
as far as I can tell, no such option exists. If having // or UnicodeString to do Unicode<->ANSI
these functions implicitly invoked is a concern for // conversions.
you, then the only solution is to modify C++Builder‘s MessageBoxA(Handle,
AnsiString(Label1->Caption).c_str(),
header files. "Demo", MB_OK);
To do this, open the file ―include\vcl\dstring.h‖ }
and find the following lines:

ISSN 1093-2097 20 C++Builder Developer’s Journal


J. Kelley, Migrating to Unicode, Part II Volume 14, Number 5—May 2010 (Special Issue)

__fastcall AnsiStringT( migration in your previous version of C++Builder


const WideString &src) :
without breaking compilation.
AnsiStringBase(src, CP){}
__fastcall AnsiStringT( If portions of your C++ code base are cross-
const UnicodeString &src) : platform, you may want those portions to use wide
AnsiStringBase(src, CP){} characters (UTF-16) on Windows but narrow charac-
ters (ANSI or UTF-8) on other platforms.
Change them to the following: Windows provides the tchar.h header file to help
explicit __fastcall AnsiStringT( with this. Depending on whether the _UNICODE pre-
const WideString &src) : processor macro is defined (which, as discussed earli-
AnsiStringBase(src, CP){} er, is controlled by C++Builder‘s ―_TCHAR maps to‖
explicit __fastcall AnsiStringT(
option), tchar.h defines the following macros:
const UnicodeString &src) :
AnsiStringBase(src, CP){}
TCHAR, which is defined as char for non-Unicode
This will let you explicitly do a Unicode-to-ANSI con- builds and wchar_t for Unicode builds
version when needed without the possibility of losing _T, which is removed by the preprocessor for non-
data in implicit conversions; see the explicit conver- Unicode builds and is defined as L for Unicode
sion example in Listing 1. builds. This means that you can write _T("Hello,
If you want to disable implicit ANSI-to-Unicode world!") and have the preprocessor convert it to
conversions (which cannot result in data loss but can a char literal ("Hello, world!") or wchar_t lit-
be a performance loss), open the file ―in- eral (L"Hello, world!") as appropriate.
clude\vcl\ustring.h‖ and find the following lines:
_tcscat(), _tcscpy(), _tcscmp(), etc., which
template <unsigned short codePage> are defined as strcat(), strcpy(), strcmp(),
__fastcall UnicodeString( etc. for non-Unicode builds and wcscat(), wcsc-
const AnsiStringT<codePage> &src) : py(), wcscmp(), etc. for Unicode builds
Data(0)
{ _tprintf(), _tscanf(), _stprintf(), etc.,
Strhlpr::UnicodeFromAnsi(*this, which are defined as printf(), scanf(),
*PrawByteString(&src));
} printf(), etc. for non-Unicode builds and
wprintf(), wscanf(), swprintf(), etc. for Un-
Change them to the following: icode builds

template <unsigned short codePage> _fgettc(), _fgetts(), _fputtc(), etc., which


explicit __fastcall UnicodeString( are defined as fgetc(), fgets(), fputc(), etc. for
const AnsiStringT<codePage> &src) : non-Unicode builds and fgetwc(), fgetws(),
Data(0)
{ fputwc(), etc. for Unicode builds
Strhlpr::UnicodeFromAnsi(*this, _ttoi(), _ttof(), _tcstol(), etc., which are de-
*PrawByteString(&src));
} fined as atoi, atof(), strtol(), etc. for non-
Unicode builds and _wtoi(), _wtof(), wcstol(),
Note that modifying system headers like this may etc. for Unicode builds.
cause problems when installing RAD Studio updates.
Macros are also provided for file- and directory-
See [5] for details.
manipulation functions (so that you can manipulate
ANSI or Unicode filenames as appropriate). See in-
Using tchar.h clude\tchar.h for a complete list of available macros.
It can be valuable to write character-width-agnostic Using these macros is very simple:
code that can compile as ANSI or Unicode.
If you‘re planning on a complete Unicode migra- TCHAR buffer[100];
tion as part of an upgrade to C++Builder 2009 or 2010, _tcscpy(buffer, _T("Hello"));
_tcscat(buffer, _T(" world!"));
character-width-agnostic code lets you prepare for the

C++Builder Developer’s Journal 21 www.bcbjournal.com


Volume 14, Number 5—May 2010 (Special Issue) J. Kelley, Migrating to Unicode, Part II

const wchar_t *src)


Using C++ typedefs {
Windows‘ tchar.h provides mappings only for C func- return wcscpy(dest, src);
tions and types, but it‘s trivial to extend the same con- }
cept to C++:
Include this function throughout your project, and
#ifndef _UNICODE repeat as needed for other functions. The overloads
typedef std::string tstring; that you define may be able to call a wide or narrow
typedef std::fstream tfstream; character equivalent (as this example strcpy() over-
typedef boost::format tformat;
static std::istream& tcin(std::cin); load does with wcscpy()), or they may need to do a
static std::ostream& tcout(std::cout); Unicode-to-ANSI conversion themselves for APIs that
#else have no Unicode support.
typedef std::wstring tstring;
Overloading other libraries‘ functions in this
typedef std::wfstream tfstream;
typedef boost::wformat tformat; manner can hamper the readability of your code. This
static std::wistream& tcin(std::wcin); technique should probably be viewed as a quick and
static std::wostream& tcout(std::wcout); temporary way of getting up and running in
#endif
C++Builder 2009 and 2010 rather than as a long-term
solution.
Extend as needed by creating typedefs for the Stan-
dard Library and Boost classes and references for the
variables (such as cout and cin) which are actually Special techniques for
used in your code.
migrating to Unicode
Using ANSI and Unicode variants of Using string shims
the Windows API The techniques discussed so far for calling the right
functions with the right kind of text data all have var-
As already discussed, Windows provides ANSI and
ious shortcomings. As an example, consider how to
Unicode variants of its API. For example, Message-
most easily migrate ANSI code that calls TApplica-
Box() is actually a macro that‘s defined as Message-
tion::MessageBox() for user feedback.
BoxA() (ANSI) or MessageBoxW() (Unicode), depend-
ing on your project settings. As part of your Unicode The VCL offers convenient conversions between
migration, you can explicitly call the ANSI or Unicode ANSI and Unicode, but having to use these at
variant from different parts of your code, as different every call site can be tedious (especially if adding
parts are migrated. these conversions is a requirement for the initial
migration to C++Builder 2009 or 2010) and can
Using C++ overloads hamper readability. TApplication::Message-
Using tchar.h and the Windows API ANSI and Un- Box() takes wchar_t* parameters instead of Un-
icode variants, you can convert much of your existing icodeStrings, so it cannot benefit from implicit
code to be character-width-agnostic using only some VCL ANSI-to-Unicode conversions. Neither can
search-and-replace operations in your IDE editor or in various non-VCL C and C++ APIs.
a script. If you want to avoid doing even this much, or Using macros, as tchar.h does, lets you select be-
if you have some third-party APIs that don‘t support tween ANSI and Unicode at compile time, but it
both ANSI and Unicode, you can use C++ overload- doesn‘t let you use both within a single build, and
ing to create your own functions that support both it requires that you clutter your source code‘s na-
ANSI and Unicode. (Keep in mind that tchar.h was mespace with extra macros.
designed for C programs; as a C++ developer, you Creating C++ overloads for ANSI and Unicode
have more powerful tools available.) variations is perhaps the easiest from the caller‘s
For example, instead of replacing strcpy() with perspective, but it requires that you create sepa-
_tcscpy(), you could define a new function: rate functions for every argument combination.
For example, TApplication::MessageBox()
inline wchar_t *strcpy(wchar_t *dest,

ISSN 1093-2097 22 C++Builder Developer’s Journal


J. Kelley, Migrating to Unicode, Part II Volume 14, Number 5—May 2010 (Special Issue)

takes two arguments (a message and a caption), inline const wchar_t *c_str_ptr_w(
and so a TApplication::MessageBox-style func- const UnicodeString& s)
{
tion would need four overloads (ANSI message return s.c_str();
and caption; Unicode message and caption; ANSI }
message and Unicode caption; Unicode message
inline const wchar_t *c_str_ptr_w(
and ANSI caption). const std::wstring& s)
{
And this is only for a bare bones TApplica- return s.c_str();
tion::MessageBox-style function. Most other VCL }
functions take Strings, not wchar_t*, as parameters;
it would be convenient if we had overloads to do the So far we‘re following the practice described by Mat-
same for our hypothetical MessageBox() replacement, thew Wilson in [6] and [1] and implemented in the
but that adds even more overloads. It would be even STLSoft library [2]. We have a replacement for TAp-
more convenient if we could also support C++ Stan- plication::MessageBox() that we can switch to
dard Library types like std::string or COM-related with a simple search-and-replace (just replace ―Appli-
types like WideString or BSTR. The number of over- cation->MessageBox‖ with ―AppMessageBox‖) and
loads to require all of these combinations of parame- that take any of several types of Unicode arguments
ters for even a single function quickly becomes prohi- without excessive overloads or extra function calls.
bitive. Clearly, a better approach is needed. For the purpose of quickly migrating to Unicode,
The concept of shims, as promoted by C++ author however, it‘s useful to have a TApplica-
and developer Matthew Wilson, offers a solution. tion::MessageBox() replacement that can also take
Shims ―are small, lightweight (in most all cases hav- ANSI arguments. In his article on shims and in his
ing zero runtime cost) components that help types ‗fit work on the STLSoft library, Matthew Wilson explicit-
or align‘ into algorithms and client code‖ [6]. For ex- ly avoids providing shims that convert between ANSI
ample, suppose you had a function that, if given any and Unicode, since those introduce (in his words)
string-like object, gave you a pointer to a C-style ―semi-implicit‖ conversion operations that introduce
string. (Since this function gets a pointer to a wide C- a performance penalty and violate the expectation
style string, and following the convention of Matthew that shims be lightweight. However, as part of a
Wilson‘s STLSoft library, we‘ll call this function C++Builder 2009 or 2010 Unicode migration, it‘s more
c_str_ptr_w().) Then you could write the following useful to accept a (possibly negligible) performance
TApplication::MessageBox() replacement: penalty in order to complete the initial migration as
soon as possible, then address performance and
template <typename T1, typename T2> ―proper‖ Unicode handling as needed.
int AppMessageBox(const T1& Text, Therefore, we need to provide c_str_ptr_w
const T2& Caption, int Flags = MB_OK)
{ shims that take ANSI arguments (const char *, An-
return Application->MessageBox( siString, and UnicodeString). This is harder than
c_str_ptr_w(Text), the previous cases. Our code will have to take the fol-
c_str_ptr_w(Caption), Flags); lowing approach:
}
We need to somehow provide a const wchar_t
Now we simply need to make sure that pointer. We can‘t simply return a const
c_str_ptr_w() results in a valid argument for every wchar_t* from c_str_ptr_w(), because we
parameter type that we use for AppMessageBox(). need to allocate memory to store the results of
The first few parameter types are easy: the ANSI-to-Unicode conversion, and returning
a raw pointer to that allocated memory would
inline const wchar_t *c_str_ptr_w(
const wchar_t *s) constitute a memory leak.
{ We can, however, define a class that contains the
return s;
}
allocated memory and return a copy of (not a ref-
erence to nor a pointer to) that class. The C++
language guarantees that it will properly clean

C++Builder Developer’s Journal 23 www.bcbjournal.com


Volume 14, Number 5—May 2010 (Special Issue) J. Kelley, Migrating to Unicode, Part II

up this temporary class instance in that case, and UnicodeString s;


as long as we make everything inline, a good // Dubious code; it pushes a copy of a
// c_str_ptr_w_proxy on the stack instead
compiler will avoid the overhead of actually // of calling the conversion operator.
making the spurious copies of the class instance. s.sprintf(“%s”,
c_str_ptr_w(some_ansi_string));
Next, we define a C++ conversion operator that lets
the class be implicitly converted to const
As it so happens, because of how c_str_ptr_w_proxy
wchar_t*.
and UnicodeString are implemented, pushing a copy
The final code looks like this: of a c_str_ptr_w_proxy on the stack behaves identi-
cally to calling the conversion operator, but it‘s a very
class c_str_ptr_w_string_proxy bad idea to rely on implementation details like this.
{ To avoid this problem:
public:
explicit c_str_ptr_w_string_proxy( 1. Explicitly invoke the conversion operator when
const char *s)
using a shim with a variadic function.
: mString(s) {}
// The above line does the actual ANSI- 2. In C++Builder, turn on warning 8074 (―Structure
// to-Unicode conversion, using the VCL. passed by value‖) under Project, under Options,
// This is the C++ conversion operator:
operator const wchar_t * () const under C++ Compiler, under Warnings, to catch
{ any dubious function calls such as this.
return mString.c_str();
} A complete set of appropriate shims functions is pro-
vided in the source code that accompanies this article.
// This is the buffer to store the
// conversion:
UnicodeString mString; Handling variadic functions
};
So now you‘re almost done. You‘ve planned whether
inline c_str_ptr_w_string_proxy to do a complete or minimal migration. You‘ve ap-
c_str_ptr_w(const char *s)
plied techniques such as tchar.h, C++ overloading,
{
return c_str_ptr_w_string_proxy(s); and string shims to ease the migration. You‘re using
} RawByteString and UTF8String where needed for
dealing with external APIs or data. Everything com-
inline c_str_ptr_w_string_proxy
c_str_ptr_w(const std::string& s) piles without warnings, and no immediate problems
{ show when you run your application.
return Unfortunately, one potentially major problem re-
c_str_ptr_w_string_proxy(s.c_str());
mains: variadic functions, primarily the printf() and
}
scanf() family of functions (including the printf()
There are two caveats with using a shim of this sort: family of methods on the VCL‘s String classes). Be-
First, because the wchar_t pointer that‘s returned is to cause these functions take variable arguments, the
a temporary object, code such as the following results compiler does no checking on their arguments. The
in undefined behavior: lack of type safety with these functions can be a recur-
ring problem in C and C++ development but is a par-
void DoSomething(const AnsiString& s) ticular risk during a Unicode migration. For example,
{ the following code will fail at runtime following a mi-
const wchar_t *msg = c_str_ptr_w(s); gration to C++Builder 2009 or 2010:
// Invalid! msg‟s contents are undefined
// at this point.
} fprintf(logfile, “%s: Startup\n”,
Application->Title.c_str());
Second, the conversion operator is only invoked when
the C++ compiler knows that it needs to be invoked.
There are several approaches for providing type safe-
In particular, the compiler does not know to invoke it
ty for variadic functions:
when it‘s used as part of a variadic function:

ISSN 1093-2097 24 C++Builder Developer’s Journal


J. Kelley, Migrating to Unicode, Part II Volume 14, Number 5—May 2010 (Special Issue)

Some compilers (such as GCC) analyze printf()


static void printf_2(const char* format,
and scanf() format strings at compile time and
int a0, unsigned int a1, char a2) {
check arguments‘ types against the types speci- printf("%d %u %hhd\n", a0, a1, a2);
fied by the format string. Unfortunately, }
C++Builder does not do this. Even compilers that
static void printf_3(const char* format,
do provide this feature often fail to check argu- int a0, double a1, unsigned int a2,
ments for the wide character variations of char a3) {
printf() and scanf(). printf("%d %f %u %hhd\n",
a0, a1, a2, a3);
Variadic type safety issues can be avoided by }
switching to a C++ alternative to printf() and /* VARARG TRANSFORMATION END */
scanf(), which is iostreams, or one of the newer
int main()
libraries such as Boost.Format. However, many {
developers find printf- and scanf-style format int a = -1;
strings preferable to iostream‘s syntax. Although printf_1("%d %u %hhd\n", a, a, a);
libraries such as Boost.Format offer a good com- unsigned int b = 512;
bination of printf-style format strings and C++ printf_2("%d %u %hhd\n", b, b, b);
type safety, switching an entire codebase to a new
formatting and I/O library just to complete a Un- double c = 2.0;
printf_3("%d %f %u %hhd\n", c, c, c, c);
icode migration is not an option.
return 0;
Nick Galbreath, in his blog posting ―Type safe printf‖ }
[7], offers an intriguing alternative. He proposes a
simple source code transformer that takes a C++ Because the transformed function calls fit such an ob-
source file, searches for printf- and scanf-style func- vious pattern, the transformation can be easily re-
tions, and transforms them. He gives the example of a versed after running the transformed code through
program containing several bad arguments to the compiler and catching any problems that it finds.
printf(): Python scripts that implement this type-safe
printf() transformer (easily runnable with Python
int main() for Windows) are available at [8].
{
int a = -1;
This technique obviously has broader applications
printf("%d %u %hhd\n", a, a, a); beyond aiding in Unicode migrations.

unsigned int b = 512;


printf("%d %u %hhd\n", b, b, b); Other issues
These are merely the issues and techniques that I‘ve
double c = 2.0;
printf("%d %f %u %hhd\n", c, c, c, c);
found most relevant in my own Unicode migrations.
Different applications will have different needs.
return 0; For example, third-party libraries may need to be
} upgraded or customized to support Unicode and
C++Builder 2009 or 2010. (The most popular libraries,
The type-safe printf() transformer turns each such as JCL, JVCL, DevExpress, and TMS, long ago
printf() call into its own unique function, whose
released Unicode-capable versions.)
arguments are specified as part of the function defini- Delphi applications that use String, Char, or
tion and therefore can be checked by the compiler: PChar as byte buffers are heavily affected by the
change in size of Delphi‘s String, Char, and PChar
/* VARARG TRANSFORMATION START */
/* This is autogenerated */ types. C and C++ applications usually use char and
char arrays for byte buffers and so are unaffected, but
static void printf_1(const char* format, C++Builder developers who also work in Delphi or
int a0, unsigned int a1, char a2) {
printf("%d %u %hhd\n", a0, a1, a2);
who work with heavily Delphi-influenced code may
} need to be aware of this.

C++Builder Developer’s Journal 25 www.bcbjournal.com


Volume 14, Number 5—May 2010 (Special Issue) J. Kelley, Migrating to Unicode, Part II

Reading and writing external data now requires to manipulate narrow character text.
attention both to in-memory storage (RawByteString, 6. Use string shims, C++ overloading, and similar
AnsiString, or UTF8String) and to encodings. (Using techniques as needed to handle remaining ANSI
the system encoding default ANSI encoding may lose versus Unicode issues.
data when transferring from UTF-16. UTF-8 is often
7. Run the type-safe printf() transformer on your
preferable.) Code that writes external data also needs
code to catch any issues with variadic macros.
to consider writing a Byte Order Mark (BOM), a spe-
cial sequence of bytes at the beginning of a file that 8. Review your code for places where you assume
indicates the file‘s endianness and Unicode encoding. that strings can be arbitrarily indexed or split; this
Finally, database tools and database interactions is no longer the case with Unicode.
may require additional attention, depending on your
If you‘re doing a minimal migration:
database‘s capabilities.
Some of these issues are discussed in more detail 1. Check your third-party libraries and make sure
in [9]. that they‘re compatible with C++Builder 2009 and
2010.
Putting it all together 2. Convert your project to C++Builder 2009 or 2010.
Under Project, Options, Directories and Condi-
Unicode is a very broad topic, and even the sub-topic
tionals, make sure that ―_TCHAR maps to‖ is set
of migrating to Unicode for C++Builder 2009 and 2010
to ―char.‖
touches upon many techniques. As a review, here‘s an
overview of one approach to migrating your applica- 3. Replace String with AnsiString.
tion to C++Builder 2009 or 2010: 4. Use string shims, C++ overloading, and similar
First, decide on whether you‘re going to do a techniques to handle interactions between Un-
complete migration (to gain the full benefits of Un- icode VCL code and your ANSI application code.
icode) or a minimal migration (to get up and running 5. Run the type-safe printf() transformer on your
in the new IDE as soon as possible). code to catch any issues with variadic macros.
If you‘re doing a complete migration:
Gradually switch to Unicode, as time and business
1. Check your third-party libraries and make sure cases permit, to gain the full benefits of Unicode.
that they‘re compatible with C++Builder 2009 and
2010.
2. Before switching to C++Builder 2009 or 2010: Contact Josh at joshkel@gmail.com.
a. Replace AnsiString with String.
b. Mark string literals (“Hello”) with tchar.h‘s _T
macro.
References
1. Matthew Wilson, Imperfect C++. Addison-Wesley,
c. Replace C library routines with their tchar.h
equivalents. 2004.
2. Matthew Wilson et. al. ―STLSoft – Robust,
3. Add C++ typedefs such as tstring so that C++
Lightweight, Cross-platform, Template Software.‖
string manipulation will work after the switch to
http://www.stlsoft.org/.
Unicode.
3. WideCharToMultiByte.
4. Convert your project to C++Builder 2009 or 2010.
http://msdn.microsoft.com/en-
Under Project, Options, Directories and Condi-
us/library/dd374130%28VS.85%29.aspx.
tionals, make sure that ―_TCHAR maps to‖ is set
to ―wchar_t.‖ 4. MultiByteToWideChar.
http://msdn.microsoft.com/en-
5. Introduce AnsiString, RawByteString, and
us/library/dd319072%28VS.85%29.aspx.
UTF8String in places where you need to continue

ISSN 1093-2097 26 C++Builder Developer’s Journal


J. Kelley, Migrating to Unicode, Part II Volume 14, Number 5—May 2010 (Special Issue)

5. ―Nick Hodges > Blog Archive > We Won‘t


Share your thoughts with our
Overwrite Your Changes.‖
http://blogs.embarcadero.com/nickhodges/2009 authors and other readers by
/05/29/39245. using the Journal‘s forums:

6. Matthew Wilson. ―Generalized String Manipula- http://forums.bcbjournal.org


tion: Access Shims and Type Tunneling.‖
http://www.drdobbs.com/cpp/184401689. Re-
trieved 4/12/2010. (Several of the links within
this article are broken; switch to the print view for
the complete article.) Did you know that you can
7. Nick Galgreath. ―Type safe printf.‖ check your subscription’s
http://blog.client9.com/2008/10/type-safe- expiration date by logging in
printf.html. Retrieved 4/13/2010. at http://bcbjournal.org?

8. Nick Galbreath. ―typesafeprintf: type safe printf


transformation.‖ If you’ve forgotten your password, please
http://code.google.com/p/typesafeprintf/. Re- visit http://bcbjournal.org/login_help.php
trieved 4/13/2010. and a new password will be e-mailed to you.
9. Cary Jensen. ―Delphi Unicode Migration for Mere
Mortals: Stories and Advice from the Front
Lines.‖
http://edn.embarcadero.com/article/40307. Re-
trieved 4/13/2010.

C++Builder Developer’s Journal 27 www.bcbjournal.com


Volume 14, Number 5—May 2010 (Special Issue) MJFAF Promotion

Develop software faster


without sacrificing reliability
using MJFAF and MJFVCL
— Exclusive to C++Builder — Purchase MJFAF and receive a
FREE copy of MJFVCL.
Updated for C++Builder 2010!
Just $99 for 1-year upgrades/support
$299 for lifetime upgrades/support
MJFAF (MJ Freelancing Application Framework) for
C++Builder is a comprehensive set of runtime pack-
ages designed to not only increase development
Subscribers of the BCB Journal will
throughput, but also instil product stability through receive the following benefits:
the use of well known design patterns and compile-
Priority e-mail support
time enforced type safety.
6-month extension of your
BCB Journal subscription
MJFVCL (MJ Freelancing VCL) for C++Builder is a
component suite that complements MJFAF and
C++Builder‘s standard VCL. The components utilize For more information visit
many of the MJFAF runtime packages, thereby ensur- http://www.mjfreelancing.com
ing that the proven track record of reliability is carried
through to your applications.

Key areas of MJFAF and MJFVCL used in the majority of applications


Advanced debugging aids Generate HTML using an object based framework
Applications licensing Read/Write XML files using an object based framework
Dynamic object lifetime management Read/Write INI files using an object based framework
Advanced threading Structured configuration files with import/export to
Adding multiple thread access safety to objects INI/XML files and the Registry. Export to HTML is also
Implementing object and thread pools available.
Manage shared memory MDI form management
Implement TCP/IP communication using Indy Access Runtime Type Information of VCL objects
Perform data hashing (MD4/MD5/SHA1/SHA2) Create secondary desktops
Perform data encryption (AES—Rijndael) Implement Singleton based objects
Perform data compression (BZIP2) Create deletion functors for use with STL containers
Access the Windows Background Intelligent Trans- Access the Windows System Restore Points
fer Service Access the Windows Firewall service
Accessing the Recycle Bin Use the Windows Management Interface
Easily perform bit masking calculations Info Balloons
Obtain detailed information about various versions Tray icon
of Windows Advanced file search operations
Create (thread safe) atomic data types Thread safe logging
Dynamically load DLLs with added type safety Inter-process communications using named pipes
Implement advanced streams Data storage in PE header
Access environment variables Advanced data storage in STL collections offering pol-
COM utilities such as the Global Interface Table icy based thread safety and copy semantics.
and much, much more.

MJFAF and MJFVCL are available for C++Builder 5 and above.

ISSN 1093-2097 28 C++Builder Developer’s Journal

Você também pode gostar