Escolar Documentos
Profissional Documentos
Cultura Documentos
WORKSHOP
April 2012
AGREEMENT
ICS 03.100.10; 01.140.20
English version
2012 CEN All rights of exploitation in any form and by any means reserved worldwide for CEN national Members.
CEN WS eCAT
Date: 2012-04-23
Secretariat: AFNOR
Contents
Foreword ........................................................................................................................................................ 4
Introduction .................................................................................................................................................... 5
1. Scope ...................................................................................................................................................... 9
2. Normative References ............................................................................................................................ 9
3. Definitions and abbreviations ................................................................................................................ 10
3.1
Definitions .................................................................................................................................... 10
3.2
Abbreviations ............................................................................................................................... 12
4. Methodologies for product classification system mapping ................................................................... 14
4.1
Ontologies .................................................................................................................................... 14
4.1.1 Semantic heterogeneity ........................................................................................................... 14
4.1.2 Ontology matching problem ..................................................................................................... 17
4.1.3 Areas of ontology mapping / matching .................................................................................... 19
4.2
Product classification system mapping methodologies ............................................................... 21
4.2.1 A canonical process model for ontology mapping ................................................................... 23
4.2.2 CC3P ....................................................................................................................................... 24
4.3
Elementary schema-based matching / mapping approaches ..................................................... 26
4.4
Architecture .................................................................................................................................. 28
4.4.1 Centralized architecture ........................................................................................................... 29
4.4.2 Distributed architecture ............................................................................................................ 29
4.5
Tools supporting ontology development and mapping ................................................................ 29
4.6
Exchange formats for ontologies ................................................................................................. 32
4.6.1 XML .......................................................................................................................................... 32
4.6.2 RDF and RDFS ........................................................................................................................ 34
4.6.3 OWL ......................................................................................................................................... 36
4.6.4 SKOS ....................................................................................................................................... 37
4.6.5 BMEcat .................................................................................................................................... 38
4.7
Summary and Recommendations ............................................................................................... 38
5. The cMap Overall Mapping Methodology ............................................................................................. 40
5.1
Requirements............................................................................................................................... 40
5.1.1 Product Classification System versions used in cMap ............................................................ 40
5.1.2 General requirements about the cMap mapping / matching methodology .............................. 41
5.1.3 Mapping challenges ................................................................................................................. 41
5.1.4 Mapping relationships .............................................................................................................. 42
5.2
Design of the cMap Mapping Methodology ................................................................................. 43
5.2.1 Design of the mapping methodology ....................................................................................... 43
5.2.2 The cMap platform architecture ............................................................................................... 44
5.2.3 Selection of an appropriate tool for the mapping ..................................................................... 45
5.2.4 Import and Export format ......................................................................................................... 46
5.3
Usage of the cMap Mapping Methodology Mapping results statistics...................................... 47
5.3.1 Interpretation for the CPV system (I. Vertical Section) ............................................................ 48
5.3.2 Interpretation for the eCl@ss system (II. Vertical Section) ...................................................... 50
5.3.3 Interpretation for the GPC system (III. Vertical Section) ......................................................... 51
5.3.4 Interpretation for the UNSPSC system (IV. Vertical Section) .................................................. 53
5.4
Summary and Recommendations ............................................................................................... 55
6. Description of the classification systems .............................................................................................. 57
6.1
Introduction .................................................................................................................................. 57
6.2
Release policy and roadmap ....................................................................................................... 57
6.3
Maintenance process ................................................................................................................... 58
6.4
Version compatibility .................................................................................................................... 58
6.4.1 CPV .......................................................................................................................................... 58
6.4.2 UNSPSC .................................................................................................................................. 62
6.4.3 GPC ......................................................................................................................................... 64
6.4.4 eCl@ss .................................................................................................................................... 69
6.5
Summary...................................................................................................................................... 73
6.5.1 Differences and Similarities ..................................................................................................... 74
6.5.2 Identified problems for the common maintenance of the mapping .......................................... 74
6.5.3 Recommendations ................................................................................................................... 76
7. Definition of the architecture for an open standardized classification collaboration platform ............... 78
2
7.1
Introduction .................................................................................................................................. 78
7.2
Business use cases ..................................................................................................................... 78
7.3
Actors who require the mapping .................................................................................................. 79
7.4
cMap platform roles ..................................................................................................................... 82
7.4.1 End-user .................................................................................................................................. 83
7.4.2 Platform administration authority ............................................................................................. 83
7.4.3 Platform Provider ..................................................................................................................... 83
7.4.4 Classification Authority ............................................................................................................ 83
7.4.5 Mapping Proposer ................................................................................................................... 84
7.4.6 Quality manager....................................................................................................................... 84
7.4.7 Apply officially at the administration Release manager ........................................................... 84
7.5
Business objects .......................................................................................................................... 85
7.5.1 Representing classifications .................................................................................................... 85
7.5.2 Representing mappings ........................................................................................................... 86
7.5.3 Bringing in line the cMap mapping cases ................................................................................ 90
7.6
Use cases .................................................................................................................................... 95
7.6.1 Use Case 1: Query for mapping .............................................................................................. 96
7.6.2 Use Case 2: Manage mapping ................................................................................................ 96
7.6.3 Use case 3: Load classification system ................................................................................... 99
7.7
Requirement analysis ................................................................................................................ 104
7.7.1 Architectural requirements ..................................................................................................... 104
7.7.2 Platform requirements ........................................................................................................... 105
7.7.3 Process-related requirements................................................................................................ 107
7.7.4 End-user requirements .......................................................................................................... 107
7.8
Data Quality ............................................................................................................................... 107
7.8.1 Unique Identification .............................................................................................................. 108
7.8.2 Tracking and Tracing ............................................................................................................. 108
7.8.3 Change Management ............................................................................................................ 108
7.8.4 Intellectual property rights and conditions of use .................................................................. 108
8. Definition of a synchronization process .............................................................................................. 110
8.1
Introduction ................................................................................................................................ 110
8.2
The basis for a sustainable process: Gen-ePDC and its adaption for cMap ............................. 110
8.3
cMap processes ......................................................................................................................... 110
8.3.1 Process 1: Query for mapping result ..................................................................................... 110
8.3.2 Process 2: Apply Release Update Information ...................................................................... 112
8.3.3 Process 3: Manage Mapping ................................................................................................. 113
8.4
Maintenance strategy ................................................................................................................ 114
8.5
Governance models ................................................................................................................... 115
8.5.1 Governance model 1: Community-driven .............................................................................. 116
8.5.2 Governance model 2: Classification authority-driven ............................................................ 118
8.5.3 Governance model 3: Administration authority-driven .......................................................... 119
8.5.4 Pros and Cons...................................................................................................................... 120
8.6
Business models (high level) ..................................................................................................... 121
8.6.1 Proposal fee ........................................................................................................................... 122
8.6.2 Mapping result fee ................................................................................................................. 122
8.6.3 Membership restriction .......................................................................................................... 122
8.6.4 Classification authority financed ............................................................................................ 122
8.6.5 Third party financed ............................................................................................................... 122
8.6.6 cMap as a service .................................................................................................................. 122
8.6.7 Comparison............................................................................................................................ 122
9. Conclusion and recommendation ....................................................................................................... 124
Foreword
The production of this CEN Workshop Agreement on Classification Mapping for open and standardized
product classification usage in eBusiness (cMap) was decided at the CEN Workshop eCAT plenary meeting
on 2 March 2011.
The Project team started to work in May 2011.
CEN Workshop eCAT was launched in 2002 and gathers experts in electronic catalogues and classifications
systems used in ebusiness and eprocurement both from public and private sector.
The Workshop produced eight CWAs which are available at URL:
http://www.cen.eu/cen/Sectors/Sectors/ISSS/Workshops/Pages/eCAT.aspx
The list of companies supporting this CWA will be added in the final document.
Introduction
The cMap project - Classification and Mapping for eBusiness and eProcurement - is a follow up project of the
CC3P project - Classification and catalogue systems for public and private procurement - which has been
closed in 2010 with CWA 16138. The CC3P project conducted an analysis about how different product
classification systems can be aligned with each other to get knowledge about the possibility to map or align
these different systems with each other.
The basis requirement for such an alignment was to have product data classified in one product classification
system classified manually, semi-automatically or even automatically in another product classification
system. Such mapping or alignment would facilitate business processes, such as electronic procurement or
tendering, even if different classification systems are used enterprise-wide.
To get this knowledge, four main product classification systems have been assessed, since they are widely
used within companies and by the public sector. These main product classification systems are CPV,
eCl@ss, GPC and UNSPSC. A trial mapping has been undertaken for six domains to analyse differences
and similarities between the product classification systems in order to extract basic rules for alignment or
mapping. Those six domains (namely: Cloths, Food Beverage & Tobacco, Furniture, Electronics, Laboratory,
Energy) are updated in the cMap project and available alongside with 45 files will be available.
In addition to the analysis of the structure of the four product classification systems, the maintenance
processes of the different classification authorities that are responsible for the maintenance of each product
classification system have been assessed. The goal was to understand the differences and similarities of the
maintenance processes in order to define an overall maintenance process for a product classification
authority responsible that would be responsible for the alignment or mapping of the four main product
classification systems.
As main results from the CC3P project a set of recommendations have been extracted or defined. They will
serve to reach a harmonization between the product classification systems and to facilitate an alignment or
mapping in the future. On the other hand, recommendations for a high-level mapping platform have been
defined, which can be used to align or map the four product classification systems with each other and reach
the goal of classify ones, use in different product classification systems of product data.
The cMap project follows the CC3P project and extends the results of the CC3P project in two main areas:
Finishing a full mapping of all domains of the four product classification systems
Defining an architecture and a governance mechanism for a mapping platform in terms of building
blocks and its requirements.
In addition, an analysis has be carried out to investigate the methods and methodologies to fulfil a semiautomatic or even automatic mapping among the four main product classification systems used in CC3P.
This methodology can serve as the core for the classification platform and gives the glue among the four
product classification authorities to support mappings among these product classification systems. Not only
technical aspects but also organizational aspects are taken into account.
To reflect all these points, in CC3P the following overall harmonization strategy has been recommended to
reach the goal of product classification system alignment or mapping.
1.
Scope
The present document studies four product classifications used in eBusiness in Europe (and beyond) to
reach the overall goals stated in the introduction, according to the CC3P project for an initial mapping and
the research in the direction of methods, methodologies and platforms.
The versions of the product classification systems used here are:
UNSPSC v11 English
eCl@ss 6.0.1 English
GPC 30062008 English ( As at 31 August 2009)
CPV 2008 English
2.
Normative References
The following normative documents contain provisions which, through reference in this text, constitute
provisions of this CWA. For dated references, subsequent amendments to, or revisions of, any of these
publications do not apply. However, parties to agreements based on this CWA are encouraged to investigate
the possibility of applying the most recent editions of the normative documents indicated below. For undated
references, the latest edition of the normative document referred to applies.
CWA 15294:2005, Dictionary of Terminology for Product Classification and Description
CWA 15295:2005, Description of References and Data Models for Classification
CWA 15556-3:2006, Product Description and Classification Part 3: results of development in harmonization
of product classification and in multilingual electronic catalogues and their respective data modelling
CWA 16100:2010, Guidelines for the design, implementation and operation of a product property server
(ePPS)
CWA 16138:2010, Classification and catalogue systems used in electronic public
and private procurement
DIN 4002-100, Properties and their scopes for product data exchange Part 100: Properties on
www.DINsml.net
IEC 61360, Standard data element types with associated classification scheme for electric components
ISO/IEC 6523, Information technology Structure for the identification of organizations and organization
parts
ISO/IEC 11179, Metadata registries (MDR)
ISO 13584, Industrial automation systems and integration Parts library (PLIB)
ISO/DIS 22274:2011, Systems to manage terminology, knowledge and content Concept-related aspects
for developing and internationalizing classification systems
ISO/TS 29002-5:2009, Industrial automation systems and integration Exchange of characteristic data
Part 5: Identification scheme
ISO 8000, Data Quality
ISO 22745, Industrial automation systems and integration - Open technical dictionaries and their application
to master data
3.
3.1
Definitions
For the purposes of the present document, the following terms and definitions apply:
3.1.1
Attribute
data element for the computer-sensible description of a () property, a relation or a () class. EXAMPLE
Creation date of a class object in a computer system
Source: ISO/DIS 22274
3.1.2
Backward Compatibility
the ability of software and hardware to use data produced by a previous generation of software and
hardware
Source: ISO Concept Database, 2011 (http://cdb.iso.org), ISO 12651:1999
Alternative definition:
a newer coding standard is backward compatible with an older coding standard if decoders designed to
operate with the older coding standard are able to continue to operate by decoding all or part of a bitstream
produced according to the newer coding standard
Source: ISO Concept Database, 2011 (http://cdb.iso.org), ISO/IEC 13818
3.1.3
Brick
Term used in the GPC for a product () class
3.1.4
Characteristic
distinguishing trait or quality
NOTE: Characteristics that apply to () concepts are called feature specifications (3.11), whereas
characteristics of () classes are called () properties. Source: ISO/DIS 22274
3.1.5
Class
description of a set of () objects that share the same () characteristics. NOTE: The characteristics may
include () properties, operations, methods, relationships and semantics
Source: ISO/DIS 22274
NOTE: a class is usually described by a class name and a class code that identifies the class hierarchical
position within a classification system
3.1.6
Classification
process of assigning phenomena to () classes according to criteria. Source: ISO/DIS 22274
3.1.7
Classification System
systematic collection of () classes organized according to a known set of rules, and into which () objects
may be grouped
NOTE: This CWA considers both the classification system with properties and the classification system
without properties
EXAMPLE 1: UNSPSC is an example of a classification system without properties.
EXAMPLE 2: eCl@ss is an example of a classification system with properties.
Source: ISO/DIS 22274
10
3.1.8
Commodity class
Term used in the UNSPSC and eCl@ss for a product () class
3.1.9
Concept
unit of knowledge created by a unique combination of () characteristics.
NOTE: Concepts are not necessarily bound to particular languages. They are, however, influenced by the
social or cultural background which often leads to different () classifications
Source: ISO/DIS 22274
3.1.10
Data provenance
a record of the ultimate derivation and passage of a piece of data through its various owners or custodians
Source: ISO 8000-102
3.1.11
Four-eye-principle
a security precaution that requires at least two people to approve of a particular activity
(http://en.wikipedia.org/wiki/Four_eyes)
3.1.12
Identifier
a character or group of characters constituting a data element value used to identify or name an object and
possibly to indicate certain properties of that object
Source: ISO Concept Database, 2011 (http://cdb.iso.org), ISO/IEC 6523-1:1998, cited ISO 13584-26:2000
Alternative definition:
linguistically independent sequence of characters capable of uniquely and permanently identifying that with
which it is associated.
Source: ISO Concept Database, 2011 (http://cdb.iso.org), adapted from ISO/IEC 11179-3:2003
3.1.13
Object
anything perceivable or conceivable. NOTE: Objects may be material (e.g. an engine, a sheet of paper, a
diamond), immaterial (e.g. conversion ratio, a project plan) or imagined (e.g. a unicorn)
Source: ISO/DIS 22274
3.1.14
Ontologist
someone who professionally deals with shared formal conceptualizations
3.1.15
Predecessor
the class for a classified product before upgrading to a new release. The product will be classified with a ()
successor class code in the new release
EXAMPLE: a user has assigned the class 21150301 (predecessor) to their product according to eCl@ss
Release 6.2. Due to a change in the classification system this class code is now changed to 23170203
(successor) in eCl@ss Release 7.0. If only one class of the existing (source) release corresponds to one
class in the new (target) release, there is a 121-relation. If more than one class is joined into one class there
is a M21-relation, in the case of one class being split into more than one class we speak of a 12M-relation
3.1.16
Property
defined characteristic suitable for the description and differentiation of the () objects in a () class
EXAMPLE: Ambient temperature may be a property of a class comprising geographical locations.
Source: ISO/DIS 22274
11
3.1.17
Release Policy
certain rules and principles that define the criteria for releasing new versions, e.g. the frequency, the content
scope, the validity etc.
3.1.18
Release Update File
Information on the changes that were done between two consecutive releases of a classification system,
usually published by the classification authority
EXAMPLE: For the GPC, a delta report is published, for the UNSPSC an audit trail, for the CPV
correspondence tables, for eCl@ss Release Update Files (mapping tables) are available
3.1.19
Roadmap
a plan that applies to a new product or process, or to an emerging technology
Source: http://en.wikipedia.org/wiki/Technology_roadmap (2011-09-05)
3.1.20
Successor
the new class for a classified product when upgrading to a new release. The product was already classified
with a () predecessor class code in the last release
EXAMPLE: a user has assigned the class 21150301 (predecessor) to his/her product according to eCl@ss
Release 6.2. Due to a change in the classification system this class code is now changed to 23170203
(successor) in eCl@ss Release 7.0. If only one class of the existing (source) release corresponds to one
class in the new (target) release, there is a 121-relation. If more than one class is joined into one class there
is a M21-relation, in the case of one class being split into more than one class this is a 12M-relation
3.1.21
Upward Compatibility
ability to move data from a more advanced version of a system or software package to a less advanced
version
Source: ISO Concept Database, 2011 (http://cdb.iso.org), ISO 12651:1999
3.1.22
Value
part of an attribute specification which specifies one possible content of an attribute compliant with the
domain of the attribute. Source: CWA 15294:2005
NOTE: The quoted term attribute is replaced here by the term property according to the ISO definition. I.e.
a value is one possible content of a property. The domain of the property is the quantity of allowed or valid
values for a property. EXAMPLE: for a class traffic light the property colour would have the allowed values
red, yellow, green which form the propertys domain
3.2
Abbreviations
CEN
CPV
CWA
DIN
DTD
eCat
ePDC
GPC
GDSN
GTIN
GUI
IEC
ISO
PCS
PLIB
UNSPSC
13
4.
4.1
Ontologies
An ontology typically provides a vocabulary that describes a domain of interest and a specification of the
meaning of terms used in the vocabulary [8]. Depending on the precision of the specification, the notion of
ontology encompasses several data and conceptual models, including, sets of terms, classifications,
thesauri, database schemas, or fully axiomatised theories [7]. When several competing ontologies are used
in different applications, most often these applications cannot immediately interoperate.
Ontologies are serving for structuring and exchanging of data or information. They typically consist of:
Concepts or classes
Types
Instances
Relations
Inheritance and
Axioms
For further information on ontologies, http://semanticweb.org/wiki/Ontology.
There are many potential circumstances where semantic heterogeneity may arise :
Enterprise information integration
Querying and indexing the deep Web (which is a classic data federation problem in that there are
literally tens to hundreds of thousands of separate Web databases)
Merchant catalogue mapping
Schema versus data heterogeneity
Schema heterogeneity and semi-structured data
1 http://techwiki.openstructs.org/index.php/Classification_of_Semantic_Heterogeneity
14
Naturally, there will always be differences in how differing authors or sponsors create their own particular
world view, which, if transmitted in XML or expressed through an ontology language such as OWL may
also result in differences based on expression or syntax. Indeed, the ease of conveying these schemas as
semi-structured XML, RDF or OWL is in and of itself a source of potential expression heterogeneities. There
are also other sources in simple schema use and versioning that can create mismatches [10]. Thus, possible
drivers in semantic mismatches can occur from world view, perspective, syntax, structure and versioning and
timing:
One schema may express a similar world view with different syntax, grammar or structure
One schema may be a new version of the other
Two or more schemas may be evolutions of the same original schema
There may be many sources modelling the same aspects of the underlying domain (horizontal
resolution such as for competing trade associations or standards bodies), or
There may be many sources that cover different domains but overlap at the seams (vertical
resolution such as between pharmaceuticals and basic medicine)
Heterogeneities can be classified into three broad classes [11]:
Structural conflicts arise when the schema of the sources representing related or overlapping data
exhibit discrepancies. Structural conflicts can be detected when comparing the underlying DTDs 2.
The class of structural conflicts includes generalization conflicts, aggregation conflicts, internal path
discrepancy, missing items, element ordering, constraint and type mismatch, and naming conflicts
between the element types and attribute names.
Domain conflicts arise when the semantic of the data sources that will be integrated exhibit
discrepancies. Domain conflicts can be detected by looking at the information contained in the
DTDs and using knowledge about the underlying data domains. The class of domain conflicts
includes schematic discrepancy, scale or unit, precision, and data representation conflicts.
Data conflicts refer to discrepancies among similar or related data values across multiple sources.
Data conflicts can only be detected by comparing the underlying documents The class of data
conflicts includes ID-value, missing data, incorrect spelling, and naming conflicts between the
element contents and the attribute values.
Moreover, mismatches or conflicts can occur between set elements (a population mismatch) or attributes (a
description mismatch).
The figure below shows about 40 distinct potential sources of semantic heterogeneities:
15
Even should the correct encoding be detected, there are significant differences in different language
sources in parsing (white space, for example), syntax and semantics that can also lead to many
error types.
3 Due to the different terminologies used by the reference authors, three different terms are used: matching, mapping and alignment but
they have the same meaning.
4 To be understood as aggregation
17
Even since there are different formalisations of the matching operation available for this document the
matching operation determines an alignment A for a pair of ontologies O1 and O2. Hence, given a pair of
ontologies (which can be very simple and contain one entity each), the matching task is that of finding an
alignment between these ontologies. There are some other parameters that can extend the definition of
matching [8]:
the use of an input alignment A, which is to be extended
the matching parameters, for instance, weights, or thresholds
external resources, such as common knowledge and domain specific thesauri
schema objects like class definitions (e.g. table definitions in a relational model), entity types and
relationship types and their relationships. So, a potential user is responsible for understanding the
semantics of the objects in the database schema.
A second direction for approaching the problem of product classification mapping is ontologies. The
ontology mapping problem is very closely related to the product classification mapping problem,
since a product classification can be seen as a specialized ontology using a (mono-) hierarchical
structure for the concepts, e.g. product classes, within the ontology. When referring to the computer
science, ontologies are defined as formal representation systems of knowledge or a domain of
discourse, e.g. products.
They are characterized by:
-
Typically a denoted and formalized and ordered representation of a set of terms and their
relationships in a specific domain of discourse.
They contain inference - and integrity - rules.
Represent a network of information with logical relations.
Database schemas and ontologies share similarity since they both provide a vocabulary of terms and
somewhat constrain the meaning of terms used in the vocabulary. Hence, they often share similar matching
solutions [8].
Overcoming semantic heterogeneity is typically achieved in two steps [8]:
matching entities to determine an alignment, i.e., a set of correspondences
interpreting an alignment according to application needs, such as data translation or query
answering.
Taking into account these formal parts of ontologies, a product classification system can be seen as a
lightweight ontology, containing terms, a taxonomy (a classification framework for all products and
services) and relations between terms and properties, which describe these terms, like classes.
19
another ontology
If the semantics of the same language constructs vary in their
implementation
Mismatch on the ontological level, which occurs if two languages are using
the same term to denote different concepts
different terms to denote the same concepts
different modelling paradigms, conventions or level of granularity
using constructs that cover different ranges of the domain.
o Automated and semi-automated mapping systems must be able to
identify the variations in concepts as represented by various
ontologies and take appropriate steps to normalize the meaning of
those concepts.
20
In such cases, support should be provided for identifying the incompleteness that may be
present within the mapping.
Accommodation for heterogeneity
o Mappings between domains will likely entail the use of multiple representational languages.
Consequently, mapping representations will need to should support multiple languages or, a
common representation language should be employed and mappings between models
described with this language.
This short comparison shows the problems of product classification mapping in the research field of
ontologies. It is much more adequate to investigate the possible solutions with ontologies than with database
schemas. For this reason, this work will concentrate on mapping solutions driven by the ontology science to
find a practical solution to the product classification mapping problems within a mapping methodology.
4.2
Based upon the research on methodologies regarding product classification systems, or more generally
ontologies, a discipline called ontology engineering brings light to the topic of product classification system
mapping. It covers the systematic creation, development, maintenance and mapping of ontologies for
different application domains.
In ontology engineering the process of ontology development can be split into different phases which are:
Requirement engineering for the ontology development
Design of the ontology according to the requirements
Development of the ontology and the maintenance process
Usage of the ontology
Maintenance of the ontology
Retiring the ontology
Mapping of ontologies or product classification systems as lightweight ontologies can be seen mainly using
the phases of design (phase 2) and development of an ontology (phase 3) since in these phases the ground
is set up to support the mapping methods applies between different ontologies or product classification
systems.
Phase 1
Phase 2
Requirement
engineering
Design of the
ontology
Phase 3
Phase 4
Development
of the
ontology
Usage of the
ontology
Phase 5
Phase 6
Maintenance
of the
ontology
Placing out of
order of the
ontology
In addition, the view on the ontology has to be decided since the view or probably different views are used
for representing the information that is structured within the ontology or product classification system to the
audience or users. Since product classification systems are typically modelled as lightweight ontologies
consisting at least of two different parts, a description of the concrete products as classes or concepts of the
ontology and a hierarchical tree-based taxonomy, it has to be extracted which information have to be
integrated into the product description to reflect the content of the different views and which taxonomies are
suitable for grouping the classes of the product classification system according to a specific view.
This means, e.g. that different pieces of information are needed to reflect a product classification system
from the view of purchasing in the opposite to reflecting a product classification system from the sales
perspective. In addition, different taxonomies are needed to facilitate the usage for the different application
areas like purchasing or sales. All information needed for the different views per product class or concept
has to be extracted from the requirements engineering process and the requirements of the correlating
taxonomy. When looking at the four horizontal product classification systems investigated within this
document, that is UNSPSC, eCl@ss, CPV and GPC, the UNSPSC product classification system consists
only of taxonomy whereas eCl@ss consists of a taxonomy and product description given by classes as
concepts and their related properties. The requirements for these two product classification systems are
accordingly very different in the amount of information which shall be represented within the product
classification system. Beside this, UNSPSC can be seen as mainly driven by a sales view whereas eCl@ss
can be seen as mainly driven by a purchasing view. According to these different views, the taxonomy of both
product classification systems is very different and not really comparable.
Phase 2: Design of the ontology
After defining the general requirements, the representation formalism for the description of the ontology or
product classification system shall be selected. In addition to this, the granularity of the description must be
defined. Today product classification systems are designed for a specific usage given by one taxonomy and
the description of classes or concepts in the direction of this intended usage. Product classification systems
as lightweight ontologies are in general defined for a specific intended usage that is the description of
products and services. But to cover different scopes different taxonomies shall be defined to meet the
requirements of the different views and also different product properties used by the different views shall be
defined.
Phase 3: Development of the ontology and maintenance process
In this phase the design principles given by the requirements have to be elaborated into a concrete ontology
or product classification system. That means, the concrete classes based on the meta model of the ontology
must be defined. Additionally it has to be decided which kind of tool shall be used to support the
development of the ontology or product classification system. Different tools are available on the market (e.g.
SKOS or Protg). The selection of the tool also supports in different ways different philosophies of
maintenance, like centralized versus distributed development and maintenance of ontologies.
Phase 4: Usage of the ontology
If the ontology or product classification system has been designed according to the requirements and has
been developed according to specific problem-based representation formalism maybe based on a specific
tool, the ontology can be used by companies for describing products and services.
Phase 5: Maintenance of the ontology
Since the development of ontology is not a static work and ontologies might change over time, the ontology
has to be adapted to the needs of the application domain. In the area of product classification systems, every
year thousands of products are developed, so that also these products must be described according to a
specific product classification system used by companies. If some information introduced by new products
and services cannot be covered by the ontology, the ontology itself, that is the metamodel has to be adapted
to meet these new requirements. This can include the adoption of product representation capabilities or new
views that is to say new taxonomies.
Phase 6: Retiring the ontology
If a product classification system or ontology is not used any more, it has to be retired. It happens when a
specific product classification system will not be used any more or is replaced by a successor. In the second
case a transition to the new product classification system should be supported to map product data used to a
specific product classification system to the successor product classification system to avoid breaks and
additional work in the electronic supply chain.
22
Based on the overall methodology for product classification system engineering, a huge number of
methodologies have been developed to satisfy different usages of ontologies in different, but mostly specific
application domains. Within the design phase of these methodologies, different approaches to support the
development of ontologies exist such as:
Architecture of a platform and tools supporting the development and mapping of ontologies or
product classification systems
Formal representation mechanism or mechanism and formats/notations for ontologies
Methods for representing the rules for the ontologies
Import and export formats for ontologies
Methods to support mapping between ontologies
Feature
Engineering
Selection of
next steps
Interpretation
similarity
aggregation
Similarity
computation
23
4.2.2 CC3P
Within the CC3P project a first analysis of the four main product classification systems, CPV, eCl@ss, GPC
and UNSPSC, has been undertaken. This analysis followed a four-step approach given by four so-called
phases [CWA 16138, p.94 ff.]:
Phase 1: Numeric analysis
Phase 2: Syntactical analysis
Phase 3: Semantic analysis
Phase 4: Summary and recommendations
24
Figure 10: Comparison of naming schemas for CPV, eCl@ss, GPC and UNSPSC
Within the third phase of the CWA 16138 methodology the semantic analysis has taken the numerical and
syntactical analysis phases as starting points. A comparison has been made to get a rough overview of the
overlapping of the classification systems in terms of commodity class names. Identical and similar
commodity class names between the different classification systems have been investigated. This analysis
has mainly been done on the class level without taking into account the properties of the different classes. As
a consequence it is not a very deep analysis of the semantics.
25
With the CC3P project the description of product classification systems is based on a data model approach
taken from the Gen-ePDC project.
4.3
According to the classification of elementary schema-based matching approaches, the following matchers8
can be differentiated [5]. Lots of these matching approaches are used within learning and heuristic
techniques:
Element level matchers
o This kind of matching approaches include matching techniques like:
String-based matching
8 A matcher is a function that fulfils the mapping operation
26
This technique is based on the assumption that the more similar two strings
are, the more likely it is that they denote the same concept.
The similarity is often explained by using distance functions or variations of
distance functions.
Language-based matching
This technique is based on morphological properties of words to identify
important concepts within a source and is widely used in natural language
processing.
o The first step is the tokenizing of an input stream to locate potential
words of relevance within the data source. In the application area of
product classification systems these tokens are concept or class
names and property names and values.
o In the next step, lemmatization that is the process of grouping
together the different inflected forms of a word so they can be
analysed as a single item looks at each candidate word and finds
all it permutations (e.g. dog, dogs).
o During the last step, parts of the investigated language resource,
like prepositions, conjunctions and so on, will be flagged for
elimination since they do not denote concepts.
Constraint-based matching
These techniques are making evaluations of entities based on internal
constraints that exist within an entity, like data types or cardinality of
attributes.
Linguistic resource matching
In this case, common knowledge resources such as thesauri maintain
information that can be used to ascertain whether two concepts are equal or
similar.
Alignment reuse
In this case, the intuition is used that many schemas or ontologies to be
matched are similar to already matched schemas or ontologies (especially if
there are in the same application domain).
The available schema or ontology matching can be used to facilitate the
mappings to new domains.
Upper level formal ontologies
The upper level ontologies are a form of external knowledge resource that
can be used to ground ontologies which are under investigation for mapping
in a shared semantic context.
Typically the formalization is done using logic-based systems.
Structural level matchers
o This kind of matching approaches include matching techniques like:
Graph-based techniques
These techniques take a data source as a labelled graph and assume that if
nodes from two separate ontologies are related or similar, then the nodes
around them seems to be similar.
This matching is typically computationally expensive and works with
approximation.
Taxonomy based techniques
e.g. is-a relationship and assumes that if an is-a relation exists between
two nodes that are already similar, so the neighbours are then likely to be
similar too.
Repository of structures
For the storage of ontologies and their fragments, a repository is used.
The idea is that if new structures are to be matched, first similarities to the
structures already given by the repository have to be checked.
Model-based techniques
Matching of concepts is handled based on model-theoretic semantics of
these concepts, like description logic.
Combined matchers
o This kind of matching approaches aggregate element and structure level matchers.
27
4.4
Architecture
The development of ontologies or even lightweight ontologies like thesauri, terminology systems or product
classification systems is a challenging task. Since the development process is distributed over a huge
amount of participants and/or organisations, the architecture for an appropriate development platform for the
mapping of product classification systems and product classification systems themselves should be open for
the integration of the different parties involved.
In general, there are two overall architectures in the research of ontology matching to look for
correspondences between ontologies [3]:
Reuse of a shared ontology (upper ontology) as a general ontology which has to be extended with
concepts and properties specific to an application area.
o As long as the extensions e.g. concepts are defined consistently with the definitions of the
shared ontology, finding correspondence between concepts can be facilitated.
o This architecture is used in language dictionaries.
Using learning and heuristic techniques, which is applied in cases where either no upper ontology is
available or shall be used in the future.
When focussing on an platform architecture for the mapping of product classification systems in general
there are two meaningful approaches which will be outlined in the next two subsections:
Centralized architecture and
Distributed architecture.
In both cases the overall platform architecture looks like in Figure 12.
Figure 12: Overall platform architecture for the product classification mapping
28
Users in public and private organisations are working with one or more product classification systems. If a
mapping shall be established, these product classification systems have to be imported into the platform or
the access to the different product classification systems has to be elaborated online, that is a direct access
to the different product classification systems without storing them as copies within the platform.
The platform has to support a mapping engine with contains the rules to fulfil the mapping between the
different product classification systems.
After mapping two or more product classification systems to each other, on the one hand a mapping file
containing the mapping results shall be generated and supported and on the other hand the concrete
mapping of different product classification systems has to be generated and delivered to the user and/or
product classification authority asking for it.
4.5
When talking about methodologies for product classification systems, one has to distinguish between the
development of a single product classification system and the mapping of two or more classification systems
to each other.
For the development of product classification systems the same distinction regarding the platform
architecture between centralized and distributed applies. In both cases the systematic development of at
least big product classification systems or the mapping of those is only possible with appropriate tools.
There are a lot of tools that support the development of ontologies or product classification systems on the
market and they are based on different representation mechanisms and architectures. These kinds of tools
have to be integrated or at least associated to the mapping platform to facilitate the development and
mapping of product classification systems.
Beside commercial tools there are also open-source tools available. A good overview about available tools is
accessible in the World Wide Web9 and the ontology wiki10.
9
http://www.xml.com/2004/07/14/examples/Ontology_Editor_Survey_2004_Table_-_Michael_Denny.pdf
29
4.5.1.2 Category A: Mapping tools to map from one classification system to another
classification system
The tools available for the mapping of product classification systems can be classified according to different
criteria:
Technology
o Stand-alone tool
These tools are totally stand-alone, which means that they have to be installed
separately and can read the product classification systems to provide a mapping.
Product classification systems to be mapped are available in some kind of data, e.g.
excel-sheet, database system or csv-files.
The mapping result will be exported in some kind of export format.
o Online-platform
These tools are available online to support the mapping of product classification
systems, e.g. on the internet or intra-/extranet of some enterprises.
The product classification systems can be imported in some input format.
The mapping result can be exported in some export formats.
o Integrated tool
These tools are typically extension of some business software like ERP-systems
and support the mapping of master data to one or more product classification
systems.
The mapping result will be available within the master data of the enterprise inside
the business software and can be exported in some format.
Interface support
o Proprietary interfaces
10
http://techwiki.openstructs.org/index.php/Ontology_Tools
30
According to the technologies mentioned in the previous bullet points, most tools
support proprietary import and export formats for the mapping.
o Standard interfaces
Some tools are supporting standard-based import and export interfaces, e.g. XMLbased interfaces or even the product classification systems formats.
Automation
o Manually
In some tools there is no function support for automating the mapping between
product classification systems given. Consequently the mapping will be done
manually by one or more domain-skilled users. In some cases this process of
mapping is supported by a graphical user interface.
o Semi-automatically
Most of the tools available support some kind of semi-automatic mapping between
product classification systems.
The mapping mechanisms used inside these tools rely on rule-based systems
where these rules are built up and extended during the mapping process.
In a first step, these tools are able to support an automatic One-to-One mapping, if
related classes within the different product classification systems are present.
For other kind of mapping relations like One-to-Many or Manty-to-One, the user has
to interact with the system manually to solve mapping conflicts.
This manual conflict resolution will lead to the extension of the rule to facilitate later
mappings in an automatic way.
o Automatically
None of the tools on the market support automatic mapping between product
classifications systems.
In some specific application domains, ontology mapping tools exist which are able to
support an automatic mapping between ontologies exist. But to do this, a huge effort
is required to prepare the import formats and the mapping rules to support the
automatic mapping process.
User support
o Textual user interface
Some tools only support a script-based or textual user interface to support the
mapping of product classification systems. If an automatic mapping is supported this
kind of interface is acceptable.
o Graphical user interface
Some tools support a graphical user interface to facilitate the mapping of product
classification systems in drag-and-drop manner. After importing, the product
classification systems are shown in two different windows on the screen and the
user can drag a product class from one window to a product class in the other
window to define the mapping between these classes. This can also be done on a
property or value level.
The result of this mapping process will lead to a rule base for later classification
mapping, e.g. creating some kind of regular expression to describe the mapping
rule.
31
4.6
In the area of exchanging ontologies a lot of different exchange formats are used by the different tools.
Formats mostly used are :
XML
RDF and RDFS
OWL
BMEcat
4.6.1 XML
The acronym XML stands for eXtensible Markup Language and is an official recommendation of the W3C. It
is a markup language used to describe content in a platform-independent manner by using tags for different
content items like the hypertext markup language (HTML). Unlike HTML, XML is not designed to present the
content items to a human user but to transport and store data in a machine-readable form without any layout
information since this kind of information is not necessary for data exchange between machines or software
systems.
One further major difference between HTML and XML is the fact that XML can be used to define a specific
dialect for different application areas, like BMEcat for the exchange of product catalogues between software
systems as will be explained later. In this sense, XML can be used as meta language for the definition of e.g.
application oriented exchange formats. Tags used in XML are not predefined but can be defined by the users
according to their specific needs. These tags are used to describe the content transported in a selfdescriptive manner.
Since XML in general supports only the capabilities to define own languages for the exchange and storage of
data in a platform and system-independent style, it is widely used as a basis for the definition of languages
for the description of resources and ontologies. For example, it is the basis of the resource description
framework (RDF) and the ontology web language (OWL). To describe resources and ontologies, new tags
are introduced for the semantic description of information items in RDF and OWL. XML defines basic
mechanisms like e.g. namespaces and basis data types which can be used within these specific languages.
Each XML document defines a tree structure for the document starting with a root element. All tags
(elements) which can be used within this XML document are also defined by using a document type
description (DTD) or a XML schema definition (XSD). Each element or tag within a XML document can have
attributes and can have text content.
<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
32
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
Figure 13: Example of an easy XML document
All XML documents shall be well formed and valid. To be well formed, an XML document as to be defined by
using a correct XML syntax. To be valid, an XML document must conform to a DTD or XSD.
33
e chapter
This is political. We describe here the technical and processual issues, not
the political issues
This is all introductory to the CWA, not the chapter
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:cd="http://www.recshop.fake/cd#">
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Empire Burlesque">
<cd:artist>Bob Dylan</cd:artist>
<cd:country>USA</cd:country>
<cd:company>Columbia</cd:company>
<cd:price>10.90</cd:price>
<cd:year>1985</cd:year>
</rdf:Description>
<rdf:Description
rdf:about="http://www.recshop.fake/cd/Hide your heart">
<cd:artist>Bonnie Tyler</cd:art
34
that elements with the cd prefix are from the namespace "http://www.recshop.fake/cd#".The
<rdf:Description> element contains the description of the resource identified by the rdf:about attribute. The
elements: <cd:artist>, <cd:country>, <cd:company>, etc. are properties of the resource.
The main elements of RDF are the root element, <RDF>, and the <Description> element, which identifies a
resource.
RDF describes resources with classes, properties, and values. In addition, RDF also needs a way to define
application-specific classes and properties. Application-specific classes and properties must be defined using
extensions to RDF. One such extension is RDF Schema (RDFS). RDF Schema does not provide actual
application-specific classes and properties. Instead RDF Schema provides the framework to describe
application-specific classes and properties. Classes in RDF Schema are much like classes in object oriented
programming languages. This allows resources to be defined as instances of classes, and subclasses of
classes.
c:date>2008-09-01</dc:date>
<dc:type>Web Development</dc:type>
<dc:format>text/html</dc:format>
<dc:language>en</dc:language>
</rdf:Description>
</rdf:RDF>
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xml:base="http://www.animals.fake/animals#">
<rdf:Description rdf:ID="animal">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
</rdf:Description>
<rdf:Description
Figure 15: RDFS example12
rdf:ID="horse">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<rdfs:subClassOf rdf:resource="#animal"/>
</rdf:Description>
</rdf:RDF>
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xml:base="ht
Figure 16: RDFS example abbreviated13
12
http://www.w3schools.com/rdf/rdf_example.asp
13 http://www.w3schools.com/rdf/rdf_schema.asp
35
RDF is metadata (data about data) and is used to describe information resources. The Dublin Core is a set
of predefined properties for describing documents.
p://www.animals.fake/animals#">
<rdfs:Class rdf:ID="animal" />
<rdfs:Class rdf:ID="horse">
<rdfs:subClassOf rdf:resource="#animal"/>
</rdfs:Class>
</rdf:RDF>
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc= "http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://www.w3schools.com">
<dc:description>W3Schools - Free tutorials</dc:description>
<dc:publisher>Refsnes Data as</dc:publisher>
<
Figure 17: RDF and Dublin Core example14
4.6.3 OWL
The ontology web language (OWL) is a language for processing web information. Since an ontology is about
the exact description of things and their relationships, ontology is about the exact description of web
information and relationships between pieces of web information.
OWL is designed on top of RDF to facilitate the processing of ontologies by software systems. In this sense it
is not meant to be used by human users but by machines. The syntax of OWL is written in XML and the W3C
has defined OWL as standard to semantically describe resource on the web. It is widely spread and used
within the web community. Actually three sublanguages are defined for OWL, which differ in their
expressiveness. These sublanguages are:
OWL Lite
OWL DL, which includes OWL Lite and
OWL full, which includes OWL DL.
OWL and RDF are very similar, but OWL is a stronger language with greater machine interpretability than
RDF. It comes with a larger vocabulary and stronger syntax than RDF.
A totally different approach for describing the semantics of product classification systems and the mapping
between these systems is driven by the ontology engineering application area. A lot of different
representation formalisms have been invested and investigated for the description of ontologies differing in
different expressiveness.
Starting with XML as general notation to describe resources in a technology independent way, it has become
obvious that XML in general is not expressive enough to describe the semantics of ontologies and also
product classification systems as lightweight ontologies. To reduce the gap for the description of ontologies
based on XML, the RDF and RDFS have been introduced. RDF and RDFS have much more capabilities to
describe semantic information necessary for ontologies.
14 http://www.w3schools.com/rdf/rdf_dublin.asp
36
Since the major driving research field of ontologies is the semantic web, the W3C consortium has developed
OWL as the standard language or representation mechanism to describe content items used in the World
Wide Web. In this sense, OWL can be seen as the next generation technology for the semantic description
of content.
OWL is a widely used, standardized format to describe semantics.
Ontologies and product classification systems can be expressed using OWL as formal representation
mechanism and description language.
4.6.4 SKOS
The Simple Knowledge Organization System (SKOS) is a data-sharing standard, bridging several different
fields of knowledge, technology and practice.
In the library and information sciences, a long and distinguished heritage is devoted to developing tools for
organizing large collections of objects such as books or museum artefacts. These tools are known generally
15
as "knowledge organization systems" (KOS) or sometimes as "controlled structured vocabularies" . Different
families of knowledge organization systems, including thesauri, classification schemes, subject heading
systems, and taxonomies are widely recognized and applied in both modern and traditional information
systems. In practice it can be hard to draw an absolute distinction between thesauri and classification
schemes or taxonomies, although some properties can be used to broadly characterize these different
16
families .
The Simple Knowledge Organization System is a common data model for knowledge organization systems
such as thesauri, classification schemes, subject heading systems and taxonomies. Using SKOS, a
knowledge organization system can be expressed as machine-readable data. It can then be exchanged
between computer applications and published in a machine-readable format in the Web. The SKOS data
model is formally defined as an OWL Full ontology whereas SKOS data are expressed as RDF triples, and
17
may be encoded using any concrete RDF syntax (such as RDF/XML) .
The SKOS data model views a knowledge organization system as a concept scheme comprising a set of
concepts. These SKOS concept schemes and SKOS concepts are identified by URIs, enabling anyone to
refer to them unambiguously from any context, and making them a part of the World Wide Web.
SKOS concepts can be labelled with any number of lexical (UNICODE) strings, such as "romantic love" or
"", in any given natural language, such as English or Japanese (written here in hiragana). One of
these labels in any given language can be indicated as the preferred label for that language, and the others
as alternative labels. Labels may also be "hidden", which is useful where a knowledge organization system is
being queried via a text index.
SKOS concepts can be assigned one or more notations, which are lexical codes used to uniquely identify the
concept within the scope of a given concept scheme. While URIs are the preferred means of identifying
SKOS concepts within computer systems, notations provide a bridge to other systems of identification
already in use such as classification codes used in library catalogues.
SKOS concepts can be documented with notes of various types. The SKOS data model provides a basic set
of documentation properties, supporting scope notes, definitions and editorial notes, among others. This set
is not meant to be exhaustive, but rather to provide a framework that can be extended by third parties to
provide support for more specific types of notes.
SKOS concepts can be linked to other SKOS concepts via semantic relation properties. The SKOS data
model provides support for hierarchical and associative links between SKOS concepts. Again, as with any
15
http://www.w3.org/TR/2009/REC-skos-reference-20090818/
http://www.w3.org/TR/2009/REC-skos-reference-20090818/
17
http://www.w3.org/TR/2009/REC-skos-reference-20090818/
16
37
part of the SKOS data model, these can be extended by third parties to provide support for more specific
needs.
SKOS concepts can be grouped into collections, which can be labelled and/or ordered. This feature of the
SKOS data model is intended to provide support for node labels within thesauri, and for situations where the
ordering of a set of concepts is meaningful or provides some useful information.
SKOS concepts can be mapped to other SKOS concepts in different concept schemes. The SKOS data
model provides support for four basic types of mapping link: hierarchical, associative, close equivalent and
exact equivalent.
The SKOS Mapping Vocabulary contains a set of properties for specifying mapping relations among
concepts from different domain ontologies (broadMatch, narrowMatch, exactMatch, majorMatch,
minorMatch). Such a rich set of semantic relations for expressing mapping is useful in ranking search results
to reflect the weight of the mapping. Apart from the properties, the SKOS Mapping has the three classes for
defining: intersection of concepts (the AND class), union of concepts (OR), and negation (NOT).
The search system starts by taking two ontologies (a source and a target), and a concept from the source
ontology as initial input. The result of the search algorithm is a set of concepts in the target ontology. For
each of the input concepts, the system searches for a mapping to the target ontology. When a matching
target concept is found, it is added to one of five search results lists (clusters) depending on which SKOS
Mapping relation it belongs to. Later, children of each of the matched concepts are searched for similarly
matching results. The algorithm limits the search for matching concept children to a predefined depth. Again,
matching children are added to one of the five search result clusters based on their SKOS mapping relation.
Finally, search results are ranked using the SKOS Mapping relation properties (broadMatch, narrowMatch,
exactMatch, majorMatch, minorMatch) and their weights.
4.6.5 BMEcat
BMEcat is an XML-based standard for the transfer of electronic product catalogues and is available as XMLSchema (XSD). The German Association of Materials Management, Purchasing and Logistics (BME), the
umbrella organization of German purchasing and logistics agents, developed this standard in close
cooperation with industry and academia based on the experience gathered from co-operation activities at
18
global level .
In BMEcat, data are categorized based on data areas which differ according to data content, entry,
management, and complexity:
Identification (article number, GTIN (formerly EAN number), )
Description (short and long description, )
Grouping (ERP-product category number, )
Classification data
Properties (weight, colour, )
Order information (order unit, minimum order quantity, )
Prices (customer price, list price, )
Logistics information (delivery times, packaging information, )
Additional multimedia data (images, pdf-files, )
References to other products
Qualifiers (special offer, discontinued model, )
4.7
Since interoperability and integration of data becomes more and more important, the problem of semantic
heterogeneity has to be solved in the area of product classification systems. One way to address this
problem is to represent product classification systems and their taxonomies as lightweight ontologies, that is
ontologies without reasoning capabilities.
18 http://www.bme.de/fileadmin/bilder/PDF/BMEcat_en.pdf
38
Different types heterogeneities must be addressed to solve the problem of ontology mismatch. One general
point is that product classification systems as lightweight ontologies must be represented in a formal
language like RDF(S) or OWL. By doing this, tools available on the market can be used to align or match
different product classification systems. If formal representations are given for product classification systems
also the ontology matching operation can be formalized to be reused in different versions of product
classifications systems. Once an alignment is available, it can be used to repeat the mapping operation and
can be used as the basis for an adapted alignment according to the new product classification system
versions. Without this formalization an on-going mapping between different versions of product classification
systems can only be achieved with huge manual effort.
In any case the process of product classification mapping shall follow a methodology to facilitate this process
and make it transparent to the user. Different approaches and models have been developed in the area of
ontology development and matching.
The most promising model is the canonical process model for ontology mapping described in subsection
4.2.1. This canonical model should be included within the overall methodology for product classification
system engineering to reach a complete methodology for the ontology and consequently the product
classification system mapping process.
To fulfil the ontology matching or mapping process, different types of matchers, elementary and structural
matchers, shall be used inside the methodology. The combination of both types promises the best results for
the mapping process.
To facilitate the usage of this methodology, a platform shall be provided supporting this methodology. There
are some tools on the market, commercial as well as open-source tools, dealing with different aspects of
ontology engineering. As a general architecture, both suggested architectures are suitable for the product
classification system mapping platform. The centralized platform architecture has some advantages against
the distributed architecture since exchange (for both import and export) formats are not needed thus limiting
the impact of the lack of industry standard format.
Additionally, data files for product classification systems might be very big, and therefore limiting the traffic to
importing product classification systems into the mapping platform will keep the network traffic at a lower
level compared to the online access of a distributed architecture. The processing capabilities of the
centralized platform are independent of the processing capabilities of the different development platforms
hosted by the different products classification system authorities.
Because of these criteria it is recommended to use a centralized platform architecture for the mapping
platform. In addition a meaningful import and export format for the exchange of the product classification
system has to be selected or developed as well as a representation for the mapping rules and the mapping
file containing the mapping between concepts of the different product classification systems.
The overview of the exchange formats for ontologies and product classification systems as lightweight
ontologies has shown that OWL is the most promising candidate for the exchange of ontologies. Therefore
product classification systems as lightweight ontologies shall be represented using OWL. Some work has
been made and is still in evolution, to represent product classification systems in the OWL notation, like
eCl@ss and UNSPSC. SKOS uses OWL full to describe the data model for ontologies and product
classification systems. Because of the introduction of specific mapping elements for product classification
systems, SKOS shall be used as basic system to map product classification systems on a semantic level. To
facilitate the usage of product classification systems in exchanging and mapping these systems, SKOS
provides the possibility to convert SKOS-based ontology representations to different other formats, like RDF
and BMEcat.
39
5.
5.1
Requirements
For the design of the cMap overall mapping methodology some general requirements have been taken into
account. These requirements include the information given by the investigated product classification
systems, the usage or computation of these product classification systems and some restriction set by the
project to fulfil the mapping based on a useful platform architecture.
40
41
Tobacco domain there is a huge proliferation of the commodity codes, unlike in other domains.
The extra semantic richness that helps the mapping is not available for all the classes in all the
classification systems. Definitions are sometimes missing and properties that help to understand the
class concept are only available in eCl@ss and in GPC.
Different terminologies due to the different set of dialects of the English language are in use and the
language-based search has to cope with the differences: CPV and GPC are in UK English; UNSPSC
is in American English while eCl@ss is translated from German to American English.
Property vocabularies: in some product classification systems that use class properties, those have
to be analysed when searching the correct equivalent class. The amount of information that is
investigated is much bigger and more resource-consuming.
In the same way, for product classification systems that use keywords (eCl@ss) these kewywords
have to be investigated, which required extra resource.
Terms with identical names are often semantically different.
The differences between classification systems are not as simple as the equivalence or subclass
relations between named classes typically found by such systems.
Due to the complexity of the mapping challenges fully automated alignment approaches seem unrealistic and
not available in the market.
For these reasons, the mapping was mainly a semi-automatic semantic mapping methodology process that
involved looking at names and structures, and when possible definitions, synonyms and actual product
properties were used.
The entire classification systems were used to find the equivalent codes for each mapping table since the
domain scopes (see clothing example above) were very different among those four classification systems.
The mapping exercise was conducted rather manually with the help of some semi-automatic search and
browse features and automated data extraction of the MS Excel tables and associated documents.
The potential reasonable relationships to search for best available match through multiple edges have been
analysed. The original CPV hierarchical structure has been kept as much as possible.
42
5.2
Feature
Engineering
sematic
analysis
Selection of
next steps
syntactical
analysis
numeric
analysis
43
According to the ontology matching operation described in section 4, within the cMap mapping methodology
no matching parameters and additional resources have been used to fulfil the mapping. Additional resources
are only implicitly given by the domain knowledge of the mapping experts, and not in a formal way or
provided by external thesauri. The input alignment A used to fulfil the mapping is given by the context
information derived from the hierarchy and the application domain of specific concepts e.g. product classes
which have to be mapped, but again, not in a formal way.
The general mapping approach taken by the cMap project is the combined matching approach described in
subsection 4.3. This combined matching contains the following matchers:
Element level matchers
String based matching
Language based matching
Constraint based matching
Alignment reuse (implicitly given by experts)
Structural level matchers
Taxonomy based techniques
44
on OWL but better suiting for the representation of product classification systems as lightweight ontologies.
This is because hierarchy relations can be reflected in different ways, not only inheritance, and because of
the restricted capabilities of reasoning, which are not relevant in the application area of taxonomies.
46
5.3
48
There are no cases (0%) when more than one class is available in the CPV system and there is no
equivalent class in any of the three target systems.
N21 + N2M cardinality: no class is available in CPV and there is one or more than one equivalent
class(es) in the target system
eCl@ss
23 %
GPC
13 %
UNSPSC
33 %
There are only 13 % of the cases when no class is available in the CPV system and there is one or more
than one equivalent class(es) in GPC.
49
50
51
121 + 12M cardinality: one class is available in the GPC and there is one or more than one equivalent
class(es) in the target system
CPV:
14%
eCl@ss:
11%
UNSPSC:
18%
The most matching 121 + 12M cardinality is between GPC and UNSPSC with 18%, i.e. in 18% of the GPC
cases there are equivalent class(es) in UNSPSC.
33%
36%
31%
There are 36 % of the cases when no classes are available in both the eCl@ss and GPC systems but there
are classes in CPV and / or in UNSPSC.
52
N21 + N2M: cardinality: no class is available in GPC and there is one or more than one equivalent
class(es) in the target system
CPV:
35%
eCl@ss:
33%
UNSPSC:
37%
There are only 33 % of the cases when no class is available in the GPC system and there is one or more
than one equivalent class(es) in eCl@ss.
M21 cardinality: more than one class in UNSPSC is matching with one class in the target system:
In CPV:
7%
In eCl@ss:
3%
In GPC:
5%
M2M cardinality: more than one class in UNSPSC is matching with more than one class in the target
system:
In CPV:
0%
In eCl@ss:
5%
In GPC:
4%
M21 + M2M cardinality: one class is available in the UNSPSC and there is one or more than one
equivalent class(es) in the target system:
In CPV:
7%
In eCl@ss:
8%
In GPC:
9%
There are 9% cases when more than one class is are available in the UNSPSC system and there is one or
more than one equivalent class(es) in GPC.
In eCl@ss:
In GPC:
20%
9%
There are only 9 % of the cases when no class is available in the UNSPSC system and there is one or more
than one equivalent class(es) in GPC.
5.4
The starting point for the mapping of the four product classification systems within this document are the
versions given in subsection 5.1.1 where the basic product classification system is the CPV as stated in
Figure 18.
For the development of the cMap mapping methodology, the investigations of section 4 have been taken into
account. The recommendations drawn from section 4 have been adapted to the general requirements given
by the project context for the product classification mapping and the product classification systems
themselves as stated as mapping challenges in subsection 5.1.3.
It was shown that some different types of mapping relations have to be taken into account during the
mapping process as requirements given by the different product classification systems.
Taking the results from section 4, in subsection 5.2 a mapping methodology has been derived as a
combination of the canonical ontology mapping methodology and the steps taken from the CC3P mapping
methodology. This methodology should be integrated into the overall mapping methodology for product
classification systems development what has not been described again to focus on the mapping process
itself.
The mapping of the product classification systems has shown that a combined matcher best fits the need of
product classification mapping that is a combination of element level and structural matchers as stated in
subsection 5.2.1.
To fulfil the mapping between the product classification systems, the general requirements have shown that
a centralized architecture for the classification platform is the most appropriate architecture.
To facilitate the usage either by machine or human user, the import and export format is Microsoft Excel.
This format suites best the need of the project since no product classification system is given in a formal
representation applicable for deriving similarities between the product classification systems and because of
the different scopes of the given product classification systems, hierarchical information must be taken into
account during the mapping process.
To facilitate this mapping process, formal representations of the product classification systems should be
available, enhanced by context information, so that tools can be used to support the mapping process. In
any case, a semi-automated mapping process has to be followed, since actually no tool is able to make in
every case the final decision about the mapping between two concepts or classes of the given product
classification systems.
The user has to interact with the systems to make the final decision. Once this final decision has been made
by the user, formal representations of the product classification systems support the reuse of this decision for
new versions of the product classification systems. In addition, these decisions should be represented
formally by a language, so that they can be adapted as the product classification systems change.
In Table 22 the overall results from the fulfilled mapping process are shown and described.
The results of the mapping process have shown that it is strongly recommended that the different product
classification systems should be represented in a formal way so that a tool based mapping can be
supported. Promising tools available on the market are Protg with the enhancements of SKOS and
Prompt. A short overview of these tools is given in annexes A to C.
Some work is ongoing to represent eCl@ss and UNSPSC as OWL based ontologies. But also the other
product classification systems must be supported formally. Only based on a formal description of the product
classification system a formalization of mapping rules is possible and useful for reuse during product
classification system development.
55
As long as there is no commitment between the classification authorities to this formal representation or at
least representations which can be transformed into each others, the mapping between the product
classification systems will be a time-consuming and mainly manually-driven process, even when using the
cMap mapping methodology.
56
6.
6.1
Introduction
This chapter is an enhancement of what was already described in the CC3P CWA 16138 Part 4. To start
with some comments on the different release policies (section 6.2) and maintenance processes (section 6.3)
are provided, followed by an analysis of the version compatibility of the four classification systems (section
6.4), as these are the crucial issues for not only upgrading to later versions of a classification system, but
also to maintain a mapping that is based on recent versions. This chapter will therefore go into detail and
describe all available upgrade information delivered by the classification authorities which is a main building
block to maintain the mapping.
6.2
As described in the CC3P CWA 16138 Part 4, the four classification authorities have established different
release policies according to the specific needs of their users and customers, i.e. they have developed
certain rules and principles that define the criteria for releasing new versions, e.g. the frequency, the content
scope, the validity etc. These policies have grown over the years and were established by each one of the
different initiatives for different purposes.
Whereas the CPV is not being published on a regular basis, eCl@ss, the UNSPSC and the GPC have
defined roadmaps that include at least one new release per year. Furthermore, eCl@ss distinguishes
between ServicePacks23, MinorReleases24 and MajorReleases25.
A third issue is the different validity policy of the classification authorities. Whereas the CPV only publish their
current release which is then obligatory for all users and makes old releases invalid, the UNSPSC and
eCl@ss are being published in all their versions without limitations so that the user can decide which release
best suits his/her needs. When GPC publishes a new release it takes a couple of months to put it into
GDSN26 production, i.e. normally there are two GPC releases available, one is the latest GPC publication
(currently 01 June 2011) and another one called GPC in production in GDSN (currently 01 December 2010).
The GDSN production version of GPC is mandatory according to the GDSN rules.
Therefore, the user requirements for a mapping could range from the mapping of just the latest release of
each standard to the complex task of covering all versions of these standards mapped with each other. The
task to align these different schedules would be challenging since the needs of the classification standards
users are diverse.
The following table compares the four standards policies:
Table 23: Comparison of release policies
CPV
Release
Frequency
Release
Types
UNSPSC
Not defined
No distinction
GPC
No distinction
eCl@ss
At least one
release per year
At least one
release per year
No distinction
Three different
types of releases
defined
depending on
change types
included
23 eCl@ss ServicePack: contains only textual corrections and translations, therefore being downwards and upwards compatible
24 eCl@ss MinorRelease: contains content of a ServicePack, add-ons and slight changes that only change an elements version, but
not the concept, therefore being downwards compatible
25 eCl@ss MajorRelease: contains any sort of change including deletions, re-structuring and generally changes, therefore being
incompatible, but including detailed release update information
26 GDSN = Global Data Synchronisation Network by GS1 (see www.gs1.org/gdsn)
57
CPV
UNSPSC
Only current
release is valid
After 6 months
old version is
invalid
i.e. only current
version is
supported and
has to be
maintained
Release
Validity
GPC
Any release is
supported,
published and in
use
i.e. the mapping
from/to any
UNSPSC release
will have
interested parties
and could be
supported /
maintained
eCl@ss
Any release is
supported,
published and in
use
i.e. the mapping
from/to any
eCl@ss release
will have
interested parties
and could be
supported /
maintained
As shown, the product classification authorities have different ideas about the frequency of new versions, the
distinction of different types of new versions and whether a version is still published and supported after
releasing a later version.
For the scope of the cMap project, there is no need to go into further details.
For the development of the cMap platform architecture the following decisions have to be taken into account:
-
6.3
Maintenance process
As described in the CC3P CWA 16138 Part 4, the four classification authorities have established different
maintenance processes according to their specific requirements. E.g. change requests for the UNSPSC are
only created by members of the UNSPSC organization, whereas change requests for eCl@ss can be
submitted by everyone with the help of a cost-free online portal. For the GPC, a GS1-wide process is
established that is used for the maintenance of all GS1 standards. Also, in each case different bodies are
involved.
A short description of the maintenance processes can be found in CWA 16138. Further descriptive details
will not be given in the current CWA, as the maintenance processes of the classification systems themselves
are not the focus of this project, but the maintenance of the mapping between them. A deeper analysis of the
recommendations of CWA 16138 is conducted and leads to the definition of what can still be used and
further elaborated in section 6.5. Plus, for the maintenance of the mapping the version compatibility of the
releases and the release update information that is delivered by the different classification systems is more in
focus as it directly influences the maintenance process.
6.4
Version compatibility
In order to find a suitable way to maintain the mapping of the standards, the version compatibility of the
different standards plays a major role. If e.g. a full mapping was established today, this would be invalid
when a new UNSPSC/eCl@ss/GPC/CPV version is published.
In order to get a clearer view on the possibilities a short summary of the main issues of version compatibility
of the four classification standards is provided in the following sections.
6.4.1 CPV
6.4.1.1 General structure
There is no version compatibility policy applied today. From one version to the next version, codes can be
added, transferred, removed and even reused after being deleted. The structure can also be changed.
27 one of them is used by the GDSN user community, however outside the GDSN user communities several versions could be in use
58
As there is only one version of CPV valid at a time, any reused code should be placed in context to
understand its meaning. As CPV is meant to be used for public procurement notices and procedures, each
use of a code is by default linked to a specific date and time, and thus to a specific version. When moving
from one version to the following one, a transition procedure is set in place to avoid conflicting use of codes.
The CPV does not use any other kind of code or identifier for their classification classes apart from the
classification code. Therefore, there is no versioning and no change management. Any kind of change can
take place in any release.
The Commission provides release update files that are called correspondence tables. The structure of this
file is rather simple. It includes all codes of the previous release (here: CPV Code 2003) and all codes of the
following, new release (here: CPV Code 2007) including the class information (class code and class name).
There is no code that describes the type of change (e.g. NEW, EDIT, SPLIT, MOVE), but there are several
types of change described (see below). Unfortunately, only few changes are machine-readable, which is a
real disadvantage concerning the cMap goal to maintain a mapping with other standards. This will be
described in detail. The correspondence tables (available at
http://ec.europa.eu/internal_market/publicprocurement/rules/current/index_en.htm) document the changes
described in the following sections.
Description
2003
CPV code
2007
09134230-8
Description 2007
09134231-5
Biodiesel (B20).
09134232-2
Biodiesel (B100).
Biodiesel.
The new classes Biodiesel, Biodiesel (B20) and Biodiesel (B100) can be found in the correspondence table.
They are added at the end of the table, i.e. the last lines of the CPVs correspondence table are filled with
new classes.
Table 25: Other CPV 2003 empty lines in correspondence table
CPV code
2003
Description
2003
CPV code
2007
44163000-0
Description 2007
44163000-0
44163000-0
+ AA05-3 Iron
+ AA02-4 Aluminium
+ AA06-6 Lead
The difference with the information on new items of the secondary vocabulary has to be analysed by the user
manually by filtering not only the empty CPV code 2003 lines, but also to filled other lines, which leads to
some filtering problems. This problem could simply be solved by adding a description for the type of change
for which the following table is an example.
28 The CPVs secondary vocabulary contains additional information to describe classes. It is a mixture of properties and values.
59
CPV code
2003
NEW
NEW
ADD VALUE
ADD VALUE
ADD VALUE
Description 2007
09134230-8
Biodiesel.
09134231-5
09134232-2
44163000-0
44163000-0
44163000-0
Biodiesel (B20).
Biodiesel (B100).
Pipes and fittings.
Pipes and fittings.
Pipes and fittings.
+ AA05-3 Iron
+ AA02-4 Aluminium
+ AA06-6 Lead
6.4.1.3 Re-coding
Some codes are being changed in a 121-relation of predecessor and successor. Table 30 below shows two
examples of the re-coding of a class within the hierarchy. In the first line a textual change is included as well
but is not documented. By comparing the two different names, it is nevertheless machine-readable in theory.
But due to the problems described for the split of classes (see below) the information is not machinereadable at all.
Table 27: Modified CPV codes in correspondence table
CPV code
2003
Description 2003
CPV code
2007
Description 2007
93900000-7
98390000-3
Other services.
93910000-0
Decommissioning services.
98391000-0
Decommissioning services.
93920000-3
Relocation services.
98392000-7
Relocation services.
93930000-6
Tailoring services.
98393000-4
Tailoring services.
93940000-9
Upholstering services.
98394000-1
Upholstering services.
93950000-2
Locksmith services.
98395000-8
Locksmith services.
The first line of Table 28 includes two changes. First the class name is changed and second, the class is
moved within the hierarchy and a new class code is assigned. The two changes are not easy to identify. This
problem could simply be solved by adding a description for the type of change and including a line for each
single change. The following table shows an example.
Table 28: Machine-readable documentation of modified CPV codes in correspondence table
Type of
change
CPV code
2003
Description 2003
CPV code
2007
Description 2007
MOVE
93900000-7
98390000-3
Other services.
MOVE
93910000-0
Decommissioning services.
98391000-0
Decommissioning services.
MOVE
93920000-3
Relocation services.
98392000-7
Relocation services.
MOVE
93930000-6
Tailoring services.
98393000-4
Tailoring services.
MOVE
93940000-9
Upholstering services.
98394000-1
Upholstering services.
MOVE
93950000-2
Locksmith services.
98395000-8
Locksmith services.
EDIT
93900000-7
98390000-3
Other services.
In some cases a class was re-coded and renamed and product properties were added to transfer the
information that was deleted in the class name. Table 29 shows the addition of the properties Fresh and
Chilled that were used to distinguish the re-named class Fish, fresh or chilled etc. Unfortunately this is not
machine-readable because the CPV does not create relationships between the CPVs classes and the
secondary vocabulary that includes properties and values.
60
05121100-0
Description 2003
Fish, fresh or chilled.
Flat fish, fresh or
chilled.
CPV code
2007
03311000-2
Description 2007
Fish.
03311100-3
+ BA04-1
Fresh
+ BA33-8
Chilled
+ BA04-1
Fresh
+ BA33-8
Chilled
+ BA04-1
Fresh
+ BA33-8
Chilled
Flat fish.
03311110-6
Sole.
In some cases this is done in combination with more than one product property as shown in Table 30.
Table 30: Addition of product properties to CPV codes in correspondence table
CPV code
2003
Description 2003
25122400-6
25122410-9
25122420-2
CPV code
2007
Description
2007
441122000
395320000
441122000
395320000
Floor
coverings.
Mats.
Floor
coverings.
Mats.
AB125
AB12+
5
AB12+
5
AB12+
5
+
BA412
BA41Rubber +
2
BA41Rubber +
2
BA41Rubber +
2
Rubber +
Vulcanised
Vulcanised
Vulcanised
Vulcanised
Description 2003
Vulcanised rubber
conveyor or transmission
belts or belting.
61
Description 2007
Gaskets.
Rubber conveyor belts.
Rubber transmission
belts.
CPV code
2003
Description 2003
CPV code
2007
Description 2007
SPLIT
25122200-4
34312500-2
Gaskets.
SPLIT
25122200-4
34312600-3
Rubber conveyor
belts.
SPLIT
25122200-4
34312700-4
Rubber transmission
belts.
By not documenting the split in a proper machine-readable way, not even the addition of new classes as
explained in 4.3.1.2 can be interpreted by a machine. Both changes are only documented by naming the
new/successor class in the CPV 2007 code column without naming a predecessor or the type of change
(NEW vs. SPLIT).
Description 2003
Description 2007
05100000-6 Fish.
03311000-2
Fish.
03311000-2
Fish.
+ BA01-2 Live
03311000-2
Fish.
+ BA04-1 Fresh
+ BA33-8 Chilled
Table 34 shows a similar example in which classes are joined and product properties are added to further
distinguish the classes in the categories that were formerly described with the help of various classes.
Unfortunately this kind of documentation is not machine-readable because the CPV does not create
relationships between the CPVs classes and the secondary vocabulary that includes properties and values.
Table 34: Joined CPV classes in correspondence table example two
CPV code
2003
Description 2003
35211000-6 Electrically-powered
rail locomotives.
35212000-3 Diesel-electric
locomotives.
35213000-0 Diesel locomotives.
CPV code
2007
Description
2007
34611000-3
Locomotives. +
34611000-3
Locomotives. +
34611000-3
Locomotives. +
CB10-1 Electrically
powered
CB41-4 Hybrid
powered
CB09-8 Dieselpowered
6.4.2 UNSPSC
For all components of the UNSPSC downward compatibility is always guaranteed. Upward compatibility is
guaranteed for the portions of the release that centres on added classes. On classes that are modified or
deleted a remapping process is required if a member wishes to upgrade to that version, i.e. that users have
to decide which codes they want to use when existing codes are modified or deleted.
62
Release Update Files (The audit trail) provided in the excel format of each release include version
parameters as to when an entry was originally added, when it was last changed and or when it was deleted.
Additionally to the class code, the UNSPSC uses unique 6-digit identifiers. This way, the move of a class
within the hierarchy can be documented without changing the class as shown below.
6.4.2.1 ADD
Each class that is added to the UNSPSC is represented as a new class (ADD) in the audit trail. This way,
the users can easily identify the need to update their product information with new commodity classes that
might be better suitable for their products.
Table 35: New UNSPSC codes in audit trail
Effective_
version
change effective
_type
_id
effective_
code
UNv131201
ADD
174585
UNv131201
ADD
174586
UNv131201
ADD
174587
UNv131201
ADD
174588
effective_title
effective_
date
effective_definition
Feedstuff that is blended from various
raw materials and additives. These
blends are formulated according to
the specific requirements of the target
01.12.2010 animal.
A pad that is placed under the saddle
01.12.2010 when riding a horse.
A young tree that is grown in a
nursery for cultivation of broad leaved
01.12.2010 trees.
A young tree that is grown in a
nursery for cultivation of acicular
trees, e.g. trees with the shape of
01.12.2010 leaves looking like needles.
6.4.2.2 EDIT
The textual modification of a class is marked in the audit trail (EDIT), but does not have an effect on the
class code, nor the unique 6-digit ID number. The fact that a change has happened is therefore documented
for the user, but the comparison of the same object in two succeeding releases is not possible by comparing
the ID as no version number or a similar mark exists.
Table 36: Edited UNSPSC codes in audit trail
Effective_
version
change changed_
_type
version
changed
_id
changed_
code
UNv131201
EDIT
UNv130601
170607
changed_title
Fresh cut snap
10316700 dragons
UNv131201
EDIT
UNv130601
170608
UNv131201
EDIT
UNv130601
UNv131201
EDIT
UNv130601
effective
_id
effective_
code
170607
10316700
170608
10316701
170609
170609
10316702
170610
170610
10316703
effective_title
Fresh cut
snapdragons
Fresh cut bi
color
snapdragon
Fresh cut
burgundy
snapdragon
Fresh cut hot
pink
snapdragon
6.4.2.3 DELETE
A commodity class can be deleted and being removed from the standard. It does have a successor-relation
to another class that shall be used instead, though. It is marked in the audit trail (DELETE). The availability
of a successor is marked in column map_to, the successor itself is marked in the audit trail with its class
code (effective_code).
63
change_
type
changed_
version
changed
_id
changed_
code
map_ effective
to
_id
UNv131201
DELETE
UNv130601
168281
10251800
UNv131201
DELETE
UNv130601
168290
10252200
UNv131201
DELETE
UNv130601
168292
10252200
UNv131201
DELETE
UNv130601
170475
10361800
UNv131201
DELETE
UNv130601
170484
10362200
UNv131201
DELETE
UNv130601
170486
10362200
UNv131201
DELETE
UNv130601
172686
10451800
UNv131201
DELETE
UNv130601
172695
10452200
UNv131201
DELETE
UNv130601
172697
10452200
changed_title
effective_
code
6.4.2.4 MOVE
As the UNSPSC has a 6-digit unique ID number for all of their classes additionally to the class code, the
move of an existing class to another spot in the hierarchy is possible without changing the class. The class
code will be changed as the class is relocated, but the unique ID does not change. The predecessorsuccessor-relation is included in the audit trail. This change is therefore documented and traceable by the
user (MOVE). The difference to the DELETE-function is that the class is still the same, so no successor is
needed.
Table 38: Moved UNSPSC codes in audit trail
Effective_
version
change changed_
_type
version
changed
_id
UNv131201
MOVE
UNv130601
168337
UNv131201
MOVE
UNv130601
168338
UNv131201
MOVE
UNv130601
168339
UNv131201
MOVE
UNv130601
168341
UNv131201
MOVE
UNv130601
168348
changed_
code
changed_title
effective
_id
168337
168339
168338
168341
168348
effective_c
ode
effective_title
6.4.3 GPC
Release Update Files (Delta reports) between two consecutive releases are available for all updates.
Companies can use several versions, however to achieve master data synchronisation the GDSN users
should migrate to the GDSN version practically one or two times a year. Codes that were once used are
marked in the database as unavailable for future release, this way it is ensured that a code cannot be reused
again.
The GPC does not make use of an additional unique identifier apart from the classification code. But as this
cannot be used again after it was marked as unavailable it is in fact used as a unique identifier. Version
changes are not visible in the code 29.
The delta report displays all changes of all elements (classes, attributes, values) in their context. Attributes
and values are displayed in the context of the class that they are assigned to. Both the element and the text
field that was e.g. changed are named (see below Table 39). Relevant for this project are only the class
changes.
64
6.4.3.1 Addition
New elements, be it classes, attributes or values, are simply marked with an A for ADDITION.
Table 39: Example of additions as documented in GPC delta report
6.4.3.3 Deletion
A brick can be deleted without a successor-relation. The code will not be used again and marked as being
th
unavailable for future releases. In case of the deletion of a class on the 4 level (brick), all its attributes and
values are deleted as well as shown in the example. The deletion is marked with the code D.
65
6.4.3.4 Move
In the GPC, the move of a class within the hierarchy is possible and documented in a very detailed way. The
above explained change codes are added by an M for move. A brick that has been moved (the
predecessor) is marked with a DM which stands for DELETE/MOVE, i.e. a brick has been deleted in the
hierarchy from its previous position. The bricks new position in the hierarchy is marked with an AM which
stands for ADD/MOVE, i.e. a brick has changed its place in the hierarchy, but no other change occurred. If
additionally to the move of the brick, the brick itself was modified, then it is marked as AMM
(ADD/MOVE/MODIFY). The following examples will help understand better.
73000000
Household Kitchen
Merchandise
73040000
Household Kitchen
Merchandise
73040100
Household Kitchen
Storage
10001761 Refuse
bags
66
47000000
Cleaning/Hygiene
Products
47210000 Waste
Management
Products
47210100 Waste
Storage Products
10001761 Refuse
bags
73040100
Household Kitchen
Storage
10002125
Household Refuse
Bins (Indoor)
67
47210100
Waste Storage
Products
10002125
Refuse /
Waste Bins
Figure 46: Household Refuse bins (indoor) renamed as refuse / waste bins in GPC version 01062011
In this example, Household Refuse Bins (Indoor) were also moved from 73040100 Household Kitchen
Storage (see Figure 45) to 47210100 Waste Storage Products (Figure 46) and renamed to Refuse / Waste
Bins. The brick code stays the same and the changes are marked as DM in 73040100, AM in 47210100 and
additionally as AMM in 47210100, as the name has changed.
6.4.3.6 Summary
The following changes are documented in the delta report and were explained in the above chapter.
Table 47: GPC changes as documented in the delta report
Change
code
A
Change
MODIFY
DELETE
ADD
Valid for
element
Brick, Attribute,
Value
Brick, Attribute,
Value
Attribute, Value
description
a new segment, family, class, brick, attribute or value
was added into the
hierarchy
a segment, family, class, brick, attribute or value was
changed (e.g. the title or definition)
a segment, family, class, brick, attribute or value was
deleted, the attribute code or attribute value code will not
68
Change
code
Change
Valid for
element
AM
ADD MOVE
Brick
AMM
ADD MOVE
MODIFY
DELETE
MOVE
Brick
DM
Brick
description
be used again
A brick has changed its place in the hierarchy, but no
change occurred. The successor position is marked.
A brick has changed its place in the hierarchy and has
been modified. This is documented additionally to AM.
A brick has been deleted in the hierarchy from its
previous position.
The predecessor position is marked.
6.4.4 eCl@ss
Within the scope of the cMap project, eCl@ss is the only classification system that distinguishes between
different types of releases. The distinction criterion is the type of compatibility:
eCl@ss ServicePacks are backward and upward compatible as only textual changes are allowed.
ServicePacks are used for translations and other textual corrections.
eCl@ss MinorReleases are compatible within the same MajorRelease cycle, i.e. all 6.x- or 7.xversions are backward compatible as only new or changed elements are included. Changes can only
be textual modifications that do not change the concept of an element.
eCl@ss MajorReleases are not backward compatible, as changes like the restructuring of the class
hierarchy, the deletion of properties and values are allowed. In some classes, a successor is
obligatory, but in case of correction there is sometimes no successor at all.
These changes are documented with the help of two things:
a combination of a unique identifier and a version number of an object;
Release Update Files (called mapping tables)30 that have been published separately to any
eCl@ss MajorRelease starting with eCl@ss version 4.0.
eCl@ss uses the class code to identify the location in the hierarchy, but every class (every object in general)
has a unique identification scheme called IRDI International Registration Data Identifier 31 and based on
ISO-standards32. This IRDI includes both an object identifier and a version number. The version number
documents slight changes that do not change the concept of an object, but a slight change of the meaning,
e.g. if a class is moved in the hierarchy or a definition is added, the version number is changed, but it keeps
the identifier. If an objects concept is changed, a new object with a unique identifier substitutes the old one.
This way a classification class can be moved in the hierarchy, changing the class code and the version
number, but not the object identifier. New classes are simply added, this is not documented separately.
The Release Update Files document changes in a MajorRelease. The following changes are documented in
the Release Update Files:
Table 48: eCl@ss changes as documented in the Release Update Files
Change code
NEW
VERSION NUMBER
Description
New element in TargetRelease
The element was changed without changing the concept
(e.g. textual correction). Identifiers do not change, only the
Version Number is raised.
30 Starting with eCl@ss MajorRelease 7.0, additionally to the Release Update Files (including predecessor-successor-relation of
classes), eCl@ss now publishes Transaction Update Files that enable the user to update automatically the evaluation, i.e. product
description of their products. For the cMap project only the Release Update Files are relevant.
31 Source: http://wiki.eclass.de/wiki/IRDI
32 ISO/IEC 11179, ISO 29002, ISO/IEC 6523
69
CLOSED
All
REPLACE
Property
SUBSTITUTE
Property
MOVE
Class
SPLIT
Class
JOIN
Class
eCl@ss delivers nine files in total that include release update information (shown in Table 49). The first and
the third are relevant for the class mapping as they document the class changes in a MajorRelease.
Table 49: eCl@ss deliverables as documented in the Release Update Files (RUF)
Name of file
eClass-RUF-TU-CC_6_x_to_7_x.csv
eClass-RUF-TU-PR_6_x_to_7_x.csv
eClass-RUF-CC_6_x_to_7_x_EN.csv
eClass-RUF-KWSY_6_x_to_7_x_EN.csv
eClass-RUF-PR_6_x_to_7_x_EN.csv
Table of Properties
eClass-RUF-VA_6_x_to_7_x_EN.csv
Table of Values
eClass-RUF-CC_PR_6_x_to_7_x.csv
Relations Classes-Properties
eClass-RUF-PR_VA_6_x_to_7_x.csv
Relations Properties-Values
eClass-RUF-UN_6_x_to_7_x_en_US.csv
Table of units
In the RUF-Table of Classification Classes all new classes in the target release (NEW) and all changes of
existing classes are listed, whether they are closed, changed (VERSION NUMBER) or restructured
(MOVE, SPLIT, JOIN).
Additionally, the predecessor-successor-relation of the restructuring changes MOVE, SPLIT, JOIN is listed in
the Transaction Upgrade File of Classes (Class-Update-Table).
6.4.4.1 NEW
New Classes are marked with the change code NEW in the Table of Classification Classes of the RUF as
follows. Their preferred names are not included in the Release Update Files as they are an additional
product to and only applicable within the eCl@ss standard.
70
Command IrdiCC
Coded
Name
Source
Release
Level PreferredName
Target
Release
NEW
NEW
NEW
NEW
IrdiCC
0173-1#01ABR658#003
0173-1#01ABR485#002
0173-1#01ABR918#003
0173-1#01ABS068#003
0173-1#01ABS203#003
Version
Date
Coded
Name
ISO
Country
Code
ISO
Language
Code
Command
VERSION
NUMBER
VERSION
NUMBER
VERSION
NUMBER
VERSION
NUMBER
VERSION
NUMBER
Level
11.02.2011 23380201
PreferredName
Source
Release
Target
Release
en
US
eCl@ss6.2.1
eCl@ss7.0
en
US
eCl@ss6.2.1
eCl@ss7.0
11.02.2011 23389205
4 Impeller
Rotor and flowguide
3 component (accessories)
Hub cap (pinion shaft,
4 accessories)
en
US
eCl@ss6.2.1
eCl@ss7.0
11.02.2011 23389204
en
US
eCl@ss6.2.1
eCl@ss7.0
11.02.2011 23389203
en
US
eCl@ss6.2.1
eCl@ss7.0
11.02.2011 23389200
6.4.4.3 CLOSED
In eCl@ss, classes can be closed in a MajorRelease, i.e. they are still part of older releases, but are no
longer part of the target release. Elements that are no longer part of a new eCl@ss release are marked as
th
DEPRECATED=TRUE, see below Table 52. As for 4 level classes, they always need a successor, which is
additionally listed in the Class Update Table (Table 53).
Table 52: Closed eCl@ss classes as documented in the mapping tables
CLOSED
CLOSED
CLOSED
CLOSED
CLOSED
Version
Date
Coded
Name
Preferred
Level Name
ISO
Country
CLOSED
IrdiCC
0173-1#01AAA359#009
0173-1#01AAA361#009
0173-1#01AAA380#008
0173-1#01AAB007#007
0173-1#01ACH692#005
0173-1#01AAA377#008
Code
ISO
Language
Code
Command
Depre
cated
Source
Release
Target
Release
11.02.2011 21040602
4 Knife
en
US
TRUE
eCl@ss6.2.1
eCl@ss7.0
11.02.2011 21040604
4 Trowel
en
US
TRUE
eCl@ss6.2.1
eCl@ss7.0
11.02.2011 21041005
4 Fork (tool)
en
US
TRUE
eCl@ss6.2.1
eCl@ss7.0
11.02.2011 23110603
4 Ring bolt
en
US
TRUE
eCl@ss6.2.1
eCl@ss7.0
11.02.2011 22390602
4 Solar collector
en
US
TRUE
eCl@ss6.2.1
eCl@ss7.0
11.02.2011 21041002
4 Hoe
en
US
TRUE
eCl@ss6.2.1
eCl@ss7.0
71
Table 53: Successor-relation of closed eCl@ss classes as documented in the mapping tables
Command IrdiSourceRelease
SPLIT
0173-1#01-AAA359#008
SPLIT
0173-1#01-AAA359#008
SPLIT
0173-1#01-AAA359#008
SPLIT
0173-1#01-AAA359#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
SPLIT
0173-1#01-AAA361#008
CodedName
SourceRelease IrdiTargetRelease
0173-1#0121040602 ADS512#001
0173-1#0121040602 ADS513#001
0173-1#0121040602 ADS514#001
0173-1#0121040602 ADS515#001
0173-1#0121040604 ADS545#001
0173-1#0121040604 ADS546#001
0173-1#0121040604 ADS547#001
0173-1#0121040604 ADS548#001
0173-1#0121040604 ADS549#001
0173-1#0121040604 ADS550#001
0173-1#0121040604 ADS551#001
0173-1#0121040604 ADS552#001
0173-1#0121040604 ADS553#001
0173-1#0121040604 ADS554#001
0173-1#0121040604 ADS555#001
0173-1#0121040604 ADS556#001
CodedName
Source
TargetRelease Release
Target
Release
6.4.4.4 REPLACE
The REPLACE function is only valid for properties. An eCl@ss property can be replaced by an identical one
(compatible substitution). REPLACE is therefore irrelevant in the cMap project and will not be considered
here.
6.4.4.5 SUBSTITUTE
The SUBSTITUTE function is only valid for properties. An eCl@ss property can be substituted by a similar
one (incompatible substitution). SUBSTITUTE is therefore irrelevant in the cMap project and will not be
considered here.
6.4.4.6 MOVE
Similar to all mentioned classification systems, classes can be moved from one position in the hierarchy to
another. The change is documented with the help of the identifiers version number (e.g. #006 to #007), the
class code changes, the identifier stays the same.
Table 54: Example of moved eCl@ss classes as documented in the mapping tables
Command
MOVE
MOVE
MOVE
CodedName
CodedName
IrdiSourceRelease SourceRelease IrdiTargetRelease TargetRelease
0173-1#010173-1#01AKH756#006
21170314 AKH756#007
21170535
0173-1#010173-1#01AKH764#007
21170323 AKH764#008
21170536
0173-1#010173-1#01AKG104#008
25020805 AKG104#009
25021104
72
SourceRelease TargetRelease
eCl@ss6.2.1
eCl@ss7.0
eCl@ss6.2.1
eCl@ss7.0
eCl@ss6.2.1
eCl@ss7.0
6.4.4.7 SPLIT
An eCl@ss class can be split into more than one other classes, i.e. one predecessor has more than one
successor. This is documented by listing all successor-relation in one line each. This way, a user may
directly choose the right successor for his/her product, usually going from more general to more specific.
Table 55: Example of split eCl@ss classes as documented in the mapping tables
Command
IrdiSourceRelease
0173-1#01AKJ667#005
0173-1#01AKJ667#005
0173-1#01AKJ708#005
0173-1#01AKJ708#005
SPLIT
SPLIT
SPLIT
SPLIT
CodedName
SourceRelease IrdiTargetRelease
0173-1#0117019890
ADV551#001
0173-1#0117019890
ADV552#001
0173-1#0117029890
ADV556#001
0173-1#0117029890
ADV557#001
CodedName
TargetRelease
SourceRelease TargetRelease
15320101
eCl@ss6.2.1
eCl@ss7.0
15320102
eCl@ss6.2.1
eCl@ss7.0
15320201
eCl@ss6.2.1
eCl@ss7.0
15320202
eCl@ss6.2.1
eCl@ss7.0
6.4.4.8 JOIN
Several eCl@ss classes can be joined into one class, i.e. several predecessors have the same successor.
This is documented by listing all successor-relations in one line each. This way, a user can automatically
choose the right successor for his/her product, usually going from more specific to more general.
Table 56: Example of joined eCl@ss classes as documented in the mapping tables
Command IrdiSourceRelease
0173-1#01JOIN
AKG133#006
0173-1#01JOIN
BAC008#004
0173-1#01JOIN
ACF464#004
0173-1#01JOIN
ACF466#004
6.5
CodedName
SourceRelease
25041402
25041490
25049890
25049990
IrdiTargetRelease
0173-1#01BAC097#005
0173-1#01ADT413#001
0173-1#01ADT413#001
0173-1#01ADT413#001
CodedName
TargetRelease
SourceRelease TargetRelease
25209090
eCl@ss6.2.1
eCl@ss7.0
25299090
eCl@ss6.2.1
eCl@ss7.0
25299090
eCl@ss6.2.1
eCl@ss7.0
25299090
eCl@ss6.2.1
eCl@ss7.0
Summary
As shown in the previous sections, all classification authorities publish additional upgrade information when
publishing a new release. The following table lists the availability of upgrade information and the kind of
changes that are possible in each classification system.
Table 57: Possible Changes and compatibility: UNSPSC, eCl@ss, GPC, CPV
Classification
system
Supports
automatic
update
Yes
Yes
Yes
Codes can
be re-used
Unique identifier
used
Possible
changes
UNSPSC
eCl@ss
GPC
Release
Update Files
available
Yes
Yes
Yes
No
Yes33
No
Any
Any
Any
CPV
Yes
Restricted
Yes
Yes
Yes
No, but not reused
No
Any
The following section summarizes and compares what was shown above and lists recommendations on how
to further proceed.
33 eCl@ss allows the reuse of classification codes, but uses additional class identifiers that include a version and revision number. This
way, a class with the same classification code (coded name) cannot be confused with the old one as the identifiers differ.
73
74
Second, the validity of the classification systems is very diverse. Whereas only the current release of the
CPV is valid due to its legal policy (with a transition period of six months), all of the versions of the UNSPSC
and eCl@ss are valid, published and used by their customers. In GPC the GDSN production version is used
for data synchronisation which is not necessarily the very latest version.
The following thoughts lead to the conclusion that only the latest release of each standard should be taken
into consideration as the basis for the mapping updates:
Users of any classification system classify their product according to their own scheme and would
like to find a corresponding code in another classification system.
Each class once created in a classification system can have a successor class code, i.e. the user of
an older version should be able to find an appropriate new class code for his/her product in the latest
release of a particular classification system.
Therefore, a mapping needs only be maintained for the latest release of any standard.
Therefore, only the latest releases have to be supported and displayed.
This would make the platform a lot simpler and easier to maintain.
Apart from that the identified problems lead to the conclusion that the release update information has to be
comparable and machine-readable so that updates can be imported a lot more easily.
75
6.5.3 Recommendations
As shown above, the goal cannot be to synchronize the maintenance processes of individual classification
authorities, as they will all stay self-governed due to the diversity of objectives, use cases, processes,
timelines and applications. The goal is simply to maintain the mapping between them (see chapter 8), i.e. to
found the basis for interoperability between the standards and thereby support users of various classification
systems to reduce processing costs.
In order to do so, the quality of the classification systems is a crucial factor to succeed. The better the quality
of the classification system, the easier the mapping between them can be created and maintained.
Therefore, the following recommendations are addressed to the classification authorities to help improve the
quality of their classification systems.
three domains chosen by the ePDC project. The problems for only these three chosen domains are
described very well. The number of problems would surely be multiplied if enhanced to all relevant domains.
In fact, each of the four classification systems named here are driven by different users for different
purposes. Therefore, their distinct maintenance is not comparable. Plus, as the cMap methodology is meant
as a basis to facilitate the integration of other product classification systems, this would produce an even
higher organizational effort. At this point, the whole operation would be endangered by an organization that
could not be managed any longer. Even within one single classification organization, the maintenance is
hard organizational work and the final result can only be seen on the day of publication, but not at some
defined synchronization point before even publishing.
From todays point of view, each standard will stay self-governed and the organizations have already a lot to
do to let their change requests be processed through the whole internal workflow.
Therefore, to compare the maintenance processes again in order to establish synchronization points as
defined above is not a fruitful task. The major goal is not a synchronization of the maintenance of different
classification systems but rather the maintained interoperability between releases published by self-governed
authorities.
The mapping itself cannot be a deliverable by the classification authorities themselves, but mainly by the
users of the standards. In fact, most companies use various standards in their IT systems mostly additionally
to their company internal classification system. Therefore, they are used to map different classification
systems and can already deliver useful input. Plus, the specialists are in the companies that drive the
classification systems themselves. A mapping authority could only do the administrational work and doublecheck the quality, but not do the whole mapping. The mapping delivered as a main output of the cMap
project is done by two experts but can only be meant as a solid basis for further changes that will be
requested by users. This process shall rather be defined here than a synchronized process between the
classification authorities that might not lead to a maintained mapping. This will be elaborated in the following
chapters 7 and 8.
77
7.
Definition of the architecture for an open standardized classification
collaboration platform
7.1
Introduction
The scope of this chapter is to describe the technical level of the cMap platform. The strategic level including
process descriptions will follow in chapter 8. To start with, the building blocks of the cMap architecture like
defined user roles, business objects, use cases, a requirement analysis and thoughts on data quality will be
documented.
To be able to define an appropriate cMap architecture section 7.2 and 7.3 will describe the possible business
use cases of the mapping results and interested actors. Based on this section 7.4 will define the necessary
roles of the cMap platform kept as simple and basic. section 7.5 will address the description of the business
objects of the cMap platform before describing use cases of the platform (section 7.6) and the requirements
of the architecture (section 7.7) as well as giving some thoughts on data quality (section 7.8).
7.2
Wherever electronic product data is exchanged, classification systems play a major role. On the one hand,
companies might have developed their own internal classification systems in order to get an overview on
similar products or use an existing classification system. On the other hand, when exchanging product data
with business partners, they have to be sure to talk about the same product classes with their partners who
themselves might use a different classification system or another classification of their own. Companies
already have the need to map the classification system they use internally to the one they use to exchange
data.
The usage of mapped classification systems is very diverse, but can be illustrated in a rather simple way.
The following figure sums up the problem of using different classification systems.
Figure 58: Business use case of mapping user: the problem of using different classification systems
The solution for business partner 2 is to provide their product data classified according to the classification
system used by business partner 1 who requests the data. cMap will deliver the solution shown in the
following figure.
78
Figure 59: Business use case mapping user: the solution for using different classification systems
The following figure shows an example in the context of public procurement in Europe: for the bidding
process a public procurement agency needs product data classified according to the European mandatory
CPV. A supplier already uses GPC in his/her system to classify his/her data and needs to translate the GPC
codes into CPV codes to take part in the bidding process.
Figure 60: Business use case mapping user: example for the usage of different classification
systems
7.3
Many different business users could be identified. As mentioned above, any user exchanging classified
product data is a potential user of both the cMap mapping results and the cMap platform to maintain the
mapping. Among them are representatives along the whole supply chain and throughout all business
processes. It is therefore relevant for manufacturers, suppliers, public and private procurement and
everybody who participates in the supply chain.
Especially suppliers who are forced to deliver their product data in a form requested by their customers are
highly potential users. They might have to deliver data classified with CPV to public procurers, classified to
UNSPSC, GPC and eCl@ss to three different customers.
The classification system end-users do not care about databases, systems and content. They have
information needs that they demand be fulfilled in a helpful way rapidly and potentially in higher quantities.
79
Figure 61: Business use case mapping user: example for the usage of different classification
systems
80
81
7.4
The business users in the market who actually use the cMap mapping results might be diverse. The roles
they could play to maintain the cMap mapping in the cMap online platform should be kept simple and will be
explained in the next chapter.
The minimum number of roles that need to be involved with the cMap platform have been identified:
the end-user of the cMap mapping (i.e. somebody who queries a mapping result)
the cMap platform administration authority (i.e. somebody who governs the platform)
the cMap platform provider (i.e. somebody who runs the platform)
the classification authorities (i.e. somebody who delivers the input for the mapping)
the mapping proposers (i.e. somebody who creates a mapping change request)
the quality managers (i.e. somebody who approves a mapping change request)
cMap release manager (i.e. somebody who manages the mapping tables (and the cMap
classification depending on strategy)
For cMap, 2. and 3. provide the technical basis, 4. provides the content basis. Roles 5. and 6. actually do the
mapping work, i.e. they provide the maintained mapping ensuring a four-eye-principle, whereas 7. is in
charge of publishing the mapping results. More roles can be imagined, but to keep the process lean only the
basic requirements are described.
The following figure gives an overview of these roles:
82
7.4.1 End-user
The cMap platform is open for users who require a mapping result for their classification code. They submit
queries to the platform sending a specific classification code in one classification system to receive a
mapping result, i.e. the relevant classification code in another classification system. The end-user could be
anybody.
83
Rights
o Should be closely integrated into the information workflow of the cMap platform
o Should be involved in a kind of advisory or supervising board
o They should receive release and update information by the other classification authorities
right after publication to be informed about new classes in the other classification systems
(by the cMap Platform Administrator or the classification authorities themselves)
84
7.5
Business objects
85
An additional and major challenge appears in classifications with properties, especially when a certain depth
of the classification is enforced. Such systems actually have two principal means of distinction: the
classification and the properties. For obvious or practical reasons this can lead to different modelling
approaches, such that in some areas of a standard classes may contain huge amounts of sub-types of
products distinguished by certain special properties while other areas extend to have a more granular
classification.
NOTE: Involving properties in the mapping tables was not included in scope of the cMap project. But it
seems obvious that involving them in the future could improve both the mapping results and their applicability
-- when properties fully distinct between or at least narrow down the available options of One-to-Many and
Many-to-Many mapping cases. Thus it would seem negligent to not consider this in the proposed system
architecture.
scope, i.e. whether the mapping is intra or inter classification systems and whether it is within or
between different releases of a classification system
type of element, i.e. whether classes, characteristics or property values are mapped
1. eCl@ss introduced with Release 7.0 two models for products, a basic and an advanced description.
For each property in the basic model there is at least one equivalent property in the advanced
model. This is an example of intra-system mapping of properties in scope of the same release. The
mapping is complete and injective from basic to advanced and incomplete as well as in some
exceptions not unique from advanced to basic.
2. eCl@ss introduced with Release 7.0 the concept of classification update. For each change
performed at classification level a predecessor-successor relation is given and exported in machine
readable format so that an automatic update of classifications can be performed. This is an example
of intra-system, cross-release mapping of classes. The mapping is injective (from old to new
release), complete (but exported only for changed elements) and not unique (because of lossy join
and ambiguous split operations)
3. PROLIST introduced with Release NE100 3.2 the concept of transaction update. For each change
performed at level of the (complex) list of properties a predecessor-successor relation is given and
exported in machine readable format so that an automatic update of transactions based on the data
dictionary can be performed. eCl@ss adapted the concept with Release 7.0. This is an example of
intra-system, cross-release mapping of characteristics and property values. The mapping is injective
(from old to new release), complete (but exported only for those elements where needed) and
unique (except for some lossy cases of error corrections)
4. eCl@ss has absorbed in Release 7.0 the PROLIST NE100 data dictionary which contains
sophistically elaborated lists of properties of devices used in process engineering and plant
automation (mostly in the chemical industry). So far only an inter-system mapping of classes has
been released; a mapping of characteristics and property values is in work. The mapping is injective
and aimed at becoming complete and unique.
86
Scope of the current cMap project is to enable semantic interoperability (cf. CWA 16100, p. 45) by producing
an inter-release mapping at level of categorization classes of selected releases of the four classification
systems CPV, eCl@ss, GPC and UNSPSC. The scope of the cMap platform architecture is more
widespread, though, as it also has to consider not only the variables and criteria described above but also :
1. for each new release or classification system new tables are produced that contain the mapping
between the previously mapped systems and/or releases and the new one.
2. a set of intermediary elements is introduced (called further on in this text the cMap dictionary for
simplicitys sake) which help abstract the mapping and serve as a medium of exchange between the
classification systems and releases. For each new classification system or release then only the
mappings to and from the cMap dictionary are created, mappings to the other system get generated
automatically.
3.
Both approaches have pros and cons as follows):
pro
approach
1
con
approach
2
Nota bene: Assuming that every release of every classification standard gets mapped with all other releases
very quickly leads to a kind of combinatoric explosion; the initial cMap mapping has to consider 12 mapping
tables for only four classification systems. Adding only more systems one by one would increase this number
each time by twice the number of systems minus 2 to 20, 30, 42, for 5, 6, 7, classification systems.
87
Approach 1
Possible goals:
88
Figure 66: Example for a 121 mapping of classes from all four systems from the cMap mapping
tables
89
NOTE: in an actual implementation it is likely that there will be a second level of intermediate classes to
make handling of sets and combinations of is_case_of relationships between multiple systems more
overseeable and easier to maintain.
of any class from S1, but it is case of every member of the set {C2_1..n} (but only these in regard to S2). There
is a no mapping found relation recorded between Ci and S1.
92
7.5.3.5 Representation of One to Many (12M) and Many to One (M21) mapping
A 12M mapping between C1 of S1 and a set of classes {C2_1..m} from S2 (with m2) or a M21 mapping
between a set of classes {C2_1..n} from S2 (with n2) and C1 from S1 is expressed as follows: there is exactly
one intermediate class Ci which is case of only C1 (in regard to S1) and every member of the set {C2_1..n} (but
only these in regard to S2).
93
94
7.6
Use cases
There are several possible use cases for a change in the cMap platform. The ePPS documentation already
describes the use cases of a classification process and it is the base for this section that only focuses on this
CWAs objective: the description of the mapping.
First, mapping status has to be defined, it will apply to each class in each classification system and in relation
to any target classification system, as shown in the following figure:
95
Somebody requests a change and another one has to evaluate this change. This is how a mapping is
corrected and completed.
The workflow is based on the ePDC Maintenance Procedure as described in CWA 15295:2005, p. 35:
97
Figure 76: UC02.04: Process mapping change request (based on ePDC maintenance)
98
99
100
101
102
Figure 81: UC03.04 Example: Apply release update information of mapped CPV classes to GPC
Figure 82: UC03.04 Example: Apply release update information of mapped GPC classes CPV
103
As the table clearly shows, EDIT and MOVE are changes that keep the MAPPED status. A JOIN will keep
the MAPPED status, but might enhance the number of target classes. A SPLIT function requires a check as
a correct successor-relation has to be identified. A NEW class requires an initial mapping whereas a
DELETED class results in a TO BE CHECKED status as the target class was deleted and a new mapping
target has to be checked.
7.7
Requirement analysis
104
For each collection of entities kept or maintained on the platform, requirements regarding the content
language have to be expressable separately.
See also: Section 4 of CWA 16100 (ePPS).
Following
Release n+1
Following
Release n+2
Following
Release n+3
Following
Release n+4
CPV
GPC
Initial
Mapping on
Version (n)
2008
2009-08-31
2010-06-01
2010-12-01
2011-06-01
2011-12-01
(planned)
UNSPSC
eCl@ss
v11
6.0.1
v12
6.1
v13
6.2
v14
7.0
7.1
Following
Release n+5
8.0
(planned)
This means that since the initial mapping 11 new releases of the classification systems have been published.
How to handle this was described in use case 3 (section 7.6.3).
The initial mapping will simply be an upload of the information recorded in the EXCEL spread sheets that
were produced in the framework of this project. No process has to be defined for this initial upload.
Further, the mapping architecture has to be designed in a way that enables the addition of classification
systems apart from the four chosen ones. The fact that a 4-level class hierarchy cannot be taken as granted
has to be taken into account for the class mapping process.
105
106
7.8
Data Quality
See chapter 7 of CWA 16100 (ePPS) for an introduction to ISO 8000, a standard about data quality that is
currently under development. In this short chapter some fundamental requirements are repeated, that
underlies a meaningful maintenance of classification systems, ontologies or master data in general as well
as mappings in between.
107
unique references to the authorities providing the identification and defining the entity
a unique reference to the entity that is assigned according to the facets supplying the entity with
identity
a possibility to express predecessor - successor relationships as well as the degrees of compatibility
between the evolution states of an entity
As already mentioned in CWA15295:2005 by the ePDC project the mapping of product [sic!] to classes
(classification) should use the class identifier only (not the classification code!), in order to prevent reclassification when the hierarchy changes (p.24). This would imply that either a unique identifier is used by
any of the authorities or the cMap platform has to assign a cMap identifier itself according to international
standards (e.g. ISO29002, already in use by systems like eCl@ss, DIN, PROLIST and companies like e.g.
SIEMENS). The recommendations would be to register for an IRDI code and assign distinct cMap identifiers
to at least use in the background as a basis for the cMap database.
themselves, i.e. the IPR of their content and their release update information. Also, the origin of the inputs
has to be mentioned to the user when reused (content coming from CPV, eCl@ss, GPC, UNSPSC).
The CEN will not charge for the published mapping tables of the cMap project, but refer to the necessity to
respect the conditions of use of the classification authorities38. If implementing a cMap platform, further
discussions have to be made depending on the underlying business model.
38 The mapping results of the CEN WS eCAT/cMap that refer to the eCl@ss classification system can only be used by registered
eCl@ss users that have accepted the eCl@ss conditions of use. A registration is provided by a registered download via the eCl@ss
website, a cooperation agreement or the membership in the eCl@ss association. Further information is available at www.eclass.eu. The
eCl@ss conditions of use will be added to the annex of the final CWA.
109
8.
8.1
Introduction
Based on the results of part 7 the identified actors and their business use cases, the technical level
including roles, business objects and use cases of the platform the strategic level can now be addressed and
an applicable synchronization process be proposed. The current CWA intends to develop a process that is
manageable and practicable, in other words: realistically suitable for all involved parties and not just a highly
elaborated (academic) process that will never be used due to its complexity or the lack of existing actors.
Therefore, the proposed process is designed with the possible business models (see 8.6) in the background
to design the process in the most practicable, simplest way.
The objective is to develop a long-term strategy that embraces the need for interoperability: one that is
sustainable, clearly communicated and supported with education. Associated technical, business and
marketing plans will need to be developed in detail when beginning to implement the processes that can only
be hinted at on a high level here.
8.2
The basis for a sustainable process: Gen-ePDC and its adaption for cMap
GEN-ePDC is already described in detail in CWA 15295 :2005 and does not have to be described again.
The proposed ePDC maintenance procedure is focused on interoperability and is very similar to ISO
directive 1 (CWA 15295:2005, p.35).
It deals with rules for the operation of a joint committee as well and refers to factors like members, voting
process, quality insurance, referencing mechanism, copyrights and translations.
All processes described in this CWA are based on the GEN-ePDC maintenance procedure. The different
processes are defined in the relevant parts of chapter 7 where different use cases were described. The basic
requirements and recommendations of GEN-ePDC are taken into account as they are still state of the art.
8.3
cMap processes
The following chapter will wrap up the processes of the cMap platform that were already partly mentioned in
the use cases of the platform described in chapter 7.2. The three basic processes identified are:
Process 1: Query for Mapping Result
Process 2: Apply Release Update Information
Process 3: Manage Mappings
110
The end-user has classified his/her product data, visits the cMap platform and sends a query on a
specific classification code of one classification system to receive the result in another specific
classification system
The cMap platform checks the mapping status and sends back the result (see 7.6)
NO MAP: source class code could not be mapped in target classification system. The process ends.
BLANK: source class code has not been attempted to map in target classification system. The
process ends here, but the user could be automatically given the possibility to request a new
mapping for the relevant class, i.e. he/she could be directly forwarded as role mapping proposer to
Process 3: Manage Mapping
MAPPED: source class code has been mapped in target classification system. The user can decide
whether he/she accepts the mapping result. If yes, the query result is accepted and the process
ends. If not, the process ends here, but the user could be automatically given the possibility to
request a new mapping for the relevant class, i.e. he/she could be directly forwarded as role
mapping proposer to Process 3: Manage Mapping
TO BE CHECKED: source class code has been mapped in target classification system before an
update and something has changed in last update, i.e. no guarantee is given for the correctness, but
the mapping has not been updated yet. The user can decide whether he/she accepts the mapping
result nonetheless. If yes, the query result is accepted although it has to be checked and the process
ends. If not, the process ends here, but the user could be automatically given the possibility to
request a new mapping for the relevant class, i.e. he/she could be directly forwarded as role
mapping proposer to Process 3: Manage Mapping
111
A classification authority delivers a new release version including their release update information
The cMap Platform Provider uploads both the new release and the release update information to
automatically update the mapping to other classification systems
The cMap platform checks whether the current mapping status is = MAPPED
If mapping status <> MAPPED: no update can be done, the mapping status remains unchanged
(BLANK;NO MAP;TO BE CHECKED) and E-Mail notifications are sent to the platform provider, the
responsible quality manager and interested mapping proposers. The process ends here. The
notification of all relevant bodies could be directly linked to Process 3 (Manage Mapping) so that they
will be in the role of the mapping proposer and can directly react to the status changes.
If mapping status = MAPPED: source class code has been mapped in target classification system.
The system checks whether the update of the mapping can be done automatically (see 7.6.2.5)
If the update can be done automatically: set mapping status = MAPPED and send E-Mail notification
to the platform provider, the responsible quality manager and interested mapping proposers.
If update cannot be done automatically: set mapping status = TO BE CHECKED and send E-Mail
112
notification to the platform provider, the responsible quality manager and interested mapping
proposers. The mapping proposer can then start Process 3: Manage Mapping
mapping relation will be published in the cMap platform and can be queried.
REWORK: The mapping request has to be edited, the mapping request status is set to REWORK
and the mapping proposer is informed about the status change automatically. The Mapping
Proposer can edit the mapping request and start again at process step 2.
REJECT: The mapping request is considered to be incorrect and is rejected, the mapping request
status is set to REJECT and the mapping proposer is informed about the status change
automatically.
8.4
Maintenance strategy
114
As mentioned above, all classification authorities will independently maintain their classification systems
according to the requirements of their specific users groups, their established processes, release roadmaps
and based on their specific business models.
The mapping will be maintained in the separate database of the cMap platform. By establishing import
interfaces for the deliverables of the classification authorities the mapping between CPV and the three
commercial systems can be maintained. Chapter 7 delivered a detailed description of the cMap architecture,
user roles, their use cases and the processes.
The benefits of this synchronization process are:
it acknowledges different business needs being met by different classification systems
it avoids governance and change management conflicts between the classification
authorities
it enables translations between classifications to meet the business need
it provides cost effective access to the codes of other classification systems
it provides interoperability, integration and migration advantages
it satisfies user communities
it accelerates the return of investment
it improves scheme quality, ease of search, code assignment and coverage
The following risks are identified:
the setting up and ongoing maintenance of the cMap platform needs funding
as it is another distinct platform, users of classification systems do not have a single source
of information and yet another process is created for users to implement
the mapping maintenance is complex, difficult, cannot guarantee high correctness, but only
a high workload
there will always be a time lag between new releases of the classifications and the
maintained mapping to the new release
When implementing the cMap platform the following issues need to be defined in detail:
Business and governance rules and principles for the mapping and the overall process.
o The owner of these rules should be the administration authority
o The maintenance of the rules does not necessarily have to be in the hands of the
administration authority, but could as well be delegated to either the platform provider, the
release manager or the quality manager
Conflict resolution processes: to answer questions like e.g.
o If more than one mapping is requested for an item what will happen?
o Does the mapping proposer have the right and possibility to appeal?
o Does anyone else have the right to appeal?
o Who takes the final decision after an appeal or is the owner of the mapping?
o In which timeline are decisions to be taken?
o How is quality defined?
Change Management Processes: to answer questions like e.g.
o Who will manage the change request, i.e. govern and control that a certain change request
is being assessed?
o How will change requests be assessed?
o What criteria will be used to assess change requests?
o Who will be responsible for change request communication, internally and externally?
In order to fulfil the synchronization process, there several possible different governance models and they
are described in the following section.
8.5
Governance models
Taking into account the different roles that were identified (see section 7.4) and the resources that will be
needed to maintain the mapping three different manageable governance models are described hereafter.
115
Different actors will be in focus, different business models would have to be developed. These three
governance models are distinguished by the respective driving actor:
Governance model 1: Community-driven
Governance model 2: Classification authority-driven
Governance model 3: Administration authority-driven
These governance models will be described in detail in the following section. For each model, the CWA will
answer the following questions: Who does? Who manages? Who pays?
117
This way, the contribution of users can be enhanced at a very early stage of their own maintenance
processes.
119
The work of the administration authority or some authorized 3 party (e.g. a contractor) doing the
maintenance work is still a financial question.
The CWA cannot finally solve this crucial question, but only contributes some ideas what a business model
could be built on. Both the resources for the mapping maintenance and the platform administration are
completely to be financed by the administration authority. Business model ideas are listed in section 8.6,
financial advantages or disadvantages will be summarized at the end of this chapter.
con
Governance model
2: Classification
authority-driven
Governance model
3: Administration
authority-driven
8.6
The CWA only contributes with some ideas on what kind of business models the operation of the cMap
platform and the maintenance work of the mapping could be built on.
There is only one way to find out, if and how the cMap platform could be financed: analyse the market
requirement. First of all, a survey should be financed to ask users of classification systems and especially
involved service providers whether or not and if yes, in what extent they need the mapping and what they
would be willing to invest to contribute and/or have access to the results. This will lead to an understanding,
how the cMap platform could be financed, who should participate and what the exact requirements are. Only
the benefits of the mapping for the users justify the involved costs.
In order to draw more users attention to cMap an understanding of the role, contribution and value of the
mapping tables has to be raised first. This could be done by:
showing the big picture and how classification mapping is used across different functions. This could
be done in conjunction with a training programme that can be financially driven
finding and promoting practical case studies that demonstrate the value of using standards
Money to finance the platform could be raised in different ways. Some ideas will be described here after.
121
8.6.7 Comparison
The business models identified above are compared in the following figure :
122
Business
Model
Explanation
Pro
Con
Proposal
Fee
+ Well-proven strategy
Mapping
Query Fee
Membership
Connect access to
membership fee
+ Continuous financing
secured
Classification
Authority
Financed
+ classification authorities
represent the end-users
3rd Party
Financed
Public funding
cMap as a
Service
123
9.
This part comprises the conclusion and recommendations provided in each section. It will be consolidated at
the time of publication.
124
Annex A
(informative)
The SKOS platform
SKOS - Simple Knowledge Organization System - provides a model for expressing the basic structure and
content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies,
folksonomies, and other similar types of controlled vocabulary. As an application of the Resource Description
Framework (RDF), SKOS allows concepts to be composed and published on the World Wide Web, linked
with data on the Web and integrated into other concept schemes.
In basic SKOS, conceptual resources (concepts) are identified with URIs, labelled with strings in one or more
natural languages, documented with various types of note, semantically related to each other in informal
hierarchies and association networks and aggregated into concept schemes.
In advanced SKOS, conceptual resources can be mapped across concept schemes and grouped into
labelled or ordered collections. Relationships can be specified between concept labels. Finally, the SKOS
vocabulary itself can be extended to suit the needs of particular communities of practice or combined with
other modelling vocabularies.
125
Annex B
(informative)
The Protg Platform
Protg is a free, open-source platform that provides a growing user community with a suite of tools to
construct domain models and knowledge-based applications with ontologies. At its core, Protg implements
a rich set of knowledge-modelling structures and actions that support the creation, visualization and
manipulation of ontologies in various representation formats. Protg can be customized to provide domainfriendly support for creating knowledge models and entering data. Further, Protg can be extended by way
of a plug-in architecture and a Java-based Application Programming Interface (API) for building knowledgebased tools and applications.
An ontology describes the concepts and relationships that are important in a particular domain, providing a
vocabulary for that domain as well as a computerized specification of the meaning of terms used in the
vocabulary. Ontologies range from taxonomies and classifications, database schemas, to fully axiomatized
theories. In recent years, ontologies have been adopted in many business and scientific communities as a
way to share, reuse and process domain knowledge. Ontologies are now central to many applications such
as scientific knowledge portals, information management and integration systems, electronic commerce and
semantic web services.
The Protg platform supports two main ways of modelling ontologies:
The Protg-Frames editor enables users to build and populate ontologies that are frame-based, in
accordance with the Open Knowledge Base Connectivity protocol (OKBC). In this model, an
ontology consists of a set of classes organized in a subsumption hierarchy to represent a domain
salient concepts, a set of slots associated to classes to describe their properties and relationships
and a set of instances of those classes - individual exemplars of the concepts that hold specific
values for their properties.
The Protg-OWL editor enables users to build ontologies for the Semantic Web, in particular in the
W3C's Web Ontology Language (OWL). "An OWL ontology may include descriptions of classes,
properties and their instances. Given such an ontology, the OWL formal semantics specifies how to
derive its logical consequences, i.e. facts not literally present in the ontology, but entailed by the
semantics. These entailments may be based on a single document or multiple distributed documents
that have been combined using defined OWL mechanisms" (see the OWL Web Ontology Language
Guide).
Protg-OWL
The Protg-OWL editor is an extension of Protg that supports the Web Ontology Language (OWL). OWL
is the most recent development in standard ontology languages, endorsed by the World Wide Web
Consortium (W3C) to promote the Semantic Web vision. "An OWL ontology may include descriptions of
classes, properties and their instances. Given such an ontology, the OWL formal semantics specifies how to
derive its logical consequences, i.e. facts not literally present in the ontology, but entailed by the semantics.
These entailments may be based on a single document or multiple distributed documents that have been
combined using defined OWL mechanisms" (see the OWL Web Ontology Language Guide).
The Protg-OWL editor enables users to:
Load and save OWL and RDF ontologies.
Edit and visualize classes, properties, and SWRL rules.
Define logical class characteristics as OWL expressions.
Execute reasoners such as description logic classifiers.
Edit OWL individuals for Semantic Web markup.
126
Protg-OWL's flexible architecture makes it easy to configure and extend the tool. Protg-OWL is tightly
integrated with Jena and has an open-source Java API for the development of custom-tailored user interface
components or arbitrary Semantic Web services.
Figure 90: The OWLClasses view can be used to edit hierarchies of concepts
127
128
Annex C
(informative)
The Prompt Tool
PROMPT implements an algorithm that provides a semi-automatic approach to ontology merging and
alignment. PROMPT performs some tasks automatically and guides the user in performing other tasks for
which his/her intervention is required and also determines possible inconsistencies in the state of the
ontology, which result from the users actions, and suggests ways to remedy these inconsistencies. [2] It is
based on a general knowledge model and therefore can be applied across various platforms.
39 http://protege.stanford.edu/plugins/prompt/PromptDiff.html
40 http://protege.stanford.edu/plugins/prompt/PromptDiff.html
129
41 http://protege.stanford.edu/plugins/prompt/Suggestions.html
42 http://protege.stanford.edu/plugins/prompt/merging.html
43 http://protege.stanford.edu/plugins/prompt/merging.html
130
44 http://protege.stanford.edu/plugins/prompt/operations.html
131
Annex D
(informative)
Mapping Tables
Mapping tables can be accessed at
ftp://ftp.cen.eu/PUBLIC/CWAs/eCAT-CC3P/CWA/cmap.zip
132
Bibliography
[1] D. Marques, A survey of Recent Research in Ontology Mapping
[2] N. F. Noy and M. A. Musen. PROMPT: Algorithm and Tool for Automated Ontology Merging and
Alignment. http://www.cs.uga.edu/~kochut/Teaching/8350/Papers/Ontologies/PROMPT.pdf
[3] Noy, Natalya F. Semantic Integration: A Survey Of Ontology Based Approaches. Stanford Medical
Informatics, Stanford University. Downloaded from http://smiweb.
stanford.edu/people/noy/papers/SigmodRecordReview.pdf October 14, 2005.
[4] Ehrig, Marc and Staab, Steffen. QOM Quick Ontology Mapping. in S.A. McIlraith et al. (Eds.): ISWC
2004, LNCS 3298, pp. 683697, 2004.
[5] Shvaiko, Pavel and Euzenat, Jerome. A Survey of Schemabased Matching Approaches. Technical
Report DIT-04-087, Informatica e Telecomunicazioni, University of Trento, 2004.
[6] J. Madhavan, P. A. Bernstein, P. Domingos, and A. Halevy. Representing and reasoning about mappings
between domain models. In Eighteenth National Conference on Artificial Intelligence (AAAI 2002),
Edmonton, Canada., 2002.
[7] J. Euzenat and P. Shvaiko, Ontology matching. Springer, 2007
[8] P. Shvaiko, J. Euzenat. Ontology matching: state of the art and future challenges. IEEE Transactions on
knowledge and data engineering, 2012
[9] M.K Bergman. "Sources and Classification of Semantic Heterogeneity," from AI3:::Adaptive Information
blog, June 6, 2006.
[10] Alon Halevy, Why Your Data Wont Mix, ACM Queue vol. 3, no. 8, October 2005.
[11] Charnyote Pluempitiwiriyawej and Joachim Hammer, A Classification Scheme for Semantic and
Schematic Heterogeneities in XML Data Sources, Technical Report TR00-004, University of Florida,
Gainesville, FL, 36 pp., September 2000
133