Você está na página 1de 116

SLOVAK UNIVERSITY OF TECHNOLOGY

Faculty of Material Science and Technology in Trnava

DATA VISUALIZATION
Course Textbook

Jozef Vaský

TRNAVA 2007
Data Visualization

Table of Contents

Table of Contents ............................................................................................................. 2


Table of Figures................................................................................................................ 4
1 Introduction .............................................................................................................. 6
2 Historical Milestones.............................................................................................. 10
3 Data to Visualization .............................................................................................. 13
3.1 Classification of data ...................................................................................... 14
3.2 Data grids........................................................................................................ 21
3.3 Data file formats ............................................................................................. 24
4 Visualization Process.............................................................................................. 25
5 Scientific Visualization........................................................................................... 28
6 Information Visualization....................................................................................... 30
7 Process Visualization.............................................................................................. 33
7.1 SCADA systems ............................................................................................. 34
7.1.1 Hardware architecture............................................................................. 36
7.1.2 Software architecture.............................................................................. 41
7.1.3 SCADA and Internet .............................................................................. 43
8 Visualization Techniques and Algorithms ............................................................. 45
8.1 Visualization features ..................................................................................... 47
8.2 Classification of visualization techniques ...................................................... 50
8.3 Volume Visualization..................................................................................... 56
8.3.1 Surface based techniques........................................................................ 56
8.3.2 Volume based techniques ....................................................................... 63
8.4 Multidimensional visualization ...................................................................... 68
9 Visualization Systems and Tools............................................................................ 76
9.1 Key requirements............................................................................................ 77
9.2 The iconographic technique ........................................................................... 79
9.3 Using glyphs for visualization........................................................................ 81
9.4 Interaction techniques..................................................................................... 83
9.5 Contemporary Visualization Systems ............................................................ 83
9.5.1 Application Visualization System (AVS)............................................... 84
9.5.2 General Visualization System (GVS)..................................................... 86
9.5.3 COVISE.................................................................................................. 87
9.5.4 OpenDX.................................................................................................. 87
9.5.5 ParaView (Parallel Visualzation Application) ....................................... 89
9.5.6 SCIRun ................................................................................................... 89
9.5.7 WebWinds .............................................................................................. 90
9.6 OpenGL .......................................................................................................... 91
9.6.1 OpenGL as a state machine .................................................................... 93
9.6.2 Execution model and processing pipeline .............................................. 93
9.6.3 Related libraries and utilities .................................................................. 96
9.6.4 OpenGL for Windows ............................................................................ 97
9.6.5 Describing points, lines, polygons and other geometric objects ............ 99
9.6.6 Geometric drawing primitives .............................................................. 101
9.6.7 Displaying points, lines, and polygons................................................. 102
9.7 The Visualization Toolkit (VTK) ................................................................. 103

2
Data Visualization

9.7.1 VTK vs. OpenGL ................................................................................. 110


9.8 The Prefuse visualization toolkit .................................................................. 110
9.9 Ferret............................................................................................................. 112
9.10 The Persistence of Vision Ray-tracer (POV-Ray)........................................ 113
10 Bibliography ..................................................................................................... 115

3
Data Visualization

List of Figures

Figure 1 From data to visualization sequence ................................................................ 14


Figure 2 Interactive computational steering ................................................................... 20
Figure 3 A taxonomy of grids (Caumon G. et al., 2005)................................................ 21
Figure 4 Taxonomy of grids by Speray and Kennon ..................................................... 23
Figure 5 Some data grids ................................................................................................ 24
Figure 6 The data visualization triangle ......................................................................... 26
Figure 7 Simplified visualization pipeline ..................................................................... 26
Figure 8 Visualization as a mapping process (Owen G. S., 1999) ................................. 27
Figure 9 Information Visualization Data State Reference Model (Chi Ed H.,1999) ..... 27
Figure 10 Typical SCADA HW architecture ................................................................. 37
Figure 11 Typical control system design and three-levels of SCADA .......................... 37
Figure 12 Typical RTU .................................................................................................. 38
Figure 13 The three-layer model with the addition of business systems and process
regulation........................................................................................................................ 40
Figure 14 Business system defines the overal policy of SCADA .................................. 40
Figure 15 Generic software architecture ........................................................................ 42
Figure 16 Internet SCADA architecture......................................................................... 44
Figure 17 Example of glyphs for scalar (L) and vector (R) data.................................... 48
Figure 18 Contours and isosurface ................................................................................. 49
Figure 19 Rubbersheet with / without animation ........................................................... 49
Figure 20 An example of a glyph that can display 12 D data ........................................ 49
Figure 21 Visualization using hedgehogs and oriented gyphs ....................................... 54
Figure 22 Visualization using warping........................................................................... 55
Figure 23 Visualization using streamlines...................................................................... 55
Figure 24 Tensor visualizations with ellipsoids (L) and hyperstreamlines (R).............. 55
Figure 25 The principle of countouring (object order)................................................... 57
Figure 26 Connecting slices by contouring .................................................................... 58
Figure 27 2D grid ........................................................................................................... 59
Figure 28 All cases for marching square algorithm ....................................................... 59
Figure 29 Ambiguous cases............................................................................................ 60
Figure 30 The different configurations for marching cubes algorithm .......................... 62
Figure 31 Notation for cube index.................................................................................. 62
Figure 32 Comparison of surface (L) and direct volume rendering (R)......................... 63
Figure 33 Illustration of ray casting ............................................................................... 65
Figure 34 Illustrating of splatting ................................................................................... 66
Figure 35 Illustrating of schear-warp ............................................................................. 67
Figure 36 Illustrating DVR using 2D textures ............................................................... 68
Figure 37 Scatterplot matrix ........................................................................................... 70
Figure 38 Parallel coordinates ........................................................................................ 71
Figure 39 Example of glyphs ......................................................................................... 72
Figure 40 Direct visualization of a vector field.............................................................. 74
Figure 41 Integral objects as a basis for visualization.................................................... 75
Figure 42 Basic data flow within a visualization system ............................................... 76
Figure 43 An integrated visualization model (Owen G. S., 1999) ................................. 79
Figure 44 One member of the stick-figure icon family .................................................. 80

4
Data Visualization

Figure 45 An overview of the L-systems-based iconographic visualization system ..... 81


Figure 46 A general data visualization environment...................................................... 82
Figure 47 Visual interface of AVS/Express ................................................................... 85
Figure 48 AVS/Express application ............................................................................... 86
Figure 49 Visual interface of Open/DX ......................................................................... 88
Figure 50 View from Open/DX...................................................................................... 88
Figure 51 OpenGL block diagram.................................................................................. 94
Figure 52 OpenGL processing pipeline.......................................................................... 95
Figure 53 Approximating Curves................................................................................. 101
Figure 54 Drawing a polygon or a set of points ........................................................... 102
Figure 55 Geometric primitive types............................................................................ 102
Figure 56 VTK system architecture ............................................................................. 104
Figure 57 The graphics model in VTK......................................................................... 105
Figure 58 The visualization model in VTK.................................................................. 106
Figure 59 VTK visualization pipeline .......................................................................... 108
Figure 60 The visualization pipeline of prefuse ........................................................... 111

List of Tables

Table 1 Specific techniques for InfoVis ......................................................................... 32


Table 2 Taxonomy of SciVis for scalar entities according to Ken Brodlie.................... 51
Table 3 Taxonomy of SciVis for vector entities according to Ken Brodlie ................... 52
Table 4 Variations of glyphs .......................................................................................... 73
Table 5 Geometric primitive names and meanings ...................................................... 102

Note:
The screens dumps of pictures (screenshots) are copyright of the respective
authors and are taken from the cited publications. These textbook was generated as
course documentation only. Therefore, the textbook is provided exclusively for students
attending the course to study the course material.

5
Data Visualization

1 Introduction

Data visualization is currently a very active area of research and teaching. The
term unites primarily the established field of scientific visualization and information
visualization. There are recently emerging many fields of visualization as: process
visualization, product visualization, software visualization, illustrative visualization,
uncertainly visualization, visual analytics etc.
The success of data visualization is due to the basic idea behind it: the use of
computer generated images to gain inside from data and its relationships. A second
premise is the utilization of the broad bandwidth of the human sensory system in
interpreting complex processes, and simulations involving data sets from diverse
disciplines and large collections of abstract data from many sources.
There are several situations in the real world where we try to understand some
phenomena, data, and events using graphics. Some aspects, such as when people need to
find a route in a city, the stock market trends during a certain period, the weather
forecast, may be understood better using graphics rather than text. Graphical
representation of data, compared to the textual or tabular (in case of numbers) one, takes
advantage of the human visual perception which is very powerful as they instantly
convey large amounts of information to our mind, and allow us to recognize essential
features and to make important inferential processes. This is possible thanks to the fact
that there's a series of identification and recognition operations that our brain performs
in an "automatic" way without the need to focus our attention or even be conscious of
them. Perceptual tasks that can be performed in a very short time lapse (typically
between 200 and 250 milliseconds or less) are called pre-attentive, since they occur
without the intervention of consciousness (Ware C., 2004).
Graphics use the visual representations that help to amplify cognition. They
convey information to our minds that allows us to search for patterns, recognize
relationship between data and perform some inferences more easily.
Visualization is a link between both data and information, and the most powerful
information processing system the human mind. It is a process of transforming data,
information, and knowledge into a visual form exploiting people’s natural strengths in

6
Data Visualization

rapid visual pattern recognition and understanding relationships. In our increasingly


information-based society, visualization research and development has fundamentally
changed the way we present and understand large complex data sets. The impact of
visualization has been widespread and fundamental, leading to new insights and more
efficient decision making.
Visualization is more than a method of computing. It is a process of transforming
information into a visual form enabling the user to observe the information. Visual
representation of information requires merging of data visualization methods, computer
graphics, design, and imagination.
Visualization is any technique for creating images, diagrams, or animations to
communicate a message. Visualization through visual imagery has been an effective
way to communicate both abstract and concrete ideas since the dawn of man. Examples
from history include cave paintings, Egyptian hieroglyphs, Greek geometry, and
Leonardo da Vinci's revolutionary methods of technical drawing for engineering and
scientific purposes.
The recent emphasis on visualization started in 1987 with the special issue of
Computer Graphics on Visualization in Scientific Computing. Since then there have
been several conferences and workshops, co-sponsored by the IEEE Computer Society
and ACM SIGGRAPH, devoted to the general topic, and special areas in the field, for
example volume visualization.
Visualization is not only a set of techniques to communicate the known as is
done with traditional graphics and maps, but a means, a mechanism, for exploring data
sets and structures, to generate ideas and explore alternatives. This has been defined as
'exploring data and information graphically, as a means of gaining understanding and
insight into the data' (Earnshaw and Wiseman, 1992: Brodlie et al., 1993), and is often
referred to as visual data analysis.
On the computer science side, it uses techniques of computer graphics and
imaging. Besides relying on visual computing and display it involves human beings.
Thus, we need to take into account human perceptual and cognitive capabilities, human
variations, and task characteristics.
It is important to differentiate between visualization and presentation graphics.
Presentation graphics is primarily concerned with the communication of information

7
Data Visualization

and results in ways that are easily understood. In visualization, we seek to understand
the data. However, often the two terms are intertwined. Often one would like the ability
to do real-time visualization of data from any source. Thus our purview is information,
scientific, or engineering visualization and closely related problems such as
computational steering or multivariate analysis.
Visualization involves research in computer graphics, image processing, high
performance computing, and other areas. The same tools that are used for visualization
may be applied to animation, or multimedia presentation, for example.
The main reasons for visualization are the following ones: it will compress a lot
of data into one picture (data browsing), it can reveal correlations between different
quantities both in space and time, it can furnish new space-like structures beside the
ones which are already known from previous calculations, and it opens up the
possibility to view the data selectively and interactively in real time.
The application fields for visualization are in: Engineering, Computational Fluid
Dynamics, Finite Element Analysis, Electronic Design Automation, Simulation,
Medical Imaging, Geospatial RF Propagation, Meteorology, Hydrology, Data Fusion,
Ground Water Modeling, Oil and Gas Exploration and Production, Finance, Data
Mining/OLAP and so on.
One of the hottest topics in the field of visualization is Illustrative Visualization
and Uncertainty Visualization.
Illustrative visualization concerns with computer supported interactive and
expressive visualization through abstractions as in traditional illustrations. Illustrative
visualization uses several non-photorealistic rendering techniques: smart visibility,
silhouettes, hatching, tone shading, focus and context techniques (context-preserving
volume rendering).
Visualized data often have dubious origins and quality. Different forms of
uncertainty and errors are also introduced as the data are derived, transformed,
interpolated, and finally rendered. In the absence of integrated presentation of data and
uncertainty, the analysis of the visualization is incomplete at best and often leads to
inaccurate or incorrect conclusions. For example environmental data have inherent
uncertainty which is often ignored in visualization, meteorological stations measure
wind with good accuracy, but winds are often averaged over minutes or hours.

8
Data Visualization

Uncertainty visualization strives to present data together with auxiliary uncertainty


information. These visualizations present a more complete and accurate rendition of
data to analyze.
As a subject in computer science, data visualization is the use of interactive,
sensory representations, typically visual, of abstract data to reinforce cognition,
hypothesis building and reasoning.
There are many fields of visualization e.g. information visualization, scientific
visualization, process visualization, product visualization, software visualization, visual
analytics etc.
Product Visualization involves visualization software technology for the viewing
and manipulation of 3D models, technical drawing and other related documentation of
manufactured components and large assemblies of products. It is a key part of Product
Lifecycle Management. Product visualization software typically provides high levels of
photorealism so that a product can be viewed before it is actually manufactured. This
supports functions ranging from design and styling to sales and marketing. Technical
visualization is an important aspect of product development. Originally technical
drawings were made by hand, but with the rise of advanced computer graphics the
drawing board has been replaced by computer-aided design (CAD). CAD-drawings and
models have several advantages over hand-made drawings such as the possibility of 3-D
modeling, rapid prototyping and simulation.
Software Visualization is a branch of data visualization dealing with the
visualization of software objects (algorithms, programs, parallel processes etc.).
Software visualization is the use of computer graphics and animation to help illustrate
and present computer programs, processes, and algorithms. Software visualization
systems can be used in teaching to help students understand how algorithms work, and
they can be used in program development as a way to help programmers understand
their code better.
Visual analytics focuses on human interaction with visualization systems as part
of a larger process of data analysis. Visual analytics has been defined as "the science of
analytical reasoning supported by the interactive visual interface". Its focus is on human
information discourse (interaction) within massive, dynamically changing information
spaces. Visual analytics research concentrates on support for perceptual and cognitive

9
Data Visualization

operations that enable users to detect the expected and discover the unexpected in
complex information space. Technologies resulting from visual analytics find their
application in almost all fields.
Knowledge visualization is the use of visual representations to transfer
knowledge between at least two persons (Burkhard and Meier, 2004). Aims to improve
the transfer of knowledge by using computer and non-computer based visualization
methods complementarily. Examples of such visual formats are sketches, diagrams,
images, objects, interactive visualizations, information visualization applications and
imaginary visualizations as in stories. While information visualization concentrates on
the use of computer-supported tools to derive new insights, knowledge visualization
focuses on transferring insights and creating new knowledge in groups. Beyond the
mere transfer of facts, knowledge visualization aims to further transfer insights,
experiences, attitudes, values, expectations, perspectives, opinions, and predictions by
using various complementary visualizations.

2 Historical Milestones

As a discipline, visualization is still emerging, tracking the revolution in


networking and computing. It is being practiced by skilled practitioners who are hand-
crafting the current systems. An emerging discipline progresses through four stages. It
starts as a crafts, practiced by artisans using heuristics. Later, researchers formulate
scientific principles and theories to gain insights about the processes. Eventually
engineers refine these principles and insights into production rules. Finally, the
technology becomes widely available. For information visualization, many of these
stages are happening in parallel.
Computer Graphics has been used from the beginning of the discipline to study
scientific problems. However, in its early days the lack of graphics power often limited
its usefulness. The recent emphasis on visualization started in 1987 with the special
issue of Computer Graphics on Visualization in Scientific Computing. Since there have
been several conferences and workshops, co-sponsored by the IEEE and ACM
SIGGRAPH.
Visualization, in the presentation sense, is not a new phenomenon. It has been
used in maps, scientific drawings, and data plots for over a thousand years. Examples of

10
Data Visualization

this are the map of China 1137 a.d. and the famous map of Napoleon's invasion of
Russia in 1812, by Jacque Minard. Most of the concepts learned in devising these
images carry over in a straight forward manner to computer visualization.
Visualization has previously been defined as the "formation of visual images; the
act or process of interpreting in visual terms or of putting into visual form" (MacCormic
B, H., 1987).
Here are some milestones of development the discipline visualization.

Mathematical graphic
The French engineer, Charles Minard (1781-1870), illustrated the disastrous
result of Napoleon's failed Russian campaign of 1812. The graph shows the size of the
army by the width of the band across the map of the campaign on its outward and return
legs, with temperature on the retreat shown on the line graph at the bottom. The images
were drawn by a Mathematical function.

Playfair's charts
William Playfair (1759-1823) is generally viewed as the inventor of most of the
common graphical forms used to display data: line plots, bar chart and pie chart. His
The Commercial and Political Atlas, published in 1786, contained a number of
interesting time-series charts.

Florence Nightingale's Coxcomb diagrams


Florence Nightingale is remembered as the mother of modern nursing. But few
realize that her place in history is at least partly linked to her use, following William
Farr, Playfair and others, of graphical methods to convey complex statistical
information dramatically to a broad audience. After witnessing deplorable sanitary
conditions in the Crimea, she wrote Notes on Matters Affecting the Health, Efficiency
and Hospital Administration of the British Army (1858). The text is completed with
several graphs which she called "Coxcombs".

Dr. John Snow's map


The most famous, early example mapping epidemiological data was Dr. John
Snow's map of deaths from a cholera outbreak in London, 1854, in relation to the
locations of public water pumps. Snow (1813 - 1858) observed that cholera occurred
almost entirely among those who lived near (and drank from) the Broad Street water

11
Data Visualization

pump. He had the handle of the contaminated pump removed, ending the neighborhood
epidemic which had taken more than 500 lives.

Album de statistique graphique


By the mid 1800s, many new forms of statistical graphics were being used to
display data of economic and national interest in England, France, and elsewhere. Many
of graphs were designed based on graphical innovations by Charles Minard.

Moseley's X-rays and the concept of atomic number (The Monogram)


Henry Moseley investigated in 1913-14 the characteristic frequencies of X-rays
produced by bombarding each of the elements in turn by high energy electrons and
made graphs. He discovered that if the serial numbers of the elements were plotted
against the square root of frequencies in the X-ray spectra emitted by these elements, all
the points fell on a series of straight. Moseley's graph represents an outstanding piece of
numerical and graphical work. He noted that there were slight departures from linearity
which he could not explain. He couldn’t explain the multiple lines at the top and bottom
of the figure too. The explanation came later with the discovery of the spin of the
electron.

From 2D to 3D Representation: Stereogram and Contour plot


By the end of the 19th century, as more statistical data became available, the
limitations of 2D of the plane for the representation of data were becoming more
apparent. Several systems for representing 3D data were developed between 1869 and
1880. The figure (showing the population of Sweden from 1750-1875 by age groups) by
Luigi Perozzo, from the Annali di Statistica, 1880, is probably the first example of a
stereogram. Perozzo credits Gustav Zeuner (1869) and Lewin with the invention of the
(anonometric projection) representation in three dimensions, and also describes other
related forms of perspective drawings by Becker and by Lexis, shown in his
construction illustration. Perozzo's figure is also notable for being printed in color in a
statistics journal, and in a way which enhances the perception of depth.
A second approach to representing multivariate data arose from the use of
contour maps in physical geography showing surface elevation (first published in 1752
by Buache), which became common in the early 19th century. It was not until 1843,
however, that this idea was applied to data visualization, when Léon Lalanne

12
Data Visualization

constructed the first contour plot, showing the mean temperature, by hour of the day and
by month at Halle. Lalanne's data formed a regularly-spaced grid, and it was fairly easy
to determine the isolines of constant temperature. Vauthier generalized the idea to three-
way data with arbitrary (x,y) values in his map of the population density of Paris.
Galton later cited this as one of the inspirations for his normal correlation surface.

Dynamic graphics
Among the pioneers of dynamic graphics and the graphical representation of
movement and dynamic phenomena was Etienne-Jules Marey (1830-1906). Marey used
and developed many devices to record and visualize motion and dynamic phenomena:
walking, running, jumping, falling of humans, horses, cats...; heart rate, pulse rate,
breathing, etc.

3 Data to Visualization

One of the major problems in science and engineering is the massive amount of
data that is collected or generated. Visualization systems must be able to enter, store,
and retrieve this data. The data also must be converted into one or more internal formats
for the visualization system. Visualization on the other hand, uses the human perceptual
system to extract meaning from the data, focus attention, and reveal structure and
patterns.
The term data itself can have a wide variety of forms. We can distinguish
between data that have a physical correspondence and is closely related to mathematical
structures and models (e.g. the airflow around the wing of an airplane), and data that is
more abstract in nature (e.g. the stock market fluctuations).
In generally the term data is interpreted as:
1. Factual information, especially information organized for analysis or used to
reason or make decisions.
2. Computer science numerical or other information represented in a form suitable
for processing by computer.
3. Values derived from scientific experiments.
4. Plural of datum.
Data are stored in computer memory in some form of an internal representation.
The criteria for an internal representation are:

13
Data Visualization

• Compact - must be efficient memory use.


For example: unstructured schemes, sparse matrices, shared vertices (nodes).

• Efficient - a method of storage which is computationally accessible, easy to


retrieve and stores in constant time.

• Mappable: - a native representation, straight-forward conversions of the data,


no lost info, a representation using graphics primitives. This is especially
useful for interactive display with minimal coverage and has a manageble
number of options.

Figure 1 From data to visualization sequence

3.1 Classification of data


Data are plain facts. When data are processed, organized, structured or presented
in a given context so as to make them useful, they are called information. It is not
enough to have data (such as statistics on the economy). Data themselves are fairly
useless. But when these data are interpreted and processed to determine its true
meaning, they become useful and can be called information.
Here are some different views to the data:
1. Distinct pieces of information usually formatted in a special way. All software
is divided into two general categories: data and programs. Programs are
collections of instructions for manipulating data.

14
Data Visualization

2. Data can exist in a variety of forms - as numbers or text on pieces of paper, as


bits and bytes stored in electronic memory, or as facts stored in a person's
mind.
3. Strictly speaking, data is the plural of datum, a single piece of information. In
practice, however, people use data as both the singular and plural form of the
word.
4. The term data is often used to distinguish binary machine-readable
information from textual human-readable information. For example, some
applications make a distinction between data files (files that contain binary
data) and text files (files that contain ASCII data).
5. In database management systems, data files are the files that store the
database information, whereas other files, such as index files and data
dictionaries, store administrative information, known as metadata.
Begin with the data collection then some form of problem decomposition (cannot
use or handle all the data), typically centered around the differences in the data and the
data distribution. In many cases this is followed by some data reduction. Also in this
first pass is the issue of the turnaround time required. This will lead to either a scale
back of the information or some hierarchical arrangement of the data. Understanding the
data is a two stage process:
1. Exploratory - search for data using all tools available.
2. Confirmatory - evaluate the strength of thesis based on the data.
The basic data types may correspond to different levels of measurement or
method for acquiring data. Using this idea, we can characterize data in two ways (this
will not be the last type of classification). Data can be:
1. Categorical - data values with labels.
• Nominal
There is no sense of order, either implicit or explicit. A scheme which allows
differentiating data. Used to state "this item is different from that one". For
example: county names, land use (such as residential), ethnicity, and tissue
type.

• Ordinal
Ordered in a particular sequence. A preference or some type of ranking (such

15
Data Visualization

as height). Used to state "this item is bigger than that item" or "the difference
between these two items is the same as the difference between those other two
items". For example: intervals with a constant step size.
This can also include:

− Simple integer
Some constant step size for the values. For example: the temperature in
degrees Fahrenheit or wind speed Data representation
− Meaningful zero
Zero is often ignored or counted as missing data. For example: temperature
in degrees Kelvin or a student's income.
The numerical data, which consists just of numerical values, can be:
• Continuous - a representation with some arbitrary precision, such as real
numbers or complex numbers.

• Discrete - typically integer numbers.


Discrete measurements of some continuous phenomena provide the basis for a
model of the phenomena. The same data can be visualized either discretely
(unconnected points) or continuously (using straight line segments or constructing a
model). Issues for discrete data are:
• Interpolation

• Aggregation

• Smoothing

• Simplification
In most cases for physical systems for which there is an underlying continuous
phenomenon is important the question of data quality. This question can be expressed as
follows.
• Are the measurements or the collection of data from a simulation sufficient or
are there missing data? In general is the grid (computational or experimental)
giving rise to sampling artifacts?

• How accurate is the simulation in terms of representing the phenomena under


study or how accurate are ther measuring instruments. How certain or
uncertain are the data?

16
Data Visualization

• Does the visualization technique lead to any representation errors?


Classification of the data can help with these issues. Most importantly only some
tools work with some data classifications and some file formats. No visualization
technique works across the whole range of data types and visualization goals, and there
is growing evidence that there cannot be one. Each technique is good for some tasks on
some data types. There is a difference between multivariate (a number of independent
variables) and multidimensional. The first term refers to the data and the second refers
to the geometry. Some examples of multidimensional data are: finance, sound and
music, images, video, medical imagery, document retrieval, etc.
We could classify the data for visualization in following groups:
• External and internal data;
External data is data that is external to the visualization system while internal
data is data that is internal to the system. A visualization system must be able
to import data from many different sources and in many different formats. It
must also have the capability to export data.

• Original and derived data;


We need to differentiate between the original experimental or computed data
and any new data that is derived from the original. Especially for experimental
data, the original will likely have noise and need to be filtered. A record
should be kept of how the derived data was obtained and what operations have
been applied to it. For example, in image processing, the record of what image
operations have been applied might be kept with the image, or in an associated
file.

• Geometric data;
Geometric data is used to represent the shape of objects. This may be in the
form of polygons, surface patches, or coordinates.

• Property Data;
Property Data is non-geometric data that represents specific properties of the
objects or properties measured at certain coordinates, such as temperature,
pressure, electron density, etc.

17
Data Visualization

• Metadata;
Metadata is data about the data. It can include the following:

− the type and structure of the associated dataset


− attributes of the dataset such as units, scales, etc.
− comments and records of the operations performed on the dataset
− color tables or any other mapping attributes
• Command Data;
A system may be command line driven or, more generally, an interactive
mouse driven system. It is useful to have some method of automating certain
repetitive tasks, perhaps as a script file, or an automatic recording and
playback system.

• Control Data;
Control Data is used to specify the parameters needed for proper execution of
the modules in the system. This can be stored and read into the system each
time, as for example a color map.

• Data Relationships;
There may be certain relations in the data. This can be internal to a dataset or
between different data sets. An example of internally linked data is a
molecular structure where certain atoms are bonded together.
Volume data are 3D geometric entities that may have information inside them,
might not consist of surfaces and edges, or might be too voluminous to be represented
geometrically.
Multidimensional data (also called multivariate or n-dimensional) consists of
some number of points, each of which is defined by an n-vector of values.
Mathematicians consider dimension as the number of independent variables in an
algebraic equation. Engineers take dimension as measurements of any sort (breadth,
length, height, and thickness).
Special types of data represent fields. Data fields occur in many typical physics
or engineering problems, in these contexts they usually appear as the solution to some
partial differential equation. A typical example would be the velocity field of a liquid.
Depending on the type of object described by the field we distinguish:

18
Data Visualization

• Scalar - a field which associates single number with each point in space,
examples are: temperature, pressure in gas or liquid, concentration, wave
function etc.

• Vector - a field which associates vector with each point in space, examples
are: velocity field in gas or liquid, magnetic field, electric field etc.

• Tensor - a field which associates tensor (matrix) with each point in space, the
typical example is stress (and strain) tensor in solid.
Data sources can be external or internal to the visualization system. External
sources would be data collected from experimental measurement or generated by a
simulation external to the system. Internal sources would be data that was passed
between modules, generated by simulations internal to the system, or stored and then
retrieved from the system. Visualization systems should be able to support the real time
collection, storage, and analysis of data.
Data could be acquired by measurement of real processes or by simulation on the
model of the process. There were identified these levels of visualization according to the
possibility to control the simulation process (Nielson G.M., 1990):
• Post-processing:

− Movie mode - consists of acquiring the data, producing an animation tape of


the data and then analyzing the data.
− Interactive post-processing - introduces user interaction in that the user is
able to interactively control the visualization parameters.
• Tracking mode - consists of acquiring the data and visualizing it and observing
it directly on the computer

• Interactive steering - allows the user to interactively control both the actual
computation of the data, e.g., by changing parameters as the computation
progresses, and the visualization of the data.

19
Data Visualization

Figure 2 Interactive computational steering

Data Management is a very important issue in visualization field. As there is


collected or computed a large amount of raw data in the stage of data acquisition
process, and then generated derived data, reports, images, etc., the management of data
becomes crucial. For saving of a large amount of data are used database management
systems (DBMS). Most DBMS have been developed for commercial use and are not
very appropriate for data visualization. A solving could be in Object-Oriented DBMS.
In these systems, an object could consist of all the different types of data that are
associated with a particular experiment or set of experiments. Distributed (over a
network) OO-DBMS are also becoming available.
Before or during an analysis of raw data the user will usually want to perform
some type of transformation on the data. There are many types of data transformations,
as discussed below:
• Data normalization is used to scale data to a range of values, e.g. from 0.0 to
1.0. This may be necessary so that a graphics or mathematics package can be
used to operate on the data.

• Filtering and smoothing of data is frequently necessary since raw data that
comes from experimental measurement usually has noise. The noise may be
introduced by the measurement process or it may be integral to the data. The
field of Digital Signal Processing (DSP) is the study of these types of
techniques.

20
Data Visualization

• Grid rezoning is the mapping of data from one type of grid to another, for
example, to a rectangular grid of pixels for display.

• Coordinate transformations may be performed on the data to facilitate the data


analysis or display of results. Examples of this might be to transform to a
logarithmic scale, or from Cartesian to polar or spherical coordinates.

3.2 Data grids


Applications producing vector data usually present them as a field of vector
quantities aligned to grid vertices. There are of course many types of grids with
different properties. Grid type is an important property of each vector field data set as it
significantly influences the effectiveness of some algorithms.
Grid is generally defined by three components:
1. The geometry, which consists of a set of n vertices, each defined by a
coordinate vector (x,y,z) in 3D space.
2. The topological model, which describes how these vertices are connected to
each other, defining a partition of the space into polyhedral 3-cells (or more
simply cells); each cell is defined by its faces, each face by its edges, and each
edge by its bounding vertices. As opposed to structured grid in which
connection patterns are repeated as in crystal lattice, an unstructured grid must
have its topology explicitly defined. A zoo grid is an unstructured grid
containing only small number of cell types (Figure 3).
3. The property model, which assigns property values to the vertices, the edges,
the faces or the cells of the grid (e.g., pressure, concentration).

Figure 3 A taxonomy of grids (Caumon G. et al., 2005)

21
Data Visualization

Other taxonomy of data grids was created by Speray and Kennon. It is presented
in order of increasing generality (and complexity). With each type is the required
indexing to find a point's world coordinates. Neighboring points delineate sub volumes
known as cells or elements. They classified the next types of data grids:
Cartesian (i, j, k )
This is typically a 3D matrix with no intended world coordinates, so subscripts
map identically to space. If the cells are small and numerous (as to be almost atomic in
practice, like 2D pixels), then it is known as a voxel grid; however, the term is often
loosely applied. The geometric representation of this grid is depicted on the Figure 4a.
Regular (i * dx, j * dy, k * dz )
Cells are identical rectangular prisms (bricks) aligned with the axes. See picture
on Figure 4b.
Rectilinear (x[i], y[j], z[k] )
Distances between points along an axis are arbitrary. Cells are still rectangular
prisms and axis - aligned. See Figure 4c.
Structured (x[i, j, k] , y[i, j, k], z[i, j, k] )
This type, also known as curvilinear, allows non boxy volumes to be gridded.
Logically, it is a cartesian grid which is subjected to non-linea r transformations so as to
fill a volume or wrap around an object . Cells are hexahedra (warped bricks). These
grids are commonly used in computational fluid dynamics (CFD). See picture on the
Figure 4d.
Block structured (xb[i, j, k], yb[i, j, k], zb[i, j, k] )
Recognizing the convenience of structured grids, but the limited range of
topologies that they handle, researchers may choose to use several structured grids
(blocks) and sew them together to fill the volume of interest. The grid is depicted on the
Figure 4e.
Unstructured (x[i] ,y[i], z[i] )
Unlike the previous types, where connectivity is implicit, there is no geometric
information implied by this list of points and edge/face/cell connectivity must be
supplied in some form . Cells may be tetrahedra, hexahedra, prisms, pyramids, etc., and
they may be linear (straight edges, planar faces) or higher-order (eg. cubic edges, with
two interior points on each edge). Tetrahedral grids are particularly useful because they

22
Data Visualization

allow better boundary fitting, can be built automatically, and are often simpler to work
with, graphically. Unstructured grids are standard in finite - element (PEA) and finite-
volume analysis (FVA) and are becoming common in CFD.

Figure 4 Taxonomy of grids by Speray and Kennon

A Cartesian grid is a special case where the elements are unit squares or unit
cubes, and the vertices are integer points.
A rectilinear grid is a tessellation by rectangles or parallelepipeds that are not all
congruent to each other. The cells may still be indexed by integers as above, but the
mapping from indexes to vertex coordinates is less uniform than in a regular grid. An
example of a rectilinear grid that is not regular appears on logarithmic scale graph
paper.
A curvilinear grid or structured grid is a grid with the same combinatorial
structure as a regular grid, in which the cells are quadrilaterals or cuboids rather than
rectangles or rectangular parallelepipeds.

23
Data Visualization

Cartesian Regular, Uniform

Rectilinear Structured, Curvilinear

Figure 5 Some data grids

3.3 Data file formats


A different but related issue is file format. Most output files are meant to be
human readable. There are many attempts to standardize output from simulations -
without much success. There are particular domain areas where there have been some
successes, namely CFD (Computational Fluid Dynamics).
Hierarchical Data Format (HDF) is a unique technology suite that makes
possible the management of extremely large and complex data collections. The HDF5
technology suite includes:
• A versatile data model that can represent very complex data objects and a wide
variety of metadata.

• A completely portable file format with no limit on the number or size of data
objects in the collection.

• A software library that runs on a range of computational platforms, from


laptops to massively parallel systems, and implements a high-level API with
C, C++, Fortran 90, and Java interfaces.

24
Data Visualization

• A rich set of integrated performance features that allow for access time and
storage space optimizations.

• Tools and applications for managing, manipulating, viewing, and analyzing


the data in the collection.
Network Common Data Format (NetCDF) is a set of data formats, programming
interfaces, and software libraries that help read and write scientific data files. NetCDF
was developed and is maintained at Unidata, funded primarily by the National Science
Foundation, is one of eight programs in the University Corporation for Atmospheric
Research (UCAR) Office of Programs (UOP). A netCDF dataset contains dimensions,
variables, and attributes, which all have both a name and an ID number by which they
are identified. These components can be used together to capture the meaning of data
and relations among data fields in an array-oriented dataset. The netCDF library allows
simultaneous access to multiple netCDF datasets which are identified by dataset ID
numbers, in addition to ordinary file names.
The currently accepted storage method for most scientific data is the Relational
Database Management System. This is the format used by many commercial databases,
such as Oracle. Data can be extracted using the Standard Query Language (SQL)
commands.

4 Visualization Process

For visualization are important models, helping the developers and users to
understand the visualization process; to follow the connections and the data paths
through the system; and to reference and compare the functionality and the limitations
of different systems or techniques. Display models specifically classify the data by what
type of output can be created. Bertin, J. (1962) e.g. described a symbolic reference
model that he used to describe images and displays.
There are four logical operations in visualization process:
• Data Selection - choosing a portion of the data to analyze that is extract a part
of the data.

• Data Enrichment - interpolating or approximating the raw data, effectively


model creation.

25
Data Visualization

• Mapping - converting the data into a geometric representation. The mapping


stage is where various visualization techniques are applied to the enriched
data. There are a great range of these techniques to choose from.

• Rendering - assigning the visual properties to the geometrical objects (e.g.


color, texture) and creating the display.

Figure 6 The data visualization triangle

Visualization process may occur in several different types of modes (Hearn D.D,
1991). The "movie mode" consists of acquiring the data, producing an animation tape of
the data and then analyzing the data. The "tracking" mode consists of acquiring the data
and visualizing it and observing it directly on the computer. Neither of these two modes
includes any user interaction. The third mode, "interactive post-processing", introduces
user interaction in that the user is able to interactively control the visualization
parameters. The final mode, "interactive steering", allows the user to interactively
control both the actual computation of the data, e.g., by changing parameters as the
computation progresses, and the visualization of the data. These four modes provide
increasing support for analysis but also require increasing technology support.

Data Mapping Rendering


?

Figure 7 Simplified visualization pipeline

26
Data Visualization

The visualization process can be seen as a pipeline consisting of simple steps as


depicted in Figure 7 and Figure 8. In general visualization is essentially a mapping
process from computer representations to perceptual representations, choosing encoding
techniques to maximize human understanding and communication. The goal of a viewer
might be a deeper understanding of physical phenomena or mathematical concepts, but
it also might be a visual proof of computer representations derived from such an initial
stage.

Figure 8 Visualization as a mapping process (Owen G. S., 1999)

Ed Chi developed in 1999 the information visualization reference model as a


reference model for information visualization, under the name of the data state model.
Chi showed that the framework successfully modeled a wide array of visualization
applications and later showed that the model was functionally equivalent to the data
flow model used in existing graphics toolkits such as VTK.

Figure 9 Information Visualization Data State Reference Model (Chi Ed H.,1999)

27
Data Visualization

In 1999 Stuart Card, Jock D. Mackinlay, and Ben Shneiderman presented their
own interpretation of this pattern, dubbing it the information visualization reference
model.

5 Scientific Visualization

Visualization of observed data or simulation output is important to science and


engineering. Scientific visualization was developed in response to the needs of scientists
and engineers to view experimental or phenomenal data in graphical formats. Scientific
data visualization should provide methods for visual analysis and interpretation through
the generation of images from complex multidimensional data sets. This will enable
researchers to ‘observe’ their simulations, computations, and experiments. Rendering
and geometric modeling are two basic research areas for scientific data visualization.
Scientific visualization is interdisciplinary. Fields of application include engineering,
natural and medical sciences.
Scientific visualization, sometimes referred to in shorthand as SciVis, is the
representation of data graphically as a means of gaining understanding and insight into
the data. It is sometimes referred to as visual data analysis. From a computing
perspective, SciVis is part of a greater field called visualization. As a science, scientific
visualization is the study concerned with the interactive display and analysis of data.
The approaches developed are general, and the goal is to make them applicable to
datasets of any size whatever while still retaining high interactivity. As an emerging
science, its strategy is to develop fundamental ideas leading to general tools for real
applications. This pursuit is multidisciplinary in that it uses the same techniques across
many areas of study.
Important characteristics of science visualization are:
1. The dimensionality of the data is at least 3.
2. Visualization concerns itself with data transformation.
3. Visualization is naturally interactive.
In general, raw scientific data can be categorized into a hierarchy of data types.
The most general and the lowest of the hierarchy is the nominal data, whose values have
no inherent ordering. For example, the names of the fifty states are nominal data. The
next higher type of the hierarchy is ordinal data, whose values are ordered, but for

28
Data Visualization

which no meaningful distance metric exists. The seven rainbow colors (i.e., red,
orange,) belong to this category. The highest of the hierarchy is metric data, which has a
meaningful distance metric between any two values. Times, distances, and temperatures
are examples. If we bin metric data into ranges, it becomes ordinal data. If we further
remove the ordering constraints, the data is nominal.
Within the applied sciences there are numerous sources of huge amounts of data,
for example:
• simulations running on supercomputers;

• satellites;

• medical scanners (CAT, MRI);

• experiments such as wind tunnel tests, test firings of rocket engines, etc.
These sources may produce data in several formats, such as 2D images, sets of
2D images, and scalars and vectors as a function of several variables.
In order to obtain more insight in a specific problem it is important to be able to
present and analyze the data. The interpretation of the data can be very difficult,
depending on its complexity.
The data visualization process, from original data to final image, is depicted in
Figure 1. For certain applications the original data are fed into the computer as an image
or series of images and can be manipulated and displayed directly, without leaving the
imaging domain. This is called ‘imaging’ and typical imaging operations are (higher
dimensional) FFT, gradient determination, etc. For other applications a transformation
of the numeric entity into a geometric entity is required. A geometric representation of
the data allows rendering on a computer screen (with standard geometric primitives),
shape analysis, and, depending on the particular application, extraction of other features.
The conversion of data into a geometric representation is called ‘geometric modeling’.
Subsequently, the rendering process transforms a geometric model into an image on a
computer screen.
Scientific visualization is mostly concerned with 2, 3, 4 dimensional, spatial or
spatio-temporal data.
In a scientific visualization are applied different representations and visual
techniques such as, color, texture, multiple windows, time sequenced data or any
combination of techniques. This can greatly aid in interpreting the numbers and finding

29
Data Visualization

the relationships among the data. A scientific visualization can show true spatial fidelity
that is, the relative sizes and the relative positions of the objects in the display. This
helps in interpreting the data. For example, seeing that one object is closer (to the
camera viewpoint) than another can be crucial as the viewpoint or the objects move
around. Several techniques which are commonly employed are:
• Perspective display in which objects farther away in the display appear
smaller. This aids in understanding distances represented in the display.

• The information is overlayed on images of maps or other kinds of background


images for points of reference.

• Grid lines are displayed to help identify the positions of the objects.

6 Information Visualization

Information visualization (InfoVis or InfoViz) concentrates on the use of


computer-supported tools to explore large amount of abstract data. The term
"information visualization" was originally coined by the User Interface Research Group
at Xerox PARC and included Dr. Jock Mackinlay. Practical application of information
visualization in computer programs involves selecting, transforming and representing
abstract data in a form that facilitates human interaction for exploration and
understanding. Important aspects of information visualization are the interactivity and
dynamics of visual representation. Strong techniques enable the user to modify the
visualization in real-time, thus affording unparalleled perception of patterns and
structural relations in the abstract data. Information visualization is dealing with
unstructured data sets.
The rise of the information age and the ascendancy of computer graphics come
together in the area of information visualization, where interactive graphical interfaces
are used for revealing structure, extracting meaning, and navigating large and complex
information worlds. Increasing amounts of data and the availability of fast digital
network access have created a demand for querying, accessing, and retrieving
information. One of the concerns of this field is the human-information interface, and
how advances in interactive computer graphics hardware, mass storage, and data
visualization could be used to visualize information. Information visualization focuses

30
Data Visualization

on high-dimensional, abstract data, discrete data, financial, statistical, etc; visualization


of large trees, networks, graphs, data mining (finding patterns, clusters, voids, outliers).
Information visualization, as a named area of research and development, was
originally an outgrowth of the pragmatics of contemporary science and engineering.
Faced with huge volumes of data, scientists and engineers write computer programs to
render data as images making it possible to visually search for and scrutinize patterns in
the data. Generally speaking, as an area of investigation and experimentation,
information visualization comprises a set of tools and techniques for distinguishing the
“forest” from the “trees,” for literally drawing out the big picture from data details. It is,
in short, a means for providing context.
The definition of the term information visualization by Averbuch, 2004:
Information visualization is an increasingly important sub discipline within HCI,
focuses on graphical mechanisms designed to show the structure of information and
improve the cost of access to large data repositories. In printed form, information
visualization has included the display of numerical data (e.g., bar charts, plot charts, pie
charts), combinatorial relations (e.g., drawings of graphs), and geographic data (e.g.,
encoded maps). Computer-based systems, such as the information visualizer and
dynamic queries have added interactivity and new visualization techniques (e.g., 3D,
animation).
A key question in InfoVis is how we convert abstract data into a graphical
representation, preserving the underlying meaning and, at the same time, providing new
insight (Hearst, 2003). There is no "magic formula" that helps the researchers to build
systematically a graphical representation starting from a raw set of data. It depends on
the nature of the data, the type of information to be represented and its use, but more
consistently, it depends on the creativity of the designer of the graphical representation.
Some interesting ideas, even if innovative, have often failed in practice. Tufte (1983)
and Bertin (1981) list a number of examples of graphics that distort the underlying data
or communicate incorrect ideas. They indicate some principles that should be followed
to build effective well designed graphics. In particular, a graphic should:
• show the data;

• avoid distorting what the data have to say;

• present many data in a small space;

31
Data Visualization

• make large data sets coherently;

• encourage inferential processes, such as comparing different pieces of data;

• give different perspectives on the data - from broad overview to the fine
structure.
Graphics facilitate understanding of information, but a number of issues must be
considered (Shneiderman 2002; Tufte 1983; Spence 2001):
1. Data is nearly always multidimensional, while graphics represented on a computer
screen or on a paper are presented in a 2D dimensional surface.
2. Sometimes we need to represent a huge dataset, while the number of data
representable on a computer screen or on a paper is limited.
3. Data may vary during the time, while graphics are static.
4. Humans have remarkable abilities to select, manipulate and rearrange data, so the
graphical representations should provide users with these features.
A number of methods and techniques have been proposed to meet these
requirements. Card at al. (1999) give a comprehensive list of eight types of data, eleven
visual structures, four views, three types of human interaction, eleven tasks and eleven
levels that a user might want to accomplish with a visualization tool (see Table 1).

Table 1 Specific techniques for InfoVis


Data Visual Human
Views Tasks Level
Types Structures interaction
Spatial Position Brushing Dynamic queries Overview Delete
Scientific Marks Zooming Direct Manipulation Zoom Reorder
Geographic Proprieties: Overview + detail Magic lenses Filter Cluster
Documents - Connection Focus + context Details-on-demand Class
Time - Enclosure Browse Promote
Hierarchies - Retinal Search Average
Networks - Time Read fact Abstract
WWW Axes: Read comparison Instantiate
- Composition Read pattern Extract
- Alighment Manipulate Compose
- Folding Create Organize
- Recursion
- Overloading

Information visualization is applied in countless areas covering every industry


and all tasks where understanding of the intrinsic structure in data is crucial.
Some examples are:
• Economical / financial analysis

32
Data Visualization

• Representation of large hierarchies

• Medical training / assistance

• Engineering / Physics
Application of information visualization on the computer involves providing
means to transform and represent data in a form that allows and encourages human
interaction. Data can therefore be analyzed by exploration rather than pure reasoning;
users can develop understanding for structures and connections in the data by observing
the immediate effects their interaction has upon the visualization.
As a subject in computer science, information visualization is the use of
interactive, sensory representations, typically visual, of abstract data to reinforce
cognition. Information visualization is a complex research area. It builds on theory in
information design, computer graphics, human-computer interaction and cognitive
science. Practical application of information visualization in computer programs
involves selecting, transforming and representing abstract data in a form that facilitates
human interaction for exploration and understanding. Important aspects of information
visualization are the interactivity and dynamics of the visual representation. Strong
techniques enable the user to modify the visualization in real-time, thus affording
unparalleled perception of patterns and structural relations in the abstract data in
question. Although much work in information visualization regards to visual forms,
auditory and other sensory representations are also of concern.

7 Process Visualization

Process can be in generally defined as a series of actions, changes, or functions


bringing about a result or a series of operations performed in the making or treatment of
a product.
The process can be industrial, infrastructure or facility based as described below:
• Industrial processes include those of manufacturing, production, power
generation, fabrication, and refining, and may run in continuous, batch,
repetitive, or discrete modes.

• Infrastructure processes may be public or private, and include water treatment


and distribution, wastewater collection and treatment, oil and gas pipelines,

33
Data Visualization

electrical power transmission and distribution, and large communication


systems.

• Facility processes occur both in public facilities and private ones, including
buildings, airports, ships, and space stations. They monitor and control HVAC,
access, and energy consumption.
The increasing level of automation of production systems has lead to reduced
numbers of operators and at the same time to an increase in the number of process
information each operator has to supervise and control. Supervisory systems will have
to be able to integrate large volumes of information and knowledge coming both from
local and remote points of large processes. These systems will therefore require new
tools for management and integration of information and knowledge.
Process visualization provides users with graphics views of processes which can
be navigationally traversed, interactively edited, or animated. Process visualizations
enable intuitive analysis and discovery of processes.

7.1 SCADA systems


Because of the ubiquitous nature of control systems, a variety of terms have
originated to describe them: process control systems, distributed control systems,
automation control systems, industrial control systems, and supervisory control and data
acquisition systems. All the terms refer in the broadest sense to control systems. Some
focus on manufacturing and the actions that take place on the factory floor, and each
industry sector will define its own terms.
SCADA stands for Supervisory Control And Data Acquisition. As the name
indicates, it is not a full control system, but rather focuses on the supervisory level. As
such, it is a purely software package that is positioned on top of hardware to which it is
interfaced, in general via Remote Terminal Units (RTUs), or other hardware
components such as: Programmable Logic Controllers (PLCs), intelligent electronic
devices (IED), Master Terminal Units (MTUs). Supervisory control and data acquisition
allows an operator to monitor and control processes that are distributed among various
remote sites. The term "supervisory control and data acquisition" (SCADA), however, is
generally accepted to mean the systems that control the distribution of critical
infrastructure public utilities (water, sewer, electricity, and oil and gas).

34
Data Visualization

SCADA systems were first used in the 1960s. The use of the term SCADA
varies, depending on location. In North America, SCADA refers to a distributed
measurement and management system that operates on a large-scale basis. For the rest
of the world, SCADA refers to a system that performs the same basic functions, but
operates in a number of different environments as well as a multiplicity of scales. While
the use of the term SCADA may not be uniform, many components are the same,
regardless of the scale of the process. SCADA generally refers to an industrial control
system which is meant to function across a wide area with an autonomous Remote
Terminal Unit (RTU).
SCADA system is expected to have open loop controls (meaning that a human
operator watches near real time data and issues commands). By comparison, a
distributed control system (DCS) is expected to have closed loop controls (meaning that
real-time loop data is applied directly to an industrial controller without human
intervention). From its inception in the 1960s, SCADA was understood as a system that
was primarily concerned with I/O from Remote Terminal Units. In the early 1970s,
DCS was developed. The ISA S5.1 standard defines a distributed control system as a
system which while being functionally integrated, consists of subsystems which may be
physically separate and remotely located from one another. DCS were originally
developed to meet the requirements of large manufacturing and process facilities that
required significant amounts of analogue control. These differences are primarily design
philosophies, not mandates of definition.
There are three main elements to a SCADA system, various RTU's,
communications and an HMI (Human Machine Interface). Each RTU effectively
collects information at a site, while communications bring that information from the
various plant or regional RTU sites to a central location, and occasionally returns
instructions to the RTU. Data acquisition begins at the RTU or PLC level and includes
meter readings and equipment status reports that are communicated to SCADA as
required. Data is then compiled and formatted in such a way that a control room
operator using the HMI can make supervisory decisions to adjust or override normal
RTU (PLC) controls. Data may also be fed to a historian, often built on a commodity
database management system, to allow trending and other analytical auditing.

35
Data Visualization

SCADA systems used to run on DOS, VMS and UNIX; in recent years all
SCADA vendors have moved to Windows, some also to Linux platform.
SCADA systems are used not only in industrial processes: e.g. steel making,
power generation (conventional and nuclear) and distribution, chemistry, but also in
some experimental facilities such as nuclear fusion. The size of such plants ranges from
a few 1000 to several 10 thousands input/output (I/O) channels. However, SCADA
systems evolve rapidly and are now penetrating the market of plants with a number of
I/O channels of several 100 000.
Originally, SCADA systems were designed for Supervisory Control and Data
Acquisition, providing a reliable means of aggregating the analysis being performed by
multiple RTUs. But with today's high speed production demands, SCADAs are required
to perform Calculation and Analysis in real time on the plant floor, effectively
combining the once disparate worlds of HMI and SCADA.
The design of HMI has become quite complicated and can no longer be handled
in an intuitive fashion. The designer needs to possess a huge amount of
multidisciplinary knowlegde and experience with respect to the application domain of
the respective technical process, the available automation and information technologies,
the capabilities and limitations of the human operators and maintenance personnel,
work psychological and organizational matters as well as ergonomic and cognitive
engineering principles of good HMI design.

7.1.1 Hardware architecture


A typical control system consists of one or more remote terminal units (RTU)
connected to a variety of sensors and actuators, and relaying information to a master
station. Figure 11 illustrates a typical control system design. The design and function of
the RTUs, sensors, actuators, and master station, as well as the means of communication
between components, are implementation details that will vary depending on the
manufacturing or industrial process being controlled. A distributed control system may
have multiple master stations or layers of master stations.
A SCADA system usually includes signal hardware (input and output),
controllers, networks, user interface (HMI), communications equipment and software.
One distinguishes two basic layers in a SCADA system: the "client layer" which
caters for the man machine interaction and the "data server layer" which handles most

36
Data Visualization

of the process data control activities. The data servers communicate with devices in the
field through process controllers. Process controllers, e.g. PLCs, or RTUs are connected
to the data servers either directly or via networks or field buses that are proprietary (e.g.
Siemens H1), or non-proprietary (e.g. Profibus).

Figure 10 Typical SCADA HW architecture

Data servers are connected to each other and to client stations via an Ethernet
LAN. The data servers and client stations are at most Windows platforms but for many
products the client stations may also be from another platform.

Figure 11 Typical control system design and three-levels of SCADA

A Remote Terminal Unit (RTU) is a standalone unit used to monitor and control
sensors and actuators at a remote location, and to transmit data and control signals to a
central master monitoring station. Depending on the sophistication of the
microcontroller in the RTU, it can be configured to act as a relay station for other RTUs
which cannot communicate directly with a master station, or the microcontroller can

37
Data Visualization

communicate on a peer-to-peer basis with other RTUs. RTUs are generally remotely
programmable, although many can also be programmed directly from a panel on the
RTU. Remote terminal units gather information from their remote site from various
input devices, like valves, pumps, alarms, meters, etc. Essentially, data is either analog
(real numbers), digital (on/off), or pulse data (e.g., counting revolutions of meters).
Many remote terminal units hold the information gathered in their memory and wait for
a request to transmit the data. Other more sophisticated remote terminal units have
microcomputers and programmable logic controllers (PLC) that can perform direct
control over a remote site without the direction of the master terminal unit. mall size
RTUs generally have less than 20 analog or digital inputs and medium size RTUs
typically have 100 digital and up to 40 analog inputs, while an RTU with greater than
100 digital or 40 analog inputs is considered large. Many RTUs are modular and thus
expandable, and several RTUs can be logically combined as one, depending on the
model and manufacturer. Figure 12 shows a typical RTU. A RTU consists of a power
supply, a central processing unit (CPU), memory (both volatile and non-volatile), and a
series of inputs and outputs. The CPU controls communications with the sensors and
actuators through the inputs and outputs, and with the master station through a serial
port, an Ethernet port, or some other interface. A programming interface can also be
connected to any of these interfaces. The Central Bus serves as the conduit for
communications between the components of the RTU.

Figure 12 Typical RTU

38
Data Visualization

Advances in CPUs and the programming capabilities of RTUs have allowed for
more sophisticated monitoring and control. Applications that had previously been
programmed at the central master station can now be programmed at the RTU. These
modern RTUs typically use a ladder-logic approach to programming due to its
similarity to standard electrical circuits - the majority of RTU programmers are
engineers, not computer programmers. A RTU that employs this ladder logic
programming is called a Programmable Logic Controller (PLC).
Modern RTUs and PLCs offer a wide variety of communications means, either
built in directly or through a module. The following list represents a variety of
transmission methods supported:
− RS-232/RS-442/RS-485
− Dialup telephone lines
− Dedicated telephone lines
− Microwave
− Satellite
− X.25
− Ethernet
− 802.11a/b/g
− Radio (VHF, UHF, etc)
Master stations have two main functions:
1. Periodically obtain data from RTUs/PLCs (and other master or sub-master
stations.
2. Control remote devices through the operator station.
Master stations consist of one or more personal computers (PC), which, although
they can function in a multi-purpose mode (email, word processing, etc), are configured
to be dedicated to master station duties. These duties include trending, alarm handling,
logging and archiving, report generation, and facilitation of automation. These duties
may be distributed across multiple PCs, either standalone or networked.

39
Data Visualization

Figure 13 The three-layer model with the addition of business systems and process
regulation

Figure 11 illustrates a generic three tiered-approach to SCADA control system


design incorporating the three main components. In any organization, business systems
dictate policy and procedures relevant to the control and monitoring of the process;
conceptually, these reside above the Master Station. Similarly, the sensors and actuators
directly act upon the physical objects.

Figure 14 Business system defines the overal policy of SCADA

40
Data Visualization

A typical organization will generate policies and procedures that define the
process that must be monitored and controlled, allocate resources to it, and dictate how
collected data will be distributed and audited. A management information system (MIS)
may facilitate access to the data supplied by the process, and can be used for
forecasting, trending and optimization. Figure 14 illustrates some components of a
Business System that will affect a SCADA implementation.

7.1.2 Software architecture


SCADA software is specialized. A Human-Machine Interface or HMI is the type
of software which presents process data to a human operator, and through which the
human operator controls the process. An HMI is usually linked to the SCADA system's
databases and software programs, to provide trending, diagnostic data, and management
information such as scheduled maintenance procedures, logistic information, detailed
schematics for a particular sensor or machine, and expert-system troubleshooting
guides. The HMI system usually presents the information to the operating personnel
graphically, in the form of a mimic diagram. This means that the operator can see a
visual representation of the plant being controlled. For example, the HMI software
could show the flow rate of the fluid in the pipe in real time. Mimic diagrams may
consist of line graphics and schematic symbols to represent process elements, or may
consist of digital photographs of the process equipment overlain with animated
symbols. The HMI package for the SCADA system typically includes a drawing
program that the operators or system maintenance personnel use to change the way
these points are represented in the interface. These representations can be as simple as
an on-screen traffic light, which represents the state of an actual traffic light in the field,
or as complex as a multi-projector display representing the position of all of the
elevators in a skyscraper or all of the trains on a railway.
An important part of most SCADA implementations are alarms. An alarm is a
digital status point that has either the value NORMAL or ALARM. Alarms can be
created in such a way that when their requirements are met, they are activated. An
example of an alarm is the "fuel tank empty" light in a car. The SCADA operator's
attention is drawn to the part of the system requiring attention by the alarm. Emails and
text messages are often sent along with an alarm activation alerting managers along
with the SCADA operator.

41
Data Visualization

Figure 15 Generic software architecture

The SCADA software products are multi-tasking and are based upon a real-time
database (RTDB) located in one or more servers. Servers are responsible for data
acquisition and handling (e.g. polling controllers, alarm checking, calculations, logging
and archiving) on a set of parameters, typically those they are connected to. However, it
is possible to have dedicated servers for particular tasks, e.g. historian, datalogger,
alarm handler. Figure 15 shows a SCADA architecture that is generic for the products
that were evaluated.
In SCADA systems, the three major categories of protocols involve the
specifications for design and manufacture of sensors and actuators, specifications for
RTUs, and the specifications for communications between components of a control
system.
The prevalent standard for industrial control RTU design and programming is the
IEC 61131 series, developed by the two IEC working groups, the Industrial Process
Measurement And Control group and the IT Applications In Industry group. It is a
series of seven publications that serve to standardize the programming languages,
instruction sets, and concepts used in industrial control devices such as RTUs and PLCs.

42
Data Visualization

There are two major protocol descriptions for SCADA component


communications, designed specifically for the purpose of process control applications.
The first is IEC 60870. IEC 60870 was defined primarily for the telecommunications of
electrical system and control information and its data structures are geared to that
application. It is the favored standard in the USA for electrical power grid SCADA
systems, but is not as popular in Europe.
The second protocol specifically designed for SCADA communications is the
Distributed Network protocol Version 3 (DNP3) and is the leading protocol employed
in Europe for most SCADA applications. DNP3 defines four layers, physical, data link,
pseudo-transport, and application.
Several other SCADA standards exist, primarily High Level Data Link Control
(HDLC) and Modbus. HDLC, defined by ISO for point-to-point and multi-point links,
is also known as Synchronous Data Link Control (SDLC) and Advanced Data
Communication Control Procedure (ADCCP). It is a bit-based protocol, the precursor to
Ethernet, and is rapidly being replaced by DNP3 and TCP/IP.
Profibus is a German standard that defines three types: Field Message
Specification (FMS) for use in general data acquisition systems, Decentralized
Peripherals (DP) for use when fast communication is required, and Process Automation
(PA) for use when highly reliable and safe communication is required. It defines three
layers: physical, data link and application.
Foundation Fieldbus is an extension to the 4-20mA standard to take advantage of
digital technologies. It defines 3+1 layers (physical, data link, application, and user).
The Utility Communications Architecture (UCA) is a new initiative from the
Electric Power Research Institute (EPRI) designed for the electrical industry. It is more
than just a protocol definition; it is a comprehensive set of standards designed to allow
"plug and play" integration into systems, allowing manufacturers to design off-theshelf
compliant devices. IEEE assumed the UCA standards process in 1999 and has
developed extensions for the water industry. Other industries are also examining UCA
for suitability.

7.1.3 SCADA and Internet


Many companies are considering using the Internet for supervisory control and
data acquisition (SCADA) to provide access to real-time data display, alarming,

43
Data Visualization

trending, and reporting from remote equipment. However, there are three significant
problems to overcome when implementing an Internet-based SCADA system.
The first is that most devices used to control remote equipment and processes do
not have Internet-communications capability already incorporated in their operating
systems. The second is that the device still has to be physically connected to the
Internet, even when equipped through retrofit or in the factory with the necessary
communications protocols and the third is assurance of data protection and access
control.

Figure 16 Internet SCADA architecture

One solution to these problems is to connect the device to a PC and have the PC
make the connection to the Internet via an Internet service provider using Secure Socket
Layer.
An alternative to using a PC is an embedded solution: a small, rugged, low-cost
device that provides connectivity capabilities of a PC at a lower cost and higher
reliability. This device (sometimes referred to as an Internet gateway) is connected to
the equipment via a serial port, communicates with the equipment in the required native
protocol, and converts data to HTML or XML format. The gateway has an IP address
and supports all or at least parts of the TCP/IP stack—typically at least HTTP, TCP/IP,
UDP, and PPP. Once connected to the Internet, the gateway responds to an HTTP
request with an HTML or XML file, just as if it were any PC server on the World Wide
Web. In cases where the equipment incorporates an electronic controller, it may be
possible to simply add Web-enabled functionality into the existing microcontroller.

44
Data Visualization

There are five phases to creating a functional SCADA system:

Phase 1; design of the system architecture


This includes the all-important communication system, and with a regional
system utilizing radio communication often involves a radio path survey. Also involved
will be any site instrumentation that is not presently in existence, but will be required to
monitor desired parameters.

Phase 2; supply of hardware


This includes the supply of RTUs communication and HMI equipment, the latter
consisting of a PC system and the necessary powerful graphic and alarm software
programs.

Phase 3; supply of software or programming


of the communication equipment and the HMI graphic and alarm software
programs.

Phase 4; installation
of the communication equipment and the PC system.

Phase 5; commissioning
of the system, during which communication and HMI programming problems
are solved, the system is proven to the client, operator training and system
documentation is provided.

8 Visualization Techniques and Algorithms

Many of the general principles of computer graphics and design carry over into
the area of data visualization. The essential idea is to provide as much information as
possible without confusing or distracting the viewer with inappropriate color schemes.
For the types of data discussed in section 3.1 there exist many visualization
techniques however, they are applicable to just a single data type, in general.
A visualization technique is used to create and manipulate a graphic
representation from a set of data. Some techniques will be appropriate only for specific
applications while others are more generic and can be used in many applications. It
should always be kept in mind that the goal of visualization is not to understand the data
but to understand the underlying phenomenon. Visualization must be useful to the users

45
Data Visualization

and should be able to be used to present information to others. There is a variety of


conventional ways to visualize data - tables, histograms, pie charts and bar graphs are
being used every day, in every project and on every possible occasion.
Information visualization, scientific visualization and visual analytics have lots
of overlapping goals and techniques. There are three stages for using a visualization
technique:
• The construction of an empirical model from the data. This construction may
involve sampling theory considerations, such as the Nyquist theorem, and
general mathematical interpolation schemes. If the data contains errors then
this must be taken into account.

• The selection of some schematic means of depicting the model as some


abstract visualization object, such as an image of a contour map.

• The rendering of the image on a graphics display.


To answer the question which visualization techniques are suitable for one's data
is not unambiguous. Should e.g. direct volume rendering techniques be preferred over
surface rendering techniques? Can current techniques, like streamline and particle
advection methods, be used to appropriately outline the known visual phenomena in the
system? The choice of visualization technique matters. By the use of a visualization
technique we must take into consideration that:
• Some things can jump out as part of the visualization. Unexpected things can
be seen.

• Some things which are seen in the visualization can generate new questions.

• Sometimes it is not easy to analyze the data and may require a variety of
visualization techniques. No single technique to for analysis. It is sometimes
impossible to avoid using multiple techniques.

• Some details cannot be seen at all (or all together). To address this, add
multiple displays and techniques for multiple questions.

• Some false things may be seen in the image. Be careful to verify all your
conclusions.
Special techniques must be used to facilitate the understanding of 3D data, since
it is mapped to a 2D output device. Attribute mapping, such as color is one method.

46
Data Visualization

Another method is brightness (closer points are brighter). But the best method appears
to be animation, allowing the user to rotate the 3D point cloud about the axes.
The visualization technique selected depends on many circumstances, some of
them are:
• The dimensionality of the dataset.

• The number of variables in the dataset, the nature of the “data objects”.

• The geometry on which the data is given, “structure of the coordinates”.

• The software available for visualization.

• The hardware available.

In visualization algorithms basically two sets of transformations are involved.


The first set of transformations is used to convert the data or subsets of the data into a
virtual scene. These transformations are applied to the structure/topology of the data and
to the data-type/data-quantity. As the scene is composed, it consists of geometrical
objects and textures, computer graphics transformations are applied to form images that
can be output to a computer screen, or video.
Common features of all visualization techniques can be summarized in following
stages of data manipulation:
1. Collection and archiving of data – modeling or simulation.
2. Data pre-processing – transformation of data for better execution of
visualization technique(s)
3. Data visualization - using visualization software and hardware for a visual
representation of data.
4. Human interaction and data analysis – taking advantage of a human
perception of reality.

8.1 Visualization features


There are features which are important in visualizing data, the use of which
depends on the shape and dimension of the data and on what one wants to show. Some
of these features are:
• Color - scalar degree of freedom; e.g. high temperature represented by red.

47
Data Visualization

• Glyphs - geometric objects (icons) representing multiple features of data


locally, i.e. on discrete positions. There are degrees of freedom due to size and
color. Shape distinguishes different fields, e.g. spheres for a scalar field and
arrows for a vector field, or spheres for one scalar field and cubes for the
other.

• Contours / Isosurfaces - scalar data level values in two or higher dimensional


space. Representing the data as surfaces in three dimensional space is called
surface rendering.

• Rubbersheet - rendering of scalar data, other than by colors (or the special case
of glyphs) namely as the height of a deformed surface. In that way one can add
other scalar data to this sheet by using color. An important aspect of the rubber
sheet is that the clarity of the data representation is dependent on the light that
falls on it. Shadows indicate height (shading).

• Animation - change of data, position, or whatever as a function of time or


another independent parameter.

• Volume rendering - a technique of immediately showing scalar data in a three


dimensional space. By immediately is meant not by first mapping on a
geometric figure like a plane, as in isosurfacing. Values of color and opacity
are assigned to each data point.

Figure 17 Example of glyphs for scalar (L) and vector (R) data

An isosurface is a three-dimensional analog of an isocontour. It is a surface that


represents points of a constant value (e.g. pressure, temperature, velocity, density)

48
Data Visualization

within a volume of space; in other words, it is a level set of a continuous function whose
domain is 3D-space.

Figure 18 Contours and isosurface

Figure 19 Rubbersheet with / without animation

Figure 20 An example of a glyph that can display 12 D data

49
Data Visualization

8.2 Classification of visualization techniques


Developers have to think by implementing a visualization technique about how
people extract meaning from pictures (psychophysics), what people understand from a
picture (cognition), how pictures are imbued with meaning (semiotics), and how in
some cases that meaning arises within a social and cultural context.
The selection of technique will depend on the type of data. It is useful to classify
data sets in the following way; think of the empirical model to be constructed, and
identify:
• The number of independent variables, x1,...xn;

• The number of components of function F, i.e. number of dependent variables.


These define a number of classes, and for each class there will be a set of
appropriate visualization techniques.
Ken Brodlie, et.al., developed taxonomy for classification of scientific
visualization techniques based on the visualization technique entities. The entity that is
created from the data can be expressed as a function F(X). The domain is X =
(x1,x2,x3,...xn) and has dimension n. The function F can be a scalar, a vector, or a tensor.
Then the classification technique will be based on the type of function and the
dimension of the domain. Time may be one of the independent variables, and then X =
(x1, x2, x3,...xn;t). Time varying phenomena are handled using a sequence of frames
which are converted into an animation. For example, a scalar entity S with a domain
dimension of 3 will be E3S. Note that the entity E is defined on the domain and it yields
a result (scalar, vector, or tensor). A vector could be denoted by V and a tensor by T. A
vector of dimensionality n will be denoted by Vn.
There are three sub cases according to the nature of the domain:
• The entity is defined point wise over a continuous domain. An example is the
electron density of a molecule.

• The entity is defined over regions of a continuous domain, e.g., a population


density map. For this we can use the notation E[2], to indicate a 2D domain
with the entity defined over regions in the domain.

• The entity is defined over an enumerated set, e.g., the number of cars sold in
each country in a given year. Then we can use the notation E{1} to indicate a
1D domain consisting of the set of enumerated countries.

50
Data Visualization

Another special case is when we want to show a set of values over some domain,
e.g., temperature and pressure over a 3D domain. This notation is E32S, or for the
general case EdnS.
In the Table 2, entities are given, defined on a d-dimensional domain with n-
scalar data as symbol EdnS. So this denotes a (multiple) scalar entity on a d-dimensional
domain. When the entity's state changes with time (like in an animation) an extra
subscript t is given. With (other) colors or color maps and glyph size extra scalar
degrees of freedom can be gained. By the shape of glyphs one can distinguish different
(local) fields. With each entity some possible applications are given. The applications
are far from complete. In the last column of the Table 2 some techniques are given
which are used in the visualization of the entity.

Table 2 Taxonomy of SciVis for scalar entities according to Ken Brodlie


Taxonomy Application Technique
E1S A lot ! ( Function y = f(x) ) Line graph
E2S Meteorology Contouring
Aerodynamics/Aeroplane Image display
industry Surface view
Physics, astronomy Carpet plot
E2nS Geography (Cartography, Height-field plot
Elevation data) (Combinations with color and
Physics, astronomy glyphs)
Medical sciences
E3S Remote Sensing Isosurfaces + surface rendering
Physics, astronomy, chemistry Basket weave
Medical sciences Volume rendering
Slicing
Clipping
Capping
E3nS Physics See E3S
(Physical and (Combinations with color and
Biological) Chemistry glyphs)
Drug design
Finite element analysis
Medical sciences
Archeological reconstruction
Oil Reservoir Techniques
E2;tnS (Astro) Physics See E2nS
Meteorology
CFD (Computational
Fluid Dynamics)
E3;tnS (Astro) Physics See E3nS
Chemistry
CFD
Oceanography
EmnS Physics (Dynamical systems) Projections to lower dimension:
Computer Science (Algorithm 3D-curves + surfaces
visualisation) Ribbons

51
Data Visualization

2D-contours

The Table 3 is organized in the same way as the Table 2, but now the entities E
have n - vector data on the d-dimensional domain, EdVn, or a n x n tensor EdTn;n (mostly
where n=3). Like for the scalar data, a subscript t indicates time dependency. Of course
visualization is not limited to scalar or vector data alone. Combinations are possible,
and are not at all exceptional.

Table 3 Taxonomy of SciVis for vector entities according to Ken Brodlie


Taxonomy Application Technique
E2V2 Physics Arrow plots
Oceanography Particle tracing
Time lines
Streak lines
E2;tV2 CFD Stream lines
Stream polygons
Glyphs
E2V3 Physics See E2V2
E2;tV3 Meteorology Hedgehogs
E3V3 Physics See E2V3
Meteorology + streamribbons and
streamsurfaces
E3;tV3 Aerodynamics Tubes
CFD Tufts
Texture based methods +
Critical point methods
E3T3;3 Finite Element Analysis Glyphs
Hyper streamlines
E3;tT3;3 Strain-stress Analysis Stream polygons

Other researchers have developed different taxonomies. E.H. Chi and J.T. Reidl
in 1998 extend and propose a new way to taxonomize information visualization
techniques by using the Data State Model. Many of the techniques share similar
operating steps that can easily be reused. The Data State Model not only helps
researchers understand the space of design, but also helps implementers understand how
information visualization techniques can be applied more broadly. Bergeron and
Grinstein developed a data oriented classification using the concept of a lattice.
Another approach was presented by Ralph Lengler and Dr. Martin J. Eppler from
visual-literacy.org. They have developed a chart so called “Periodic Table of
Visualization Methods” that organizes tons of ways to present information in visual
ways. The web page uses a Javascript library to display an example of a diagram type
when we mouse-over its box. Not only can we hover over each of the methods and see

52
Data Visualization

examples, but the chart itself helps to see connections between different approaches.
The table itself is an example of how the right visual can not only present information
but actually make knowledge by organizing material. This displays around 100 diagram
types, with examples and a multi-faceted classification. There are visualization methods
for data, information, concept, strategy, metaphor and compound. Chris Wallace has
implemented an XML page from this table on which we can see and print the
mouseover pictures individually.
The most elementary techniques and algorithms deals with scalar data or data
that can be made into scalars. For this type of data we can use following visualization
techniques:
• Colour mapping - maps the scalar to a colour, and then displays that colour.
The scalar mapping is implemented using a colour lookup table, that is
indexed with the scalar. Good choice of the “transfer function” is important for
the final result of the visualization.

• Contouring - contouring is a natural idea for scalars, in 2D we get contour


lines (like the elevation curves on a map), whereas in 3D we get isosurfaces.

• Scalar generation - since scalar visualization techniques are simple and


effective methods, we might wish to use these techniques even when
visualizing something that isn't a scalar field. Then we need to generate a
scalar.
Visualizing vector fields can be done with a variety of methods:
• Hedgehogs and Oriented Glyphs - display an arrow or an oriented glyph at
(selected) points.

• Warping.

• Displacement plots.

• Time animations - let “mass less” particles trace the vector field.

• Streamlines - this are lines parallel to the vector field at all points.
Glyphs are objects that are affected by input data, for example:
• The size could vary with a scalar value,

• The size and orientation could vary according to a vector value,

53
Data Visualization

• The colour could vary with a scalar value.


Glyphs can be simple geometric objects (sphere, arrow, etc.) but can also be
more elaborate.
The traditional two dimensional point and line plots are among the most
commonly used visualization techniques for data with lower number of variates.

The following are examples of some common visualization techniques:


• Constructing isolines (contouring) and isosurfaces

• Direct volume rendering

• Streamlines, streaklines, and pathlines

• Table, matrix

• Charts (pie chart, bar chart, histogram, function graph, scatter plot, etc.)

• Graphs (tree diagram, network diagram, flowchart, existential graph, etc.)

• Maps

• Parallel coordinates - a visualization technique aimed at multidimensional data

• Treemap - a visualization technique aimed at hierarchical data

• Venn diagram

• Euler diagram

• Chernoff faces

• Hyperbolic trees

Figure 21 Visualization using hedgehogs and oriented gyphs

54
Data Visualization

Figure 22 Visualization using warping

Figure 23 Visualization using streamlines

Figure 24 Tensor visualizations with ellipsoids (L) and hyperstreamlines (R)

55
Data Visualization

8.3 Volume Visualization


Volume visualization is a field within data visualization, which is concerned
with volume data. Volume data are 3D entities that may have information inside them,
might not consist of surfaces and edges, or might be too voluminous to be represented
geometrically. Volume visualization is a method of extracting meaningful information
from volumetric data using interactive graphics and imaging, and it is concerned with
volume data representation, modeling, manipulation, and rendering.
Volume rendering techniques are conventionally classified as either direct or
indirect methods. Indirect methods require to transform the initial volumetric model into
an intermediate geometrical model in order to efficiently visualize it. In contrast, direct
volume rendering (DVR) methods can directly process the volumetric data.
Volume datasets are obtained by sampling, simulation, or modeling techniques.
For example, a sequence of 2D slices obtained from Computer Tomography (CT) or
Magnetic Resonance Imaging (MRI) is three dimensionally reconstructed into a volume
model and visualized for diagnostic purposes, planning of treatment, or surgery. The
same technology is often used for non-destructive inspection of composite materials or
mechanical parts. Recently, many traditional geometric computer graphics applications,
such as CAD systems and simulation, have been exploiting the advantages of volume
techniques called volume graphics for modeling, manipulation, and visualization.
Currently, the major application area of volume rendering is medical imaging,
where volume data is available from X-ray Computer Tomography (CT) scanners and
Positron Emission Tomography (PET) scanners.

8.3.1 Surface based techniques


Over the years many techniques have been developed to visualize volumetric
data. Since methods for displaying geometric primitives were already well-established,
most of the early methods involve approximating a surface contained within the data
using geometric primitives. Common methods include contour tracking, opaque cubes,
marching cubes, marching tetrahedra, dividing cubes, and others. These algorithms fit
geometric primitives, such as polygons or patches, to constant-value contour surfaces in
volumetric datasets and are known as surface-fitting algorithms too. As information
about the interior of objects is generally not retained, a basic drawback of these methods
is that one dimension of information is essentially lost.

56
Data Visualization

Surface-fitting advantages are:


• Rendering methods are known: shadows, depth cueing, reflections, etc.

• Render in hardware is possible.

• Display list is in memory.

• Changing view/light(s) requires only rendering.

• Compact storage & transmission.

• Render in object order or image order.

• Good spatial coherence for efficient rendering.


Surface-fitting disadvantages are:
• Requires binary classification.

• Throws away data between surfaces.

• False positives and negatives.

• Handles small features poorly.

• Can't handle branching.

• User intervention sometimes required.

• Amorphous data doesn't have "thin surfaces".

Figure 25 The principle of countouring (object order)

The main assumption and procedures for surface based techniques are:
• Assume that volume contains thin boundary surfaces.

• Classify all cells as inside, outside, or "on" the surface.

• Fit constant-value surfaces to all "on" cells.

57
Data Visualization

• Render surfaces.

Figure 26 Connecting slices by contouring

Contour tracking is an object ordered surface fitting method. Given a threshold


value, a closed contour is traced for each data slice and then the contours in adjacent
slices are connected and a tesselation, usually of triangles, is performed.

8.3.1.1 The Marching Cubes Algorithm


Marching cubes is a computer graphics algorithm for extracting a polygonal
mesh of an isosurface from a 3D volume. The marching cubes algorithm was published
in the 1987 SIGGRAPH proceedings by William Lorensen and Harvey Cline. It is used
for rendering isosurfaces in volumetric data (3D scalar field). Isosurface extraction
remains one of the most common visualization methods in use today. Marching cubes
algorithm is mainly used to process medical data. Typically, these data are acquired by
computed tomography (CT), magnetic resonance imaging (MRI) or single photon
emission computed tomography (SPECT), and marching cube algorithm allows
visualization of complex models. Nevertheless, it can be used in many other fields.
The marching cubes algorithm is a 3D isosurface representation technique. In
order to explain this technique, we are going to indroduce the marching squares
algorithm which uses the same approach in 2D. The marching squares algorithm aims at
drawing lines between interpolated values along the edges of a square, considering
given weights of the corners and a reference value. Let's consider a 2D grid as shown in
the Figure 27.

58
Data Visualization

Figure 27 2D grid

Each point of this grid has a weight and here the reference value is known as 5.
To draw the curve whose value is constant and equals the reference one, different kinds
of interpolation can be used. The most used is the linear interpolation. In order to
display this curve, different methods can be used. One of them consists in considering
individually each square of the grid. This is the marching square method. For this
method 16 (24) configurations have been enumerated, which allows the representation
of all kinds of lines in 2D space.

Figure 28 All cases for marching square algorithm

Some cases may be due the symmetry ambiguous. That is the situation for the
cases 5 and 10. As we can see on the Figure 29 we are not able to take a decision on the
interpretation of this kind of situation. However, these exceptions do not imply any real
error because the edges keep closed.

59
Data Visualization

Figure 29 Ambiguous cases

In a 3D space we enumerate 256 different situations for the marching cubes


representation. There are 28 = 256 ways the surface may intersect the cube. Using the
symmetries reduces those 256 cases to 15 patterns (Figure 30). The simplest pattern,
(case 0 on Figure 28), occurs if all vertex values are above (or below) the selected value
and produces no triangles. The next pattern, (case 1), occurs if the surface separates one
vertex from the other seven, resulting in one triangle defined by the three edge
intersections. Other patterns produce multiple triangles. Permutation of these 15 or in
point of fact 14 basic patterns using complementary and rotational symmetry produces
the 256 cases.
Marching cubes only works with cubic cells (commonly called voxels). The
input data set can represent anything from medical imaging data to geological scans.
The algorithm proceeds through the scalar field, taking eight neighbor locations at a
time (thus forming an imaginary cube), then determining the polygon(s) needed to
represent the part of the isosurface that passes through this cube. The individual
polygons are then fused into the desired surface. By connecting the patches from all
cubes on the isosurface boundary, we get a surface representation. In this algorithm the
user first specifies a threshold value. For this value, some voxels will be entirely inside
or outside the corresponding isosurface and some voxels will be intersected by the
isosurface. The fundamental problem is to form a facet approximation to an isosurface
through a scalar field sampled on a rectangular 3D grid. Given one grid cell defined by
its vertices and scalar values at each vertex, it is necessary to create planar facets that
best represent the isosurface through that grid cell. The isosurface may not be pass

60
Data Visualization

through the grid cell, it may cut off any one of the vertices, or it may pass through in
any one of a number of more complicated ways. Each possibility will be characterised
by the number of vertices that have values above or below the isosurface. If one vertex
is above the isosurface and an adjacent vertex is below the isosurface then we know the
isosurface cuts the edge between these two vertices. The position that it cuts the edge
will be linearly interpolated, the ratio of the length between the two vertices will be the
same as the ratio of the isosurface value to the values at the vertices of the grid cell.
In order to be able to determine each real case, a notation has been adopted. It
aims at referring each case by an index (cube index), based on the state of the vertex and
created from a binary interpretation of the corner weights (Figure 31). Using the vertex
numbering in Figure 31 the eight bit index, contains one bit for each vertex. This index
serves as a pointer into an edge table that gives all edge intersections for a given cube
configuration. If for example the value at vertex v1 is below the isosurface value and all
the values at all the other vertices were above the isosurface value then we would create
a triangular facet which cuts through edges e1, e4, and e9 (case 1 in Figure 30. The
exact position of the vertices of the triangular facet depend on the relationship of the
isosurface value to the values at the vertices v1-v2, v1-v5, v1-v4 respectively. What
makes the algorithm difficult are the large number (256) of possible combinations and
the need to derive a consistent facet combination for each solution so that facets from
adjacent grid cells connect together correctly. The first part of the algorithm uses a table
(edge table) which maps the vertices under the isosurface to the intersecting edges. An 8
bit index is formed where each bit corresponds to a vertex. Edge table returns a 12 bit
number, each bit corresponding to an edge, 0 if the edge isn't cut by the isosurface, 1 if
the edge is cut by the isosurface. If none of the edges are cut, the table returns a 0, this
occurs when cube index is 0 (all vertices below the isosurface) or 0xff (all vertices are
above the isosurface). Using the example earlier where only vertex v1 was below the
isosurface, cube index would equal 0000 1000 or 8. The edge table for index 8 returns
the number 0001 0000 1001. This means that edge 1, 4, and 9 are intersected by the
isosurface.
The intersection points are now calculated by linear interpolation. If P1 and P2
are the vertices of a cut edge and V1 and V2 are the scalar values at each vertex, the
intersection point P is given by

61
Data Visualization

P = P1 + (isovalue - V1) * (P2 - P1) / (V2 - V1)


The last part of the algorithm involves forming the correct facets from the
positions that the isosurface intersects the edges of the grid cell. Again a table
(triangulation table) could be used which this time uses the same cube index but allows
the vertex sequence to be looked up for as many triangular facets are necessary to
represent the isosurface within the grid cell. There at most 5 triangular facets necessary.

Case 0 Case 1 Case 2 Case 3 Case 4

Case 5 Case 6 Case 7 Case 8 Case 9

Case 10 Case 11 Case 12 Case 13 Case 14

Figure 30 The different configurations for marching cubes algorithm

Figure 31 Notation for cube index

62
Data Visualization

8.3.2 Volume based techniques


Volume rendering techniques convey more information than surface rendering
methods, but at the cost of increased algorithm complexity, and consequently increased
rendering times. Direct volume rendering (DVR) techniques were developed that
attempt to capture the entire 3D data in a single 2D image. Direct volume rendering
algorithms include approaches such as raycasting, splatting, and shear-warp. Instead of
extracting an intermediate representation, volume rendering provides a method for
directly displaying the volumetric data. The original samples are projected onto the
image plane in a process which interprets the data as an amorphous cloud of particles. It
is thus possible to simultaneously visualize information about surfaces and interior
structures without making any assumptions about the underlying structure of the data.
Volume rendering comprises more information in a single image than traditional surface
representations (Figure 32), and is thus a valuable tool for the exploration and analysis
of data. However, due to the increased computational effort required and the enormous
size of volumetric datasets, the ongoing challenge of research in volume rendering is to
achieve fully interactive performance.

Figure 32 Comparison of surface (L) and direct volume rendering (R)

Optical models for direct volume rendering view the volume as a cloud of
particles. Light from a source can either be scattered or absorbed by particles. In
practice, models that take into account all the phenomena tend to be very complicated.
Therefore, practical models use several simplifications.
In general, a volumetric dataset consists of samples arranged on a regular grid.
These samples are also referred to as vowels. While most volume rendering techniques

63
Data Visualization

are based on the theoretical optical model for volume rendering, several different
techniques implementing this optical model have emerged.
In the following, we use a taxonomy based on the processing order of the data.
We distinguish between image based, object based, and hybrid based methods. Image
based (order) methods start from the pixels on the image plane and computes the
contribution of the appropriate voxels to these pixels. Object based techniques traverse
the voxels and compute what their contribution to the image is. Hybrid based methods
try to combine both approaches. Techniques based on the texture mapping capabilities
of the graphics hardware as well as dedicated volume rendering hardware solutions are
possible too.

8.3.2.1 Ray Casting


Ray casting is a method used to render high-quality images of solid objects. The
term was first used in computer graphics in a 1982 paper by Scott Roth to describe a
method for rendering CSG models. Ray casting is not a synonym for ray tracing, but
can be thought of as an abridged, and significantly faster, version of the ray tracing
algorithm. The basic goal of ray casting is to allow the best use of the three-dimensional
data and not attempt to impose any geometric structure on it.
In nature, a light source emits a ray of light which travels, eventually, to a
surface that interrupts its progress. One can think of this "ray" as a stream of photons
travelling along the same path. At this point, any combination of three things might
happen with this light ray: absorption, reflection, and refraction. The surface may reflect
all or part of the light ray, in one or more directions. It might also absorb part of the
light ray, resulting in a loss of intensity of the reflected and/or refracted light.
The idea behind ray casting is to shoot rays from the eye, one per pixel, and find
the closest object blocking the path of that ray - think of an image as a screen-door, with
each square in the screen being a pixel. This is then the object the eye normally sees
through that pixel.
Ray casting is an image order algorithm that casts viewing rays through the
volume. The image based approach to volume rendering determines, for each pixel on
the image plane, the data samples which contribute to it. At discrete intervals along the
ray, the three-dimensional function is reconstructed from the samples and the optical
model is evaluated.

64
Data Visualization

Figure 33 Illustration of ray casting

At discrete intervals along the ray, the three-dimensional function is


reconstructed from the samples and the optical model is evaluated. As the accumulation
is performed in front-to-back order, viewing rays that have accumulated full opacity can
be terminated. This very effectively avoids processing of occluded regions and is one of
the main advantages of ray casting.
At each point along the ray there is an illumination I(x,y,z) reaching the point
(x,y,z) from the light source. The intensity scattered along the ray to the eye depends on
this value, a reflection function or phase function P, and the local density D(x,y,z). The
dependence on density expresses the fact that a few bright particles will scatter less light
in the eye direction than a number of dimmer particles. The density function is
parameterized along the ray as D(x(t), y(t), z(t)) = D(t) and the illumination from the
source as I(x(t), y(t), z(t)) = I(t) and the illumination scattered along R from a point
distance t along the ray is I(t)D(t)P(cos Ø), where Ø is the angle between R and L, the
light vector, from the point of interest.
Determining I(t) is not trivial. It involves computing how the radiation from the
light sources is attenuated and/or shadowed due to its passing through the volume to the
point of interest. This calculation is the same as the computation of how the light
scattered at point (x,y,z) is affected in its journey along R to the eye. In most
algorithms, however, this calculation is ignored and I(x,y,z) is set to be uniform
throughout the volume. For most practical applications we're interested in visualization,
and including the line integral from a point (x,y,z) to the light source may actually be

65
Data Visualization

undesirable. In medical imaging, for example, it would be impossible to see into areas
surrounded by bone if the bone were considered dense enough to shadow light. On the
other hand, in applications where internal shadows are desired, this integral has to be
computed.

8.3.2.2 Splatting
Splatting is object order technique that traverses and projects footprints (known
as splats) onto the image plane. In contrast to image order techniques, object order
methods determine, for each data sample, how it affects the pixels on the image plane.
In its simplest form, an object order algorithm loops through the data samples,
projecting each sample onto the image plane. Voxels that have zero opacity, and thus do
not contribute to the image, can be skipped. This is one of the greatest advantages of
splatting, as it can tremendously reduce the amount of data that has to be processed. But
there are also disadvantages. Using pre-integrated kernels introduces inaccuracies into
the compositing process, since the 3D reconstruction kernel is composited as a whole.
This can cause color bleeding artifacts (i.e. the colors of hidden background objects may
”bleed”into the final image).To remedy these artifacts, an approach has been developed
which sums voxel kernels within volume slices most parallel to the image plane.
However, this leads to severe brightness variations in interactive viewing.

Figure 34 Illustrating of splatting

66
Data Visualization

8.3.2.3 Shear-Warp
Image order and object order algorithms have very distinct advantages and
disadvantages. Therefore, some effort has been spent on combining the advantages of
both approaches. Shear-warp is such an algorithm. It is considered to be the fastest
software based volume rendering algorithm. It is based on a factorization of the viewing
transformation into a shear and a warp transformation. The shear transformation has the
property that all viewing rays are parallel to the principal viewing axis in sheared-
object-space. This allows volume and image to be traversed simultaneously.
Compositing is performed into anintermediate image. A 2D warp transformation is then
applied to the intermediate image, producing the final image.
The problem of shear-warp is the low image quality caused by using only
bilinear interpolation for reconstruction, a varying sample rate which is dependent on
the viewing direction, and the use of pre-classification. Some of these problems have
been solved; however, the image quality is still inferior when compared to other
methods, such as ray casting.

Figure 35 Illustrating of schear-warp

8.3.2.4 Texture Mapping


With graphics hardware becoming increasingly powerful, researchers have
started to utilize the features of commodity graphics hardware to perform volume
rendering. These approaches exploit the increasing processing power and flexibility of
the Graphics Processing Unit (GPU). GPU accelerated solutions are capable of
performing volume rendering at interactive frame rates for medium-sized datasets on
commodity hardware.

67
Data Visualization

One method to exploit graphics hardware is based on 2D texture mapping. This


method stores stacks of slices for each major viewing axis in memory as two-
dimensional textures. The stack most parallel to the current viewing direction is chosen.
These textures are then mapped on object aligned proxy geometry which is rendered in
back-to-front order using alpha blending. This approach corresponds to shear-warp
factorization and suffers from the same problems, i.e., only bilinear interpolation within
the slices and varying sampling rates depending on the viewing direction.
Approaches that use 3D texture mapping upload the whole volume to the
graphics hardware as a three-dimensional texture. The hardware is then used to map this
texture onto polygons parallel to the viewing plane which are rendered in back-to-front
order using alpha blending. 3D texture mapping allows to use trilinear interpolation
supported by the graphics hardware and provides a consistent sampling rate.A problem
of these approaches is the limited amount of video memory. If a dataset does not fit into
this memory, it has to be subdivided.

Figure 36 Illustrating DVR using 2D textures

8.4 Multidimensional visualization


Multidimensional visualization is an important subfield of data visualization.
Visual exploration of multidimensional data is of great interest in Statistics and
Information Visualization. Multidimensional volumetric data (also called multivariate
or n-dimensional) are often used for storing information in various fields of science
such as physics, engineering, metallurgy, astronomy, medical science, etc. because they
best match the character of the underlying phenomena. The term multidimensional
covers several types of data, the most frequently used being vector fields, tensor fields
and multi-scalar datasets. The information contained in such data is usually very dense
and thus difficult to understand without subsidiary tools. Multidimensional data can be

68
Data Visualization

sorted according to various criteria. First, it is the domain, over which the data are
defined, and which is usually two or three dimensional. Second, it is the dimension of
the data values themselves, which is theoretically unlimited and depends on the
application. Two or three dimensional vector fields can be encountered most frequently,
but fields of quadratic tensors are also quite common. It is, however, necessary to
realize, that the character of the data must be taken into account as well. Three
dimensional vectors need to be treated in a different way than a set of three scalar
values. The third important criterion is, whether the data vary in time. If so, they are
usually called time dependent. Otherwise, we speak of time independent data. Such a
variety of kinds of data implies even larger variety of visualization techniques.
Multidimensional data visualization was studied separately by statisticians and
psychologists long before computer science was deemed a discipline. The appearance of
personal computers and workstations during the 1980’s breathed new life into graphical
analysis of multidimensional data. Scientists have studied multidimensional and
multivariate visualization since 1782 when Crome used point symbols to show the
geographical distribution in Europe of 56 commodities. In 1950, Gibson started the
research on visual texture perception. Later, Pickett and White proposed mapping data
sets onto artificial graphical objects composed of lines. This texture mapping work was
further investigated by Pickett, and was eventually computerized. Chernoff presented
his arrays of cartoon faces for multivariate data in 1973. In this well-known technique,
variables are mapped to the shape of the cartoon faces and their facial features including
nose, mouth, and eyes (see Figure 39). These faces are then displayed in a two
dimensional graph (Wong P. Ch., 1997).
Before we can start to visualize general multidimensional or multivariate data we
need to assign a coordinate with each data point. We need to define x, y and z for each
data point. Typically we do this through some functions:
xj = x(d1j, d2j, ... ,dmj)
yj = y(d1j; d2j; : : : ; dmj)
zj = z(d1j; d2j; : : : ; dmj)
After this has been done we can apply the same techniques to this kind of data.
Multidimensional data consists of some number of points, each of which is
defined by an n-vector of values. Such data can be viewed as an m x n matrix, where

69
Data Visualization

each row represents data point and each column represents an observation (also called
variable or dimension). An observation may be scalar or vector, nominal or ordinal, and
may or may not have a distance metric, ordering relation, or absolute zero. Each
variable/dimension may be independent or dependent. The problem with
multidimensional data is that we only have three spatial dimensions available onto
which to map the attributes. In practice we are even limited to two dimensions, as three
dimensional visualization is tricky and usually only works well for the data that has an
intrinsic spatial structure. One common strategy to deal with that problem is
parallelization.
The general idea of parallelization is to subdivide the (two dimensional) space
into an appropriate number of sub-spaces. Each of the sub-spaces is then used to show a
two dimensional representation of a selected aspect of the data. By showing all these
sub-spaces in parallel and at the same time, correlations and patterns become evident
that go beyond two dimensions. One example for the parallelization strategy is so called
scatterplot matrix shown in Figure 37.

Figure 37 Scatterplot matrix

This technique constructs a matrix of small scatter plots with all possible
combinations of pairs of attributes. The scatterplot matrix can also be regarded as an
instance of the rule of small multiples that we know from information design.

70
Data Visualization

Figure 38 Parallel coordinates

Another technique shown in Figure 38 breaks the dilemma of flatland by using


one axis for each attribute, but arranging them in parallel in the plane, instead of
orthogonal to each other as in the Cartesian way. An object in this parallel coordinate
system is then represented not by a point, but a polygonal line that is constructed by
connecting the values for all the attributes by a line between neighboring axes.
Glyphs (also referred to as icons) are another technique. Glyphs are graphical
entities that convey one or more data values via attributes such as shape, size, color, and
position. They have been widely used in the visualization of data, and are especially
well suited for displaying complex, multivariate data sets. The placement or layout of
glyphs on a display can communicate significant information regarding the data values
themselves as well as relationships between data points, and wide assortment of
placement strategies have been developed to date. Methods range from simply using
data dimensions as positional attributes to basing placement on implicit or explicit
structure within the data set.
Individual dimensions (variables) for given data point are mapped to attributes of
particular shape or symbol, and variations, clusterings, and anomalies among these
graphical entities may be readily perceived. The particular mappings may also be
customized to reflect semantics relevant to specific domains, which can greatly facilitate
the interpretation process. Once glyph has been designed and generated from data entry,
it must be placed at location in display space (2D or 3D).

71
Data Visualization

Examples of profile glyphs Stars & Anderson/metro glyphs Sticks & Trees

Face glyphs Auto & box glyph Arrows & weathervanes

Figure 39 Example of glyphs

The position attribute can be very effective in communicating data attributes or


improving the detection of similarities, differences, clustering, outliers, or relations in
the data. Many strategies exist for setting the position attribute; some are based on the
raw data (or information derived from the data), while others are based on the structure
of the data set (e.g., ordered or hierarchical). Methods also vary as to whether they
allow overlapping glyphs, whether they are space-filling, or whether they employ empty
space to highlight differences. A glyph consists of graphical entity with p components,
each of which may have r geometric attributes and s appearance attributes. Typical
geometric attributes include shape, size, orientation, position, and direction/magnitude
of motion, while appearance attributes include color, texture, and transparency.
Attributes can be discrete or continuous, scalar or vector, and may or may not have a
distance metric, ordering relation, or absolute zero.
The process of creating glyph thus becomes one of mapping one or more data
dimensions for data point to one or more geometric and/or appearance attributes of one
or more components of graphical entity. A glyph may also contain components or
attributes that are independent of the data being used in its formation. For example, in
scatterplot the size and color of the plotting symbol may be data-independent.
Figure 39 shows variety of examples of glyphs that have been proposed and used
in the past. In the Table 4 are alphabetically described glyphs from the literature (Ward

72
Data Visualization

M.O., 1999) in terms of the graphical entities and attributes controlled by the
multivariate data point.

Table 4 Variations of glyphs


Graphical Entity Attributes
Anderson/metroglyphs Length of rays
Arrows Length, width, taper, and color of base and head
Autoglyph Color of boxes
Boids Shape and orientation of primitives moving through time-
varying field.
Boxes Height, width, depth of first box; height of successive boxes
Bugs Wing shapes controlled by time series; length of head spikes
(antennae); size and color of tail; size of body markings.
Circular profiles Distance from center to vertices at equal angles.
Color icons Colored lines across box.
Dashtubes Texture and opacity to convey vector field data.
Faces Size and position of eyes, nose, mouth; curvature of mouth;
angle of eyebrows
Glyphmaker User-controlled mappings.
Hedgehogs Spikes on vector field, with variation in orientation, thickness,
and taper.
Icon Modeling Language Attributes of 2D contour and the parameters that extrude it to
3D and further transform/deform it
Polygons Conveying local deformation in a vector field via orientation and
shape changes
Procedural shapes Blobby objects controlled by up to 14 dimensions.
Profiles Height and color of bars
Stars Length of evenly spaced rays emanating from center
Stick figure icons Length, angle, color of limbs
Trees Length, thickness, angles of branches; branch structure derived
from analyzing relations between dimensions
Weathervanes Level in bulb, length of flags.
Wheels Time wheels create ring of time series plots, value controls
distance from base ring; 3D wheel maps time to height, variable
value to radius.

Glyphs are not without limitations in the communication of multivariate data.


Most mappings introduce biases in the process of interpreting relationships between
dimensions. Some relations are much easier to perceive (e.g., data dimensions mapped
to adjacent components) than others. There are also limitations based on the media
being used to communicate the information. Screen space and resolution are limited,
and displaying too many glyphs at once can lead to either overlaps or very small glyphs.
The large spectrum of users’ needs has led to a development of numerous
approaches for vector field visualization, which can be sorted into four main categories.
The first is being direct flow visualization. Methods from this category use a
direct translation of the flow data into visualization cues, such as by drawing arrows.

73
Data Visualization

Flow visualization solutions of this kind allow immediate investigation of the vector
data, without a lot of mental translation effort.

Figure 40 Direct visualization of a vector field

For a better illustration of the long-term behavior induced by flow dynamics,


integration based approaches first integrate the flow data and use the resulting integral
objects as a basis for visualization. Displaying streamlines, streamlets, streaklines,
timelines etc. are good examples of integration based technique. Line integral
convolution (LIC) is another example of this approach. LIC is a integration texture
based technique for visualizing vector fields and has the advantage of being able to
visualize large and detailed vector fields in a reasonable display area. The geometric
visualization techniques, texture based algorithms utilize integral curves. Instead of
displaying them as individual geometric entities, a convolution with some kind of input
texture is performed. The type of the input texture as well as the integral curve used
then makes the difference between individual techniques. Apparently, texture base
techniques need two input structures, these being the vector field and the texture to
convolve it with.

74
Data Visualization

Streamlines Streamlets with direction

Line Integral Convolution with texture

Figure 41 Integral objects as a basis for visualization

Another approach for visualizing flow data is the feature based approach, in
which an abstraction step is performed first. From the original data set, interesting
objects are extracted, such as important phenomena or topological information of the
flow. These flow features represent an abstraction of the data, and can be visualized
efficiently and without the presence of the original data, thus achieving a huge data
reduction, which makes this approach very suitable for large (time-dependent) data sets,
acquired from computational fluid dynamics simulations. These data sets are simply too
large to visualize directly, and therefore, a lot of time is required in preprocessing, for
computing the features (feature extraction). But once this preprocessing has been
performed, visualization can be done very quickly.
The general idea behind the fourth technique, group of methods consists in
deriving scalar quantities from the vector data first and then visualizing them via
approaches like isosurface extraction or direct volume visualization.

75
Data Visualization

9 Visualization Systems and Tools

A visualization system is not just a system to create an image of the data but can
be used to manipulate the data to create different types of images. Visualization system
should link with the model of scientific investigation. Visualization can help form the
link between hypothesis and experiment and between insight and revised hypothesis.
The human visual perception system is very complex and must be taken into account in
the design and use of data visualization systems. Visualization systems include software
applications such as for example: LYMB, Iris Explorer, Data Explorer and AVS.
Systems are self-contained, often turn-key, offer great reuse, have an integrated user
interface, but they are an all-or-nothing approach. They do not readily embed within
another application framework.
Tools and Toolkits are more adaptable, allowing using only what we need and
are generally independent of the user interface. Toolkits offer the possibility to develop
visualization applications using a variety of visualization techniques. Toolkits include
packages such as Iris Inventor, ISG’s IAP, and the Visualization Toolkit (VTK).

Figure 42 Basic data flow within a visualization system

There are specialized applications for a certain type of data visualization. Two
examples are:
• RasMol – a tool for viewing large molecules,

• Vis5D - a system for interactive visualization of large 5-D gridded data sets
such as those produced by numerical weather models.
The success of visualization not only depends on the results which it produces,
but also depends on the environment in which it has to be done. This environment is
determined by the available hardware, like graphical workstations, disk space, color

76
Data Visualization

printers, video editing hardware, and network bandwidth, and by the visualization
software. For example, the graphical hardware imposes constraints on interactive speed
of visualization and on the size of the data sets which can be handled. Many different
problems encountered with visualization software must be taken into account. The user
interface, programming model, data input, and data output, data manipulation facilities,
and other related items are all important. The way in which these items are implemented
determines the convenience and effectiveness of the use of the software package.
Furthermore, whether software supports distributive processing and computational
steering must be taken into account.
The lack of a clear reference visualization model makes it difficult to design
improved approaches in a systematic manner. The task of generating robust default
visualizations of data under exploratory or directed investigation, or automating the
production of such defaults and assisting the user to refine them, relies on having
standard representations that satisfy established or specified criteria for interpretation.
This task also points to the need for a reference model that is formalized.

9.1 Key requirements


Several researchers have pointed out key requirements for the visualization
systems.
For example Foley and Ribarsky suggested in 1994 a general visualization
environment in which all the modules are in a feedback loop that processes through the
user's brain to refine the visualization iteratively or to focus on certain aspects of the
data. Flexibility and integration with data analysis are two key components in the model
of Foley and Ribarsky. Another aspect that has received special attention refers to the
use of effective and efficient graphical representations capable of displaying high
dimensional data. Glyphmaker was a visualization system in the direction suggested by
the model of Foley and Ribarsky. Glyphmaker allows nonexpert users to customize
their own graphical representations using a simple glyph editor and a point-and-click
binding mechanism.
Lee and Grinstein in 1996 pointed out the need of integrating database and
visualization systems in order to provide more flexible and powerful data selection
support. In this integration, the visual nature of the database exploration is emphasized.
Users are also part of the feedback loop in the iterative process of data exploration.

77
Data Visualization

Users perceive visual data representations and use them to guide the exploration
process. Exbase was a system for exploring databases developed on the basis of this
integration. It provided support for the visual exploration of databases by integrating
visualization techniques and a real database management system. The search for
structure inside the database was focused on manual mechanisms but it did not exclude
support for automatic discovery tools.
Goldstein, Roth and Mattis proposed a framework for interactive data
exploration. There are several key points in that framework. First, they pointed out that,
given the growth of data sets in both complexity and amount of information, the design
of graphic displays is requiring more expertise from users. Second, visualization
systems must provide explicit support for the data archaeology process. The data
archaeology process refers to the use of users' discovered relationships to guide the data
exploration process. This process, iterative and interactive in nature, was initiated and
controlled by people. Third, tools in the visualization system must be oriented to
provide support for the kind of subtasks usually involved exploring large databases,
including explicit support for a dynamic specification of the user's information-seeking
goals. Visage, a user interface environment for exploring information, was based on
these principles and on a new paradigm for user interfaces that they have called an
information-centric approach.
Robertson and De Ferrari proposed a reference model for data visualization
within which the scope and limitations of existing tools and systems can be identified.
In this model, the user is again part of the exploration data analysis loop. The end goal
is the automatic generation of visual representations given a description of all the
important data characteristics and the user's interpretation aims. User interpretation aims
refers to the specification of characteristics of the data or relations between data
variables the user was interested in analyzing by mean of visual representations. There
were four basic components in the Robertson and De Ferrari model (sie Figure 43):
1. The data model.
2. Visualization specification.
3. Visualization representation.
4. The matching procedure.

78
Data Visualization

The model also specified desired requirements for all of these components. The
data model should be precise and functionally-rich with the data structure and the data
fields explicitly specified. The visualization specification should support both user
directives (requirements explicitly defined by the user) and user interpretation aims
(requirements implicitly defined by user specified criteria). In reference to the visual
representation, Robertson and De Ferrari suggested that many possible visual
representations should be described based on some criteria. There were currently two
approaches for describing visual representations. The bottom-up or
expressiveness/effectiveness approach defined by Mackinlay and extended by other
researchers, and the top-down approach defined by Robertson which is based on scene
properties and criteria for matching scene properties to data variables. The matching
procedure refers to mechanisms for encoding and decoding information. Robertson and
De Ferrari suggested that for each visualization technique those mechanisms must be
explicitly stated by giving a formal model which describes the encoding mechanism and
explicitly states what decoding capabilities it assumes.

Figure 43 An integrated visualization model (Owen G. S., 1999)

9.2 The iconographic technique


The iconographic technique intends to exploit the texture perception mechanism
of the human visual system. The Exvis (Exploratory Visualization) visualization system

79
Data Visualization

developed at the University of Massachusetts at Lowell used icons as basic primitives.


The visual attributes of icons were driven by the data. Exvis offered several icons to
users. Each of these icons had several visual attributes. By representing each record or
row of a multidimensional database as an icon whose features (e.g. color, geometry,
position) were under the control of the various fields of the record (dimensions), and by
placing those icons densely enough in the display, a textural representation of the
database was obtained. Exvis was a static visualization system in which users'
interactions were basically limited to examining the actual values associated with the
icons (in a one by one style). Users could also change the association between fields in
the record and icon visual attributes, but these changes were implemented in a batch
style. Most of the work developed with the iconographic technique had been based on
the stick figure icon. The stick figure icon consisted of five connected segments, called
"limbs". One of the segments, called the "body", serves as a reference for the various
geometric transformations that an icon could undergo. Each limb had three parameters
that could be bounded to data attributes: the angle, intensity, and length. Figure 44
shows one member of the family of stick figure icons. The figure shows the icon in its
base configuration (theoretical icon, no data mapped to limb's parameters) and also a
sample icon (data mapped to limbs' parameters).

Figure 44 One member of the stick-figure icon family

The L-systems-based visualization prototype that Pinkney developed is a


descendant of the Exvis system. It used the stick figure and color icons, but these icons
were formally represented using Open L-systems grammar notation. The grammar
consists of an axiom, the initial string, and a variable number of productions (rewriting
rules). Each icon was modeled by setting an initial string and a set of rules defining the
way in which the initial string will evolve along the grammar derivation process. The L-
systems grammar representation was also used to represent interaction techniques. One

80
Data Visualization

of the advantages of open L-systems is that grammars are able to invoke external
modules. Pinkney uses this facility to alter either the derived string or the course of
derivation itself in order to implement different behaviors of icons. Figure 45 shows an
overview of the L-systems-based visualization system developed by Pinkney. As seen
in this figure, interactions are implemented in one or two places. At the level of the
grammar, by including productions defining the desired behavior; or at the level of the
string, by changing values of the derived string.

Figure 45 An overview of the L-systems-based iconographic visualization system

9.3 Using glyphs for visualization


Glyphmaker is an exploratory tool developed at The Georgia Institute of
Technology. It was created in 1994 as a prototype environment and provided a general
approach to data visualization and analysis based on the use of glyphs as graphical
elements for representing data. Glyphmaker used glyphs in order to exploit the ability of
the human eye-brain system to identify finely resolved spatial relationships and
differences in shape. The idea was to bind visual attributes of the glyphs, such as
position, size, shape, color, orientation, etc. to data attributes in order to get a glyph-
based representation of the data. Glyphmaker provides a flexible environment in which
even non-expert users could create customized visualizations using their own graphical
representations (glyphs). Three main components provided the functionality required
for such purposes: the read module, the glyph editor module, and the glyph binder
module. The read module is able to read a variety of data input formats and convert
them to the format required by other modules. The glyph editor module provides a
library of basic glyphs (points, lines spheres, cuboids, cylinders, cones, and arrows) that
can be used in combination to develop other variations. The glyph editor provides a 3D
graphical environment in which users not only can develop new glyphs but also see how
they will look in the finished visualization. Visual attributes of the glyph were

81
Data Visualization

associated with data attributes. Glyphmaker is an interactive system for data


visualization and analysis that is built in a dataflow environment and provides a quite
general platform for user-defined visualizations, one that can be extended in many ways
and that, in particular, offers the first stage for an integrated visualization/analysis
system. It is well-suited for the exploration and analysis of multivariate, highly
correlated 3D data. This is both because of the glyph-based visual representations and
the interactivity that are built into the system. Glyphmaker allows the user to: (a) design
glyphs by assembling them from different shapes; (b) choose binding sites on these
shapes; (c) bind user data attributes to glyph elements; (d) conditionally isolate certain
variable ranges; (e) interactively view, explore, remove, or rebind the resulting glyph
sets as solid 3D objects. Each of these capabilities has a graphical user interface and is
built into Glyphmaker in a modular form. The Glyphmaker approach of arranging data
into data objects that are then bound to 3D graphical objects (glyphs) or manipulated by
interactive tools is quite extensible in terms of visual representations and analysis
methods. Other visualization system with data flow approach are SGI Explorer, IBM
Explorer, Khoros, apE.

Figure 46 A general data visualization environment

IVEE (Information Visualization and Exploration Environment), was a system


for automatic creation of dynamic query applications.
Visage is a system for exploring information whose user interface is based on a
new paradigm that authors have called an "information-centric paradigm". Information
objects, represented as first class interface objects, could be manipulated using a
common set of basic operations universally applicable across the environment. The

82
Data Visualization

system includes over 500 classes ranging from visualization and graphics to Xlib and
Motif user interface. Objects are created using compiled C and interact through an
interpreted scripting language.

9.4 Interaction techniques


Several approaches have been used for implementing interactions in
visualization systems. Chuah et. al. defined two categories of these techniques: those
following a spatial metaphor, and those following an object metaphor. Five dimensions
or features are considered in the classification that Chuah defined: method of control,
targets of control, method of maintaining context, control operations, and scope of
operations. Method of control refers to how users control changes in the visualization.
Targets of control refer to the type of target entities the user can control. Method of
maintaining context refers to mechanisms that interaction techniques provide in order to
keep users informed of their relative position with respect to the whole set. Control
operations refer to the operations allowed over the target of control. Scope of operations
refers to elements affected by a user action.
Dynamic querying is a direct manipulation technique that allows users to
formulate queries to databases by manipulating graphical objects (typically sliders).
Even users with little knowledge about the logical structure of the database are able to
formulate complex queries. Dynamic querying is still powerful enough to be used by
advanced users with a deep knowledge about the database structure. One of the things
that make dynamic querying a powerful tool for exploring databases is that it not only
allows users to graphically formulate queries, but it also provides a graphical
representation of the database and the query result.
A set of other interaction techniques are used in visualization systems
development process such as: Movable Filters as a User Interface Tool, Dynamic
Queries, Selective Dynamic Manipulation and so on.

9.5 Contemporary Visualization Systems


Many modern visualization systems are designed as so called modular
visualization environments (MVE) with data flow architecture, visual programming
capabilities and API (e. g. C++) for adding new modules. Examples of such commercial
products are: AVS, COVISE and open source products OpenDX, VTK/Paraview,
SCIRun. Other (non MVE) visualization systems are for example: commercial - Amira

83
Data Visualization

(Zuse Institute Berlin), SimVis (VRVis Research Vienna), EnSight (CEI, originally by
Cray), TecPlot (TecPlot Inc.) and open source - VisIt (Lawrence Livermore National
Laboratory).
Components of an MVE are:
• Visual programming editor;

• Modules:

− Typically Categories:
− Input (reading, generating data);
− Filters (mapping to the same data type);
− Mappers (mapping to a different data type);
− Output (3D graphics, image, or file);
− Module libraries:
− ordered by category, author, etc.;
− users' community contributed modules;
• UI widgets (parameters, status, viewers, etc.)
A challenge for visualization system designers provides grid computing. The
grid's purpose is to tackle increasingly complex problems. Grid-enabled visualization
will enable to make a transparent interconnect fabric to link data sources, computing
(visualization) resources and users into widely distributed Virtual Organizations for the
purposes of tackling increasingly complex problems.

9.5.1 Application Visualization System (AVS)


AVS is a software system for developing interactive scientific visualization with
a minimum of programming effort. It is commercial product from Advanced Visual
Systems Inc. (originally Stardent Computer, Inc.). Application visualization system is
an application framework targeted at scientists and engineers. The goal of the system is
to make applications that combine interactive graphics and high computational
requirements easier to develop for both programmers and nonprogrammers. AVS is
designed around the concept of software building blocks, or modules, which can be
interconnected to form visualization applications. AVS allows flow networks of existing
modules to be constructed using a direct-manipulation user interface, and it
automatically generates a simple user interface to each module.

84
Data Visualization

The main product editions are AVS5, AVS/Express and AVS/Powerviz. AVS5
consists of a comprehensive suite of data visualization and analysis techniques that
incorporate both traditional visualization tools (such as 2D plots and graphs and image
processing) as well as advanced tools (such as 3D interactive rendering and volume
visualization). Available is for Unix/Linux and Mac OS X platform. AVS/Express is a
visualization toolkit which supports a visual programming interface. The AVS/Express
interface provides tools to import data, build a visualization, interact with the display
and generate images and animations. AVS/PowerViz is a comprehensive solution that
enables real-time businesses to better manage their critical networks through a
customizable Web portal that integrates data and applications from across the entire
corporate enterprise into a single, graphically enhanced real-time management tool.

Figure 47 Visual interface of AVS/Express

85
Data Visualization

Figure 48 AVS/Express application

9.5.2 General Visualization System (GVS)


The primary purpose of GVS is to support scientific visualization of data output
by the panel method PMARC_12 on the Silicon Graphics Iris computer with IRIX.
GVS allows the user to view PMARC geometries and wakes as wire frames or as light
shaded objects. Additionally, geometries can be color shaded according to phenomena
such as pressure coefficient or velocity. Screen objects can be interactively translated
and/or rotated to permit easy viewing. Keyframe animation is also available for
studying unsteady cases.
The purpose of scientific visualization is to allow the investigator to gain insight
into the phenomena they are examining; therefore GVS emphasizes analysis, not artistic
quality. GVS uses existing IRIX 4.0 image processing tools to allow for conversion of
SGI RGB files to other formats. GVS is a self-contained program which contains all the
necessary interfaces to control interaction with PMARC data.
This includes:
• GVS Tool Box, which supports color histogram analysis, lighting control,
rendering control, animation, and positioning,

• GVS on-line help, which allows the user to access control elements and get
information about each control simultaneously, and

86
Data Visualization

• a limited set of basic GVS data conversion filters, which allows for the display
of data requiring simpler data formats.
Specialized controls for handling PMARC data include animation and wakes,
and visualization of off-body scan volumes.

9.5.3 COVISE
COVISE stands for COllaborative VIsualization and Simulation Environment.
The product is developed at The High Performance Computing Center Stuttgart (HLRS)
of the University of Stuttgart. The company Visenso GmbH sells the commercial
version of COVISE. It is an extendable distributed software environment to integrate
simulations, postprocessing and visualization functionalities. From the beginning
COVISE was designed for collaborative working allowing engineers and scientists to
spread on a network infrastructure. In COVISE an application is divided into several
processing steps, which are represented by COVISE modules. These modules, being
implemented as separate processes, can be arbitrarily spread across different
heterogeneous machine platforms. COVISE rendering modules support virtual
environments ranging form workbenches over powerwalls, curved screens up to full
domes or CAVEs. The users can analyze datasets intuitively in a fully immersive
environment through state of the art visualization techniques including volume
rendering and fast sphere rendering. Physical prototypes or experiments can be included
into the analysis process through augmented reality techniques.

9.5.4 OpenDX
OpenDX is a uniquely powerful, full-featured software package for the
visualization of scientific, engineering and analytical data: Its open system design is
built on a standard interface environment. The GUI is built on the standard interface
environment: OSF/Motif and X Window System and its sophisticated data model
provide users with great flexibility in creating visualizations. The current version
supports software-rendered images on 8-, 12-, 16-, 24-, and 32-bit windows. OpenDX is
based on IBM’s Visualization Data Explorer. One of the most distinctive characteristics
of OpenDX is its object-oriented, self-describing Data Model. The DX data model and
the many available filters for third party data formats enable to quickly import disparate
data sets, in most cases, without changing the way the original data is organized. The
currently supported platforms in version 4.4 of OpenDX (binaries) are: Irix 6.5, HP-UX

87
Data Visualization

11.22 (Itanium), HP-UX 11.11 (PA RISC), Redhat Linux-FC 4.0 and FC 5.0 ix86,
Solaris 10 Sparc an ix86, Windows 2000, XP, 2003 (this version of OpenDX still
requires an X-Server to be running on local machine). Some commercial versions for
Mac OS X are also available.

Figure 49 Visual interface of Open/DX

Figure 50 View from Open/DX

88
Data Visualization

9.5.5 ParaView (Parallel Visualzation Application)


ParaView is an open-source, multi-platform application designed to visualize
data sets of size varying from small to very large. ParaView runs on distributed and
shared memory parallel as well as single processor systems and has been tested on
Windows, Mac OS X, Linux and various Unix workstations, clusters and
supercomputers. ParaView uses the Visualization Toolkit as the data processing and
rendering engine and has a user interface written using Qt.

9.5.6 SCIRun
SCIRun is more properly considered a problem solving environment (PSE)
framework upon which application specific PSEs are built. Each specific PSE is a
package with SCIRun. PSEs use and build upon data types, algorithms, and modules
provided by the the SCIRun framework. PSEs provide application specific data types,
algorithms, and modules. SCIRun is completely Open Source product.
SCIRun is a modular dataflow programming PSE. SCIRun has a set of modules
that perform specific functions on a data stream. Each module reads data from its input
ports, calculates the data, and sends new data from output ports. In SCIRun, a module is
represented by a rectangular box on the Network Editor canvas. Data flowing between
modules is represented by pipes connecting the modules. A group of connected modules
is called a Dataflow Network, or Net
The most frequently used SCIRun module is the ViewScene module, which
displays interactive graphical output to the computer screen. ViewScene is used any
time the user wants to see a geometry or spatial data. The ViewScene module also
provides access to many simulation parameters and controls, and indirectly initiates new
iterations of simulation steps, which is important for computational steering (see
Computational Steering). Multiple ViewScene windows can be created. Each window is
independent of the others.
Another module is SCIRun/BioPSE - Problem Solving Environment for
BioMedical Applications
SCIRun consist of packages, which are collections of modules organized by
category. Because SCIRun core is required, it is not technically a package. Like a
package, SCIRun core provides a set of datatypes, algorithms, and modules. Unlike

89
Data Visualization

packages, SCIRun core is required and SCIRun would not function without it. Its
modules are divided into the following categories:
• Bundle

• ChangeFieldData

• ChangeMesh

• Converters

• DataArrayMath

• DataIO

• Examples

• Math

• MiscField

• NewField

• Render

• String

• Visualization

9.5.7 WebWinds
WebWinds is a visualization program developed originally by the Jet Propulsion
Lab. WebWinds is freely available software that allows atmospheric scientists,
educators, students and the general public to quickly and easily visualize and analyze
data on many of the computer platforms in use today
(http://www.openchannelsoftware.com/projects/WebWinds). WebWinds is written in
Java and able to ingest files from local disk or the WWW. It is designed to eventually
be distributed over the Internet and operate outside of WWW browsers entirely,
allowing fewer restrictions as to where data and applications will be required to be
stored.
WebWinds is an interactive science data visualization system available for all
major computer platforms. Its use does not require any user programming experience
since sessions are created by assembling components on the screen via 'point and click'.
WebWinds is modular, allowing flexibility in tool construction and application. Allows

90
Data Visualization

internet-based distributed processing for the first time so that it can fulfill the needs of
data providers as well as data consumers.
Because it is written in Java, WebWinds is modular, allowing flexibility in tool
construction and application. It is also largely platform and operating system
independent so that it functions efficiently in today's heterogeneous environment.
WebWinds is also object-oriented, but the objects are provided with a complete
inter-face and a visual programming approach was adopted.

9.6 OpenGL
OpenGL is strictly defined as a software interface to graphics hardware. In
essence, it is a 3D graphics and modeling library that is extremely portable and very
fast. It uses algorithms developed and optimized by Silicon Graphics, Inc. (SGI).
Generic (software only) implementations of OpenGL are also possible. Microsoft
implementation of OpenGL is from this category. The forerunner of OpenGL was GL
from SGI.
According to the OpenGL data sheet, OpenGL is an industry standard, stable,
reliable and portable, evolving, scalable, easy to use and well-documented 3D graphics
API. As a software interface for graphics hardware, OpenGL's main purpose is to render
two- and three-dimensional objects into a frame buffer. These objects are described as
sequences of vertices (which define geometric objects) or pixels (which define images).
OpenGL performs several processing steps on this data to convert it to pixels, to form
the final desired image in the frame buffer.
The OpenGL specification was managed by an independent consortium, the
OpenGL Architecture Review Board (ARB) formed in 1992, some of its members were
SGI (Silicon Graphics) and Microsoft. In fall of 2006, the ARB and the Khronos Board
of Directors voted to transfer control of the OpenGL API standard to the Khronos
Group.
OpenGL is available in a variety of systems. Additions to the specification
(through extensions) are well controlled by the consortium and proposed updates are
announced in time for developers to adopt changes. Backwards compatibility is also
ensured.
OpenGL is reliable as all applications based on OpenGL produce consistent
visual display results on any OpenGL API compliant hardware. Portability is also a fact

91
Data Visualization

as OpenGL is available in a variety of systems, such as PCs, Macintoshes, Silicon


Graphics and UNIX based machines and so on. OpenGL is available also in different
bindings, some of them being C and C++, Java and FORTAN.
OpenGL is evolving through its extensions mechanism that allows new hardware
innovations to be accessible to the API, as soon as the developers have the hardware
(and the extension) ready. OpenGL is also scalable as it can run in a variety of
computers, from ‘simple’ home systems to workstations and supercomputers. This is
achieved through OpenGL’s hardware capabilities inquiry mechanism. OpenGL is well
structured with logical commands (a few hundred). OpenGL also encapsulates
information about the underlying hardware, freeing the application developer from
having to design hardware specific code.
To the programmer OpenGL is a set of commands. Firstly he opens a window in
the frame buffer into which the program will draw. After this some calls are made to
establish a GL context. When this is done, the programmer is free to use the OpenGL
commands to describe 2D and 3D objects and alter their appearance or changing their
attributes (or state). The programmer is also free to manipulate directly the frame buffer
with calls like read and write pixels.
OpenGL is more precise than e.g. DirectX. GKS (and XGKS) or PHIGS (and
PEX) are years in the market but are not as platform independent as OpenGL. OpenGL
is also free. And for scientific visualization, virtual environments, CAD/CAM/CAE,
medical imaging and so on precision and platform independence are the key features.
OpenGL provides the user with fairly direct control over the fundamental
operations of two- and three-dimensional graphics. This includes specification of such
parameters as transformation matrices, lighting equation coefficients, antialiasing
methods, and pixel update operators. However, it doesn't provide the user with a means
for describing or modeling complex geometric objects. Thus, the OpenGL commands
specify how a certain result should be produced (what procedure should be followed)
rather than what exactly that result should look like. That is, OpenGL is fundamentally
procedural rather than descriptive. Because of this procedural nature, it helps to know
how OpenGL works - the order in which it carries out its operations, for example - in
order to fully understand how to use it.

92
Data Visualization

9.6.1 OpenGL as a state machine


OpenGL is a state machine. We put it into various states (or modes) that then
remain in effect until we change them. The current color is a state variable. We can set
the current color to white, red, or any other color, and thereafter every object is drawn
with that color until we set the current color to something else. The current color is only
one of many state variables that OpenGL maintains. Others control such things as the
current viewing and projection transformations, line and polygon stipple patterns,
polygon drawing modes, pixel-packing conventions, positions and characteristics of
lights, and material properties of the objects being drawn. Many state variables refer to
modes that are enabled or disabled with the command glEnable() or glDisable(). Each
state variable or mode has a default value, and at any point we can query the system for
each variable's current value.
Although we can draw complex and interesting pictures using OpenGL, they're
all constructed from a small number of primitive graphical items. At the highest level of
abstraction, there are three basic drawing operations: clearing the window, drawing a
geometric object, and drawing a raster object. With OpenGL, every time we issue a
drawing command, the specified object is drawn. This might seem obvious, but in some
systems, we first make a list of things to draw. When the list is complete, we tell the
graphics hardware to draw the items in the list. The first style is called immediate-mode
graphics and is the default OpenGL style. In addition to using immediate mode, we can
choose to save some commands in a list (called a display list) for later drawing.
Immediate-mode graphics are typically easier to program, but display lists are often
more efficient.

9.6.2 Execution model and processing pipeline


The model for interpretation of OpenGL commands is client-server. An
application (the client) issues commands, which are interpreted and processed by
OpenGL (the server). The server may or may not operate on the same computer as the
client. In this sense, OpenGL is network-transparent. A server can maintain several GL
contexts, each of which is an encapsulated GL state. A client can connect to any one of
these contexts. The required network protocol can be implemented by augmenting an
already existing protocol (such as that of the X Window System) or by using an

93
Data Visualization

independent protocol. No OpenGL commands are provided for obtaining user input,
windows management or user interaction.

Figure 51 OpenGL block diagram

The Figure 51 gives an abstract, high-level block diagram of how OpenGL


processes data. In the diagram, commands enter from the left and proceed through what
can be thought of as a processing pipeline. Some commands specify geometric objects
to be drawn, and others control how the objects are handled during the various
processing stages. Rather than having all commands preceded immediately through the
pipeline, the user can choose to accumulate some of them in a display list for processing
at a later time. The evaluator stage of processing provides an efficient means for
approximating curve and surface geometry by evaluating polynomial commands of
input values. Rasterization produces a series of frame buffer addresses and associated
values using a two-dimensional description of a point, line segment, or polygon. Each
fragment so produced is fed into the last stage, per-fragment operations, which perform
the final operations on the data before it's stored as pixels in the frame buffer. These
operations include conditional updates to the frame buffer based on incoming and
previously stored z-values (for z-buffering) and blending of incoming pixel colors with
stored colors, as well as masking and other logical operations on pixel values. Input data
can be in the form of pixels rather than vertices. Pixel data skips the first stage of
processing and instead is processed as pixels, in the pixel operations stage. The result of
this stage is either stored as texture memory, for use in the rasterization stage, or

94
Data Visualization

rasterized and the resulting fragments merged into the frame buffer just as if they were
generated from geometric data.
The diagram in Figure 52 details the OpenGL processing pipeline. For most of
the pipeline, we can see three vertical arrows between the major stages. These arrows
represent vertices and the two primary types of data that can be associated with vertices:
color values and texture coordinates. Also note that vertices are assembled into
primitives, then into fragments, and finally into pixels in the frame buffer.
Many OpenGL functions are variations of each other, differing mostly in the data
types of their arguments. Some functions differ in the number of related arguments and
whether those arguments can be specified as a vector or must be specified separately in
a list. For example, if we use the glVertex2f function, we need to supply x- and y-
coordinates as 32-bit floating-point numbers; with glVertex3sv, we must supply an
array of three short (16-bit) integer values for x, y, and z.

Figure 52 OpenGL processing pipeline

95
Data Visualization

9.6.3 Related libraries and utilities


OpenGL provides a powerful but primitive set of rendering commands, and all
higher-level drawing must be done in terms of these commands. Also, OpenGL
programs have to use the underlying mechanisms of the windowing system. A number
of libraries exist to allow simplifying programming tasks, including the following:
• The OpenGL Utility Library (GLU) contains several routines that use lower-
level OpenGL commands to perform such tasks as setting up matrices for
specific viewing orientations and projections, performing polygon tessellation,
and rendering surfaces. This is a set of functions to create texture mipmaps
from a base image, map coordinates between screen and object space, and
draw quadric surfaces and NURBS. This library is provided as part of every
OpenGL implementation. GLU routines use the prefix glu.

• For every window system, there is a library that extends the functionality of
that window system to support OpenGL rendering. For machines that use the
X Window System, the OpenGL Extension to the X Window System (GLX) is
provided as an adjunct to OpenGL. GLX routines use the prefix glX. For
Microsoft Windows, the WGL routines provide the Windows to OpenGL
interface. All WGL routines use the prefix wgl.

• The OpenGL Utility Toolkit (GLUT) is a window system-independent toolkit,


written by Mark Kilgard, to hide the complexities of differing window system
APIs. It implements a simple windowing application programming interface
for OpenGL. GLUT makes it considerably easier to learn about and explore
OpenGL programming and provides a portable API so we can write a single
OpenGL program that works on both Win32 PCs, Mac OS, and Linux/UNIX
workstations. GLUT routines use the prefix glut.

• The OpenGL Character Renderer (GLC) is a platform - independent character


renderer that is convenient to use for simple applications, can scale and rotate
text and draw text using lines, filled triangles, or bitmaps , and supports
international characters. It offers many adavantages over GLX and WGL.

• QuesoGLC is a free implementation of the OpenGL Character Renderer


(GLC). QuesoGLC is based on the FreeType library, provides Unicode

96
Data Visualization

support and is designed to be easily ported to any platform that supports both
FreeType and the OpenGL API.

• The OpenGL Stream Codec (GLS) is a facility for encoding and decoding
streams of 8-bit bytes that represent sequences of OpenGL commands.

• Open Inventor is an object-oriented toolkit based on OpenGL which provides


objects and methods for creating interactive 3D graphics applications. Open
Inventor, which is written in C++, provides prebuilt objects and a built-in
event model for user interaction, high-level application components for
creating and editing 3D scenes, and the ability to print objects and exchange
data in other graphics formats. Open Inventor is separate from OpenGL.

• Mesa is a 3-D graphics library with an API which is very similar to that of
OpenGL. Mesa is an open-source implementation of the OpenGL specification
- a system for rendering interactive 3D graphics.

9.6.4 OpenGL for Windows


Microsoft's implementation of OpenGL in Windows XP/NT//2000 and Windows
95/98 includes the following components:
• The full set of current OpenGL commands.
OpenGL contains a library of core functions for 3-D graphics operations. These
basic functions are used to manage object shape description, matrix transformation,
lighting, coloring, texture, clipping, bitmaps, fog, and antialiasing. The names for these
core functions have a "gl" prefix. Many of the OpenGL commands have several
variants, which differ in the number and type of their parameters. Counting all the
variants, there are more than 300 OpenGL commands.
• The OpenGL Utility (GLU) library.
This library of auxiliary functions complements the core OpenGL functions. The
commands manage texture support, coordinate transformation, polygon tessellation,
rendering spheres, cylinders and disks, NURBS (Non-Uniform Rational B-Spline)
curves and surfaces, and error handling.
• The OpenGL Programming Guide Auxiliary library.
This is a simple, platform-independent library of functions for managing
windows, handling input events, drawing classic 3-D objects, managing a background

97
Data Visualization

process, and running a program. The window management and input routines provide a
base level of functionality with which we can quickly get started programming in
OpenGL. Do not use them, however, in a production application. Here are some reasons
for this warning:
− The message loop is in the library code.
− There is no way to add handlers for additional WM* messages.
− There is very little support for logical palettes.
• The WGL functions.
This set of functions connects OpenGL to the Windows XP/NT/2000 and
Windows 95/98 windowing system. The functions manage rendering contexts, display
lists, extension functions, and font bitmaps. The WGL functions are analogous to the
GLX extensions that connect OpenGL to the X Window System. The names for these
functions have a "wgl" prefix.
• New Win32 functions for pixel formats and double buffering.
These functions support per-window pixel formats and double buffering (for
smooth image changes) of windows. These new functions apply only to OpenGL
graphics windows.
The functions and routines of the Win32 library are necessary to initialize the
pixel format and control rendering for OpenGL. Some routines, which are prefixed by
wgl, extend Win32 so that OpenGL can be fully supported. For Win32/WGL, the
PIXELFORMATDESCRIPTOR is the key data structure to maintain pixel format
information about the OpenGL window. A variable of data type
PIXELFORMATDESCRIPTOR keeps track of pixel information, including pixel type
(RGBA or color index), single- or double- buffering, resolution of colors, and presence
of depth, stencil, and accumulation buffers. More information about WGL is available
through the Microsoft Developer Network Library
The generic implementation from Microsoft has the following limitations:
• Printing.
We can print an OpenGL image directly to a printer using metafiles only.
• OpenGL and GDI graphics cannot be mixed in a double-buffered window.

98
Data Visualization

An application can directly draw both OpenGL graphics and GDI graphics into a
single-buffered window, but not into a double-buffered window.
• There are no per-window hardware color palettes.
Windows has a single system hardware color palette, which applies to the whole
screen. An OpenGL window cannot have its own hardware palette, but can have its own
logical palette. To do so, it must become a palette-aware application.
• There is no direct support for the Clipboard, dynamic data exchange (DDE), or
OLE.
A window with OpenGL graphics does not directly support these Windows
capabilities.
• The Inventor 2.0 C++ class library is not included.
The Inventor class library, built on top of OpenGL, provides higher-level
constructs for programming 3-D graphics. It is not included in the current version of
Microsoft's implementation of OpenGL for Windows.
• There is no support for the following pixel format features: stereoscopic
images, alpha bit planes, and auxiliary buffers.
There is, however, support for several ancillary buffers including: stencil buffer,
accumulation buffer, back buffer (double buffering), overlay and underlay plane buffer,
and depth (z-axis) buffer.

9.6.5 Describing points, lines, polygons and other geometric objects


In any OpenGL implementation, floating-point calculations are of finite
precision, and they have round-off errors. Consequently, the coordinates of OpenGL
points, lines, and polygons suffer from the same problems. Another limitation arises
from a raster graphics display. On such a display, the smallest displayable unit is a
pixel, and although pixels might be less than 1/100 of an inch wide, they are still much
larger than the mathematician's concepts of infinitely small (for points) or infinitely thin
(for lines). When OpenGL performs calculations, it assumes points are represented as
vectors of floating-point numbers. However, a point is typically (but not always) drawn
as a single pixel, and many different points with slightly different coordinates could be
drawn by OpenGL on the same pixel. OpenGL works in the homogeneous coordinates
of 3D projective geometry, so for internal calculations, all vertices are represented with

99
Data Visualization

four floating-point coordinates (x, y, z, w). If w is different from zero, these coordinates
correspond to the Euclidean three-dimensional point (x/w, y/w, z/w). We can specify the
w coordinate in OpenGL commands, but that's rarely done. If the w coordinate isn't
specified, it's understood to be 1.0.
A point is represented by a set of floating-point numbers called a vertex. All
internal calculations are done as if vertices are 3D. Vertices specified by the user as 2D
are assigned a z coordinate equal to zero by OpenGL.
In OpenGL, the term line refers to a line segment, not the mathematician's
version that extends to infinity in both directions. There are easy ways to specify a
connected series of line segments, or even a closed, connected series of segments. In all
cases, though, the lines constituting the connected series are specified in terms of the
vertices at their endpoints. Polygons are the areas enclosed by single closed loops of
line segments, where the line segments are specified by the vertices at their endpoints.
Polygons are typically drawn with the pixels in the interior filled in, but we can also
draw them as outlines or a set of points. In general, polygons can be complicated, so
OpenGL makes some strong restrictions on what constitutes a primitive polygon. First,
the edges of OpenGL polygons can't intersect. Second, OpenGL polygons must be
convex, meaning that they cannot have indentations. Stated precisely, a region is convex
if, given any two points in the interior, the line segment joining them is also in the
interior. The reason for the OpenGL restrictions on valid polygon types is that it's
simpler to provide fast polygon-rendering hardware for that restricted class of polygons.
Simple polygons can be rendered quickly. The difficult cases are hard to detect quickly.
So for maximum performance, OpenGL crosses its fingers and assumes the polygons
are simple. Many real-world surfaces consist of nonsimple polygons, nonconvex
polygons, or polygons with holes. Since all such polygons can be formed from unions
of simple convex polygons, some routines to build more complex objects are provided
in the GLU library. These routines take complex descriptions and tessellate them, or
break them down into groups of the simpler OpenGL polygons that can then be
rendered. Since OpenGL vertices are always three-dimensional, the points forming the
boundary of a particular polygon don't necessarily lie on the same plane in space. If a
polygon's vertices don't lie in the same plane, then after various rotations in space,

100
Data Visualization

changes in the viewpoint, and projection onto the display screen, the points might no
longer form a simple convex polygon.
Since rectangles are so common in graphics applications, OpenGL provides a
filled-rectangle drawing primitive, glRect*(). We can draw a rectangle as a polygon too,
but implementation of OpenGL might have optimized glRect*() for rectangles.
Any smoothly curved line or surface can be approximated - to any arbitrary
degree of accuracy - by short line segments or small polygonal regions. Thus,
subdividing curved lines and surfaces sufficiently and then approximating them with
straight line segments or flat polygons makes they appear curved (see Figure 53).
Even though curves aren't geometric primitives, OpenGL does provide some
direct support for subdividing and drawing them.

Figure 53 Approximating Curves

With OpenGL, all geometric objects are described as an ordered set of vertices.
We can use the glVertex*() command to specify a vertex.

9.6.6 Geometric drawing primitives


How to tell OpenGL to create a set of points, a line, or a polygon from vertices.
To do this, we bracket each set of vertices between a call to glBegin() and a call to
glEnd(). The argument passed to glBegin() determines what sort of geometric primitive
is constructed from the vertices.
For example code fragment for drawing a polygon:
glBegin(GL_POLYGON);
glVertex2f(0.0, 0.0);
glVertex2f(0.0, 3.0);
glVertex2f(4.0, 3.0);
glVertex2f(6.0, 1.5);
glVertex2f(4.0, 0.0);
glEnd();

If we had used GL_POINTS instead of GL_POLYGON, the primitive would


have been simply the five points.

101
Data Visualization

Figure 54 Drawing a polygon or a set of points

Figure 55 shows examples of all the geometric primitives in OpenGL listed in


the Table 5.

Figure 55 Geometric primitive types

9.6.7 Displaying points, lines, and polygons


By default, a point is drawn as a single pixel on the screen, a line is drawn solid
and one pixel wide, and polygons are drawn solidly filled in.

Table 5 Geometric primitive names and meanings


Value Meaning
GL_POINTS individual points
GL_LINES pairs of vertices interpreted as individual line segments
GL_LINE_STRIP series of connected line segments
GL_LINE_LOOP same as above, with a segment added between last and first
vertices

102
Data Visualization

GL_TRIANGLES triples of vertices interpreted as triangles


GL_TRIANGLE_STRIP linked strip of triangles
GL_TRIANGLE_FAN linked fan of triangles
GL_QUADS quadruples of vertices interpreted as four-sided polygons
GL_QUAD_STRIP linked strip of quadrilaterals
GL_POLYGON boundary of a simple, convex polygon

The actual collection of pixels on the screen which are drawn for various point
widths depends on whether antialiasing is enabled. Antialiasing is a technique for
smoothing points and lines as they're rendered. If antialiasing is disabled (the default),
fractional widths are rounded to integer widths, and a screen-aligned square region of
pixels is drawn. Thus, if the width is 1.0, the square is 1 pixel by 1 pixel; if the width is
2.0, the square is 2 pixels by 2 pixels, and so on. With antialiasing enabled, a circular
group of pixels is drawn, and the pixels on the boundaries are typically drawn at less
than full intensity to give the edge a smoother appearance. In this mode, non-integer
widths aren't rounded. Most OpenGL implementations support very large point sizes.
With OpenGL, we can specify lines with different widths and lines that are
stippled in various ways - dotted, dashed, drawn with alternating dots and dashes, and
so on. The actual rendering of lines is affected by the antialiasing mode, in the same
way as for points. Without antialiasing, widths of 1, 2, and 3 draw lines 1, 2, and 3
pixels wide.
Polygons are typically drawn by filling in all the pixels enclosed within the
boundary, but we can also draw them as outlined polygons or simply as points at the
vertices. A filled polygon might be solidly filled or stippled with a certain pattern.
Antialiasing polygons is more complicated than for points and lines. A polygon has two
sides - front and back - and might be rendered differently depending on which side is
facing the viewer.

9.7 The Visualization Toolkit (VTK)


The Visualization Toolkit (VTK) is an open source, freely available software
system for 3D computer graphics, image processing, and visualization. Although
tailored for the scientific visualization field it can also be successfully used to illustrate
the basics of 3D graphics and rendering. VTK includes a textbook published by Kitware
(The Visualization Toolkit, An Object-Oriented Approach To 3D Graphics, soft edition,
ISBN 1-930934-12-2), a C++ class library, and several interpreted interface layers
including Tcl/Tk, Java, and Python. VTK has been implemented on nearly every Unix-

103
Data Visualization

based platform, PC's (Windows 9x/NT/ME/2000/XP) and Mac OS X. The design and
implementation of the library has been strongly influenced by object-oriented
principles.

Building applications with toolkits Toolkit architecture

Figure 56 VTK system architecture

The graphics model in VTK is at a higher level of abstraction than rendering


libraries like OpenGL or PEX. This means it is much easier to create useful graphics
and visualization applications. VTK applications can be written directly in C++, Tcl,
Java, or Python. In fact, using the interpreted languages Tcl or Python with Tk, and
even Java with its GUI class libraries, it is possible to build useful applications really
fast. The software is a true visualization system, it doesn't just let you visualize
geometry. VTK supports a wide variety of visualization algorithms including scalar,
vector, tensor, texture, and volumetric methods; and advanced modeling techniques like
implicit modeling, polygon reduction, mesh smoothing, cutting, contouring, and
Delaunay triangulation. VTK is object-oriented, we need only to create objects of the
appropriate types and call the available methods.
Figure 56 illustrates the basic idea of VTK. Toolkits enable complex applications
to be built from small pieces. The key here is that the pieces must be well defined with
simple interfaces. In this way they can be readily assembled into larger systems. VTK
system has the best of both the compiled and interpreted approaches. The core
computational objects are using a compiled language, and the higher level applications
are using an interpreted language. There is a boundary between the compiled core and
the interpreted language. This feature ensured that the compiled core could be easily
separated from the interpreted language and easily imbedded into applications. The core
of the system is independent of used windowing systems (graphical user interface).

104
Data Visualization

The graphics model captures the essential features of a 3D graphics system in a


form that is easy to understand and use. The abstraction is based on the movie-making
industry, with some influence from current graphical user interface (GUI) windowing
systems. There are nine basic objects in the graphics model:
1. Render Master - coordinates device-independent methods and creates rendering
windows.
2. Render Window - manages a window on the display device. One or more
renderers draw into a render window to generate a scene (i.e., final image).
3. Renderer - coordinates the rendering of lights, cameras, and actors.
4. Light - illuminates the actors in a scene.
5. Camera - defines the view position, focal point, and other camera characteristics.
6. Actor - an object drawn by a renderer in the scene. Actors are defined in terms of
mapper, property, and a transform objects.
7. Property - represents the rendered attributes of an actor including object color,
lighting (e.g., specular, ambient, diffuse), texture map, drawing style (e.g.,
wireframe or shaded); and shading style.
8. Mapper - represents the geometric definition of an actor and maps the object
through a lookup table. More than one actor may refer to the same mapper.
9. Transform - an object that consists of a 4x4 transformation matrix and methods to
modify the matrix. It specifies the position and orientation of actors, cameras, and
lights.

Instances of render window

Renderer
Render master instances
(creates rendering Windows)

Camera defines view

Lights illuminate scene

Actor instances
(property, transfom, mapper)

Figure 57 The graphics model in VTK

VTK components (classes) can be grouped according to their functionality. The


most important groups are: Sources, Filters, Mappers, Graphics, The next one are:

105
Data Visualization

Actors, Renderers & Windows and UI using Tcl/Tk. Sources are information provider
objects. Some sources are readers, that is, they get the information from files in
appropriate formats. Examples: vtkPLOT3DReader, vtkBYUReader,
vtkPolyDataReader (one of vtk own formats). Others generate data algorithmically. The
typical example are the sources that create basic geometric shapes like
vtkSphereSource, vtkCylinderSource, vtkConeSource. Mappers, as suggested by the
name itself, get the information provided by source objects, directly or through
appropriate filters, and generate graphic primitives. There are two types of mappers:
vtkDataSetMapper, vtkPolyDataMapper. A special type of mappers are the writers,
which write out information in files with different formats.

Process object Process object Process object

A Output
B Output
C
Intput Intput

Dataset A Dataset B

Data object Data object

Figure 58 The visualization model in VTK

Examples: vtkBYUWriter, vtkTIFFWriter, vtkPolyDataWriter. Components in


the graphics group are responsible for the rendering portion of the visualization
pipeline. Objects in this group include: vtkRenderer, vtkRenderWindow, vtkActor,
vtkProperty, vtkTransform, vtkCamera, vtkLight. Renderers and Windows represent the
end of the VTK pipeline, which users actually see on the screen. Component
vtkRenderer is a virtual class. But vtkOpenGLRenderer, which inherits from
vtkRenderer, is a concrete class instantiated when the graphics system is OpenGL.
Objects of type vtkRenderWindow represent the application window. The window will
be a Windows window or an X window, according to the graphics system being used.
Each instance of vtkActor represents a scene “object”, combining the geometry
(provided by the mapper) with the optical properties (colour, texture, etc) and concrete
values for the location, orientation and size. Instances of vtkProperty are associated with
actors to control its appearance. Similarly, instances of vtkTransform are associated
with actors to determine their location, orientation and size. Instances of vtkCamera and

106
Data Visualization

vtkLight are associated with Renderer objects to specify how the scene is seen and
illuminated. If the user does not explicitly create objects of these two types, default ones
are provided. Filters are “transformation” objects and there are many types of filters
available. For example, the marching cubes and marching squares algorithms are
implemented as filters. The same is true for decimation, sampling, geometry extraction,
thresholding, particle and many other algorithms. The simplest pipeline includes a
source, a mapper and an actor. Actors must be associated with a renderer and the
renderer associated to a window.
The VTK model is based on the data-flow paradigm adopted by many
commercial systems. In this paradigm, modules are connected together into a network.
The modules perform algorithmic operations on data as it flows through the network.
The execution of this visualization network is controlled in response to demands for
data (demand-driven) or in response to user input (event driven). The appeal of this
model is that it is flexible, and can be quickly adapted to different data types or new
algorithmic implementations. The visualization model consists of two basic types of
objects: process objects and data objects (see Figure 58). Process objects are the
modules, or algorithmic portions of the visualization network. Data objects, also
referred to as datasets, represent and enable operations on the data that flows through
the network.

107
Data Visualization

Figure 59 VTK visualization pipeline

108
Data Visualization

Simple VTK application

#include "vtkSphereSource.h"
#include "vtkPolyDataMapper.h"
#include "vtkActor.h"
#include "vtkProperty.h"
#include "vtkRenderWindow.h"
#include "vtkRenderer.h"
#include "vtkRenderWindowInteractor.h"

void main ()
{

// create sphere geometry


vtkSphereSource *sphere = vtkSphereSource::New();
sphere->SetRadius( 1.0 );
sphere->SetThetaResolution( 64 );
sphere->SetPhiResolution( 64 );

// map to graphics library


vtkPolyDataMapper *map = vtkPolyDataMapper::New();
map->SetInput( sphere->GetOutput() );

// actor coordinates geometry, properties, transformation


vtkActor *aSphere = vtkActor::New();
aSphere->SetMapper( map );
aSphere->GetProperty()->SetColor( 1, 0, 0 ); // sphere color
blue

// a renderer and render window


vtkRenderer *ren1 = vtkRenderer::New();
vtkRenderWindow *renWin = vtkRenderWindow::New();
renWin->AddRenderer( ren1 );

// an interactor
vtkRenderWindowInteractor *iren =
vtkRenderWindowInteractor::New();
iren->SetRenderWindow( renWin );

// add the actor to the scene


ren1->AddActor( aSphere );
ren1->SetBackground( 1, 1, 1 ); // Background color white

// render an image (lights and cameras are created


automatically)
renWin->Render();

// begin mouse interaction


iren->Start();
}

109
Data Visualization

9.7.1 VTK vs. OpenGL


VTK is built upon OpenGL. Most closely modeling what is going on in the
graphics card, OpenGL provides more flexibility and precision in use. It is also quicker.
If there is something that you want to design with a specific design, OpenGL is best.
However, the development process will be slower since you cannot rely on the powerful
VTK libraries. VTK is great when it comes to displaying large amounts of data. For
example, I display data from mathematical finite element analyses. It is slow and
memory-intensive because it uses additional libraries.

9.8 The Prefuse visualization toolkit


Prefuse is a set of software tools for creating rich interactive data visualizations.
The original prefuse toolkit provides a visualization framework for the Java
programming language. The prefuse flare toolkit provides visualization and animation
tools for ActionScript and the Adobe Flash Player. Prefuse is an interactive graphical
open source toolkit written in Java. It is thought to support the development of
sophisticated, highly interactive, and flexible information visualizations. The
architecture of prefuse considers the visualization pipeline, a recommendation on how
to implement a visualization. To provide flexibility, a polylithic design was chosen
which enables developers to implement only needed functionality and customize these
to meet the own requirements. But, in contrast to a monolithic design, this design needs
more time to conceive and influences the architecture of the own application heavily.
Typically, prefuse is designed to visualize interrelated information so it can be
stored in a graph or tree structure, but, also not related data can be used which is stored
within a data table. Even if the structure is hierarchical, the resulting tree or graph
structure must not be considered in the proper graphical representation as layout
algorithms are not restricted in any way. The painting of visual items is done by using a
renderer which has access to the respective item itself and the Graphics2D context of
the view. This approach enables to use the whole range of available painting methods of
Java, therefore, all painting issues are completely independent of the toolkit itself.
Besides providing a large set of predefined elements to visualize data, most attention
was set toward usability of the visualization. This is obtained by applying several
interaction techniques like tooltips or dragging visual elements. But also more
sophisticated techniques like zooming, panning, or semantic zooming are supported or

110
Data Visualization

partially even provided by the toolkit itself. Further, prefuse clearly distinguishes
between absolute and view coordinates. This separation helps users to place all visual
elements in a logical way without consideration of later applied visualization techniques
that changes the entire view.

Figure 60 The visualization pipeline of prefuse

Prefuse uses a centralized object called Visualization (former ItemRegistry)


which maintains and manages the whole visualization. This object has to fulfill several
tasks. It stores the abstract data as well as visual analogous of these in two different
tables. All transformation routines like filtering or rendering are managed and executed
by the Visualization too. Further, it refers to at least one Display, a graphical component
which can represent visual elements of the Visualization. Transformation routines are
defined as Actions which are combined in ActionLists. The Display is the view of
prefuse. It can be embedded in each Java Swing application. The Display also provides
navigation techniques like zooming or panning which change the view. Furthermore, a
Display also maintains a set of ControlListener which are used to process user
interactions (mouse or keyboard events).
Raw data is the base of the application. Typically, the source of raw data is a file,
but also other sources like a database or web content are thinkable. However, prefuse
provides several file readers for different formats (CSV, Tab-Delimitated, etc.) that can
be transformed into a data table. Further, some readers also support the reading of SQL-
Databases. If another source is needed, a reader can be written manually.
The data table, which is used to store abstract data, works with data types and a
definition schema which maps relations between data-type and table. That means, each

111
Data Visualization

row contains a data record, and each column contains values for a named data field with
a specific data-type. Each record of the table is referred as a Tuple.
Visual structures are created when filtering the data container which contains the
abstract data. In case of a visualization that displays all items from the beginning on, the
filtering routine must only executed once at initializing time.
Typically, the rendered visual structure is not stored, instead, they are re-
rendered whenever needed. However, in case of very complex but constant visual
representations it makes sense to hold items in an off-screen image.

9.9 Ferret
Ferret is an analysis tool for gridded and non-gridded data and an interactive
computer visualization and analysis environment designed to meet the needs of
oceanographers and meteorologists analyzing large and complex gridded data sets. It
runs on most UNIX systems, and on Windows XP/NT/9x using X windows for display.
It can transparently access extensive remote Internet data bases using OPeNDAP
(formerly known as DODS). OPeNDAP and Ferret create a powerful tool for the
retrieval, sampling, analyzing and displaying of datasets; regardless of size or data
format (though there are data format limitations).
Ferret was developed by the Thermal Modeling and Analysis Project (TMAP) at
NOAA's Pacific Marine Environmental Laboratory PMEL in Seattle to analyze the
outputs of its numerical ocean models and compare them with gridded, observational
data. The model data sets are generally multi-gigabyte in size with mixed 3 and 4-
dimensional variables defined on staggered grids. Ferret offers a Mathematica-like
approach to analysis; new variables may be defined interactively as mathematical
expressions involving data set variables. Calculations may be applied over arbitrarily
shaped regions. Fully documented graphics are produced with a single command.
Many software packages have been developed recently for scientific
visualization. The features that make Ferret distinctive among these packages are
Mathematica-like flexibility, geophysical formatting, "intelligent" connection to its data
base, memory management for very large calculations, and symmetrical processing in 4
dimensions.
PMEL has developed a WWW-based visualization and data extraction system.
The PMEL server uses HTML forms to provide a point and click front end to the

112
Data Visualization

scientific analysis and visualization program FERRET. The server provides access to a
large (over 20 gigabytes) research-oriented data base of multi-dimensional, gridded,
environmental data collected within NOAA and elsewhere. The data base is maintained
by the Thermal Modeling and Analysis Project (TMAP) at PMEL.
Ferret LAS software can be obtained and configured to meet the needs of other
gridded data set providers. It is especially well-suited to groups of data producers with
related data sets at distributed locations (for example, a community modeling effort). It
runs on most UNIX systems, and on Windows XP/NT/9x using X windows for display.
It can be installed to run from a Web browser ("WebFerret") for uses while away from
desk or from a system lacking X windows software. It can transparently access
extensive remote Internet data bases using OpeNDAP; (see
http://ferret.wrc.noaa.gov/Ferret/ and http://www.opendap.org/).

9.10 The Persistence of Vision Ray-tracer (POV-Ray)


The Persistence of Vision Ray-tracer is a high-quality, totally free tool for
creating 3D graphics. It is available for Windows, Mac OS/Mac OS X and i86 Linux.
POV-Ray creates 3D, photo-realistic images using a rendering technique called ray-
tracing. It reads in a text file containing information describing the objects and lighting
in a scene and generates an image of that scene from the view point of a camera also
described in the text file. Ray-tracing is not a fast process, but it produces very high
quality images with realistic reflections, shading, perspective and other effects. The user
specifies the location of the camera, light sources, and objects as well as the surface
texture properties of objects, their interiors (if transparent) and any atmospheric media
such as fog, haze, or fire. The Persistence of Vision Ray-Tracer was developed from
DKBTrace 2.12 (written by David K. Buck and Aaron A. Collins) by a team of people
(called the POV-Team). Many scenes are included with POV-Ray so we can start
creating images immediately. These scenes can be modified so we do not have to start
from begin. In addition to the pre-defined scenes, a large library of pre-defined shapes
and materials is provided. We can include these shapes and materials in own scenes by
just including the library file name at the top of scene file, and by using the shape or
material name in scene.
The main features of POV-Ray are:
• Easy to use scene description language.

113
Data Visualization

• Large library of example scene files.

• Standard include files that pre-define many shapes, colors and textures.

• High quality output image files (up to 48-bit color).

• 16 and 24 bit color display on many computer platforms using appropriate


hardware.

• Many camera types, including perspective, orthographic, fisheye, etc.

• Photons for realistic, reflected and refracted, caustics. Photons also interact
with media.

• Phong and specular highlighting for more realistic-looking surfaces.

• Inter-diffuse reflection (radiosity) for more realistic lighting.

• Atmospheric effects like atmosphere, ground-fog and rainbow.

• Particle media to model effects like clouds, dust, fire and steam.

• Several image file output formats including Targa, BMP (Windows only),
PNG and PPM.

• Basic shape primitives such as: spheres, boxes, quadrics, cylinders, cones,
triangle and planes.

• Advanced shape primitives such as: Tori (donuts), bezier patches, height fields
(mountains), blobs, quartics, smooth triangles, text, superquadrics, surfaces of
revolution, prisms, polygons, lathes, fractals, isosurfaces and the parametric
object.

• Shapes can easily be combined to create new complex shapes using


Constructive Solid Geometry (CSG). POV-Ray supports unions, merges,
intersections and differences.

• Objects are assigned materials called textures (a texture describes the coloring
and surface properties of a shape) and interior properties such as index of
refraction and particle media.

• Built-in color and normal patterns etc.

114
Data Visualization

10 Bibliography

Visual-Literacy.org [on-line]. Visual Literacy: An E-Learning


Tutorial on Visualization for Communication, Engineering and
Business, [cit. 2008-07-24],
URL: <http://www.visual-literacy.org/>
Brodlie K.W. Scientific Visualization - Techniques and Applications,
et al. Springer-Verlag, 1992, ISBN 978-0387545653
Bruckner S.: Efficient Volume Visualization of Large Medical Datasets.
Master Thesis, supervised by E. Groeller, Dipl.-Inform. S.
Grimm, Institut of Computer Graphics and Algorithms Vienna
University of Technology,
Card S.K;, Readings in Information Visualization: Using Vision to Think
Mackinlay J;, (Interactive Technologies). Academic Press 1999, ISBN 1-
Shneiderman B.: 55860-533-9
Caumon G. et al.: Visualization of grids conforming to geological structures: a
topological approach, COMPUTERS & GEOSCIENCES, 31,
2005, pages 671–680, 2005
Chi E. H.: Information visualization reference model [on-line]. A
Taxonomy of Visualization Techniques using the Data State
Reference Model, 1999, [cit. 2008-07-24],
URL: < http://www-
users.cs.umn.edu/~echi/papers/infovis00/Chi-
TaxonomyVisualization.pdf >
Hearn D. D., "Scientific Visualization", Tutorial Notes for Eurographics '91,
Baker P.: Sept., 1991, Eurographics Technical Report Series, EG 91, TN 6
(ISSN 1017-4656).
Lorensen W.E., ACM Portal [on-line]. Marching cubes: A high resolution 3D
Cline H.E. surface construction algorithm, [cit. 2008-07-24], URL:
<http://portal.acm.org/citation.cfm?id=37422>
McCormick B. H. Visualization in Scientific Computing. Computer Graphics
et al.: 21(6), November 1987.
Nielson G. M. et al.: Visualization in Scientific Computing, IEEE Computer Society
Press Tutorial series, 1990, ISBN 0818659793
Owen G. S.: Georgia State Uni., Dep. of Computer Science [on-line].
Visualization Education in the USA, 1993, [cit. 2008-07-24],
URL:
<http://www.cs.gsu.edu/mathcsc/research/papers/vised.html>
Owen G. S.: ACM SIGGRAPH Education Committee [on-line]. HyperVis -
Teaching Scientific Visualization Using Hypermedia, [cit. 2008-
07-24], URL:
<http://www.siggraph.org/education/materials/HyperVis/hypervi
s.htm>

115
Data Visualization

Post F.H;, Nielson Data Visualization: The State of the Art. The Springer
G.M;, Bonneau G- International Series in Engineering and Computer Science,
P.: Kluwer Academic Publishers, 2003, ISBN 1-4020-7259-7
Speray D., ACM SIGGRAPH Computer Graphics archive [on-line].
Kennon S.: Volume probes: interactive data exploration on arbitrary grids.
Volume 24, Issue 5 (November 1990), pages 5 – 12, 1990,
ISSN:0097-8930, [cit. 2008-07-24], URL:
<http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&i
d=99310>
Ward M. O.: Xmdv Tool Home Page [on-line]. A Taxonomy of Glyph
Placement Strategies for Multidimensional Data Visualization,
1999 [cit. 2008-07-24], URL:
<http://davis.wpi.edu/~xmdv/docs/jinfovis02_glyphpos.pdf>
Ware C.: Information Visualization, Second Edition: Perception for
Design. Elsevier Inc. 2004, ISBN 1-55860-819-2.
Wong P. Ch., 30 Years of Multidimensional Multivariate Visualization.
Bergeron R. D: Scientific Visualization - Overviews, Methodologies and
Techniques, pages 3-33, Los Alamitos, CA, 1997. IEEE
Computer Society Press.

116

Você também pode gostar