Escolar Documentos
Profissional Documentos
Cultura Documentos
DATA VISUALIZATION
Course Textbook
Jozef Vaský
TRNAVA 2007
Data Visualization
Table of Contents
2
Data Visualization
3
Data Visualization
List of Figures
4
Data Visualization
List of Tables
Note:
The screens dumps of pictures (screenshots) are copyright of the respective
authors and are taken from the cited publications. These textbook was generated as
course documentation only. Therefore, the textbook is provided exclusively for students
attending the course to study the course material.
5
Data Visualization
1 Introduction
Data visualization is currently a very active area of research and teaching. The
term unites primarily the established field of scientific visualization and information
visualization. There are recently emerging many fields of visualization as: process
visualization, product visualization, software visualization, illustrative visualization,
uncertainly visualization, visual analytics etc.
The success of data visualization is due to the basic idea behind it: the use of
computer generated images to gain inside from data and its relationships. A second
premise is the utilization of the broad bandwidth of the human sensory system in
interpreting complex processes, and simulations involving data sets from diverse
disciplines and large collections of abstract data from many sources.
There are several situations in the real world where we try to understand some
phenomena, data, and events using graphics. Some aspects, such as when people need to
find a route in a city, the stock market trends during a certain period, the weather
forecast, may be understood better using graphics rather than text. Graphical
representation of data, compared to the textual or tabular (in case of numbers) one, takes
advantage of the human visual perception which is very powerful as they instantly
convey large amounts of information to our mind, and allow us to recognize essential
features and to make important inferential processes. This is possible thanks to the fact
that there's a series of identification and recognition operations that our brain performs
in an "automatic" way without the need to focus our attention or even be conscious of
them. Perceptual tasks that can be performed in a very short time lapse (typically
between 200 and 250 milliseconds or less) are called pre-attentive, since they occur
without the intervention of consciousness (Ware C., 2004).
Graphics use the visual representations that help to amplify cognition. They
convey information to our minds that allows us to search for patterns, recognize
relationship between data and perform some inferences more easily.
Visualization is a link between both data and information, and the most powerful
information processing system the human mind. It is a process of transforming data,
information, and knowledge into a visual form exploiting people’s natural strengths in
6
Data Visualization
7
Data Visualization
and results in ways that are easily understood. In visualization, we seek to understand
the data. However, often the two terms are intertwined. Often one would like the ability
to do real-time visualization of data from any source. Thus our purview is information,
scientific, or engineering visualization and closely related problems such as
computational steering or multivariate analysis.
Visualization involves research in computer graphics, image processing, high
performance computing, and other areas. The same tools that are used for visualization
may be applied to animation, or multimedia presentation, for example.
The main reasons for visualization are the following ones: it will compress a lot
of data into one picture (data browsing), it can reveal correlations between different
quantities both in space and time, it can furnish new space-like structures beside the
ones which are already known from previous calculations, and it opens up the
possibility to view the data selectively and interactively in real time.
The application fields for visualization are in: Engineering, Computational Fluid
Dynamics, Finite Element Analysis, Electronic Design Automation, Simulation,
Medical Imaging, Geospatial RF Propagation, Meteorology, Hydrology, Data Fusion,
Ground Water Modeling, Oil and Gas Exploration and Production, Finance, Data
Mining/OLAP and so on.
One of the hottest topics in the field of visualization is Illustrative Visualization
and Uncertainty Visualization.
Illustrative visualization concerns with computer supported interactive and
expressive visualization through abstractions as in traditional illustrations. Illustrative
visualization uses several non-photorealistic rendering techniques: smart visibility,
silhouettes, hatching, tone shading, focus and context techniques (context-preserving
volume rendering).
Visualized data often have dubious origins and quality. Different forms of
uncertainty and errors are also introduced as the data are derived, transformed,
interpolated, and finally rendered. In the absence of integrated presentation of data and
uncertainty, the analysis of the visualization is incomplete at best and often leads to
inaccurate or incorrect conclusions. For example environmental data have inherent
uncertainty which is often ignored in visualization, meteorological stations measure
wind with good accuracy, but winds are often averaged over minutes or hours.
8
Data Visualization
9
Data Visualization
operations that enable users to detect the expected and discover the unexpected in
complex information space. Technologies resulting from visual analytics find their
application in almost all fields.
Knowledge visualization is the use of visual representations to transfer
knowledge between at least two persons (Burkhard and Meier, 2004). Aims to improve
the transfer of knowledge by using computer and non-computer based visualization
methods complementarily. Examples of such visual formats are sketches, diagrams,
images, objects, interactive visualizations, information visualization applications and
imaginary visualizations as in stories. While information visualization concentrates on
the use of computer-supported tools to derive new insights, knowledge visualization
focuses on transferring insights and creating new knowledge in groups. Beyond the
mere transfer of facts, knowledge visualization aims to further transfer insights,
experiences, attitudes, values, expectations, perspectives, opinions, and predictions by
using various complementary visualizations.
2 Historical Milestones
10
Data Visualization
this are the map of China 1137 a.d. and the famous map of Napoleon's invasion of
Russia in 1812, by Jacque Minard. Most of the concepts learned in devising these
images carry over in a straight forward manner to computer visualization.
Visualization has previously been defined as the "formation of visual images; the
act or process of interpreting in visual terms or of putting into visual form" (MacCormic
B, H., 1987).
Here are some milestones of development the discipline visualization.
Mathematical graphic
The French engineer, Charles Minard (1781-1870), illustrated the disastrous
result of Napoleon's failed Russian campaign of 1812. The graph shows the size of the
army by the width of the band across the map of the campaign on its outward and return
legs, with temperature on the retreat shown on the line graph at the bottom. The images
were drawn by a Mathematical function.
Playfair's charts
William Playfair (1759-1823) is generally viewed as the inventor of most of the
common graphical forms used to display data: line plots, bar chart and pie chart. His
The Commercial and Political Atlas, published in 1786, contained a number of
interesting time-series charts.
11
Data Visualization
pump. He had the handle of the contaminated pump removed, ending the neighborhood
epidemic which had taken more than 500 lives.
12
Data Visualization
constructed the first contour plot, showing the mean temperature, by hour of the day and
by month at Halle. Lalanne's data formed a regularly-spaced grid, and it was fairly easy
to determine the isolines of constant temperature. Vauthier generalized the idea to three-
way data with arbitrary (x,y) values in his map of the population density of Paris.
Galton later cited this as one of the inspirations for his normal correlation surface.
Dynamic graphics
Among the pioneers of dynamic graphics and the graphical representation of
movement and dynamic phenomena was Etienne-Jules Marey (1830-1906). Marey used
and developed many devices to record and visualize motion and dynamic phenomena:
walking, running, jumping, falling of humans, horses, cats...; heart rate, pulse rate,
breathing, etc.
3 Data to Visualization
One of the major problems in science and engineering is the massive amount of
data that is collected or generated. Visualization systems must be able to enter, store,
and retrieve this data. The data also must be converted into one or more internal formats
for the visualization system. Visualization on the other hand, uses the human perceptual
system to extract meaning from the data, focus attention, and reveal structure and
patterns.
The term data itself can have a wide variety of forms. We can distinguish
between data that have a physical correspondence and is closely related to mathematical
structures and models (e.g. the airflow around the wing of an airplane), and data that is
more abstract in nature (e.g. the stock market fluctuations).
In generally the term data is interpreted as:
1. Factual information, especially information organized for analysis or used to
reason or make decisions.
2. Computer science numerical or other information represented in a form suitable
for processing by computer.
3. Values derived from scientific experiments.
4. Plural of datum.
Data are stored in computer memory in some form of an internal representation.
The criteria for an internal representation are:
13
Data Visualization
14
Data Visualization
• Ordinal
Ordered in a particular sequence. A preference or some type of ranking (such
15
Data Visualization
as height). Used to state "this item is bigger than that item" or "the difference
between these two items is the same as the difference between those other two
items". For example: intervals with a constant step size.
This can also include:
− Simple integer
Some constant step size for the values. For example: the temperature in
degrees Fahrenheit or wind speed Data representation
− Meaningful zero
Zero is often ignored or counted as missing data. For example: temperature
in degrees Kelvin or a student's income.
The numerical data, which consists just of numerical values, can be:
• Continuous - a representation with some arbitrary precision, such as real
numbers or complex numbers.
• Aggregation
• Smoothing
• Simplification
In most cases for physical systems for which there is an underlying continuous
phenomenon is important the question of data quality. This question can be expressed as
follows.
• Are the measurements or the collection of data from a simulation sufficient or
are there missing data? In general is the grid (computational or experimental)
giving rise to sampling artifacts?
16
Data Visualization
• Geometric data;
Geometric data is used to represent the shape of objects. This may be in the
form of polygons, surface patches, or coordinates.
• Property Data;
Property Data is non-geometric data that represents specific properties of the
objects or properties measured at certain coordinates, such as temperature,
pressure, electron density, etc.
17
Data Visualization
• Metadata;
Metadata is data about the data. It can include the following:
• Control Data;
Control Data is used to specify the parameters needed for proper execution of
the modules in the system. This can be stored and read into the system each
time, as for example a color map.
• Data Relationships;
There may be certain relations in the data. This can be internal to a dataset or
between different data sets. An example of internally linked data is a
molecular structure where certain atoms are bonded together.
Volume data are 3D geometric entities that may have information inside them,
might not consist of surfaces and edges, or might be too voluminous to be represented
geometrically.
Multidimensional data (also called multivariate or n-dimensional) consists of
some number of points, each of which is defined by an n-vector of values.
Mathematicians consider dimension as the number of independent variables in an
algebraic equation. Engineers take dimension as measurements of any sort (breadth,
length, height, and thickness).
Special types of data represent fields. Data fields occur in many typical physics
or engineering problems, in these contexts they usually appear as the solution to some
partial differential equation. A typical example would be the velocity field of a liquid.
Depending on the type of object described by the field we distinguish:
18
Data Visualization
• Scalar - a field which associates single number with each point in space,
examples are: temperature, pressure in gas or liquid, concentration, wave
function etc.
• Vector - a field which associates vector with each point in space, examples
are: velocity field in gas or liquid, magnetic field, electric field etc.
• Tensor - a field which associates tensor (matrix) with each point in space, the
typical example is stress (and strain) tensor in solid.
Data sources can be external or internal to the visualization system. External
sources would be data collected from experimental measurement or generated by a
simulation external to the system. Internal sources would be data that was passed
between modules, generated by simulations internal to the system, or stored and then
retrieved from the system. Visualization systems should be able to support the real time
collection, storage, and analysis of data.
Data could be acquired by measurement of real processes or by simulation on the
model of the process. There were identified these levels of visualization according to the
possibility to control the simulation process (Nielson G.M., 1990):
• Post-processing:
• Interactive steering - allows the user to interactively control both the actual
computation of the data, e.g., by changing parameters as the computation
progresses, and the visualization of the data.
19
Data Visualization
• Filtering and smoothing of data is frequently necessary since raw data that
comes from experimental measurement usually has noise. The noise may be
introduced by the measurement process or it may be integral to the data. The
field of Digital Signal Processing (DSP) is the study of these types of
techniques.
20
Data Visualization
• Grid rezoning is the mapping of data from one type of grid to another, for
example, to a rectangular grid of pixels for display.
21
Data Visualization
Other taxonomy of data grids was created by Speray and Kennon. It is presented
in order of increasing generality (and complexity). With each type is the required
indexing to find a point's world coordinates. Neighboring points delineate sub volumes
known as cells or elements. They classified the next types of data grids:
Cartesian (i, j, k )
This is typically a 3D matrix with no intended world coordinates, so subscripts
map identically to space. If the cells are small and numerous (as to be almost atomic in
practice, like 2D pixels), then it is known as a voxel grid; however, the term is often
loosely applied. The geometric representation of this grid is depicted on the Figure 4a.
Regular (i * dx, j * dy, k * dz )
Cells are identical rectangular prisms (bricks) aligned with the axes. See picture
on Figure 4b.
Rectilinear (x[i], y[j], z[k] )
Distances between points along an axis are arbitrary. Cells are still rectangular
prisms and axis - aligned. See Figure 4c.
Structured (x[i, j, k] , y[i, j, k], z[i, j, k] )
This type, also known as curvilinear, allows non boxy volumes to be gridded.
Logically, it is a cartesian grid which is subjected to non-linea r transformations so as to
fill a volume or wrap around an object . Cells are hexahedra (warped bricks). These
grids are commonly used in computational fluid dynamics (CFD). See picture on the
Figure 4d.
Block structured (xb[i, j, k], yb[i, j, k], zb[i, j, k] )
Recognizing the convenience of structured grids, but the limited range of
topologies that they handle, researchers may choose to use several structured grids
(blocks) and sew them together to fill the volume of interest. The grid is depicted on the
Figure 4e.
Unstructured (x[i] ,y[i], z[i] )
Unlike the previous types, where connectivity is implicit, there is no geometric
information implied by this list of points and edge/face/cell connectivity must be
supplied in some form . Cells may be tetrahedra, hexahedra, prisms, pyramids, etc., and
they may be linear (straight edges, planar faces) or higher-order (eg. cubic edges, with
two interior points on each edge). Tetrahedral grids are particularly useful because they
22
Data Visualization
allow better boundary fitting, can be built automatically, and are often simpler to work
with, graphically. Unstructured grids are standard in finite - element (PEA) and finite-
volume analysis (FVA) and are becoming common in CFD.
A Cartesian grid is a special case where the elements are unit squares or unit
cubes, and the vertices are integer points.
A rectilinear grid is a tessellation by rectangles or parallelepipeds that are not all
congruent to each other. The cells may still be indexed by integers as above, but the
mapping from indexes to vertex coordinates is less uniform than in a regular grid. An
example of a rectilinear grid that is not regular appears on logarithmic scale graph
paper.
A curvilinear grid or structured grid is a grid with the same combinatorial
structure as a regular grid, in which the cells are quadrilaterals or cuboids rather than
rectangles or rectangular parallelepipeds.
23
Data Visualization
• A completely portable file format with no limit on the number or size of data
objects in the collection.
24
Data Visualization
• A rich set of integrated performance features that allow for access time and
storage space optimizations.
4 Visualization Process
For visualization are important models, helping the developers and users to
understand the visualization process; to follow the connections and the data paths
through the system; and to reference and compare the functionality and the limitations
of different systems or techniques. Display models specifically classify the data by what
type of output can be created. Bertin, J. (1962) e.g. described a symbolic reference
model that he used to describe images and displays.
There are four logical operations in visualization process:
• Data Selection - choosing a portion of the data to analyze that is extract a part
of the data.
25
Data Visualization
Visualization process may occur in several different types of modes (Hearn D.D,
1991). The "movie mode" consists of acquiring the data, producing an animation tape of
the data and then analyzing the data. The "tracking" mode consists of acquiring the data
and visualizing it and observing it directly on the computer. Neither of these two modes
includes any user interaction. The third mode, "interactive post-processing", introduces
user interaction in that the user is able to interactively control the visualization
parameters. The final mode, "interactive steering", allows the user to interactively
control both the actual computation of the data, e.g., by changing parameters as the
computation progresses, and the visualization of the data. These four modes provide
increasing support for analysis but also require increasing technology support.
26
Data Visualization
27
Data Visualization
In 1999 Stuart Card, Jock D. Mackinlay, and Ben Shneiderman presented their
own interpretation of this pattern, dubbing it the information visualization reference
model.
5 Scientific Visualization
28
Data Visualization
which no meaningful distance metric exists. The seven rainbow colors (i.e., red,
orange,) belong to this category. The highest of the hierarchy is metric data, which has a
meaningful distance metric between any two values. Times, distances, and temperatures
are examples. If we bin metric data into ranges, it becomes ordinal data. If we further
remove the ordering constraints, the data is nominal.
Within the applied sciences there are numerous sources of huge amounts of data,
for example:
• simulations running on supercomputers;
• satellites;
• experiments such as wind tunnel tests, test firings of rocket engines, etc.
These sources may produce data in several formats, such as 2D images, sets of
2D images, and scalars and vectors as a function of several variables.
In order to obtain more insight in a specific problem it is important to be able to
present and analyze the data. The interpretation of the data can be very difficult,
depending on its complexity.
The data visualization process, from original data to final image, is depicted in
Figure 1. For certain applications the original data are fed into the computer as an image
or series of images and can be manipulated and displayed directly, without leaving the
imaging domain. This is called ‘imaging’ and typical imaging operations are (higher
dimensional) FFT, gradient determination, etc. For other applications a transformation
of the numeric entity into a geometric entity is required. A geometric representation of
the data allows rendering on a computer screen (with standard geometric primitives),
shape analysis, and, depending on the particular application, extraction of other features.
The conversion of data into a geometric representation is called ‘geometric modeling’.
Subsequently, the rendering process transforms a geometric model into an image on a
computer screen.
Scientific visualization is mostly concerned with 2, 3, 4 dimensional, spatial or
spatio-temporal data.
In a scientific visualization are applied different representations and visual
techniques such as, color, texture, multiple windows, time sequenced data or any
combination of techniques. This can greatly aid in interpreting the numbers and finding
29
Data Visualization
the relationships among the data. A scientific visualization can show true spatial fidelity
that is, the relative sizes and the relative positions of the objects in the display. This
helps in interpreting the data. For example, seeing that one object is closer (to the
camera viewpoint) than another can be crucial as the viewpoint or the objects move
around. Several techniques which are commonly employed are:
• Perspective display in which objects farther away in the display appear
smaller. This aids in understanding distances represented in the display.
• Grid lines are displayed to help identify the positions of the objects.
6 Information Visualization
30
Data Visualization
31
Data Visualization
• give different perspectives on the data - from broad overview to the fine
structure.
Graphics facilitate understanding of information, but a number of issues must be
considered (Shneiderman 2002; Tufte 1983; Spence 2001):
1. Data is nearly always multidimensional, while graphics represented on a computer
screen or on a paper are presented in a 2D dimensional surface.
2. Sometimes we need to represent a huge dataset, while the number of data
representable on a computer screen or on a paper is limited.
3. Data may vary during the time, while graphics are static.
4. Humans have remarkable abilities to select, manipulate and rearrange data, so the
graphical representations should provide users with these features.
A number of methods and techniques have been proposed to meet these
requirements. Card at al. (1999) give a comprehensive list of eight types of data, eleven
visual structures, four views, three types of human interaction, eleven tasks and eleven
levels that a user might want to accomplish with a visualization tool (see Table 1).
32
Data Visualization
• Engineering / Physics
Application of information visualization on the computer involves providing
means to transform and represent data in a form that allows and encourages human
interaction. Data can therefore be analyzed by exploration rather than pure reasoning;
users can develop understanding for structures and connections in the data by observing
the immediate effects their interaction has upon the visualization.
As a subject in computer science, information visualization is the use of
interactive, sensory representations, typically visual, of abstract data to reinforce
cognition. Information visualization is a complex research area. It builds on theory in
information design, computer graphics, human-computer interaction and cognitive
science. Practical application of information visualization in computer programs
involves selecting, transforming and representing abstract data in a form that facilitates
human interaction for exploration and understanding. Important aspects of information
visualization are the interactivity and dynamics of the visual representation. Strong
techniques enable the user to modify the visualization in real-time, thus affording
unparalleled perception of patterns and structural relations in the abstract data in
question. Although much work in information visualization regards to visual forms,
auditory and other sensory representations are also of concern.
7 Process Visualization
33
Data Visualization
• Facility processes occur both in public facilities and private ones, including
buildings, airports, ships, and space stations. They monitor and control HVAC,
access, and energy consumption.
The increasing level of automation of production systems has lead to reduced
numbers of operators and at the same time to an increase in the number of process
information each operator has to supervise and control. Supervisory systems will have
to be able to integrate large volumes of information and knowledge coming both from
local and remote points of large processes. These systems will therefore require new
tools for management and integration of information and knowledge.
Process visualization provides users with graphics views of processes which can
be navigationally traversed, interactively edited, or animated. Process visualizations
enable intuitive analysis and discovery of processes.
34
Data Visualization
SCADA systems were first used in the 1960s. The use of the term SCADA
varies, depending on location. In North America, SCADA refers to a distributed
measurement and management system that operates on a large-scale basis. For the rest
of the world, SCADA refers to a system that performs the same basic functions, but
operates in a number of different environments as well as a multiplicity of scales. While
the use of the term SCADA may not be uniform, many components are the same,
regardless of the scale of the process. SCADA generally refers to an industrial control
system which is meant to function across a wide area with an autonomous Remote
Terminal Unit (RTU).
SCADA system is expected to have open loop controls (meaning that a human
operator watches near real time data and issues commands). By comparison, a
distributed control system (DCS) is expected to have closed loop controls (meaning that
real-time loop data is applied directly to an industrial controller without human
intervention). From its inception in the 1960s, SCADA was understood as a system that
was primarily concerned with I/O from Remote Terminal Units. In the early 1970s,
DCS was developed. The ISA S5.1 standard defines a distributed control system as a
system which while being functionally integrated, consists of subsystems which may be
physically separate and remotely located from one another. DCS were originally
developed to meet the requirements of large manufacturing and process facilities that
required significant amounts of analogue control. These differences are primarily design
philosophies, not mandates of definition.
There are three main elements to a SCADA system, various RTU's,
communications and an HMI (Human Machine Interface). Each RTU effectively
collects information at a site, while communications bring that information from the
various plant or regional RTU sites to a central location, and occasionally returns
instructions to the RTU. Data acquisition begins at the RTU or PLC level and includes
meter readings and equipment status reports that are communicated to SCADA as
required. Data is then compiled and formatted in such a way that a control room
operator using the HMI can make supervisory decisions to adjust or override normal
RTU (PLC) controls. Data may also be fed to a historian, often built on a commodity
database management system, to allow trending and other analytical auditing.
35
Data Visualization
SCADA systems used to run on DOS, VMS and UNIX; in recent years all
SCADA vendors have moved to Windows, some also to Linux platform.
SCADA systems are used not only in industrial processes: e.g. steel making,
power generation (conventional and nuclear) and distribution, chemistry, but also in
some experimental facilities such as nuclear fusion. The size of such plants ranges from
a few 1000 to several 10 thousands input/output (I/O) channels. However, SCADA
systems evolve rapidly and are now penetrating the market of plants with a number of
I/O channels of several 100 000.
Originally, SCADA systems were designed for Supervisory Control and Data
Acquisition, providing a reliable means of aggregating the analysis being performed by
multiple RTUs. But with today's high speed production demands, SCADAs are required
to perform Calculation and Analysis in real time on the plant floor, effectively
combining the once disparate worlds of HMI and SCADA.
The design of HMI has become quite complicated and can no longer be handled
in an intuitive fashion. The designer needs to possess a huge amount of
multidisciplinary knowlegde and experience with respect to the application domain of
the respective technical process, the available automation and information technologies,
the capabilities and limitations of the human operators and maintenance personnel,
work psychological and organizational matters as well as ergonomic and cognitive
engineering principles of good HMI design.
36
Data Visualization
of the process data control activities. The data servers communicate with devices in the
field through process controllers. Process controllers, e.g. PLCs, or RTUs are connected
to the data servers either directly or via networks or field buses that are proprietary (e.g.
Siemens H1), or non-proprietary (e.g. Profibus).
Data servers are connected to each other and to client stations via an Ethernet
LAN. The data servers and client stations are at most Windows platforms but for many
products the client stations may also be from another platform.
A Remote Terminal Unit (RTU) is a standalone unit used to monitor and control
sensors and actuators at a remote location, and to transmit data and control signals to a
central master monitoring station. Depending on the sophistication of the
microcontroller in the RTU, it can be configured to act as a relay station for other RTUs
which cannot communicate directly with a master station, or the microcontroller can
37
Data Visualization
communicate on a peer-to-peer basis with other RTUs. RTUs are generally remotely
programmable, although many can also be programmed directly from a panel on the
RTU. Remote terminal units gather information from their remote site from various
input devices, like valves, pumps, alarms, meters, etc. Essentially, data is either analog
(real numbers), digital (on/off), or pulse data (e.g., counting revolutions of meters).
Many remote terminal units hold the information gathered in their memory and wait for
a request to transmit the data. Other more sophisticated remote terminal units have
microcomputers and programmable logic controllers (PLC) that can perform direct
control over a remote site without the direction of the master terminal unit. mall size
RTUs generally have less than 20 analog or digital inputs and medium size RTUs
typically have 100 digital and up to 40 analog inputs, while an RTU with greater than
100 digital or 40 analog inputs is considered large. Many RTUs are modular and thus
expandable, and several RTUs can be logically combined as one, depending on the
model and manufacturer. Figure 12 shows a typical RTU. A RTU consists of a power
supply, a central processing unit (CPU), memory (both volatile and non-volatile), and a
series of inputs and outputs. The CPU controls communications with the sensors and
actuators through the inputs and outputs, and with the master station through a serial
port, an Ethernet port, or some other interface. A programming interface can also be
connected to any of these interfaces. The Central Bus serves as the conduit for
communications between the components of the RTU.
38
Data Visualization
Advances in CPUs and the programming capabilities of RTUs have allowed for
more sophisticated monitoring and control. Applications that had previously been
programmed at the central master station can now be programmed at the RTU. These
modern RTUs typically use a ladder-logic approach to programming due to its
similarity to standard electrical circuits - the majority of RTU programmers are
engineers, not computer programmers. A RTU that employs this ladder logic
programming is called a Programmable Logic Controller (PLC).
Modern RTUs and PLCs offer a wide variety of communications means, either
built in directly or through a module. The following list represents a variety of
transmission methods supported:
− RS-232/RS-442/RS-485
− Dialup telephone lines
− Dedicated telephone lines
− Microwave
− Satellite
− X.25
− Ethernet
− 802.11a/b/g
− Radio (VHF, UHF, etc)
Master stations have two main functions:
1. Periodically obtain data from RTUs/PLCs (and other master or sub-master
stations.
2. Control remote devices through the operator station.
Master stations consist of one or more personal computers (PC), which, although
they can function in a multi-purpose mode (email, word processing, etc), are configured
to be dedicated to master station duties. These duties include trending, alarm handling,
logging and archiving, report generation, and facilitation of automation. These duties
may be distributed across multiple PCs, either standalone or networked.
39
Data Visualization
Figure 13 The three-layer model with the addition of business systems and process
regulation
40
Data Visualization
A typical organization will generate policies and procedures that define the
process that must be monitored and controlled, allocate resources to it, and dictate how
collected data will be distributed and audited. A management information system (MIS)
may facilitate access to the data supplied by the process, and can be used for
forecasting, trending and optimization. Figure 14 illustrates some components of a
Business System that will affect a SCADA implementation.
41
Data Visualization
The SCADA software products are multi-tasking and are based upon a real-time
database (RTDB) located in one or more servers. Servers are responsible for data
acquisition and handling (e.g. polling controllers, alarm checking, calculations, logging
and archiving) on a set of parameters, typically those they are connected to. However, it
is possible to have dedicated servers for particular tasks, e.g. historian, datalogger,
alarm handler. Figure 15 shows a SCADA architecture that is generic for the products
that were evaluated.
In SCADA systems, the three major categories of protocols involve the
specifications for design and manufacture of sensors and actuators, specifications for
RTUs, and the specifications for communications between components of a control
system.
The prevalent standard for industrial control RTU design and programming is the
IEC 61131 series, developed by the two IEC working groups, the Industrial Process
Measurement And Control group and the IT Applications In Industry group. It is a
series of seven publications that serve to standardize the programming languages,
instruction sets, and concepts used in industrial control devices such as RTUs and PLCs.
42
Data Visualization
43
Data Visualization
trending, and reporting from remote equipment. However, there are three significant
problems to overcome when implementing an Internet-based SCADA system.
The first is that most devices used to control remote equipment and processes do
not have Internet-communications capability already incorporated in their operating
systems. The second is that the device still has to be physically connected to the
Internet, even when equipped through retrofit or in the factory with the necessary
communications protocols and the third is assurance of data protection and access
control.
One solution to these problems is to connect the device to a PC and have the PC
make the connection to the Internet via an Internet service provider using Secure Socket
Layer.
An alternative to using a PC is an embedded solution: a small, rugged, low-cost
device that provides connectivity capabilities of a PC at a lower cost and higher
reliability. This device (sometimes referred to as an Internet gateway) is connected to
the equipment via a serial port, communicates with the equipment in the required native
protocol, and converts data to HTML or XML format. The gateway has an IP address
and supports all or at least parts of the TCP/IP stack—typically at least HTTP, TCP/IP,
UDP, and PPP. Once connected to the Internet, the gateway responds to an HTTP
request with an HTML or XML file, just as if it were any PC server on the World Wide
Web. In cases where the equipment incorporates an electronic controller, it may be
possible to simply add Web-enabled functionality into the existing microcontroller.
44
Data Visualization
Phase 4; installation
of the communication equipment and the PC system.
Phase 5; commissioning
of the system, during which communication and HMI programming problems
are solved, the system is proven to the client, operator training and system
documentation is provided.
Many of the general principles of computer graphics and design carry over into
the area of data visualization. The essential idea is to provide as much information as
possible without confusing or distracting the viewer with inappropriate color schemes.
For the types of data discussed in section 3.1 there exist many visualization
techniques however, they are applicable to just a single data type, in general.
A visualization technique is used to create and manipulate a graphic
representation from a set of data. Some techniques will be appropriate only for specific
applications while others are more generic and can be used in many applications. It
should always be kept in mind that the goal of visualization is not to understand the data
but to understand the underlying phenomenon. Visualization must be useful to the users
45
Data Visualization
• Some things which are seen in the visualization can generate new questions.
• Sometimes it is not easy to analyze the data and may require a variety of
visualization techniques. No single technique to for analysis. It is sometimes
impossible to avoid using multiple techniques.
• Some details cannot be seen at all (or all together). To address this, add
multiple displays and techniques for multiple questions.
• Some false things may be seen in the image. Be careful to verify all your
conclusions.
Special techniques must be used to facilitate the understanding of 3D data, since
it is mapped to a 2D output device. Attribute mapping, such as color is one method.
46
Data Visualization
Another method is brightness (closer points are brighter). But the best method appears
to be animation, allowing the user to rotate the 3D point cloud about the axes.
The visualization technique selected depends on many circumstances, some of
them are:
• The dimensionality of the dataset.
• The number of variables in the dataset, the nature of the “data objects”.
47
Data Visualization
• Rubbersheet - rendering of scalar data, other than by colors (or the special case
of glyphs) namely as the height of a deformed surface. In that way one can add
other scalar data to this sheet by using color. An important aspect of the rubber
sheet is that the clarity of the data representation is dependent on the light that
falls on it. Shadows indicate height (shading).
Figure 17 Example of glyphs for scalar (L) and vector (R) data
48
Data Visualization
within a volume of space; in other words, it is a level set of a continuous function whose
domain is 3D-space.
49
Data Visualization
• The entity is defined over an enumerated set, e.g., the number of cars sold in
each country in a given year. Then we can use the notation E{1} to indicate a
1D domain consisting of the set of enumerated countries.
50
Data Visualization
Another special case is when we want to show a set of values over some domain,
e.g., temperature and pressure over a 3D domain. This notation is E32S, or for the
general case EdnS.
In the Table 2, entities are given, defined on a d-dimensional domain with n-
scalar data as symbol EdnS. So this denotes a (multiple) scalar entity on a d-dimensional
domain. When the entity's state changes with time (like in an animation) an extra
subscript t is given. With (other) colors or color maps and glyph size extra scalar
degrees of freedom can be gained. By the shape of glyphs one can distinguish different
(local) fields. With each entity some possible applications are given. The applications
are far from complete. In the last column of the Table 2 some techniques are given
which are used in the visualization of the entity.
51
Data Visualization
2D-contours
The Table 3 is organized in the same way as the Table 2, but now the entities E
have n - vector data on the d-dimensional domain, EdVn, or a n x n tensor EdTn;n (mostly
where n=3). Like for the scalar data, a subscript t indicates time dependency. Of course
visualization is not limited to scalar or vector data alone. Combinations are possible,
and are not at all exceptional.
Other researchers have developed different taxonomies. E.H. Chi and J.T. Reidl
in 1998 extend and propose a new way to taxonomize information visualization
techniques by using the Data State Model. Many of the techniques share similar
operating steps that can easily be reused. The Data State Model not only helps
researchers understand the space of design, but also helps implementers understand how
information visualization techniques can be applied more broadly. Bergeron and
Grinstein developed a data oriented classification using the concept of a lattice.
Another approach was presented by Ralph Lengler and Dr. Martin J. Eppler from
visual-literacy.org. They have developed a chart so called “Periodic Table of
Visualization Methods” that organizes tons of ways to present information in visual
ways. The web page uses a Javascript library to display an example of a diagram type
when we mouse-over its box. Not only can we hover over each of the methods and see
52
Data Visualization
examples, but the chart itself helps to see connections between different approaches.
The table itself is an example of how the right visual can not only present information
but actually make knowledge by organizing material. This displays around 100 diagram
types, with examples and a multi-faceted classification. There are visualization methods
for data, information, concept, strategy, metaphor and compound. Chris Wallace has
implemented an XML page from this table on which we can see and print the
mouseover pictures individually.
The most elementary techniques and algorithms deals with scalar data or data
that can be made into scalars. For this type of data we can use following visualization
techniques:
• Colour mapping - maps the scalar to a colour, and then displays that colour.
The scalar mapping is implemented using a colour lookup table, that is
indexed with the scalar. Good choice of the “transfer function” is important for
the final result of the visualization.
• Warping.
• Displacement plots.
• Time animations - let “mass less” particles trace the vector field.
• Streamlines - this are lines parallel to the vector field at all points.
Glyphs are objects that are affected by input data, for example:
• The size could vary with a scalar value,
53
Data Visualization
• Table, matrix
• Charts (pie chart, bar chart, histogram, function graph, scatter plot, etc.)
• Maps
• Venn diagram
• Euler diagram
• Chernoff faces
• Hyperbolic trees
54
Data Visualization
55
Data Visualization
56
Data Visualization
The main assumption and procedures for surface based techniques are:
• Assume that volume contains thin boundary surfaces.
57
Data Visualization
• Render surfaces.
58
Data Visualization
Figure 27 2D grid
Each point of this grid has a weight and here the reference value is known as 5.
To draw the curve whose value is constant and equals the reference one, different kinds
of interpolation can be used. The most used is the linear interpolation. In order to
display this curve, different methods can be used. One of them consists in considering
individually each square of the grid. This is the marching square method. For this
method 16 (24) configurations have been enumerated, which allows the representation
of all kinds of lines in 2D space.
Some cases may be due the symmetry ambiguous. That is the situation for the
cases 5 and 10. As we can see on the Figure 29 we are not able to take a decision on the
interpretation of this kind of situation. However, these exceptions do not imply any real
error because the edges keep closed.
59
Data Visualization
60
Data Visualization
through the grid cell, it may cut off any one of the vertices, or it may pass through in
any one of a number of more complicated ways. Each possibility will be characterised
by the number of vertices that have values above or below the isosurface. If one vertex
is above the isosurface and an adjacent vertex is below the isosurface then we know the
isosurface cuts the edge between these two vertices. The position that it cuts the edge
will be linearly interpolated, the ratio of the length between the two vertices will be the
same as the ratio of the isosurface value to the values at the vertices of the grid cell.
In order to be able to determine each real case, a notation has been adopted. It
aims at referring each case by an index (cube index), based on the state of the vertex and
created from a binary interpretation of the corner weights (Figure 31). Using the vertex
numbering in Figure 31 the eight bit index, contains one bit for each vertex. This index
serves as a pointer into an edge table that gives all edge intersections for a given cube
configuration. If for example the value at vertex v1 is below the isosurface value and all
the values at all the other vertices were above the isosurface value then we would create
a triangular facet which cuts through edges e1, e4, and e9 (case 1 in Figure 30. The
exact position of the vertices of the triangular facet depend on the relationship of the
isosurface value to the values at the vertices v1-v2, v1-v5, v1-v4 respectively. What
makes the algorithm difficult are the large number (256) of possible combinations and
the need to derive a consistent facet combination for each solution so that facets from
adjacent grid cells connect together correctly. The first part of the algorithm uses a table
(edge table) which maps the vertices under the isosurface to the intersecting edges. An 8
bit index is formed where each bit corresponds to a vertex. Edge table returns a 12 bit
number, each bit corresponding to an edge, 0 if the edge isn't cut by the isosurface, 1 if
the edge is cut by the isosurface. If none of the edges are cut, the table returns a 0, this
occurs when cube index is 0 (all vertices below the isosurface) or 0xff (all vertices are
above the isosurface). Using the example earlier where only vertex v1 was below the
isosurface, cube index would equal 0000 1000 or 8. The edge table for index 8 returns
the number 0001 0000 1001. This means that edge 1, 4, and 9 are intersected by the
isosurface.
The intersection points are now calculated by linear interpolation. If P1 and P2
are the vertices of a cut edge and V1 and V2 are the scalar values at each vertex, the
intersection point P is given by
61
Data Visualization
62
Data Visualization
Optical models for direct volume rendering view the volume as a cloud of
particles. Light from a source can either be scattered or absorbed by particles. In
practice, models that take into account all the phenomena tend to be very complicated.
Therefore, practical models use several simplifications.
In general, a volumetric dataset consists of samples arranged on a regular grid.
These samples are also referred to as vowels. While most volume rendering techniques
63
Data Visualization
are based on the theoretical optical model for volume rendering, several different
techniques implementing this optical model have emerged.
In the following, we use a taxonomy based on the processing order of the data.
We distinguish between image based, object based, and hybrid based methods. Image
based (order) methods start from the pixels on the image plane and computes the
contribution of the appropriate voxels to these pixels. Object based techniques traverse
the voxels and compute what their contribution to the image is. Hybrid based methods
try to combine both approaches. Techniques based on the texture mapping capabilities
of the graphics hardware as well as dedicated volume rendering hardware solutions are
possible too.
64
Data Visualization
65
Data Visualization
undesirable. In medical imaging, for example, it would be impossible to see into areas
surrounded by bone if the bone were considered dense enough to shadow light. On the
other hand, in applications where internal shadows are desired, this integral has to be
computed.
8.3.2.2 Splatting
Splatting is object order technique that traverses and projects footprints (known
as splats) onto the image plane. In contrast to image order techniques, object order
methods determine, for each data sample, how it affects the pixels on the image plane.
In its simplest form, an object order algorithm loops through the data samples,
projecting each sample onto the image plane. Voxels that have zero opacity, and thus do
not contribute to the image, can be skipped. This is one of the greatest advantages of
splatting, as it can tremendously reduce the amount of data that has to be processed. But
there are also disadvantages. Using pre-integrated kernels introduces inaccuracies into
the compositing process, since the 3D reconstruction kernel is composited as a whole.
This can cause color bleeding artifacts (i.e. the colors of hidden background objects may
”bleed”into the final image).To remedy these artifacts, an approach has been developed
which sums voxel kernels within volume slices most parallel to the image plane.
However, this leads to severe brightness variations in interactive viewing.
66
Data Visualization
8.3.2.3 Shear-Warp
Image order and object order algorithms have very distinct advantages and
disadvantages. Therefore, some effort has been spent on combining the advantages of
both approaches. Shear-warp is such an algorithm. It is considered to be the fastest
software based volume rendering algorithm. It is based on a factorization of the viewing
transformation into a shear and a warp transformation. The shear transformation has the
property that all viewing rays are parallel to the principal viewing axis in sheared-
object-space. This allows volume and image to be traversed simultaneously.
Compositing is performed into anintermediate image. A 2D warp transformation is then
applied to the intermediate image, producing the final image.
The problem of shear-warp is the low image quality caused by using only
bilinear interpolation for reconstruction, a varying sample rate which is dependent on
the viewing direction, and the use of pre-classification. Some of these problems have
been solved; however, the image quality is still inferior when compared to other
methods, such as ray casting.
67
Data Visualization
68
Data Visualization
sorted according to various criteria. First, it is the domain, over which the data are
defined, and which is usually two or three dimensional. Second, it is the dimension of
the data values themselves, which is theoretically unlimited and depends on the
application. Two or three dimensional vector fields can be encountered most frequently,
but fields of quadratic tensors are also quite common. It is, however, necessary to
realize, that the character of the data must be taken into account as well. Three
dimensional vectors need to be treated in a different way than a set of three scalar
values. The third important criterion is, whether the data vary in time. If so, they are
usually called time dependent. Otherwise, we speak of time independent data. Such a
variety of kinds of data implies even larger variety of visualization techniques.
Multidimensional data visualization was studied separately by statisticians and
psychologists long before computer science was deemed a discipline. The appearance of
personal computers and workstations during the 1980’s breathed new life into graphical
analysis of multidimensional data. Scientists have studied multidimensional and
multivariate visualization since 1782 when Crome used point symbols to show the
geographical distribution in Europe of 56 commodities. In 1950, Gibson started the
research on visual texture perception. Later, Pickett and White proposed mapping data
sets onto artificial graphical objects composed of lines. This texture mapping work was
further investigated by Pickett, and was eventually computerized. Chernoff presented
his arrays of cartoon faces for multivariate data in 1973. In this well-known technique,
variables are mapped to the shape of the cartoon faces and their facial features including
nose, mouth, and eyes (see Figure 39). These faces are then displayed in a two
dimensional graph (Wong P. Ch., 1997).
Before we can start to visualize general multidimensional or multivariate data we
need to assign a coordinate with each data point. We need to define x, y and z for each
data point. Typically we do this through some functions:
xj = x(d1j, d2j, ... ,dmj)
yj = y(d1j; d2j; : : : ; dmj)
zj = z(d1j; d2j; : : : ; dmj)
After this has been done we can apply the same techniques to this kind of data.
Multidimensional data consists of some number of points, each of which is
defined by an n-vector of values. Such data can be viewed as an m x n matrix, where
69
Data Visualization
each row represents data point and each column represents an observation (also called
variable or dimension). An observation may be scalar or vector, nominal or ordinal, and
may or may not have a distance metric, ordering relation, or absolute zero. Each
variable/dimension may be independent or dependent. The problem with
multidimensional data is that we only have three spatial dimensions available onto
which to map the attributes. In practice we are even limited to two dimensions, as three
dimensional visualization is tricky and usually only works well for the data that has an
intrinsic spatial structure. One common strategy to deal with that problem is
parallelization.
The general idea of parallelization is to subdivide the (two dimensional) space
into an appropriate number of sub-spaces. Each of the sub-spaces is then used to show a
two dimensional representation of a selected aspect of the data. By showing all these
sub-spaces in parallel and at the same time, correlations and patterns become evident
that go beyond two dimensions. One example for the parallelization strategy is so called
scatterplot matrix shown in Figure 37.
This technique constructs a matrix of small scatter plots with all possible
combinations of pairs of attributes. The scatterplot matrix can also be regarded as an
instance of the rule of small multiples that we know from information design.
70
Data Visualization
71
Data Visualization
Examples of profile glyphs Stars & Anderson/metro glyphs Sticks & Trees
72
Data Visualization
M.O., 1999) in terms of the graphical entities and attributes controlled by the
multivariate data point.
73
Data Visualization
Flow visualization solutions of this kind allow immediate investigation of the vector
data, without a lot of mental translation effort.
74
Data Visualization
Another approach for visualizing flow data is the feature based approach, in
which an abstraction step is performed first. From the original data set, interesting
objects are extracted, such as important phenomena or topological information of the
flow. These flow features represent an abstraction of the data, and can be visualized
efficiently and without the presence of the original data, thus achieving a huge data
reduction, which makes this approach very suitable for large (time-dependent) data sets,
acquired from computational fluid dynamics simulations. These data sets are simply too
large to visualize directly, and therefore, a lot of time is required in preprocessing, for
computing the features (feature extraction). But once this preprocessing has been
performed, visualization can be done very quickly.
The general idea behind the fourth technique, group of methods consists in
deriving scalar quantities from the vector data first and then visualizing them via
approaches like isosurface extraction or direct volume visualization.
75
Data Visualization
A visualization system is not just a system to create an image of the data but can
be used to manipulate the data to create different types of images. Visualization system
should link with the model of scientific investigation. Visualization can help form the
link between hypothesis and experiment and between insight and revised hypothesis.
The human visual perception system is very complex and must be taken into account in
the design and use of data visualization systems. Visualization systems include software
applications such as for example: LYMB, Iris Explorer, Data Explorer and AVS.
Systems are self-contained, often turn-key, offer great reuse, have an integrated user
interface, but they are an all-or-nothing approach. They do not readily embed within
another application framework.
Tools and Toolkits are more adaptable, allowing using only what we need and
are generally independent of the user interface. Toolkits offer the possibility to develop
visualization applications using a variety of visualization techniques. Toolkits include
packages such as Iris Inventor, ISG’s IAP, and the Visualization Toolkit (VTK).
There are specialized applications for a certain type of data visualization. Two
examples are:
• RasMol – a tool for viewing large molecules,
• Vis5D - a system for interactive visualization of large 5-D gridded data sets
such as those produced by numerical weather models.
The success of visualization not only depends on the results which it produces,
but also depends on the environment in which it has to be done. This environment is
determined by the available hardware, like graphical workstations, disk space, color
76
Data Visualization
printers, video editing hardware, and network bandwidth, and by the visualization
software. For example, the graphical hardware imposes constraints on interactive speed
of visualization and on the size of the data sets which can be handled. Many different
problems encountered with visualization software must be taken into account. The user
interface, programming model, data input, and data output, data manipulation facilities,
and other related items are all important. The way in which these items are implemented
determines the convenience and effectiveness of the use of the software package.
Furthermore, whether software supports distributive processing and computational
steering must be taken into account.
The lack of a clear reference visualization model makes it difficult to design
improved approaches in a systematic manner. The task of generating robust default
visualizations of data under exploratory or directed investigation, or automating the
production of such defaults and assisting the user to refine them, relies on having
standard representations that satisfy established or specified criteria for interpretation.
This task also points to the need for a reference model that is formalized.
77
Data Visualization
Users perceive visual data representations and use them to guide the exploration
process. Exbase was a system for exploring databases developed on the basis of this
integration. It provided support for the visual exploration of databases by integrating
visualization techniques and a real database management system. The search for
structure inside the database was focused on manual mechanisms but it did not exclude
support for automatic discovery tools.
Goldstein, Roth and Mattis proposed a framework for interactive data
exploration. There are several key points in that framework. First, they pointed out that,
given the growth of data sets in both complexity and amount of information, the design
of graphic displays is requiring more expertise from users. Second, visualization
systems must provide explicit support for the data archaeology process. The data
archaeology process refers to the use of users' discovered relationships to guide the data
exploration process. This process, iterative and interactive in nature, was initiated and
controlled by people. Third, tools in the visualization system must be oriented to
provide support for the kind of subtasks usually involved exploring large databases,
including explicit support for a dynamic specification of the user's information-seeking
goals. Visage, a user interface environment for exploring information, was based on
these principles and on a new paradigm for user interfaces that they have called an
information-centric approach.
Robertson and De Ferrari proposed a reference model for data visualization
within which the scope and limitations of existing tools and systems can be identified.
In this model, the user is again part of the exploration data analysis loop. The end goal
is the automatic generation of visual representations given a description of all the
important data characteristics and the user's interpretation aims. User interpretation aims
refers to the specification of characteristics of the data or relations between data
variables the user was interested in analyzing by mean of visual representations. There
were four basic components in the Robertson and De Ferrari model (sie Figure 43):
1. The data model.
2. Visualization specification.
3. Visualization representation.
4. The matching procedure.
78
Data Visualization
The model also specified desired requirements for all of these components. The
data model should be precise and functionally-rich with the data structure and the data
fields explicitly specified. The visualization specification should support both user
directives (requirements explicitly defined by the user) and user interpretation aims
(requirements implicitly defined by user specified criteria). In reference to the visual
representation, Robertson and De Ferrari suggested that many possible visual
representations should be described based on some criteria. There were currently two
approaches for describing visual representations. The bottom-up or
expressiveness/effectiveness approach defined by Mackinlay and extended by other
researchers, and the top-down approach defined by Robertson which is based on scene
properties and criteria for matching scene properties to data variables. The matching
procedure refers to mechanisms for encoding and decoding information. Robertson and
De Ferrari suggested that for each visualization technique those mechanisms must be
explicitly stated by giving a formal model which describes the encoding mechanism and
explicitly states what decoding capabilities it assumes.
79
Data Visualization
80
Data Visualization
of the advantages of open L-systems is that grammars are able to invoke external
modules. Pinkney uses this facility to alter either the derived string or the course of
derivation itself in order to implement different behaviors of icons. Figure 45 shows an
overview of the L-systems-based visualization system developed by Pinkney. As seen
in this figure, interactions are implemented in one or two places. At the level of the
grammar, by including productions defining the desired behavior; or at the level of the
string, by changing values of the derived string.
81
Data Visualization
82
Data Visualization
system includes over 500 classes ranging from visualization and graphics to Xlib and
Motif user interface. Objects are created using compiled C and interact through an
interpreted scripting language.
83
Data Visualization
(Zuse Institute Berlin), SimVis (VRVis Research Vienna), EnSight (CEI, originally by
Cray), TecPlot (TecPlot Inc.) and open source - VisIt (Lawrence Livermore National
Laboratory).
Components of an MVE are:
• Visual programming editor;
• Modules:
− Typically Categories:
− Input (reading, generating data);
− Filters (mapping to the same data type);
− Mappers (mapping to a different data type);
− Output (3D graphics, image, or file);
− Module libraries:
− ordered by category, author, etc.;
− users' community contributed modules;
• UI widgets (parameters, status, viewers, etc.)
A challenge for visualization system designers provides grid computing. The
grid's purpose is to tackle increasingly complex problems. Grid-enabled visualization
will enable to make a transparent interconnect fabric to link data sources, computing
(visualization) resources and users into widely distributed Virtual Organizations for the
purposes of tackling increasingly complex problems.
84
Data Visualization
The main product editions are AVS5, AVS/Express and AVS/Powerviz. AVS5
consists of a comprehensive suite of data visualization and analysis techniques that
incorporate both traditional visualization tools (such as 2D plots and graphs and image
processing) as well as advanced tools (such as 3D interactive rendering and volume
visualization). Available is for Unix/Linux and Mac OS X platform. AVS/Express is a
visualization toolkit which supports a visual programming interface. The AVS/Express
interface provides tools to import data, build a visualization, interact with the display
and generate images and animations. AVS/PowerViz is a comprehensive solution that
enables real-time businesses to better manage their critical networks through a
customizable Web portal that integrates data and applications from across the entire
corporate enterprise into a single, graphically enhanced real-time management tool.
85
Data Visualization
• GVS on-line help, which allows the user to access control elements and get
information about each control simultaneously, and
86
Data Visualization
• a limited set of basic GVS data conversion filters, which allows for the display
of data requiring simpler data formats.
Specialized controls for handling PMARC data include animation and wakes,
and visualization of off-body scan volumes.
9.5.3 COVISE
COVISE stands for COllaborative VIsualization and Simulation Environment.
The product is developed at The High Performance Computing Center Stuttgart (HLRS)
of the University of Stuttgart. The company Visenso GmbH sells the commercial
version of COVISE. It is an extendable distributed software environment to integrate
simulations, postprocessing and visualization functionalities. From the beginning
COVISE was designed for collaborative working allowing engineers and scientists to
spread on a network infrastructure. In COVISE an application is divided into several
processing steps, which are represented by COVISE modules. These modules, being
implemented as separate processes, can be arbitrarily spread across different
heterogeneous machine platforms. COVISE rendering modules support virtual
environments ranging form workbenches over powerwalls, curved screens up to full
domes or CAVEs. The users can analyze datasets intuitively in a fully immersive
environment through state of the art visualization techniques including volume
rendering and fast sphere rendering. Physical prototypes or experiments can be included
into the analysis process through augmented reality techniques.
9.5.4 OpenDX
OpenDX is a uniquely powerful, full-featured software package for the
visualization of scientific, engineering and analytical data: Its open system design is
built on a standard interface environment. The GUI is built on the standard interface
environment: OSF/Motif and X Window System and its sophisticated data model
provide users with great flexibility in creating visualizations. The current version
supports software-rendered images on 8-, 12-, 16-, 24-, and 32-bit windows. OpenDX is
based on IBM’s Visualization Data Explorer. One of the most distinctive characteristics
of OpenDX is its object-oriented, self-describing Data Model. The DX data model and
the many available filters for third party data formats enable to quickly import disparate
data sets, in most cases, without changing the way the original data is organized. The
currently supported platforms in version 4.4 of OpenDX (binaries) are: Irix 6.5, HP-UX
87
Data Visualization
11.22 (Itanium), HP-UX 11.11 (PA RISC), Redhat Linux-FC 4.0 and FC 5.0 ix86,
Solaris 10 Sparc an ix86, Windows 2000, XP, 2003 (this version of OpenDX still
requires an X-Server to be running on local machine). Some commercial versions for
Mac OS X are also available.
88
Data Visualization
9.5.6 SCIRun
SCIRun is more properly considered a problem solving environment (PSE)
framework upon which application specific PSEs are built. Each specific PSE is a
package with SCIRun. PSEs use and build upon data types, algorithms, and modules
provided by the the SCIRun framework. PSEs provide application specific data types,
algorithms, and modules. SCIRun is completely Open Source product.
SCIRun is a modular dataflow programming PSE. SCIRun has a set of modules
that perform specific functions on a data stream. Each module reads data from its input
ports, calculates the data, and sends new data from output ports. In SCIRun, a module is
represented by a rectangular box on the Network Editor canvas. Data flowing between
modules is represented by pipes connecting the modules. A group of connected modules
is called a Dataflow Network, or Net
The most frequently used SCIRun module is the ViewScene module, which
displays interactive graphical output to the computer screen. ViewScene is used any
time the user wants to see a geometry or spatial data. The ViewScene module also
provides access to many simulation parameters and controls, and indirectly initiates new
iterations of simulation steps, which is important for computational steering (see
Computational Steering). Multiple ViewScene windows can be created. Each window is
independent of the others.
Another module is SCIRun/BioPSE - Problem Solving Environment for
BioMedical Applications
SCIRun consist of packages, which are collections of modules organized by
category. Because SCIRun core is required, it is not technically a package. Like a
package, SCIRun core provides a set of datatypes, algorithms, and modules. Unlike
89
Data Visualization
packages, SCIRun core is required and SCIRun would not function without it. Its
modules are divided into the following categories:
• Bundle
• ChangeFieldData
• ChangeMesh
• Converters
• DataArrayMath
• DataIO
• Examples
• Math
• MiscField
• NewField
• Render
• String
• Visualization
9.5.7 WebWinds
WebWinds is a visualization program developed originally by the Jet Propulsion
Lab. WebWinds is freely available software that allows atmospheric scientists,
educators, students and the general public to quickly and easily visualize and analyze
data on many of the computer platforms in use today
(http://www.openchannelsoftware.com/projects/WebWinds). WebWinds is written in
Java and able to ingest files from local disk or the WWW. It is designed to eventually
be distributed over the Internet and operate outside of WWW browsers entirely,
allowing fewer restrictions as to where data and applications will be required to be
stored.
WebWinds is an interactive science data visualization system available for all
major computer platforms. Its use does not require any user programming experience
since sessions are created by assembling components on the screen via 'point and click'.
WebWinds is modular, allowing flexibility in tool construction and application. Allows
90
Data Visualization
internet-based distributed processing for the first time so that it can fulfill the needs of
data providers as well as data consumers.
Because it is written in Java, WebWinds is modular, allowing flexibility in tool
construction and application. It is also largely platform and operating system
independent so that it functions efficiently in today's heterogeneous environment.
WebWinds is also object-oriented, but the objects are provided with a complete
inter-face and a visual programming approach was adopted.
9.6 OpenGL
OpenGL is strictly defined as a software interface to graphics hardware. In
essence, it is a 3D graphics and modeling library that is extremely portable and very
fast. It uses algorithms developed and optimized by Silicon Graphics, Inc. (SGI).
Generic (software only) implementations of OpenGL are also possible. Microsoft
implementation of OpenGL is from this category. The forerunner of OpenGL was GL
from SGI.
According to the OpenGL data sheet, OpenGL is an industry standard, stable,
reliable and portable, evolving, scalable, easy to use and well-documented 3D graphics
API. As a software interface for graphics hardware, OpenGL's main purpose is to render
two- and three-dimensional objects into a frame buffer. These objects are described as
sequences of vertices (which define geometric objects) or pixels (which define images).
OpenGL performs several processing steps on this data to convert it to pixels, to form
the final desired image in the frame buffer.
The OpenGL specification was managed by an independent consortium, the
OpenGL Architecture Review Board (ARB) formed in 1992, some of its members were
SGI (Silicon Graphics) and Microsoft. In fall of 2006, the ARB and the Khronos Board
of Directors voted to transfer control of the OpenGL API standard to the Khronos
Group.
OpenGL is available in a variety of systems. Additions to the specification
(through extensions) are well controlled by the consortium and proposed updates are
announced in time for developers to adopt changes. Backwards compatibility is also
ensured.
OpenGL is reliable as all applications based on OpenGL produce consistent
visual display results on any OpenGL API compliant hardware. Portability is also a fact
91
Data Visualization
92
Data Visualization
93
Data Visualization
independent protocol. No OpenGL commands are provided for obtaining user input,
windows management or user interaction.
94
Data Visualization
rasterized and the resulting fragments merged into the frame buffer just as if they were
generated from geometric data.
The diagram in Figure 52 details the OpenGL processing pipeline. For most of
the pipeline, we can see three vertical arrows between the major stages. These arrows
represent vertices and the two primary types of data that can be associated with vertices:
color values and texture coordinates. Also note that vertices are assembled into
primitives, then into fragments, and finally into pixels in the frame buffer.
Many OpenGL functions are variations of each other, differing mostly in the data
types of their arguments. Some functions differ in the number of related arguments and
whether those arguments can be specified as a vector or must be specified separately in
a list. For example, if we use the glVertex2f function, we need to supply x- and y-
coordinates as 32-bit floating-point numbers; with glVertex3sv, we must supply an
array of three short (16-bit) integer values for x, y, and z.
95
Data Visualization
• For every window system, there is a library that extends the functionality of
that window system to support OpenGL rendering. For machines that use the
X Window System, the OpenGL Extension to the X Window System (GLX) is
provided as an adjunct to OpenGL. GLX routines use the prefix glX. For
Microsoft Windows, the WGL routines provide the Windows to OpenGL
interface. All WGL routines use the prefix wgl.
96
Data Visualization
support and is designed to be easily ported to any platform that supports both
FreeType and the OpenGL API.
• The OpenGL Stream Codec (GLS) is a facility for encoding and decoding
streams of 8-bit bytes that represent sequences of OpenGL commands.
• Mesa is a 3-D graphics library with an API which is very similar to that of
OpenGL. Mesa is an open-source implementation of the OpenGL specification
- a system for rendering interactive 3D graphics.
97
Data Visualization
process, and running a program. The window management and input routines provide a
base level of functionality with which we can quickly get started programming in
OpenGL. Do not use them, however, in a production application. Here are some reasons
for this warning:
− The message loop is in the library code.
− There is no way to add handlers for additional WM* messages.
− There is very little support for logical palettes.
• The WGL functions.
This set of functions connects OpenGL to the Windows XP/NT/2000 and
Windows 95/98 windowing system. The functions manage rendering contexts, display
lists, extension functions, and font bitmaps. The WGL functions are analogous to the
GLX extensions that connect OpenGL to the X Window System. The names for these
functions have a "wgl" prefix.
• New Win32 functions for pixel formats and double buffering.
These functions support per-window pixel formats and double buffering (for
smooth image changes) of windows. These new functions apply only to OpenGL
graphics windows.
The functions and routines of the Win32 library are necessary to initialize the
pixel format and control rendering for OpenGL. Some routines, which are prefixed by
wgl, extend Win32 so that OpenGL can be fully supported. For Win32/WGL, the
PIXELFORMATDESCRIPTOR is the key data structure to maintain pixel format
information about the OpenGL window. A variable of data type
PIXELFORMATDESCRIPTOR keeps track of pixel information, including pixel type
(RGBA or color index), single- or double- buffering, resolution of colors, and presence
of depth, stencil, and accumulation buffers. More information about WGL is available
through the Microsoft Developer Network Library
The generic implementation from Microsoft has the following limitations:
• Printing.
We can print an OpenGL image directly to a printer using metafiles only.
• OpenGL and GDI graphics cannot be mixed in a double-buffered window.
98
Data Visualization
An application can directly draw both OpenGL graphics and GDI graphics into a
single-buffered window, but not into a double-buffered window.
• There are no per-window hardware color palettes.
Windows has a single system hardware color palette, which applies to the whole
screen. An OpenGL window cannot have its own hardware palette, but can have its own
logical palette. To do so, it must become a palette-aware application.
• There is no direct support for the Clipboard, dynamic data exchange (DDE), or
OLE.
A window with OpenGL graphics does not directly support these Windows
capabilities.
• The Inventor 2.0 C++ class library is not included.
The Inventor class library, built on top of OpenGL, provides higher-level
constructs for programming 3-D graphics. It is not included in the current version of
Microsoft's implementation of OpenGL for Windows.
• There is no support for the following pixel format features: stereoscopic
images, alpha bit planes, and auxiliary buffers.
There is, however, support for several ancillary buffers including: stencil buffer,
accumulation buffer, back buffer (double buffering), overlay and underlay plane buffer,
and depth (z-axis) buffer.
99
Data Visualization
four floating-point coordinates (x, y, z, w). If w is different from zero, these coordinates
correspond to the Euclidean three-dimensional point (x/w, y/w, z/w). We can specify the
w coordinate in OpenGL commands, but that's rarely done. If the w coordinate isn't
specified, it's understood to be 1.0.
A point is represented by a set of floating-point numbers called a vertex. All
internal calculations are done as if vertices are 3D. Vertices specified by the user as 2D
are assigned a z coordinate equal to zero by OpenGL.
In OpenGL, the term line refers to a line segment, not the mathematician's
version that extends to infinity in both directions. There are easy ways to specify a
connected series of line segments, or even a closed, connected series of segments. In all
cases, though, the lines constituting the connected series are specified in terms of the
vertices at their endpoints. Polygons are the areas enclosed by single closed loops of
line segments, where the line segments are specified by the vertices at their endpoints.
Polygons are typically drawn with the pixels in the interior filled in, but we can also
draw them as outlines or a set of points. In general, polygons can be complicated, so
OpenGL makes some strong restrictions on what constitutes a primitive polygon. First,
the edges of OpenGL polygons can't intersect. Second, OpenGL polygons must be
convex, meaning that they cannot have indentations. Stated precisely, a region is convex
if, given any two points in the interior, the line segment joining them is also in the
interior. The reason for the OpenGL restrictions on valid polygon types is that it's
simpler to provide fast polygon-rendering hardware for that restricted class of polygons.
Simple polygons can be rendered quickly. The difficult cases are hard to detect quickly.
So for maximum performance, OpenGL crosses its fingers and assumes the polygons
are simple. Many real-world surfaces consist of nonsimple polygons, nonconvex
polygons, or polygons with holes. Since all such polygons can be formed from unions
of simple convex polygons, some routines to build more complex objects are provided
in the GLU library. These routines take complex descriptions and tessellate them, or
break them down into groups of the simpler OpenGL polygons that can then be
rendered. Since OpenGL vertices are always three-dimensional, the points forming the
boundary of a particular polygon don't necessarily lie on the same plane in space. If a
polygon's vertices don't lie in the same plane, then after various rotations in space,
100
Data Visualization
changes in the viewpoint, and projection onto the display screen, the points might no
longer form a simple convex polygon.
Since rectangles are so common in graphics applications, OpenGL provides a
filled-rectangle drawing primitive, glRect*(). We can draw a rectangle as a polygon too,
but implementation of OpenGL might have optimized glRect*() for rectangles.
Any smoothly curved line or surface can be approximated - to any arbitrary
degree of accuracy - by short line segments or small polygonal regions. Thus,
subdividing curved lines and surfaces sufficiently and then approximating them with
straight line segments or flat polygons makes they appear curved (see Figure 53).
Even though curves aren't geometric primitives, OpenGL does provide some
direct support for subdividing and drawing them.
With OpenGL, all geometric objects are described as an ordered set of vertices.
We can use the glVertex*() command to specify a vertex.
101
Data Visualization
102
Data Visualization
The actual collection of pixels on the screen which are drawn for various point
widths depends on whether antialiasing is enabled. Antialiasing is a technique for
smoothing points and lines as they're rendered. If antialiasing is disabled (the default),
fractional widths are rounded to integer widths, and a screen-aligned square region of
pixels is drawn. Thus, if the width is 1.0, the square is 1 pixel by 1 pixel; if the width is
2.0, the square is 2 pixels by 2 pixels, and so on. With antialiasing enabled, a circular
group of pixels is drawn, and the pixels on the boundaries are typically drawn at less
than full intensity to give the edge a smoother appearance. In this mode, non-integer
widths aren't rounded. Most OpenGL implementations support very large point sizes.
With OpenGL, we can specify lines with different widths and lines that are
stippled in various ways - dotted, dashed, drawn with alternating dots and dashes, and
so on. The actual rendering of lines is affected by the antialiasing mode, in the same
way as for points. Without antialiasing, widths of 1, 2, and 3 draw lines 1, 2, and 3
pixels wide.
Polygons are typically drawn by filling in all the pixels enclosed within the
boundary, but we can also draw them as outlined polygons or simply as points at the
vertices. A filled polygon might be solidly filled or stippled with a certain pattern.
Antialiasing polygons is more complicated than for points and lines. A polygon has two
sides - front and back - and might be rendered differently depending on which side is
facing the viewer.
103
Data Visualization
based platform, PC's (Windows 9x/NT/ME/2000/XP) and Mac OS X. The design and
implementation of the library has been strongly influenced by object-oriented
principles.
104
Data Visualization
Renderer
Render master instances
(creates rendering Windows)
Actor instances
(property, transfom, mapper)
105
Data Visualization
Actors, Renderers & Windows and UI using Tcl/Tk. Sources are information provider
objects. Some sources are readers, that is, they get the information from files in
appropriate formats. Examples: vtkPLOT3DReader, vtkBYUReader,
vtkPolyDataReader (one of vtk own formats). Others generate data algorithmically. The
typical example are the sources that create basic geometric shapes like
vtkSphereSource, vtkCylinderSource, vtkConeSource. Mappers, as suggested by the
name itself, get the information provided by source objects, directly or through
appropriate filters, and generate graphic primitives. There are two types of mappers:
vtkDataSetMapper, vtkPolyDataMapper. A special type of mappers are the writers,
which write out information in files with different formats.
A Output
B Output
C
Intput Intput
Dataset A Dataset B
106
Data Visualization
vtkLight are associated with Renderer objects to specify how the scene is seen and
illuminated. If the user does not explicitly create objects of these two types, default ones
are provided. Filters are “transformation” objects and there are many types of filters
available. For example, the marching cubes and marching squares algorithms are
implemented as filters. The same is true for decimation, sampling, geometry extraction,
thresholding, particle and many other algorithms. The simplest pipeline includes a
source, a mapper and an actor. Actors must be associated with a renderer and the
renderer associated to a window.
The VTK model is based on the data-flow paradigm adopted by many
commercial systems. In this paradigm, modules are connected together into a network.
The modules perform algorithmic operations on data as it flows through the network.
The execution of this visualization network is controlled in response to demands for
data (demand-driven) or in response to user input (event driven). The appeal of this
model is that it is flexible, and can be quickly adapted to different data types or new
algorithmic implementations. The visualization model consists of two basic types of
objects: process objects and data objects (see Figure 58). Process objects are the
modules, or algorithmic portions of the visualization network. Data objects, also
referred to as datasets, represent and enable operations on the data that flows through
the network.
107
Data Visualization
108
Data Visualization
#include "vtkSphereSource.h"
#include "vtkPolyDataMapper.h"
#include "vtkActor.h"
#include "vtkProperty.h"
#include "vtkRenderWindow.h"
#include "vtkRenderer.h"
#include "vtkRenderWindowInteractor.h"
void main ()
{
// an interactor
vtkRenderWindowInteractor *iren =
vtkRenderWindowInteractor::New();
iren->SetRenderWindow( renWin );
109
Data Visualization
110
Data Visualization
partially even provided by the toolkit itself. Further, prefuse clearly distinguishes
between absolute and view coordinates. This separation helps users to place all visual
elements in a logical way without consideration of later applied visualization techniques
that changes the entire view.
111
Data Visualization
row contains a data record, and each column contains values for a named data field with
a specific data-type. Each record of the table is referred as a Tuple.
Visual structures are created when filtering the data container which contains the
abstract data. In case of a visualization that displays all items from the beginning on, the
filtering routine must only executed once at initializing time.
Typically, the rendered visual structure is not stored, instead, they are re-
rendered whenever needed. However, in case of very complex but constant visual
representations it makes sense to hold items in an off-screen image.
9.9 Ferret
Ferret is an analysis tool for gridded and non-gridded data and an interactive
computer visualization and analysis environment designed to meet the needs of
oceanographers and meteorologists analyzing large and complex gridded data sets. It
runs on most UNIX systems, and on Windows XP/NT/9x using X windows for display.
It can transparently access extensive remote Internet data bases using OPeNDAP
(formerly known as DODS). OPeNDAP and Ferret create a powerful tool for the
retrieval, sampling, analyzing and displaying of datasets; regardless of size or data
format (though there are data format limitations).
Ferret was developed by the Thermal Modeling and Analysis Project (TMAP) at
NOAA's Pacific Marine Environmental Laboratory PMEL in Seattle to analyze the
outputs of its numerical ocean models and compare them with gridded, observational
data. The model data sets are generally multi-gigabyte in size with mixed 3 and 4-
dimensional variables defined on staggered grids. Ferret offers a Mathematica-like
approach to analysis; new variables may be defined interactively as mathematical
expressions involving data set variables. Calculations may be applied over arbitrarily
shaped regions. Fully documented graphics are produced with a single command.
Many software packages have been developed recently for scientific
visualization. The features that make Ferret distinctive among these packages are
Mathematica-like flexibility, geophysical formatting, "intelligent" connection to its data
base, memory management for very large calculations, and symmetrical processing in 4
dimensions.
PMEL has developed a WWW-based visualization and data extraction system.
The PMEL server uses HTML forms to provide a point and click front end to the
112
Data Visualization
scientific analysis and visualization program FERRET. The server provides access to a
large (over 20 gigabytes) research-oriented data base of multi-dimensional, gridded,
environmental data collected within NOAA and elsewhere. The data base is maintained
by the Thermal Modeling and Analysis Project (TMAP) at PMEL.
Ferret LAS software can be obtained and configured to meet the needs of other
gridded data set providers. It is especially well-suited to groups of data producers with
related data sets at distributed locations (for example, a community modeling effort). It
runs on most UNIX systems, and on Windows XP/NT/9x using X windows for display.
It can be installed to run from a Web browser ("WebFerret") for uses while away from
desk or from a system lacking X windows software. It can transparently access
extensive remote Internet data bases using OpeNDAP; (see
http://ferret.wrc.noaa.gov/Ferret/ and http://www.opendap.org/).
113
Data Visualization
• Standard include files that pre-define many shapes, colors and textures.
• Photons for realistic, reflected and refracted, caustics. Photons also interact
with media.
• Particle media to model effects like clouds, dust, fire and steam.
• Several image file output formats including Targa, BMP (Windows only),
PNG and PPM.
• Basic shape primitives such as: spheres, boxes, quadrics, cylinders, cones,
triangle and planes.
• Advanced shape primitives such as: Tori (donuts), bezier patches, height fields
(mountains), blobs, quartics, smooth triangles, text, superquadrics, surfaces of
revolution, prisms, polygons, lathes, fractals, isosurfaces and the parametric
object.
• Objects are assigned materials called textures (a texture describes the coloring
and surface properties of a shape) and interior properties such as index of
refraction and particle media.
114
Data Visualization
10 Bibliography
115
Data Visualization
Post F.H;, Nielson Data Visualization: The State of the Art. The Springer
G.M;, Bonneau G- International Series in Engineering and Computer Science,
P.: Kluwer Academic Publishers, 2003, ISBN 1-4020-7259-7
Speray D., ACM SIGGRAPH Computer Graphics archive [on-line].
Kennon S.: Volume probes: interactive data exploration on arbitrary grids.
Volume 24, Issue 5 (November 1990), pages 5 – 12, 1990,
ISSN:0097-8930, [cit. 2008-07-24], URL:
<http://portal.acm.org/citation.cfm?coll=GUIDE&dl=GUIDE&i
d=99310>
Ward M. O.: Xmdv Tool Home Page [on-line]. A Taxonomy of Glyph
Placement Strategies for Multidimensional Data Visualization,
1999 [cit. 2008-07-24], URL:
<http://davis.wpi.edu/~xmdv/docs/jinfovis02_glyphpos.pdf>
Ware C.: Information Visualization, Second Edition: Perception for
Design. Elsevier Inc. 2004, ISBN 1-55860-819-2.
Wong P. Ch., 30 Years of Multidimensional Multivariate Visualization.
Bergeron R. D: Scientific Visualization - Overviews, Methodologies and
Techniques, pages 3-33, Los Alamitos, CA, 1997. IEEE
Computer Society Press.
116