Workloads 02 Tutorial

Workload Modeling
and its Effect on

Performance Evaluation
Dror Feitelson
Hebrew University
Performance Evaluation
In system design
Selection of algorithms
Setting parameter values
In procurement decisions
Value for money
Meet usage goals
For capacity planing
The Good Old Days

The skies were
blue
The simulation
results were
conclusive
Our scheme
was better than
theirs
Feitelson & Jette, JSSPP 1997
But in their papers,

Their scheme was better than ours!
How could they be so wrong?
Performance evaluation depends on:

The systems design
(What we teach in algorithms and data structures)
Its implementation
(What we teach in programming courses)
The workload to which it is subjected

The metric used in the evaluation
Interactions between these factors
Performance evaluation depends on:

The systems design
(What we teach in algorithms and data structures)
Its implementation
(What we teach in programming courses)
The workload to which it is subjected

The metric used in the evaluation
Interactions between these factors
Outline for Today

Three examples of how workloads affect
performance evaluation
Workload modeling
Getting data
Fitting, correlations, stationarity
Heavy tails, self similarity
Research agenda
In the context of parallel job scheduling
Example #1
Gang Scheduling and

Job Size Distribution
Gang What?!?
Time slicing parallel jobs with coordinated
context switching
Ousterhout
matrix
Ousterhout, ICDCS 1982
Gang What?!?
Time slicing parallel jobs with coordinated
context switching
Ousterhout
matrix
Optimization:
Alternative
scheduling
Ousterhout, ICDCS 1982
Packing Jobs
Use a buddy system for allocating processors
Feitelson & Rudolph, Computer 1990
Packing Jobs
Packing Jobs
Packing Jobs
Packing Jobs
The Question:
The buddy system leads to internal
fragmentation
But it also improves the chances of
alternative scheduling, because processors
are allocated in predefined groups
Which effect dominates the other?
The Answer (part 1):
Feitelson & Rudolph, JPDC 1996
Many small jobs

Many sequential jobs
Many power of two jobs
Practically no jobs use full machine
Conclusion: buddy system should work well
Verification
Feitelson, JSSPP 1996
Example #2
Parallel Job Scheduling

and Job Scaling
Variable Partitioning
Each job gets a dedicated partition for the
duration of its execution
Resembles 2D bin packing
Packing large jobs first should lead to better
performance
But what about correlation of size and
runtime?
Scaling Models
Constant work
Parallelism for speedup: Amdahls Law
Large first SJF
Constant time
Size and runtime are uncorrelated
Memory bound
Large first LJF
Full-size jobs lead to blockout
Worley, SIAM JSSC 1990
Scan Algorithm
Keep jobs in separate queues according to
size (sizes are powers of 2)
Serve the queues Round Robin, scheduling
all jobs from each queue (they pack
perfectly)
Assuming constant work model, large jobs
only block the machine for a short time
But the memory bound model would lead to
excessive queueing of small jobs
Krueger et al., IEEE TPDS 1994
The Data
The Data
The Data
The Data
Data: SDSC Paragon, 1995/6
The Data
The Data
Conclusion
Parallelism used for better results, not for
faster results
Constant work model is unrealistic
Memory bound model is reasonable
Scan algorithm will probably not perform
well in practice
Example #3
Backfilling and
User Runtime Estimation
Backfilling
Variable partitioning can suffer from
external fragmentation
Backfilling optimization: move jobs
forward to fill in holes in the schedule
Requires knowledge of expected job
runtimes
Variants
EASY backfilling
Make reservation for first queued job
Conservative backfilling
Make reservation for all queued jobs
User Runtime Estimates

Lower estimates improve chance of
backfilling and better response time
Too low estimates run the risk of having the
job killed
So estimates should be accurate, right?
They Arent
Mualem & Feitelson, IEEE TPDS 2001
Surprising Consequences
Inaccurate estimates actually lead to
improved performance
Performance evaluation results may depend
on the accuracy of runtime estimates
Example: EASY vs. conservative
Using different workloads
And different metrics
EASY vs. Conservative

Using CTC SP2 workload

Using Jann workload model

Using Feitelson workload model
Conflicting Results Explained
Jann uses accurate runtime estimates

This leads to a tighter schedule
EASY is not affected too much
Conservative manages less backfilling of long
jobs, because respects more reservations
Conservative is bad for the long jobs

Good for short ones that are respected
Conservative
EASY
Conflicting Results Explained

Response time sensitive to long jobs, which
favor EASY
Slowdown sensitive to short jobs, which
favor conservative
All this does not happen at CTC, because
estimates are so loose that backfill can
occur even under conservative
Verification
Run CTC workload with accurate estimates
But What About My Model?

Simply does not
have such small
long jobs
Workload Data Sources
No Data
Innovative unprecedented systems
Wireless
Hand-held
Use an educated guess

Self similarity
Heavy tails
Zipf distribution
Serendipitous Data
Data may be collected for various reasons
Accounting logs
Audit logs
Debugging logs
Just-so logs
Can lead to wealth of information
NASA Ames iPSC/860 log

42050 jobs from Oct-Dec 1993
user
user4
user4
user42
user41
sysadmin
user4
sysadmin
user41
job nodes runtime date

time
cmd8 32
70 11/10/93 10:13:17
cmd8 32
70 11/10/93 10:19:30
nqs450 32 3300 11/10/93 10:22:07
cmd342 4
54 11/10/93 10:22:37
pwd
1
6 11/10/93 10:22:42
cmd8 32
60 11/10/93 10:25:42
pwd
1
3 11/10/93 10:30:43
cmd342 4 126 11/10/93 10:31:32
Feitelson & Nitzberg, JSSPP 1995
Distribution of Job Sizes
Distribution of Job Sizes
Distribution of Resource Use
Distribution of Resource Use
Degree of Multiprogramming
System Utilization
Job Arrivals
Arriving Job Sizes
Distribution of Interarrival Times
Distribution of Runtimes
User Activity
Repeated Execution
Application Moldability
Distribution of Run Lengths
Predictability in Repeated Runs
Recurring Findings
Many small and serial jobs

Many power-of-two jobs
Weak correlation of job size and duration
Job runtimes are bounded but have CV>1
Inaccurate user runtime estimates
Non-stationary arrivals (daily/weekly cycle)
Power-law user activity, run lengths
Instrumentation
Passive: snoop without interfering
Active: modify the system
Collecting the data interferes with system
behavior
Saving or downloading the data causes
additional interference
Partial solution: model the interference
Data Sanitation
Strange things happen
Leaving them in is safe and faithful to
the real data
But it risks situations in which a nonrepresentative situation dominates the
evaluation results
Arrivals to SDSC SP2
Arrivals to LANL CM-5
Arrivals to CTC SP2
Arrivals to SDSC Paragon

What are they
doing at 3:30
AM?
3:30 AM
Nearly every day, a set of 16 jobs are run by
the same user
Most probably the same set, as they
typically have a similar pattern of runtimes
Most probably these are administrative jobs
that are executed automatically
Arrivals to CTC SP2
Arrivals to SDSC SP2
Arrivals to LANL CM-5
Arrivals to SDSC Paragon
Are These Outliers?

These large activity outbreaks are easily
distinguished from normal activity
They last for several days to a few weeks
They appear at intervals of several months
to more than a year
They are each caused by a single user!
Therefore easy to remove
Two Aspects
In workload modeling, should you include
this in the model?
In a general model, probably not
Conduct separate evaluation for special
conditions (e.g. DOS attack)
In evaluations using raw workload data,

there is a danger of bias due to unknown
special circumstances
Automation
The idea:
Cluster daily data in n based on various
workload attributes
Remove days that appear alone in a cluster
Repeat
The problem:
Strange behavior often spans multiple days
Cirne &Berman, Wkshp Workload Charact. 2001
Workload Modeling
Statistical Modeling
Identify attributes of the workload
Create empirical distribution of each
attribute
Fit empirical distribution to create model
Synthetic workload is created by sampling
from the model distributions
Fitting by Moments
Calculate model parameters to fit moments
of empirical data
Problem: does not fit the shape of the
distribution
Jann et al, JSSPP 1997
Fitting by Moments
Calculate model parameters to fit moments
of empirical data
Problem: does not fit the shape of the
distribution
Problem: very sensitive to extreme data
values
Effect of Extreme Runtime Values

Change when top records omitted
omit
mean
CV
0.01%
-2.1%
-29%
0.02%
-3.0%
-35%
0.04%
-3.7%
-39%
0.08%
-4.6%
-39%
0.16%
-5.7%
-42%
0.31%
-7.1%
-42%
Downey & Feitelson, PER 1999
Alternative: Fit to Shape

Maximum likelihood: what distribution
parameters were most likely to lead to the
given observations
Needs initial guess of functional form
Phase type distributions

Construct the desired shape
Goodness of fit
Kolmogorov-Smirnov: difference in CDFs
Anderson-Darling: added emphasis on tail
May need to sample observations
Correlations
Correlation can be measured by the
correlation coefficient
It can be modeled by a joint distribution
function
Both may not be very useful
Correlation Coefficient
x x y y
x x y y
i
Gives low results for

correlation of runtime
and size in parallel
systems
system
CTC SP2
KTH SP2
SDSC SP2
LANL CM-5
SDSCParagon
CC
-0.029
0.011
0.145
0.211
0.305
Distributions
A restricted version
of a joint distribution
Modeling Correlation
Divide range of one attribute into subranges
Create a separate model of other attribute
for each sub-range
Models can be independent, or model
parameter can depend on sub-range
Stationarity
Problem of daily/weekly activity cycle
Not important if unit of activity is very small
(network packet)
Very meaningful if unit of work is long
(parallel job)
How to Modify the Load

Multiply interarrivals or runtimes by a
factor
Changes the effective length of the day
Multiply machine size by a factor

Modifies packing properties
Add users
Stationarity
Problem of daily/weekly activity cycle
Not important if unit of activity is very small
(network packet)
Very meaningful if unit of work is long
(parallel job)
Problem of new/old system

Immature workload
Leftover workload
Heavy Tails
Tail Types
When a distribution has mean m, what is the
distribution of samples that are larger than x?
Light: expected to be smaller than x+m
Memoryless: expected to be x+m
Heavy: expected to be larger than x+m
Formal Definition
Tail decays according to a power law
F x Pr X x x
0a2
Test: log-log complementary distribution
log F ( x) a log x
Consequences
Large deviations from the mean are realistic
Mass disparity
small fraction of samples responsible for large
part of total mass
Most samples together account for negligible
part of mass
Crovella, JSSPP 2001
Unix File Sizes Survey, 1993
Unix File Sizes LLCD
Consequences
Large deviations from the mean are realistic
Mass disparity
small fraction of samples responsible for large
part of total mass
Most samples together account for negligible
part of mass
Infinite moments
For a 1 mean is undefined
For a 2 variance is undefined
Crovella, JSSPP 2001
Pareto Distribution
With parameter a 1 the density is
2
proportional to x
The expectation is then
1
E[ x] cx 2 dx c ln x
x
i.e. it grows with the number of samples
Pareto Samples
Pareto Samples
Pareto Samples
Effect of Samples from Tail

In simulation:
A single sample may dominate results
Example: response times of processes
In analysis:
Average long-term behavior may never happen
in practice
Real Life
Data samples are necessarily bounded
The question is how to generalize to the
model distribution
Arbitrary truncation
Lognormal or phase-type distributions
Something in between
Solution 1: Truncation
Postulate an upper bound on the distribution

Question: where to put the upper bound
Probably OK for qualitative analysis
May be problematic for quantitative
simulations
Solution 2: Model the Sample

Approximate the empirical distribution
using a mixture of exponentials (e.g. phasetype distributions)
In particular, exponential decay beyond
highest sample
In some cases, a lognormal distribution
provides a good fit
Good for mathematical analysis
Solution 3: Dynamic
Place an upper bound on the distribution
Location of bound depends on total number
of samples required
Example:
1
BF 1
2N
Note: does not change during simulation
Self Similarity
The Phenomenon
The whole has the same structure as certain
parts
Example: fractals
The Phenomenon
The whole has the same structure as certain
parts
Example: fractals
In workloads: burstiness at many different
time scales
Note: relates to a time series
Job Arrivals to SDSC Paragon
Process Arrivals to SDSC Paragon
Long-Range Correlation
A burst of activity implies that values in the
time series are correlated
A burst covering a large time frame implies
correlation over a long range
This is contrary to assumptions about the
independence of samples
Aggregation
Replace each subsequence of m consecutive
values by their mean
If self-similar, the new series will have
statistical properties that are similar to the
original (i.e. bursty)
If independent, will tend to average out
Poisson Arrivals
Tests
Essentially based on the burstiness-retaining
nature of aggregation
Rescaled range (R/s) metric: the range
(sum) of n samples as a function of n
R/s Metric
Tests
Essentially based on the burstiness-retaining
nature of aggregation
Rescaled range (R/s) metric: the range
(sum) of n samples as a function of n
Variance-time metric: the variance of an
aggregated time series as a function of the
aggregation level
Variance Time Metric
Modeling Self Similarity

Generate workload by an on-off process
During on period, generate work at steady pace
During off period to nothing
On and off period lengths are heavy tailed

Multiplex many such sources
Leads to long-range correlation
Research Areas
Effect of Users
Workload is generated by users
Human users do not behave like a random
sampling process
Feedback based on system performance
Repetitive working patterns
Feedback
User population is finite
Users back off when performance is
inadequate
Negative feedback
Better system stability
Need to explicitly model this behavior
Locality of Sampling
Users display different levels of activity at
different times
At any given time, only a small subset of
users is active
Active Users
Users display different levels of activity at
different times
At any given time, only a small subset of
users is active
These users repeatedly do the same thing
Workload observed by system is not a
random sample from long-term distribution
SDSC Paragon Data
SDSC Paragon Data
Growing Variability
SDSC Paragon Data
SDSC Paragon Data
The questions:
How does this effect the results of
performance evaluation?
Can this be exploited by the system, e.g. by
a scheduler?
Hierarchical Workload Models

Model of user population
Modify load by adding/deleting users
Model of a single users activity

Built-in self similarity using heavy-tailed on/off
times
Model of application behavior and internal

structure
Capture interaction with system attributes
A Small Problem
We dont have data for these models
Especially for user behavior such as
feedback
Need interaction with cognitive scientists
And for distribution of application types

and their parameters
Need detailed instrumentation
Final Words
We like to think
that we design
systems based
on solid
foundations
But beware:
the foundations
might be
unbased
assumptions!
Computer Systems are Complex

We should have more science in computer
science:
Collect data rather than make assumptions
Run experiments under different conditions
Make measurements and observations
Make predictions and verify them
Share data and programs to promote good
practices and ensure comparability
Advice from the Experts

Science if built of facts as a house if built of
stones. But a collection of facts is no more a
science than a heap of stones is a house
-- Henri Poincar
Advice from the Experts

Science if built of facts as a house if built of
stones. But a collection of facts is no more a
science than a heap of stones is a house
-- Henri Poincar
Everything should be made as simple as
possible, but not simpler
-- Albert Einstein
Acknowledgements
Students: Ahuva Mualem, David Talby,
Uri Lublin
Larry Rudolph / MIT
Data in Parallel Workloads Archive
Joefon Jann / IBM

Allen Downey / Welselley
CTC SP2 log / Steven Hotovy
SDSC Paragon log / Reagan Moore
SDSC SP2 log / Victor Hazelwood
LANL CM-5 log / Curt Canada
NASA iPSC/860 log / Bill Nitzberg

Workloads 02 Tutorial

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Workloads 02 Tutorial

Enviado por

Direitos autorais:

Formatos disponíveis

Workload Modeling

and its Effect on

For capacity planing

The Good Old Days

But in their papers,

How could they be so wrong?

Performance evaluation depends on:

The workload to which it is subjected

Performance evaluation depends on:

The workload to which it is subjected

Outline for Today

Gang Scheduling and

Ousterhout, ICDCS 1982

Feitelson & Rudolph, Computer 1990

The Answer (part 1):

Feitelson & Rudolph, JPDC 1996

The Answer (part 2):

The Answer (part 2):

The Answer (part 2):

The Answer (part 2):

Many small jobs

Conclusion: buddy system should work well

Feitelson, JSSPP 1996

Parallel Job Scheduling

Data: SDSC Paragon, 1995/6

Data: SDSC Paragon, 1995/6

Data: SDSC Paragon, 1995/6

User Runtime Estimates

Mualem & Feitelson, IEEE TPDS 2001

EASY vs. Conservative

EASY vs. Conservative

EASY vs. Conservative

Conflicting Results Explained

Jann uses accurate runtime estimates

Conservative is bad for the long jobs

Conflicting Results Explained

But What About My Model?

Workload Data Sources

Use an educated guess

Can lead to wealth of information

NASA Ames iPSC/860 log

job nodes runtime date

Distribution of Job Sizes

Distribution of Job Sizes

Distribution of Resource Use

Distribution of Resource Use

Arriving Job Sizes

Distribution of Interarrival Times

Distribution of Run Lengths

Predictability in Repeated Runs

Many small and serial jobs

Arrivals to SDSC SP2

Arrivals to LANL CM-5

Arrivals to CTC SP2

Arrivals to SDSC Paragon

Arrivals to CTC SP2

Arrivals to SDSC SP2

Arrivals to LANL CM-5

Arrivals to SDSC Paragon

Are These Outliers?

In evaluations using raw workload data,

Jann et al, JSSPP 1997

Effect of Extreme Runtime Values

Alternative: Fit to Shape