Você está na página 1de 149

Workload Modeling

and its Effect on


Performance Evaluation
Dror Feitelson
Hebrew University

Performance Evaluation
In system design
Selection of algorithms
Setting parameter values

In procurement decisions
Value for money
Meet usage goals

For capacity planing

The Good Old Days


The skies were
blue
The simulation
results were
conclusive
Our scheme
was better than
theirs
Feitelson & Jette, JSSPP 1997

But in their papers,


Their scheme was better than ours!

How could they be so wrong?

Performance evaluation depends on:


The systems design
(What we teach in algorithms and data structures)

Its implementation
(What we teach in programming courses)

The workload to which it is subjected


The metric used in the evaluation
Interactions between these factors

Performance evaluation depends on:


The systems design
(What we teach in algorithms and data structures)

Its implementation
(What we teach in programming courses)

The workload to which it is subjected


The metric used in the evaluation
Interactions between these factors

Outline for Today


Three examples of how workloads affect
performance evaluation
Workload modeling
Getting data
Fitting, correlations, stationarity
Heavy tails, self similarity

Research agenda
In the context of parallel job scheduling

Example #1

Gang Scheduling and


Job Size Distribution

Gang What?!?
Time slicing parallel jobs with coordinated
context switching
Ousterhout
matrix

Ousterhout, ICDCS 1982

Gang What?!?
Time slicing parallel jobs with coordinated
context switching
Ousterhout
matrix
Optimization:
Alternative
scheduling
Ousterhout, ICDCS 1982

Packing Jobs
Use a buddy system for allocating processors

Feitelson & Rudolph, Computer 1990

Packing Jobs
Use a buddy system for allocating processors

Packing Jobs
Use a buddy system for allocating processors

Packing Jobs
Use a buddy system for allocating processors

Packing Jobs
Use a buddy system for allocating processors

The Question:
The buddy system leads to internal
fragmentation
But it also improves the chances of
alternative scheduling, because processors
are allocated in predefined groups
Which effect dominates the other?

The Answer (part 1):

Feitelson & Rudolph, JPDC 1996

The Answer (part 2):

The Answer (part 2):

The Answer (part 2):

The Answer (part 2):

Many small jobs


Many sequential jobs
Many power of two jobs
Practically no jobs use full machine

Conclusion: buddy system should work well

Verification

Feitelson, JSSPP 1996

Example #2

Parallel Job Scheduling


and Job Scaling

Variable Partitioning
Each job gets a dedicated partition for the
duration of its execution
Resembles 2D bin packing
Packing large jobs first should lead to better
performance
But what about correlation of size and
runtime?

Scaling Models
Constant work
Parallelism for speedup: Amdahls Law
Large first SJF

Constant time
Size and runtime are uncorrelated

Memory bound
Large first LJF
Full-size jobs lead to blockout
Worley, SIAM JSSC 1990

Scan Algorithm
Keep jobs in separate queues according to
size (sizes are powers of 2)
Serve the queues Round Robin, scheduling
all jobs from each queue (they pack
perfectly)
Assuming constant work model, large jobs
only block the machine for a short time
But the memory bound model would lead to
excessive queueing of small jobs
Krueger et al., IEEE TPDS 1994

The Data

The Data

The Data

The Data

Data: SDSC Paragon, 1995/6

The Data

Data: SDSC Paragon, 1995/6

The Data

Data: SDSC Paragon, 1995/6

Conclusion
Parallelism used for better results, not for
faster results
Constant work model is unrealistic
Memory bound model is reasonable
Scan algorithm will probably not perform
well in practice

Example #3

Backfilling and
User Runtime Estimation

Backfilling
Variable partitioning can suffer from
external fragmentation
Backfilling optimization: move jobs
forward to fill in holes in the schedule
Requires knowledge of expected job
runtimes

Variants
EASY backfilling
Make reservation for first queued job
Conservative backfilling
Make reservation for all queued jobs

User Runtime Estimates


Lower estimates improve chance of
backfilling and better response time
Too low estimates run the risk of having the
job killed
So estimates should be accurate, right?

They Arent

Mualem & Feitelson, IEEE TPDS 2001

Surprising Consequences
Inaccurate estimates actually lead to
improved performance
Performance evaluation results may depend
on the accuracy of runtime estimates
Example: EASY vs. conservative
Using different workloads
And different metrics

EASY vs. Conservative


Using CTC SP2 workload

EASY vs. Conservative


Using Jann workload model

EASY vs. Conservative


Using Feitelson workload model

Conflicting Results Explained

Jann uses accurate runtime estimates


This leads to a tighter schedule
EASY is not affected too much
Conservative manages less backfilling of long
jobs, because respects more reservations

Conservative is bad for the long jobs


Good for short ones that are respected
Conservative

EASY

Conflicting Results Explained


Response time sensitive to long jobs, which
favor EASY
Slowdown sensitive to short jobs, which
favor conservative
All this does not happen at CTC, because
estimates are so loose that backfill can
occur even under conservative

Verification
Run CTC workload with accurate estimates

But What About My Model?


Simply does not
have such small
long jobs

Workload Data Sources

No Data
Innovative unprecedented systems
Wireless
Hand-held

Use an educated guess


Self similarity
Heavy tails
Zipf distribution

Serendipitous Data
Data may be collected for various reasons

Accounting logs
Audit logs
Debugging logs
Just-so logs

Can lead to wealth of information

NASA Ames iPSC/860 log


42050 jobs from Oct-Dec 1993
user
user4
user4
user42
user41
sysadmin
user4
sysadmin
user41

job nodes runtime date


time
cmd8 32
70 11/10/93 10:13:17
cmd8 32
70 11/10/93 10:19:30
nqs450 32 3300 11/10/93 10:22:07
cmd342 4
54 11/10/93 10:22:37
pwd
1
6 11/10/93 10:22:42
cmd8 32
60 11/10/93 10:25:42
pwd
1
3 11/10/93 10:30:43
cmd342 4 126 11/10/93 10:31:32
Feitelson & Nitzberg, JSSPP 1995

Distribution of Job Sizes

Distribution of Job Sizes

Distribution of Resource Use

Distribution of Resource Use

Degree of Multiprogramming

System Utilization

Job Arrivals

Arriving Job Sizes

Distribution of Interarrival Times

Distribution of Runtimes

User Activity

Repeated Execution

Application Moldability

Distribution of Run Lengths

Predictability in Repeated Runs

Recurring Findings

Many small and serial jobs


Many power-of-two jobs
Weak correlation of job size and duration
Job runtimes are bounded but have CV>1
Inaccurate user runtime estimates
Non-stationary arrivals (daily/weekly cycle)
Power-law user activity, run lengths

Instrumentation
Passive: snoop without interfering
Active: modify the system
Collecting the data interferes with system
behavior
Saving or downloading the data causes
additional interference
Partial solution: model the interference

Data Sanitation
Strange things happen
Leaving them in is safe and faithful to
the real data
But it risks situations in which a nonrepresentative situation dominates the
evaluation results

Arrivals to SDSC SP2

Arrivals to LANL CM-5

Arrivals to CTC SP2

Arrivals to SDSC Paragon


What are they
doing at 3:30
AM?

3:30 AM
Nearly every day, a set of 16 jobs are run by
the same user
Most probably the same set, as they
typically have a similar pattern of runtimes
Most probably these are administrative jobs
that are executed automatically

Arrivals to CTC SP2

Arrivals to SDSC SP2

Arrivals to LANL CM-5

Arrivals to SDSC Paragon

Are These Outliers?


These large activity outbreaks are easily
distinguished from normal activity
They last for several days to a few weeks
They appear at intervals of several months
to more than a year
They are each caused by a single user!
Therefore easy to remove

Two Aspects
In workload modeling, should you include
this in the model?
In a general model, probably not
Conduct separate evaluation for special
conditions (e.g. DOS attack)

In evaluations using raw workload data,


there is a danger of bias due to unknown
special circumstances

Automation
The idea:
Cluster daily data in n based on various
workload attributes
Remove days that appear alone in a cluster
Repeat

The problem:
Strange behavior often spans multiple days
Cirne &Berman, Wkshp Workload Charact. 2001

Workload Modeling

Statistical Modeling
Identify attributes of the workload
Create empirical distribution of each
attribute
Fit empirical distribution to create model
Synthetic workload is created by sampling
from the model distributions

Fitting by Moments
Calculate model parameters to fit moments
of empirical data
Problem: does not fit the shape of the
distribution

Jann et al, JSSPP 1997

Fitting by Moments
Calculate model parameters to fit moments
of empirical data
Problem: does not fit the shape of the
distribution
Problem: very sensitive to extreme data
values

Effect of Extreme Runtime Values


Change when top records omitted
omit
mean
CV
0.01%
-2.1%
-29%
0.02%
-3.0%
-35%
0.04%
-3.7%
-39%
0.08%
-4.6%
-39%
0.16%
-5.7%
-42%
0.31%
-7.1%
-42%
Downey & Feitelson, PER 1999

Alternative: Fit to Shape


Maximum likelihood: what distribution
parameters were most likely to lead to the
given observations
Needs initial guess of functional form

Phase type distributions


Construct the desired shape

Goodness of fit
Kolmogorov-Smirnov: difference in CDFs
Anderson-Darling: added emphasis on tail
May need to sample observations

Correlations
Correlation can be measured by the
correlation coefficient
It can be modeled by a joint distribution
function
Both may not be very useful

Correlation Coefficient
x x y y
x x y y
i

Gives low results for


correlation of runtime
and size in parallel
systems

system
CTC SP2
KTH SP2
SDSC SP2
LANL CM-5
SDSCParagon

CC
-0.029
0.011
0.145
0.211
0.305

Distributions

A restricted version
of a joint distribution

Modeling Correlation
Divide range of one attribute into subranges
Create a separate model of other attribute
for each sub-range
Models can be independent, or model
parameter can depend on sub-range

Stationarity
Problem of daily/weekly activity cycle
Not important if unit of activity is very small
(network packet)
Very meaningful if unit of work is long
(parallel job)

How to Modify the Load


Multiply interarrivals or runtimes by a
factor
Changes the effective length of the day

Multiply machine size by a factor


Modifies packing properties

Add users

Stationarity
Problem of daily/weekly activity cycle
Not important if unit of activity is very small
(network packet)
Very meaningful if unit of work is long
(parallel job)

Problem of new/old system


Immature workload
Leftover workload

Heavy Tails

Tail Types
When a distribution has mean m, what is the
distribution of samples that are larger than x?
Light: expected to be smaller than x+m
Memoryless: expected to be x+m
Heavy: expected to be larger than x+m

Formal Definition
Tail decays according to a power law

F x Pr X x x

0a2

Test: log-log complementary distribution

log F ( x) a log x

Consequences
Large deviations from the mean are realistic
Mass disparity
small fraction of samples responsible for large
part of total mass
Most samples together account for negligible
part of mass

Crovella, JSSPP 2001

Unix File Sizes Survey, 1993

Unix File Sizes LLCD

Consequences
Large deviations from the mean are realistic
Mass disparity
small fraction of samples responsible for large
part of total mass
Most samples together account for negligible
part of mass

Infinite moments
For a 1 mean is undefined
For a 2 variance is undefined
Crovella, JSSPP 2001

Pareto Distribution
With parameter a 1 the density is
2
proportional to x
The expectation is then

1
E[ x] cx 2 dx c ln x
x

i.e. it grows with the number of samples

Pareto Samples

Pareto Samples

Pareto Samples

Effect of Samples from Tail


In simulation:
A single sample may dominate results
Example: response times of processes

In analysis:
Average long-term behavior may never happen
in practice

Real Life
Data samples are necessarily bounded
The question is how to generalize to the
model distribution
Arbitrary truncation
Lognormal or phase-type distributions
Something in between

Solution 1: Truncation

Postulate an upper bound on the distribution


Question: where to put the upper bound
Probably OK for qualitative analysis
May be problematic for quantitative
simulations

Solution 2: Model the Sample


Approximate the empirical distribution
using a mixture of exponentials (e.g. phasetype distributions)
In particular, exponential decay beyond
highest sample
In some cases, a lognormal distribution
provides a good fit
Good for mathematical analysis

Solution 3: Dynamic
Place an upper bound on the distribution
Location of bound depends on total number
of samples required
Example:
1
BF 1
2N

Note: does not change during simulation

Self Similarity

The Phenomenon
The whole has the same structure as certain
parts
Example: fractals

The Phenomenon
The whole has the same structure as certain
parts
Example: fractals
In workloads: burstiness at many different
time scales
Note: relates to a time series

Job Arrivals to SDSC Paragon

Process Arrivals to SDSC Paragon

Long-Range Correlation
A burst of activity implies that values in the
time series are correlated
A burst covering a large time frame implies
correlation over a long range
This is contrary to assumptions about the
independence of samples

Aggregation
Replace each subsequence of m consecutive
values by their mean
If self-similar, the new series will have
statistical properties that are similar to the
original (i.e. bursty)
If independent, will tend to average out

Poisson Arrivals

Tests
Essentially based on the burstiness-retaining
nature of aggregation
Rescaled range (R/s) metric: the range
(sum) of n samples as a function of n

R/s Metric

Tests
Essentially based on the burstiness-retaining
nature of aggregation
Rescaled range (R/s) metric: the range
(sum) of n samples as a function of n
Variance-time metric: the variance of an
aggregated time series as a function of the
aggregation level

Variance Time Metric

Modeling Self Similarity


Generate workload by an on-off process
During on period, generate work at steady pace
During off period to nothing

On and off period lengths are heavy tailed


Multiplex many such sources
Leads to long-range correlation

Research Areas

Effect of Users
Workload is generated by users
Human users do not behave like a random
sampling process
Feedback based on system performance
Repetitive working patterns

Feedback
User population is finite
Users back off when performance is
inadequate
Negative feedback
Better system stability
Need to explicitly model this behavior

Locality of Sampling
Users display different levels of activity at
different times
At any given time, only a small subset of
users is active

Active Users

Locality of Sampling
Users display different levels of activity at
different times
At any given time, only a small subset of
users is active
These users repeatedly do the same thing
Workload observed by system is not a
random sample from long-term distribution

SDSC Paragon Data

SDSC Paragon Data

Growing Variability

SDSC Paragon Data

SDSC Paragon Data

Locality of Sampling
The questions:
How does this effect the results of
performance evaluation?
Can this be exploited by the system, e.g. by
a scheduler?

Hierarchical Workload Models


Model of user population
Modify load by adding/deleting users

Model of a single users activity


Built-in self similarity using heavy-tailed on/off
times

Model of application behavior and internal


structure
Capture interaction with system attributes

A Small Problem
We dont have data for these models
Especially for user behavior such as
feedback
Need interaction with cognitive scientists

And for distribution of application types


and their parameters
Need detailed instrumentation

Final Words

We like to think
that we design
systems based
on solid
foundations

But beware:
the foundations
might be
unbased
assumptions!

Computer Systems are Complex


We should have more science in computer
science:
Collect data rather than make assumptions
Run experiments under different conditions
Make measurements and observations
Make predictions and verify them
Share data and programs to promote good
practices and ensure comparability

Advice from the Experts


Science if built of facts as a house if built of
stones. But a collection of facts is no more a
science than a heap of stones is a house
-- Henri Poincar

Advice from the Experts


Science if built of facts as a house if built of
stones. But a collection of facts is no more a
science than a heap of stones is a house
-- Henri Poincar
Everything should be made as simple as
possible, but not simpler
-- Albert Einstein

Acknowledgements
Students: Ahuva Mualem, David Talby,
Uri Lublin
Larry Rudolph / MIT
Data in Parallel Workloads Archive

Joefon Jann / IBM


Allen Downey / Welselley
CTC SP2 log / Steven Hotovy
SDSC Paragon log / Reagan Moore
SDSC SP2 log / Victor Hazelwood
LANL CM-5 log / Curt Canada
NASA iPSC/860 log / Bill Nitzberg

Você também pode gostar