Você está na página 1de 10

White Paper:

Parallel processing in PowerMILL 10

Delcam plc

White Paper:

Parallel processing in PowerMILL 10


Mark Jacobs - Principal Engineer

Abstract
This paper aims to remove the marketing hype
surrounding parallel processing and its performance
impact on CAM systems. Delcams research to date
helps to separate the fact from the fiction and gives
you a true understanding
of parallel processing in
the CAM environment.
In particular,
questions:

this

paper

addresses

the

What is parallel computing?


What influence does hardware configuration have on
toolpath calculation times?
How does parallel computing really benefit end users?
How will Delcam continue to harness the power of the
latest multi-core processors to benefit all aspects of
CAM programming?

following

Contents
1. Introduction ........................................................................................................................ 1

1.1 What savings can be expected? ............................................................... 2

1.2 How realistic are these potential savings? ........................................... 2

2. Increased productivity in PowerMILL 10 .................................................................. 2


2.1 Background processing .............................................................................. 2

2.2 Parallel processing ........................................................................................ 3

3. Performance improvements in PowerMILL 10 ....................................................... 3


4. Hardware effects................................................................................................................ 4

4.1 Adding more cores ....................................................................................... 4

4.2 Adding more processors ............................................................................ 5

4.3 Parallel processing and background processing ............................... 6

4.4 Which computer? .......................................................................................... 6

5. Future developments ...................................................................................................... 6


5.1 Faster toolpath calculations ...................................................................... 6

5.2 Larger models ................................................................................................ 6

6. Terminology and further reading ............................................................................... 7

1. Introduction
The new buzz words in computing at the moment
seem to be multi-core and parallel processing.
Increasing the clock speed of the processor has been
replaced by increasing the number of processor cores
in your computer. But what advantage does increasing
the number of cores in a processor give you and does
the reality actually live up to the hype?

simultaneously. This will undoubtedly provide a


greater performance gain for PowerMILL 10 than with
any previous PowerMILL release.

Figure 2: PowerMILL 10 diagram showing parallel processing


of both foreground and background calculations on a quad-core
PC.

Figure 1: Intel Core i7 Processor One of many multi-core


processors tested in PowerMILL 10 benchmarks.

In PowerMILL 10, parallel computing techniques have


been applied in two distinct ways:
Firstly, you can prepare, calculate or edit
toolpaths in the foreground while calculating
other toolpaths in the background, with minimal
degradation in processing speed. This effectively
doubles your potential productivity. This is what
Delcam terms Background Processing. It works
on any hardware but the benefits are greater on
multi-core machines.
Secondly, parallel processing performs different
parts of a complex calculation at the same time.
Essentially, this takes a single function and
processes it on all the cores in the CPU chip to
reduce overall calculation time. To benefit from
parallel processing you need a computer with
more than one processor.

Tests conducted on a range of strategies at Delcam


show that toolpath calculation improvements of three
or four times compared with earlier versions on a
single-core PC are possible. The actual improvement
depends heavily on your hardware configuration and
on the toolpath strategy you are calculating. This is
discussed later in the paper.

PowerMILL 10 benefits

Four times faster raster toolpath calculations

Reduces programming time by up to 2.5 times*

Less waiting time whilst toolpaths are calculating

Increases capacity for additional work

Significantly improves manufacturing productivity

Reduced lead times

Ability to handle even larger memory intensive


files

*On tests conducted at Delcam over a range of toolpaths.

(See Section 3.

Performance Improvements in PowerMILL 10 - page 3.)

A third, and, we believe unique benefit offered by


PowerMILL 10, is that parallel processing is used in
both foreground and background calculations
1.

1.1. What savings can be expected?

1.2. So how realistic are these potential savings?

If we look at a simplistic example, where in an average


working week toolpath calculations consume 50% of
an 8 hour shift, then annually this would equate to
approximately 120 days continuous toolpath calculations.

While it is unrealistic to assume across the board


savings of three or four times, as some claim,
it is realistic to expect major speed increases in calculation
time. You will also see significant productivity gains as
you can plan, create, and edit toolpaths in the foreground
while calculating toolpaths in the background, giving you
a new competitive edge at a time when you need it most.

Tests conducted on a range of toolpaths show that


using PowerMILL 10 on a quad-core machine results
in a 60% saving in toolpath calculation times,
therefore taking you from the 120 days to less than 50 days.

2. Increased productivity in PowerMILL 10


Many of the enhancements in PowerMILL 10 either
shorten calculation times or enable calculations to
process during PowerMILLs idle time. Both of these
aspects greatly improve productivity.
Background processing allows you to calculate
toolpaths, boundaries, or individual stock model
states in the background while continuing to
interact with PowerMILL.

Figure 3: Potential savings (in days) using PowerMILL 10 on


a multi-core PC.

In dollar terms, if the internal cost for generating


toolpaths is $50 per hour (taking into consideration
operator costs, down time, and machining delays due to
data starvation), this would equate to 70 days worth
of toolpath calculation savings or cost reductions in excess of
$28,000.00.

Parallel processing the overall calculation


is divided into subtasks that can be processed
simultaneously. This is only possible when more
than one processor is available. Parallel processing
can greatly reduce calculation times.
Specific toolpath speed-ups less memory is
used when area clearance toolpaths are calculated,
and calculation time has been reduced. This is
particularly beneficial when working on large
models as it reduces the likelihood of running out
of memory.

2.1. Background processing


PowerMILL 10 allows you to perform background
operations, such as toolpath or boundary creation,
while at the same time you can continue preparing,
editing or even calculating toolpaths in the foreground
of PowerMILL.

Figure 4: Potential savings (in $) using PowerMILL 10 on a multi-core


PC.

2.

Figure 5: Time compression of background processing in PowerMILL 10.


While preparing, editing or calculating toolpaths in the foreground you
can also be calculating toolpaths in the background.

To use background processing all you have to do is


click the new Queue button rather than Calculate on
a toolpath dialog. PowerMILL checks that everything is
set up correctly (such as block, tool...), and adds the
toolpath to the background queue. While you continue
working, PowerMILL calculates the toolpaths in the
queue in a background process.
Note: background processing works for boundaries and stock models as
well as toolpaths.

Preparation

Edit

Calculation

Foreground

Background

2.2. Parallel processing

Other strategies utilizing this code include:

Possibly the most important, but least visible


improvement in PowerMILL 10 is the use of parallel
processing in toolpath calculations.

Constant Z
3D Offset
Area clearance
Interleaved constant Z
Optimised constant Z
Boundary calculations

In PowerMILL 9, Point Distribution performed many


calculations in parallel to improve its performance. In
PowerMILL 10, the code that calculates how a tool
runs over the model also uses parallel processing.
As a result, raster machining calculations run
almost entirely in parallel.

In addition, the calculation to apply a toolpath to the


stock model runs entirely in parallel.
Parallel processing happens automatically if your
computer is suitable; you do not need to do anything to
activate it.

3. Performance improvements in
PowerMILL 10
Delcam has tested PowerMILL 10 on a range of
typical 3-axis parts. These tests show major speed
improvements for raster machining when using multiprocessor machines. Figure 7 on page 4 shows the
raster toolpath calculation time for PowerMILL 10 as a
percentage of the time taken by PowerMILL 9 on the
same computer, for a number of different processor
configurations.
Figure 6: Raster toolpath multi-threading on all 4 cores.

3.

On
the
quad-core
processors
the
benchmark
runs about 1.5 times faster in PowerMILL 10.
These benchmark tests can be requested by emailing
PowerMILL10@delcam.com and will also be included
in the PowerMILL examples folder on the installation
DVD.

4. Hardware effects
It is tempting to think that once parallel processing is
supported then the way to improve performance is to
add more and more processors. However, the test
results show that things are not quite that simple. It is
apparent from both performance graphs that the
processor configuration has a significant effect on the
calculation time. It is not immediately obvious why two
dual-core processors are significantly slower than a
single quad-core, and it is surprising that two quad
core processors (eight cores in total) perform worse
than a single quad-core.
The trends we can see in both graphs are:
Figure 7: PowerMILL 10 raster calculation compared with
PowerMILL 9 on different processor configurations.

There are slight


processor machine,

improvements on the single


caused by other optimizations.

Raster machining benefits the most from parallel


processing at the moment. Other strategies benefit
too, but not to the same extent. What matters for
most users is the overall performance for a typical
machining project. The graph below compares the
performance of PowerMILL 10 and PowerMILL 9 for
a benchmark using a range of strategies.

Adding more cores improves performance, but...


More than four cores makes little difference, and...
Adding more processors reduces performance.

4.1. Adding more cores


Why then does the benefit of adding processors tail
off?
The problem of parallel processing in a computer
program is very much like organising the production
of a single product in a company. You need to decide
who does what, how their efforts are coordinated and
how different peoples output gets combined into the
final product.
The interaction between people means some form of
management is necessary, which is generally an
overhead. The first problem is to make a management
system that works at all. The second problem is to
minimise this overhead.
Consider the production of a magazine. The basic process is:

Figure 8: PowerMILL 10 benchmark calculations compared with


PowerMILL 9 on different processor configurations.

Write articles.
Edit and collate, to produce the magazine.
4.

Let us assume that the effort to produce a typical magazine


breaks down as shown in the table:

Task

Man hours

Write articles

20

Edit magazine

Production time

22

In this case the obvious target for parallel processing is the


writing of the articles. If there are four articles, then three
authors and the editor could write one each, taking 5 hours
and reducing the total production time to 7 hours (about three
times faster than one person).
What happens if more authors are available? It might be
possible to get two authors to work on each article, but this
would require much closer cooperation, and it is unlikely that
they would complete the work in half the time it would take
for a single author to do it. If they could complete an article
in 3 hours between them, the a team of seven authors plus
the editor could produce the whole magazine in 5 hours.
Doubling the team from four to eight has only increased the
speed by 40%.
If there were forty articles and forty authors, there
is a different problem. The authors could write
an article each, taking half an hour. However there is still
only one editor who will still take two hours to edit the
magazine. Putting more and more people into writing articles
is never going to speed this up; now you have to think about
using more than one editor.
Deciding how to apply parallel processing in CAM
software is very similar. There is often one step that
takes the majority of the time. When you parallelise
this task it exposes other tasks that are now taking the
most time. To get the calculation to go significantly
faster you need to parallelise other tasks.
Performance gains are limited by the fraction of the
program that can be parallelised to run on multiple
cores simultaneously; this effect of diminishing returns
is known as Amdahls law. For example if only 50%
of a toolpath can be parallelised, the theoretical
speedup would be 1.9x, as shown in Figure 9, no matter how
many cores are available.

Figure 9: Amdahls law illustrating the maximum theoretical speed


up achievable using up to 32 processors, for tasks where different
proportions of the work can be done in parallel.

In PowerMILL the most performance-critical process


is the production of gouge-free tool passes over the
model. This now runs in parallel, but the nature of the
algorithms means that most of the improvements are
achieved with four processes running in parallel. To
achieve further speed increases we need to make other
routines work in parallel.

4.2. Adding more processors


Figures 7 and 8 show that two dual-core processors are slower
than a single quad-core. They also show that two quad-core
processors are slower than the single quad-core. We have
already seen that PowerMILL gains limited benefits from more
than four cores, but why is performance reduced when the
cores are in separate processor packages?
The clock speed of modern processors is so high that a major
limit on their performance is the time it takes to access main
memory. Processor manufacturers reduce this bottleneck by
including fast cache memory on the processor chip. Frequently
used data is kept in the cache where it can be accessed quickly.
When a processor has multiple cores they share the same onchip cache.
When processor cores are working in parallel, the
communication between cores to coordinate their tasks can
take advantage of the shared cache provided they are all
on the same chip. However, the benefit of the
shared cache is lost if some of the cores are on a
separate chip and communication has to use an
external bus or main memory.

5.

A further overhead arises because cache coherency


must be maintained - the contents of the caches must
be kept in step with the contents of main memory, and
vice-versa. It is quite complex to keep a single cache
up to date, but the problem becomes much more
complex and time consuming when coherency has to
be maintained between two or more caches and main
memory.

If calculation time is a major issue then dual


quad-core processors will help foreground and
background calculations to run at maximum
speed.
2 x Intel Xeon X5450 (3.00GHz, 1333MHz, 2x6MB
Cache, Quad Core)
16GB, 667MHz, ECC Memory (8x2GB)

4.3. Parallel processing and background


processing
Background processing allows you to organise your
activities so that you dont have to wait for
PowerMILL to calculate toolpaths; parallel processing
reduces toolpath calculation times.
Toolpath calculations in PowerMILL 10 benefit from
parallel processing whether they are running in the
foreground or in the background. Therefore, by
running
foreground
and
background
calculations
simultaneously, it is possible (on a suitably equipped
computer) to make full use of up to eight cores and 8
GB of memory.

4.4. Which computer should I use?


PowerMILL 10 will work on the same hardware as
PowerMILL 9. Background processing works on
single
processor
machines
and
dual-processor
machines will show noticeable performance benefits
from parallel processing as well.
To obtain the most benefit from PowerMILL 10 we
recommend:
Intel Core2 Quad Q9550 (2.83GHz, 1333MHz FSB, 12MB L2
Cache, Quad Core) 375W
8GB (4 x 2.0GB DIMM) 800MHZ ECC Dual Channel Memory
(requires 64-bit O/S)
512MB PCIe x16 nVidia Quadro FX 3700 (MRGA15), Dual
Monitor DVI or VGA Graphics Card (configured in a hardware
mirror)
2 x 320GB (7,200 rpm) SATA 3.0Gb/s Hard Drive with NCQ
and 16MB DataBurst Cache
Genuine Windows Vista Business x64 SP1

512MB PCIe x16 nVidia Quadro FX 3700 (MRGA15),


Dual Monitor DVI or VGA Graphics Card
2 x 320GB (7,200 rpm) SATA 3.0Gb/s Hard Drive
with NCQ and 16MB DataBurst Cache
Genuine Windows Vista Business x64 SP1 WITH
Media

5. Future developments
5.1. Faster toolpath calculations
We expect that future versions of PowerMILL will
give faster overall calculation times in two ways:1. Increasing the amount of multi-threading in the
program. This will improve the overall benchmark
time on dual-core and quad-core machines.
2. Optimising data structures to make better use of
processor
caches. This
will
allow
multi-chip
computers to work more efficiently and will
improve the dual quad-core times significantly.

5.2. Larger Models


Future versions of PowerMILL will include full 64-bit
support. On 64-bit machines, the amount of RAM that
can be used will only be limited by what can be
installed. This will allow extremely large or complex
parts to be processed successfully.

6.

6. Terminology and Further Reading


There are a lot of very similar terms used to describe
parallel computing. We have tried to use terminology
consistently as follows:

Processor the part of the computer that does


the real work, sometimes known as a Central
Processing Unit or CPU. In the past, processors
were packaged singly, but increasingly these days
multi-core processors include two, four or more
processors on a single chip.
Background processing - the ability to prepare
(or calculate) toolpaths in the foreground whilst
calculating another toolpath in the background. In
this case two separate calculations are performed
at the same time. This is sometimes referred to as
multi-tasking.
Parallel processing - the ability to perform different
parts of a single calculation simultaneously,
essentially taking a single function and dividing
it into parts that can be processed at the same
time on different processors. This is sometimes
referred to as multi-threading.
Parallel computing - the ability to perform
many
calculations
simultaneously.
This
can
be either parallel processing or background
processing
(or
both).
This
is
sometimes
referred to as multi-processing or multi-core
processing.
There is a lot of material about parallel computing
available on-line. Below is a sample of useful links;
most include references to much more detailed
information.
http://en.wik ipedia.org/wik i/Parallel_computing
is a good overview of the whole subject of parallel
computing.
http://en.wikipedia.org/wiki/Multicore_
(computing) discusses the evolution and different
types of multi-core processor.
http://en.wikipedia.org/wiki/Multiprocessing
goes into detail about different types of
multiprocessing.
http://en.wikipedia.org/wiki/Multithreading_
(computer_hardware) talks about the different
types of multi-threading.
7.

Você também pode gostar