Escolar Documentos
Profissional Documentos
Cultura Documentos
1. INTRODUCTION
3. THE SYSTEM
4. HARDWARE IMPLEMENTATION
6. WATTCH ANALYSIS
7. PROBLEMS FACED
8. CONCLUSION
9. REFERENCES
Introduction
Motivation
Requirements of Watermarks
To be effective in the protection of the ownership of intellectual
property, the invisibly watermarked image should satisfy several
criteria:
Techniques of watermarking
An owner embeds the mark into an original image. The marking key is
used to generate the watermark and is typically an identifier assigned
to the owner or image. The marked image is perceptually identical to
the original image under normal observation.
Algorithm
implemented for
Image watermarking
and detection:
Secure, Fragile
Authentication Watermark with Localization – J. Fridrich et al
Background
Algorithm Description
Watermark Embedding
1. Divide the 8-bit grayscale image into 8x16 blocks (each block
comprising of 128 8-bit pixels).
2. For each block, calculate the hash H of the MSBs of all 128
pixels.
3. Xor the Hash with the Watermark bitstream (binary logo).
4. Encrypt the xor’ed output.
5. Inserted the encrypted bitsream into the corresponding LSB
positions of the image block being watermarked.
Watermark Detection/Verification
1. Divide the 8-bit grayscale image into 8x16 blocks (each block
comprising of 128 8-bit pixels).
2. For each block , calculate the hash H of the MSBs of all 128
pixels.
3. Decrypt the bitstream corresponding to the LSB positions of the
image block under consideration.
4. Xor the decrypted bitstream with the Hash Function output.
5. Compare the output of the above step with the Watermark
(Binary logo) bitstream.
6. If the 1st 64 bits of the generated watermark coincide with the
next 64 bits, it indicates that the block content is authentic.
Swapping blocks and cropping can thus be readily detected.
The main MD5 algorithm operates on a 128-bit state, divided into four
32-bit words, denoted A, B, C, and D. These are initialized to certain
fixed constants. The main algorithm then operates on each 512-bit
message block in turn, each block modifying the state. The processing
of a message block consists of four similar stages, termed rounds;
each round is composed of 16 similar operations based on a non-linear
function F, modular addition, and left rotation. Figure 1 illustrates one
operation within a round. There are four possible functions F, a
different one is used in each round.
for i from 0 to 63
k[i] := floor(abs(sin(i+1))x2^32)
//Intitialize variable:
// Pre-processing
append "1" bit to message
append "0" bits until message length in bits = 448(mod 512)
append bit length of message as 64-bit little-endian integer to
message
var int a := h0
var int b := h1
var int c := h2
var int d := h3
// Main loop
for 1 from 0 to 63
if 0<=1<=15 then
f:=(b and c)or((not b) and d)
g := ;
elese if 16<i<31
f:=(b and d)or ((not b and c)
g:- (5xi+1)mod 16
elseif32<=i<=47
f:=b xor c xor d
g:=(3xi+5)mod 16
elseif 48<=i<=63
temp := d
d:=c
c:=b
b:=((a+f+k(i)+w(g) leftrotate(i))+b
a:=temp
h0:=h0+a
h1:=h1+b
h2:=h2+c
h3:=h3+d
The following steps are used to generate a public key and a private
key:
1. Choose two large prime numbers p and q such that p is not equal
to q , randomly and independently of each other.
2. Compute n = pq
3. Compute the quotient Ф(n) = (p-1)(q-1)
4. Choose an integer e such that 1<e< Ф(n) which is co prime to
Ф(n).
5. Compute d such that de=1 (mod Ф(n)).
is:
Encrypt(m) = me mod n = m17 mod 3233 where m is the message.
Data[7:0]
The decryption function is: 128
Decrypt(c) = cd mod n = c2753 mod 3233 where c is the cipher text.
pixels Reg_lsb[127:0]
128x8 bit Register Bank
8-BIT GRAYSCALE IMAGE
To encrypt the plain text 123 we calculate
Encrypt(123) = 12317 mod 3233 = 855
THE SYSTEM
clock
ready
Watermark_Key XOR
lsb_enable
clock
E[4:0] Encryptor
Dataready_in
Dataready_out
Encoder (Watermark Embedder) operation
clock
reset
Hashing_function
Msg_out_valid
ready
Watermark_Key
XOR XOR
0
D[7:0] Decryptor
Dataready_in
Dataready_out
Decoder (Watermark Extractor) operation
HDL IMPLEMENTATION
The modules for the Encoder and the Decoder were written in Verilog
and their functional verification was done using MODELSIM 6.0a.
FPGA
The HDL designs for both the Watermarking Encoder and Decoder
were Placed and Routed using the QuartusII software suite for the
Altera DE2 board. The following results were obtained after running the
placement, power and timing analyzer.
Area
Modules\Analysi Logic Interconnec
I/O pins
s elements DSP t Usage (%) T_setu T_c
(Max:426
(Max:10345 blocks p (ns)
)
)
Encoder 6356 48 399 45 2.17
Decoder 6343 48 372 51 2.789
Hasher 3121 0 269 33 11.219
Encryptor 3209 48 264 35 48.204
Decryptor 3210 48 267 33 46.852
Register File
0 0 256 17 1.03
Encoder
Register File
0 0 256 14 1.1
Decoder
Analysis Summary
For all the modules, even though the device I/O utilization was very
high, the logic utilization was relatively low. Also, we see that the
hasher module consumes the maximum amount of power as it
repeatedly operates on 128 bit data. The encryptor and the decryptor
modules were found to be the most sensitive to setup time violations.
Area Power
Modules\Analysi Cell Net To
s # #Net #Cell Total Cell
Internal Switchin Dyn
Ports s s Area (um2)
Power g Power Po
Encoder 527 10661 14562 1820151.87 38.9 W 14.3 W 63.
Decoder 526 10403 13932 1760883.4 34.8 W 12.5 W 57.
Hasher 269 9790 9491 857681.93 17.5 W 8.7 W 27.
Encryptor 264 1265 381 989069.875 14.6 W 5.32 W 19.
Decryptor 267 1269 369 929877.43 11.2 W 4.88 W 16.1
Register File
260 261 131 34418.132 6.4 W 3.10 W 9.5
Encoder
Register File
261 273 127 38808.273 6.9 W 4.85 W 10.3
Decoder
Fig7b. Hasher
Fig7c. Encoder
Fig7d. Encryptor
Fig7.d Decryptor
Fig7.e Decoder
All the images above are the GDSII layouts of the individual modules
and the encoder/decoder as a whole. The above layouts were imported
into CoolViewPlus v7.0 for better clarity while displayed here.
Analysis Summary
VPR ANALYSIS
The Verilog netlist obtained from above was converted into a gate level
netlist. The gate level netlist was a BLIF netlist. The conversion was
done using ABC (A System for Sequential Synthesis and Verification,
developed by Berkeley Logic Synthesis and Verification Group), a logic
minimization tool. These blif files were then used for VPR based
Placement and Routing analysis.
Encoder:1:4
Routing area (in minimum width transistor areas):
Assuming no buffer sharing (pessimistic). Total: 6.36335e+06, Per CLB:
3286.85
Assuming buffer sharing (slightly optimistic). Total: 4.31675e+06, Per
CLB: 2229.73
Decoder: 4: 10
Routing area (in minimum width transistor areas):
Assuming no buffer sharing (pessimistic). Total: 2.71946e+06, Per CLB:
3730.40
Assuming buffer sharing (slightly optimistic). Total: 2.06824e+06, Per
CLB: 2837.09
Decoder: 1: 4
Routing area (in minimum width transistor areas):
Assuming no buffer sharing (pessimistic). Total: 7.88719e+06, Per CLB:
2704.80
Assuming buffer sharing (slightly optimistic). Total: 5.37527e+06, Per
CLB: 1843.37
For both encoder and decoder circuits the cluster size of 4 with 10
distinct inputs was found to be more area efficient as compared to the
cluster size of 1 with 4 distinct inputs. This is because in a clustered
FPGA many nets are completely absorbed within the clusters and their
routing is taken care of by the intra-cluster multiplexers.
“WATTCH” ANALYSIS
Background
Experimental Setup
Simulation Results
Fast
Hartley Sub-
Cosine
Transfor band WM_Algo
Transfor
m DCT
m
Branch
Prediction 0.4988 0.5051 0.5671 0.301
Unit
ResultBus
0.4601 0.4666 0.6034 0.613
Unit
Instruction
0.6547 0.6674 0.7015 0.661
Cache Unit
ALU Unit 2.299 2.315 2.61 1.95
Pow er dissipated per simulation cycle in milli-w atts by Watermark Encoder (on
WATTC H )
2.5
Performance Penalty
Fast
Hartley Sub-
Cosine
Transfor band WM_Algo
Transfor
m DCT
m
#of
simulation 15158 17250 20898 15230
cycles
Total simulation time of the Encoders(# of SIMPLESCALAR simulation cycles)
25000
20000
# of simulation cycles
15000
# of simulation cycles
10000
5000
0
Hartley Transform Fast Cosine Sub-band DCT WM_algo
Transform
Algorithm
Fast
Hartley Sub-
Cosine
Transfor band WM_algo
Transfor
m DCT
m
Branch
Prediction 0.5034 0.6152 0.6672 0.415
Unit
ResultBus
0.4004 0.412 0.4887 0.412
Unit
Instruction
0.7124 0.7328 0.79 0.654
Cache Unit
ALU Unit 3.65 3.67 3.92 2.98
Power dissipated per simulation cycle in milli-watts by Watermark Decoder (on
WATTCH)
3.5
2.5
1 ALU Unit
ALU Unit
0.5
Instruction Cache Unit
0 ResultBus Unit
Hartley Branch Prediction Unit
Fast
Transform Sub-band
Cosine WM_algo
DCT
Transform
Performance Penalty
Fast
Hartley Sub-
Cosine
Transfor band WM_algo
Transfor
m DCT
m
# of
simulation 16240 19310 21650 16348
cycles
Total simulation time of Decoders (# of simulation cycles on SIMPLESCALAR)
25000
20000
15000
Algorithm
# of simulation cycles
10000
5000
0
Hartley Transform Fast Cosine Sub-band DCT WM_algo
Transform
# of simulation cycles
Analysis Summary
PROBLEMS FACED
CONCLUSION
A Fragile watermarking system in spatial domain was successfully
implemented on FPGA and Standard Cell. The analysis was done using
Altera Quartus 2.0 Pro and Synopsys Design Compiler. The
watermarking process did not introduce visual artifacts and retained
the quality of the images. The design was ported to a suitable format
for analysis using academic software tools - VPR and Wattch. On all
platforms the basic area, power and timing results were determined.
REFERENCES