Enhancement of Security and Embedding Capacity Through Huffman Coding in Steganography

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 4, July August 2013 ISSN 2278-6856
Enhancement of Security and Embedding Capacity through Huffman Coding in Steganography

Sanjay Bajpai1 , Dr. Kanak Saxena 2
1
Department of Computer applications, Lakshmi Narain College of Technology, RGPV, Bhopal Kalchuri Nagar, Raisen Road, Bhopal (MP), INDIA.
2
Department of computer applications, Samrat Ashok Technological Institute Civil Lines, Vidisha (M.P.), INDIA.
Abstract: In this paper, we incorporated Huffman coding

algorithm in digital image steganography to enhance both the data security and embedding capacity. Data is compressed by variable length compression technique and then embedded in digital colored images using modified LSB technique. Pixels are broken into RGB components and some of the least significant bits, based on certain criterion, of these components are used to embed the secret message. Security is extended to a greater extent by using multi-keys for embedding and data compression ratio varies from 45% to 80% that depends on the type of data embedded. Here, we have categorized data into three types, small length, larger length and data from a particular domain. Our experimental results have shown that distortion in the stego-image is negligible, very difficult to extract the message and embedding capacity increased to a large extent.
Keywords:
Distortion
Compression
ratio,
Embedding
capacity,
1. INTRODUCTION
The present work shows the embedding of compressed data through Huffman coding in steganography. The native meaning of word steganography is hidden writing and is originated from the Greek language. It is an art and science in which secret data is hid in other medium known as cover and the thing which is hidden is known as host [1]. The basic advantage of steganography is to keep the unwanted persons or intruders away from the actual fact and this technique is successful since they are not able to see the hidden message and only cover is visible. Most common method employed in steganography is the LSB substitution method in which two or three bits are replaced by the bits of the secret message so that distortion is not visible by human eyes [2], [3] but they are unable to give high embedding capacity and because of the same pattern, steganalysis techniques can detect them. El Safy et al. [4] used adaptive data embedding technique involving Optimal Pixel Adjustment Process to hide data in the Integer Wavelet coefficients of the cover image. Recent trends show that genetic algorithm (GA) is commonly used to embed the secret message and Chin-Chen Chang et al [5] Volume 2, Issue 4 July August 2013
also used GA in their proposed block truncation coding (BTC) method that can embed about 3 bits in each BTCencoded block in an average. Elham Ghasemi et al [6] embedded the secret message in Discrete Wavelet Transform coefficients by dividing the image into 44 blocks. In our work, we have taken the digital colored images as the cover and text message as the host. Factors that play a vital role in digital image steganography are image distortion, embedding capacity and security of data [7] and care is taken in the proposed algorithm to optimize these factors. We have increased the security of data by using multi-key LSB substitution method and embedding capacity by compartmentalizing the pixels into its components. Further security and embedding capacity is enhanced by incorporating the Huffman coding algorithm and results show that embedding capacity is increased from 45% to 55 % that depends on the type of data chosen for hiding. Remainder of the paper is organized as follows. In section 2, we shall briefly discuss about the related work and Huffman coding algorithm. Section 3 describes the proposed method with analysis. Section 4 includes case studies, experimental results and comparisons with previous works. Finally, the conclusion is presented in section 5.
2. RELATED WORK
Most popular and core concept of steganography is to hide the secret message in digital images by changing the least significant bits of the pixels. Xin Liao and et al. [8] focused on finding the edge pixels of the cover image to hide the secret message because these are the locations where changes are least visible. Rosziati Ibrahim and et al. [9] proposed a SIS (Steganography Imaging System) in which they converted the secret message into binary codes and then embedded the two bits in each pixel. Mahmud Hasan and et al. [10] proposed a block processing mechanism in which they divided the image into a number of 44 non-overlapping blocks. Most Frequent Pixels (MFPs) and Second Most Frequent Pixels (SMFPs) of each block were identified and after deleting Page 73

their occurrences, remaining pixels were over written by the encoded bits of the secret message. 2.1 Huffman Coding Huffman coding is one of the popular entropy encoding algorithms used for lossless data compression [11]. It is classified into Static Huffman coding and Adaptive Huffman coding and later is more beneficial than former in terms of number of internal nodes, path length and height of the tree in memory representation [12]. To do the encoding, at first characters of the file with their frequency are stored in a list and sorted in the increasing order of frequency. To achieve optimality, Huffman joins the two symbols with lowest probability/frequency and replaces them with a new fictive node whose probability is the sum of the other nodes' probabilities. The two removed symbols are now two nodes in the binary tree. The fictive node is their parent and is not inserted in the binary tree yet but kept in the input list. We repeat this process until input list becomes empty and at the same time binary tree is constructed whose frequency indicates the total number of symbols in the file. Code for each symbol is constructed by traversing the tree from the root to leaf by assigning 0 for left visit and 1 for right visit. Encoded tree for a sample data is shown in Fig.1.
Ax
P ( x )l ( x) pili (3) x Ax i 1 Fig.2 shows the result of coding and compression on the given Ensemble X stated belowAx = { a , b , c , d } Px = {1/2 , 1/4 , 1/8, 1/8 }
L (C , X )
Figure 2 Code and code length of the symbols given probabilities
Figure 3 Contiguous interval representation of symbols 2.2 Huffman Decoding Decoding procedure can be performed by the binary search algorithm if the Huffman tree is stored in the array. This can be demonstrated by taking the sample data from Fig.3 where codes of 8 symbols are shown in the increasing order [13]. Let the input code be 01010, then fifth [(1+8) / 2 = 5] entry 01101 is tested. Since 01010 is smaller than 01101 so the third [(1+4)/2=3] entry is tested. This process is continued until the given code is found by reducing the list into half of its size in each iteration or the size of the list becomes only one.
Figure 1 Encoded tree for ETASNO For example, here TEA will be encoded as 10 00 010 and SEA will be encoded as 011 00 010 [12]. These codes are in the form of bits and space occupied for one symbol is reduced which in general take 8 bits. An ensemble X is a triple (x, Ax, Px) x: value of a random variable Ax: set of possible values for x , Ax={a1 , a2, , ai} Px: probability for each value , Px={p1, p2, , pi} where P(x) = P(x = ai) = pi, pi > 0, and
3. PROPOSED METHOD
3.1 Analysis In the proposed work, we are using multi-key LSB substitution method in which we are compartmentalizing the pixels into RGB components to make it more secure and every pixel is capable of hiding one character so embedding capacity varied linearly, that is
N ( c )N ( p )
Shannon information content of x is shown in equation (1) h(x) = log2(1/P(x)) (1) and entropy of x is stated in equation (2)
H ( x)
x Ax
P ( x ). log
1 p ( x)
(2)
and the expected length L(C,X) of symbol code C for X is stated in equation (3) which achieves as much compressiion as possible Volume 2, Issue 4 July August 2013
(4)
where N(c) is the number of characters that can be embedded and N(p) is the number of pixels of the image. We start by accumulating all the pixels of the cover image Page 74

in an array P. In our proposed algorithm, we are deploying four keys for embedding and extraction. KEY1 (k11, k12, k13) which is a composite key and is used for selecting the bits of the pixel components and KEY2 (k21, k22, k23) is used for deciding the pattern of the bits of the characters comprising the secret message. Two keys KEY3 and KEY4 are formulated based on the size and texture of the cover image to fix the region for embedding and to decide the gap between the pixels if needed. To increase the embedding capacity and to make it further secure we incorporated the Huffman compression technique. We assumed that the secret message which is to be embedded may vary from a few lines to thousands of lines. If message length is very large then Huffman compression technique plays a vital role in increasing the embedding capacity and security. We used the variable length code to make it further efficient. If T is a tree corresponding to a prefix code then number of bits required to encode a file is
B (T )
without causing any distortion to the cover image and use of multi keys make it more secure. 3.2 Algorithm (Embedding Process) 1. Store all the different characters of a file and their frequency in a two dimensional array T1 in the increasing order of frequency and accumulate all the pixels of the image in one dimensional array P of size M*N where MN is the size of the image. 2. Construct the Huffman binary tree T by merging the elements from left to right until all the elements of the array T1 are exhausted using the Huffman coding algorithm. 3. Generate the Huffman codes for every character in the array T1 using the Huffman binary tree T and store them in another array T2 along with the code length. 4. Calculate the keys KEY3 and KEY4 by incorporating the size and texture of the cover image and length of the secret message. KEY3 fixes the starting position of the cover image and KEY4 fixes the gap between two pixels. 5. Set the variable MaxBits to 9. 6. Read each code from the array T2 one by one, find its length Len and repeat steps from 6 to 10. 7. Repeat steps 8 and 9 until (Len MaxBits). 8. Select a pixel using keys KEY3 and KEY4 and bifurcate it into its components and embed first nine bits of the code, 3 bits in each RGB components. 9. Reduce the length of the code by MaxBits that have been embedded. 10. Embed the remaining bits of the code in next pixel using the keys KEY1 and KEY2. 11. Rejoin all the pixels of the array P to form the stegoimage 3.3 Algorithm (Extraction Process) 1. Accumulate pixels of the stego-image in the byte array T and extract the keys KEY1, KEY2, KEY3 and KEY4. 2. Read the length of the code from the input array T2. 3. Pick up pixels from the array T using KEY3 and KEY4. 4. Extract the bits from the pixels using keys KEY1 and KEY2 to generate the code. 5. Decode the code to get the symbol using decoding process discussed in 2.2 and append the symbol in String S. 6. Repeat steps 2 to 5 until all the elements of array T2 are exhausted. 7. The string S is the extracted message.
f (c)dT (c) cC
(5)
where (c) denote the frequency of a character c and dT(c) denote the depth of cs leaf in tree T and is also the length of the codeword for character c [14]. After calculating the probabilities of different symbols that are occurring in the secret message, Huffman codes are obtained by generating the prefix-free codes and placing the symbols at the leaves of a code tree. This helps in obtaining the instantaneously decodable code of minimal expected length L satisfying the equation
H (S ) L H (S ) 1 (6) where H(S) is the entropy of a source S or the expected information of a symbol [15], [16]. If source S is emitting one of symbols s1, s2 , , sn at each time with probabilities p1, p2, , pn respectively independent of the symbols emitted at other times then in a very long string of K emissions we expect to get Kp1 , Kp2, , Kpn instances of the symbols s1, s2, , sn respectively. These emitted symbols are independent and identically distributed. This is supported by the law of large numbers. If the expected or mean number of occurrences of symbol s1 in K independent repetitions is Kp1 where p1 is the probability of getting s1 in a single trial then standard deviation around this mean is sqrt{Kp1(1-p1)}. Therefore, fractional one-std spread around the mean is sqrt{(1-p1) (Kp1), that is, for large K, the number of occurrences of s1 is relatively tightly concentrated around the mean value of Kp1 [15]. In our proposed method, every pixel of the cover image can be embedded and to make it distortion free, embedding is done in all the components of the pixel. Since data is compressed before embedding therefore a message of very long length can be embedded
Volume 2, Issue 4 July August 2013
4. EXPERIMENTAL RESULTS
We selected and categorized data into three types, that is, micro (message length less than 100 characters), macro (message length of more than 1000 characters) and Page 75

texture (message from a particular domain) and tested our results on 50 different digital colored images out of which 5 results are shown. Different types of images are chosen as the cover and robustness is tested by selecting contiguous pixels of any region of the image to embed the message. Images img1c, img2c, img3c, img4c, img5c are the cover images and img1s, img2s, img3s, img4s, img5s are the stego-images. Micro message of length 92 is embedded in image img1c in Fig. 4(a) and corresponding stego-image img1s is shown in Fig. 4(b). Macro message of length 1537 and 1687 is embedded in img2c in Fig. 5(a) and img3c in Fig. 6(a) respectively and corresponding stego-images img2s and img3s are shown in Fig. 5(b) and Fig. 6(b). Similarly, data of same texture of length 1503 and 1714 are embedded in img4c and img5c whose stego-images are img4s and img5s that are shown in Fig. 7(a), 7(b), 8(a), 8(b) respectively. Compression ratio by Huffman coding depends on the length and nature of data that is why we took different samples of three types of data whose result is shown in Table 1. Maximum performance given by our algorithm is compared with the algorithms proposed by Mahmud Hasan et al. [10] and our previous designed algorithm which clearly show that our results have been improved as shown in Table 2. Mahmud Hasan et al. [10] compromised with only 32 bits out of 128 bits after performing calculations in their block processing approach and embedded characters of 7 bits. On an average, 4.57 characters can be embedded in 32 bits and considering the complete block of 44, 3.50 pixels are required to embed one character. In our previous algorithm, we stored 8 bits of a character per pixel by compartmentalizing it into its components and in no case failure occurred. In the proposed method, maximum 9 bits can be stored per pixel but it is observed that very few symbols are obtained whose code length is greater than 9. In general, code length varies from 4 to 7 for macro and texture type of messages and for micro message, code length varies from 1 to 4. Table 1: Results of Compression on Different type of data
Message Type Message Length (in chars) 12 92 1537 1687 1503 1714 Number of Bits per Symbol 1.67 1.83 4.22 4.24 3.97 3.80 Compression Ratio 82.46 77.17 47.16 47.00 50.43 52.52 No. of Bits Embedde d 20 168 6497 7152 5960 6510
For normal messages, embedding capacity is almost doubled as can be seen in table 1 where compression ratio varies from 47% to 52% and it gives very significant results for small messages where compression ratio varies from 75% to 85% and these compression ratios totally depend on the frequencies of the symbols occurring in the message. Compression through Huffman coding not only increases the embedding capacity but security is also increased as the symbols are embedded in the form of codes. Comparative results in peak position for macro and texture type messages are shown in table 2.
Result Images
Figure 4(a) img1c (Cover image) Table 2: Comparative Performance Measurements
Figure 4 (b) img1s (Stego image)
Micro Micro Macro Macro Texture Texture
Figure 5 (a) img2c (Cover image)
Figure 5 (b) img2s (Stego image) Volume 2, Issue 4 July August 2013 Page 76

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 4, July August 2013 ISSN 2278-6856 5. CONCLUSION
In this paper, we improved over the algorithms proposed by Mahmud Hasan and et al. [10] and our previous algorithm and devised the enhanced algorithm for embedding the message. We embedded 8 bits of the character per pixel in the previous algorithm but in the proposed algorithm, 8 bits are reduced to almost 2 bits for micro messages and 4 to 5 bits for the macro and texture type of messages. Bits per symbol in the table 1 is the average case and in actual implementation, it will be complete bits. The number of characters of the secret message that can be embedded may be 45% to 55% more than the number of pixels in the image. If cover image is of dimension 450338 pixel then it can hide about 228150 characters if on an average 50% compression is considered. Incorporation of Huffman coding for compression in multi key embedding algorithm increases security as well as embedding capacity which is clearly visible by the analysis of table1 and table2.
Figure 7 (a) img4c (Cover image) Graph Showing the Performance Measurement
References
[1] Min-Wen Chao, Chao-hung Lin, Cheng-Wei Yu, and Tong-Yee Lee, A High Capacity 3D Steganography Algorithm, IEEE Transactions on Visualization and Computer Graphics, Vol 20 No. 3, pp. 1-11, June, 2009. [2] D. Neeta, K. Snehal, D. Jacobs, Implementation of LSB Steganography and Its Evaluation for Various Bits, IEEE International Conference on Digital Information Management, pp. 173-178, June 2007. [3] V. Vijayalakshmi, G. Zayaraz, V. Nagaraj, A Modulo Based LSB Steganography Method, IEEE Conference on Control, Automation, Communication and Energy Conservation, pp. 1-4, August 2009. [4] El Safy, R.O, Zayed. H. H, El Dessouki. A, An Adaptive Steganography Technique Based on Integer Wavelet Transform, ICNM International Conference on Networking and Media Convergence, pp. 111-117, 2009. [5] Chin-Chen Chang, Chih-Yang Lin, Yi-Hsuan Fan, Lossless data hiding for color images based on block truncation coding, The Journal Pattern Recognition Society 41 (2008), Science Direct Elsevier, pp. 23472375, 16 December, 2007. [6] Elham Ghasemi, Jamshid Shanbehzadeh, Nima Fassihi, High Capacity Image Steganography using Page 77

Wavelet Transform and Genetic Algorithm, Proceedings of the International MultiConference of Engineers and Computer Scientists 2011 vol I (IMECS) ISBN: 978-988-18210-3-4, Hong Kong, 16-18 March, 2011. [7] Sanjay Bajpai, Kanak Saxena, Techniques of Steganography for Securing Information: A Survey, International Journal on Emerging Technologies 3(1), ISSN No. (Print): 0975-8364, pp. 48-54, 15 April, 2012. [8] Xin Liao, Qiao-yan Wen, Jie Zhang, A steganographic method for digital images with fourpixel differencing and modified LSB substitution, Journal of Vis. Commun. Image R 22 (2011), Elsevier, pp. 1-8, 22 August, 2010. [9] Rosziati Ibrahim and Teoh Suk Kuan, Steganography algorithm to hide secret message inside an image, Journal of Computer Technology and Application 2 (2011), pp. 102-108, 25 February, 2011. [10] Mahmud Hasan, Kamruddin Md. Nur, Tanzeem Bin Noor, A Novel Compressed Domain Technique of Reversible Steganography, International Journal of Advanced Research in Computer Science and Software Engineering ISSN: 2277 128X, pp. 1-6, 03March, 2012. [11] Asadollah Shahbahrami, Ramin Bahrampour, Mobin Sabbaghi Rostami, Mostafa Ayoubi Mobarhan, Evaluation of Huffman and Arithmetic Algorithms for Multimedia Compression Standards, http://arxiv.org/ftp/arxiv/papers/1109/1109.0216.pdf [12] Pushpa R. Suri and Madhu Goel, Ternary Tree and Memory-Efficient Huffman Decoding Algorithm, IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 1, ISSN (Online): 1694-0814, pp. 483-489, January 2011. [13] Pi-Chung Wang, Yuan-Rung Yang, Chun-Liang Lee, Hung-Yi Chang, A Memory-efcient Huffman Decoding Algorithm, Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA05) 1550-44 IEEE, 2005. [14] http://www.columbia.edu/~cs2035/courses/csor4231. F11/huff.pdf. [15] http://web.mit.edu/6.02/www/currentsemester/ handouts/L02_slides.pdf. [16] Luenberger, a book titled Information Science, 2006.
Page 78

Enhancement of Security and Embedding Capacity Through Huffman Coding in Steganography

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Enhancement of Security and Embedding Capacity Through Huffman Coding in Steganography

Enviado por

Direitos autorais:

Formatos disponíveis

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)