Você está na página 1de 28

Detection of Fake Images

using Metadata Analysis and


Error Level Analysis

A Report Submitted
in Partial Fulfillment of the Requirements
for the Degree of
Bachelor of Technology
in
Information Technology

by
Utkarsh Mani Tripathi, Vrinda Agarwal, Ankit Garg,
Vivek Kumar Jha and Kushagra Shukla

to the
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY
ALLAHABAD
November, 2018
UNDERTAKING

I declare that the work presented in this


report titled “Detection of Fake Images
using Metadata Analysis and
Error Level Analysis”, submitted to the Computer Science
and Engineering Department (CSED), Motilal Nehru
National Institute of Technology, Allahabad, for the
award of the Bachelor of Technology degree in
Information Technology , is my original work. I have
not plagiarized or submitted the same work for the award of
any other degree. In case this undertaking is found incorrect, I
accept that my degree may be unconditionally withdrawn.

November, 2018
Allahabad
(Utkarsh Mani Tripathi,
Vrinda Agarwal,
Ankit Garg,
Vivek Kumar Jha and
Kushagra Shukla)

ii
CERTIFICATE

Certified that the work contained in the


report titled “Detection of Fake Images
using Metadata Analysis and
Error Level Analysis”, by Utkarsh Mani Tripathi, Vrinda
Agarwal, Ankit Garg, Vivek Kumar Jha and Kushagra Shukla
has been carried out under my supervision and that this work
has not been submitted elsewhere for a degree.

(Dr. Rajitha B)
Computer Science and Engineering Dept.
M.N.N.I.T. Allahabad

November, 2018

iii
Preface

The main objective of this project is detection of digitally-altered fake images using
Metadata Analysis and Error-Level Analysis. The problem with existing Fake
Image Detection System is that they can be used to detect only specific tampering
methods like Image Splicing, Coloring etc. We approached the problem using a
combined approach of Metadata Analysis, followed by using Error Level Analysis
(Machine Learning), wherein we trained a neural network to detect almost all kinds
of tampering on images.

iv
Acknowledgements

We would like to take this opportunity to express our deep sense of gratitude to
all who helped us directly or indirectly for our project work. Firstly, we would
like to thank our supervisor, Dr. Rajitha B, for being such a supportive mentor.
Her advice, encouragement and critics were the primary sources of inspiration and
innovation for us, and were the actual reason behind the successful completion of
this project. In addition, she was always accessible and willing to help us at each
step of our project development. It has truly been a privilege working under her
during this semester.
Also, we wish to express our sincere gratitude to Prof. Rajeev Tripathi,
Director, MNNIT Allahabad and Prof. A.K Singh, Head of the Department,
Computer Science and Engineering Department (CSED), for providing us with all
the needed resources and facilities during the process of completion of this project
work. We would also like to thank our friends for their constant motivation, advice
and kind support. Finally, we express our indebtedness to all who have directly or
indirectly contributed to the successful completion of our report.

v
Contents

Preface iv

Acknowledgements v

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Related Work 4

3 Proposed Work 7
3.1 Metadata Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Error Level Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Modal Learning - User Feedback . . . . . . . . . . . . . . . . . . . . . 10

4 Software and System Requirements 11


4.1 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Experimental Setup and Results Analysis 13

6 Conclusion and Future Work 20

References 21

vi
Chapter 1

Introduction

The rapid proliferation of image editing technologies has increased both the ease with
which images can be manipulated and the difficulty in distinguishing
between altered and natural images. Although image editing techniques can provide
significant aesthetic or entertainment value, they may also be used with
malicious intent. Image manipulation includes transforming or altering an image
using several methods and image editing tools to achieve a desired result. There are
many sophisticated tools available to alter an image without leaving any traceable
proof. Most of the time it is hard to discern the forgery thorough naked eyes.
In the present time, an image is a viable source of information and it plays a vital
role in the courtroom witnesses, insurance claims, scientific scams,
medical field etc. Hence, to ensure its authenticity, researchers have developed
several methods to authenticate digital images[1]. There are two methods to au-
thenticate an image: Intrusive (active) and Non-intrusive (passive) methods. In
intrusive method, prior information is embedded into the image, like watermarking
or digital signature. At authentication stage, it is being tested to ensure the au-
thenticity of the image. If there is any tampering done in the image, the extracted
watermark or digital signature is not same. By this method,the authenticity of
an image can be ensured. But in non-intrusive method, there is no such kind of
information is embedded into the image, we have to rely on the intrinsic proper-
ties of the image. It is also said as blind technique to detect the image forgery[2].

1
Non-intrusive methods are of two kinds:
1. Copy-move Forgery Detection and
2. Splicing Detection.
In copy-move forgery, one region is copied and pasted over other region in the
same image to conceal some important information.But in image splicing, one region
is cut and pasted over other region in different image to create a new image. Based
on the above classification, the basic principal of forgery detection is to find the
region which have similar features in copy-move or completely different region in
spliced image.
All the forgery detection techniques follow the same pipeline i.e. feature ex-
traction, matching and post-processing. Since last decades, image feature detectors
are the popular tools in computer vision field and being implemented in various
applications like image representation, object recognition and matching, image clas-
sification and retrieval, 3D In general, the various image editing approaches employ
different mechanisms. Splicing and copymove techniques usually manipulate part of
the image and perform object-level changes.

1.1 Motivation
In this technological era a huge number of people have become victims of image
forgery. A lot of people use technology to manipulate images and use it as evidences
to mislead the court. So to put an end to this, all the images that are shared
through social media should be categorized as real or fake accurately. Social media
is a great platform to socialize, share and spread knowledge but if caution is not
exercised, it can mislead people and even cause havoc due to unintentional false
propaganda. While manipulation of most of the photoshopped images is clearly
evident due to pixelization & shoddy jobs by novices, some of them indeed appear
genuine. Especially in the political arena, manipulated images can make or break a
politicians credibility. Current forensic techniques require an expert to analyze the
credibility of an image. We implemented a system that can determine whether an
image is fake or not.

2
1.2 Methodology
The objective of this project is to identify digitally altered images. The problem with
existing fake image detection system is that they can be used detect only specific
tampering methods like splicing, coloring etc. We approached the problem using
machine learning and neural network to detect almost all kinds of tampering on
images.
Using latest image editing softwares, it is possible to make alterations on image
which are too difficult for human eye to detect. Even with a complex neural network,
it is not possible to determine whether an image is fake or not without identifying a
common factor across almost all fake images. So, instead of giving direct raw pixels
to the neural network, we gave error level analyzed image.

This project provides two level analysis for the image.

• At first level, it checks the Image Metadata. Image Metadata is not that
much reliable since it can be altered using simple programs. But most of the
images we come across will have non-altered metadata which helps to identify
the alterations. For example, if an image is edited with Adobe Photoshop, the
metadata will contain even the version of the Adobe Photoshop used.

• In the second level, the image is converted into error level analyzed format and
will be resized to 100px x 100px image. Then these 10,000 pixels with RGB
values (30,000 inputs) is given in to the input layer of Multilayer perceptron
network.Output layer contain two neurons. One for fake image and one for
real image. Depending upon the value of these neuron outputs along with
metadata analyzer output, we determine whether the image is fake or not and
how much chance is there for the given image to be tampered.

3
Chapter 2

Related Work

Pun et al. [3] proposed a method which is based on noise discrepancies between the
original image and spliced image. Initially, the noise level function is calculated and
analyzed at pixel level on various scales. The region which is not under the noise
level, termed as suspicious area and inconsistent level of noise indicate the pres-
ence of tampering in spliced segments. This technique performs better for multiple
spliced object detection.

F.Hakimi [4] proposed a method based on improved local binary pattern (LBP)
and discrete cosine transform (DCT). The chrominance component of the image is
divided into nonoverlapping blocks. Then improved LBP is calculated for all blocks
and using 2D-DCT, it is transformed into frequency domain. Further the frequency
coefficients are evaluated to find the standard deviation for all blocks which are used
as features for classification using k-nearest neighbour.

Shi et al. [5] proposed a natural image model, which educe statistical moments
of characteristics function by treating the neighbouring differences of BDCT of an
image as 1-D signal and the dependencies between neighbouring nodes along certain
directions have sculpted as Markov model. SVM classifier considered these features
as discriminative features for classification.

4
Wang et al. [6] proposed a method in which gray level cooccurrence matrix
(GLCM) is considered along certain direction (horizontal, vertical, main and minor
diagonal) to extract edge images and educed edge images serve as discriminative
features for classification.

Xuefang Li et al. [7] proposed a method in which Hilbert-Huang transform and


moment of characteristic function of wavelet transform are used for forgery detec-
tion. SVM is used as a classifier for spliced image classification in their method and
achieved an accuracy of 85.86%.

Zhao et al. [8] proposed a method in which chroma channel and gray level run-
length features are used. To extract the features, gray level run-length is used along
four different directions from de-correlated chroma channel. Then SVM is used for
classification based on extracted features.

Wu and Fang [9] proposed a method to detect image splicing using illuminant
color inconsistency. Initially, image is divided into overlapping blocks and then
using classifier, estimated illuminant is selected for further process. Further, the
difference between estimated and reference illuminant color is calculated. If the
difference value is greater than a defined threshold, then the block is considered as
spliced block.

He et al. [10] proposed a method in which Markov model is introduced in DCT as


well as in DWT domain. The difference co-efficient array and transition probability
matrix are modeled as feature vector and cross domain Markov feature are consid-
ered as discriminative feature for SVM classifier. However, the scheme required up
to 7290 features.

B. Su et al. [11] proposed a method in which an enhanced Markov state selection


method is used. In this method author considered some already proposed function

5
model and map the large number of coefficient extracted from transform domain to
specific states. This method sacrifices the detection performance when the number
of features are reduced.

Overall, there is no technique developed till date, that can detect fake images
upto absolute accuracy, hence the researchers are now focussing on using a com-
bined algorithm of multiple techniques clubbed together, to tackle the problem.
In this report, the problem has been approached in a similar manner, wherein a
combined approach of Metadata Analysis and Error Level Analysis has been taken
to detect whether an image under test can be fake or not.

6
Chapter 3

Proposed Work

In this report, a combined approach of Metadata Analysis and Error Level Analysis
has been taken to detect whether an image under test can be fake or not.
Fake image detection system discussed here describes two different levels of analysis
for the image. At first level, it checks the image metadata. Image metadata is not
that much reliable since it can be altered using simple programs. But most of the
images we come across will have non-altered metadata which helps to identify the
alterations. For example, if an image is edited with Adobe Photoshop, the metadata
will contain even the version of the Adobe Photoshop used.

3.1 Metadata Analysis


The entire system is developed using java programming language. For extracting
metadata of images, metadata - extractor library is used. Once an image is selected
for processing, it is tunneled into 2 separate stages. First stage is metadata analysis.
After extracting metadata, the metadata text is fed into metadata analysis module.
Metadata analyzer is basically a tag searching algorithm. If keywords like
“Photoshop”, “Gimp”, “Adobe”, “Corel” etc. are found in the text and then
the possibility of being tampered is increased. Two separate variables are main-
tained which are called fakeness and realness. Each variable represents the weight
of being real or fake image. Once a tag is taken, it is analyzed and corresponding
variable is incremented by a certain predefined weight.

7
Following flowcharts for the proposed system explain the overall algorithm used
to tackle the problem at hand- detection of fake images.

Figure 1: Flow Chart : Fake Image Detection Using Machine Learning

Figure 2: Flow Chart : Neural Network Training Procedure

8
3.2 Error Level Analysis
Error Level Analysis (ELA) permits identifying areas within an image that are at
different compression levels. With JPEG images, the entire picture should be at
roughly the same level. If a section of the image is at a significantly different error
level, then it likely indicates a digital modification. JPEG is a lossy format, but
the amount of error introduced by each re-save is not line. Any modification to the
picture will alter the image such that stable areas (areas with no additional error)
become unstable.
The modified picture is based on the first 75% resave. The 95% ELA identifies
the changes since they are areas that are no longer at their minimal error level.
Additional areas of the picture will show slightly more volatility because Photoshop
merged information from multiple layers, effectively modifying many of the pixels.
Error level analysis is done with the help of ImageJ library. ImageJ provides
option to save image in JPEG format with certain percentage of compression. The
system first saves an image at 100% quality. Then the same image is converted
into 90% quality image using ImageJ. The difference between these two is found out
though difference method. The resulting image is the required ELA image of the
input image. This image is saved as a buffered image and sent to the neural network
for further processing.

3.3 Machine Learning


Machine learning is implemented using Neuroph library for java. Neuroph is selected
because of the simplicity and easiness to implement neural networks. We have
implemented a multilayer perceptron network with momentum back propagation
learning rule.

• A multilayer perceptron neural network is used having one input layer, 3 hid-
den layers and 1 output layer.

• Once the image is selected for evaluation, it is converted to ELA representation


from Compression and Error Level Analysis stage. 100%, 90% images are used

9
for the construction of ELA image.

• Once ELA is calculated, the image is preprocessed to convert into 100x100px


width and height.

• After preprocessing,the image is serialized in to an array. The array contains


30,000 integer values representing 10,000 pixels. Since each pixel has red, green
and blue components, 10,000 pixels will have 30,000 values.

• During training, the array is given as input to the multilayer perceptron net-
work and output neurons also set. The MLP is a fully connected neural net-
work. There are 2 output neurons. First neuron is for representing fake and
the second one for real image. If the given image is fake one, then the fake
neuron is set to one and real is set to zero. Else fake is set to zero and real
set to one.We have used momentum back propagation learning rule adjust
the neuron connection weights. It is a supervised learning rule that tries to
minimize the error function.

• During testing, the image array is fed into the input neurons and values of
output neurons are taken. We have used sigmoid activation function.

3.4 Modal Learning - User Feedback


Based on the combined results of Metadata Analysis and Error-Level Analysis, the
neural network performs the test on images and classifies them into real or fake
images. Once this is done, the user is provided with an option to train the neural
network better and give the actual vote of whether the detection done was correct
or not. This review helps in training the neural network better.
Based on the user feedback, the neural network runs its iterations again, and re-
learns for the given images. Upon exit, the new trainings done are asked to be saved
in the neural network .nnet file for future references.

10
Chapter 4

Software and System


Requirements

4.1 Software Requirements


Java was used as the primary platform to develop this Fake Image Detector appli-
cation. This is because Java offers direct libraries for neural network development
and the Metadata-extractor.jar along with it’s inbuilt abstract classes.
Another advantage with using Java over Python or MATLAB is that Java offers
about 8-9 times faster Image Processing solutions, because of the use of Java Byte
Code, making it both OS and Platform Independant.
Following are the names of the tools that were used in the development of this appli-
cation.

Neuroph Studio
Neuroph studio is an open source Java neural network framework that helps to easily
build and use neural networks. It also provides a direct interface for loading images.

Metadata-extractor
Metadata-extractor is an open source java library used to extract metadata from im-
ages.

11
JavaFX
JavaFX is used to implement modern user interface for the application.

4.2 System Requirements


For running the developed Java application, all that is needed is a system with
256MB RAM or above, 64MB Memory or above and a Windows or Linux-based
Operating System, with the above mentioned tools installed.

12
Chapter 5

Experimental Setup and Results


Analysis

For this project, CASIA v1.0[12] image set has been used for the purpose of
training the neural network to distinguish between Real and Fake Images.
For training purpose, in the CASIA v1.0 image set, a total of 800 real (unal-
tered) images were used, and a total of 921 fake images were used. (Fake Images
were a mixed set of Copy-Move forged, Spliced images and 20 Fake images
with Altered Metadata).

Upon training the neural network with the CASIA v1.0 images, we tested the
system manually.

In case of the 57 REAL (UnTampered) Images (taken from Internet), we found


that all 57 images were correctly identified as REAL images by both the Metadata
Analysis and the Error-Level Analysis Algorithms.

In case of the 50 Tampered (FAKE) images with UN-TAMPERED Metadata, all


the 50 images were correctly identified as fake in the Metadata Analysis. However,
Error-Level Analysis was unable to identify 8 fake images correctly as FAKE.

Upon testing the system with Fake Images with altered Metadata, NONE of the

13
10 test images could now be correctly identified as fake images in Metadata Analysis
because their Metadata has been externally altered. However, Error Level Analysis
was successfully able to identify 7 out of these 10 fake Images.

Following is the Accuracy Analysis of our Application as per our testings.

Table 1: Accuracy Analysis of the proposed Fake Image Detector system

Clearly evident from the above Accuracy Analysis,

1. Metadata Analysis is unreliable because Metadata of images can be


externally altered.

14
2. Accuracy of Error Level Analysis is satisfactorily high in both the fake and
real images images and also in metadata-altered fake images.

3. A combination of Metadata Analysis and Error-Level Analysis can be a pow-


erful and efficient tool in determining almost perfectly whether a given image is fake
or not.

Here is shown how Error Level Analysis was able to locate the location of tam-
pering in a fake image we tested.

Figure 3: A Fake Image, being tested against Error Level Analysis

15
(54).png

Figure 4: Metadata Analysis running on an image under test

16
(52).png

Figure 5: Real Image - as found by Error Level Analysis

17
(55).png

Figure 6: Fake Image - as found in Metadata Analysis

18
(56).png

Figure 7: Fake Image - as found in Error Level Analysis

19
Chapter 6

Conclusion and Future Work

The trained neural network was able to recognize the image as fake or real at a
maximum success rate of 84.67% using Error-Level Analysis. Also, we achieved a
better result in conclusive determination of fakeness by using Metadata Analysis as
an added component.

The use of this application in mobile platforms will greatly reduce the spreading
of fake images through social media. This project can also be used as a false proof
technique in digital authentication, court evidence evaluation etc.

By combining the results of Metadata Analysis (66.67%) and Neural Network


ELA output(84.67%) a reliable fake image detection program is developed and
tested.

For future work, the system can be trained further to classify the fake images
into two categories :
1. Copy-Move Forged Images,
2. Different-images Spliced Images.

20
References

[1] Muhammed Afsal Villan, Kuncheria Kuruvilla, Johns Paul, Prof. Eldo P Elias,
“Fake Image Detection Using Machine Learning”, IRACST - International
Journal of Computer Science and Information Technology Security (IJCSITS),
ISSN: 2249-9555 Vol.7, No.2, Mar-April 2017

[2] Prakash, C.S., Kumar, A., Maheshkar, S. and Maheshkar, V., 2018. An inte-
grated method of copy-move and splicing for image forgery detection. Multi-
media Tools and Applications, pp.1-25.

[3] Pun, C.M., Liu, B. and Yuan, X.C., 2016. Multi-scale noise estimation for
image splicing forgery detection. Journal of visual communication and image
representation, 38, pp.195-206.

[4] Hakimi, F. and Zanjan, I.M.H., 2015. Image-splicing forgery detection based
on improved lbp and k-nearest neighbors algorithm. Electronics Information
Planning, 3(0304-9876), p.7.

[5] Shi Y, Chen C, Chen W (2007) A natural image model approach to splicing
detection. In: Proceedings of the 9th workshop on Multimedia security, pp
5162. ACM

[6] Wang W, Dong J, Tan T (2009) Effective image splicing detection based on
image chroma. Image Processing (ICIP), 2009 16th IEEE International Con-
ference on, pp 12571260. IEEE

[7] Li X, Jing T, Li X (2010) Image splicing detection based on moment fea-


tures and hilbert-huang transform. In: 2010 IEEE international conference on
information theory and information security (ICITIS), pp 11271130. IEEE

21
[8] Zhao, X., Li, J., Li, S. and Wang, S., 2010, October. Detecting digital image
splicing in chroma spaces. In International Workshop on Digital Watermarking
(pp. 12-22). Springer, Berlin, Heidelberg.

[9] Wu X, Fang Z (2011) Image splicing detection using illuminant color incon-
sistency. In: 2011 3rd international conference on multimedia information
networking and security (MINES), pp 600603. IEEE

[10] He, Z., Lu, W., Sun, W. and Huang, J., 2012. Digital image splicing detection
based on Markov features in DCT and DWT domain. Pattern Recognition,
45(12), pp.4292-4299.

[11] Su, B., Yuan, Q., Wang, S., Zhao, C. and Li, S., 2014. Enhanced state selection
Markov model for image splicing detection. EURASIP Journal on wireless
communications and networking, 2014(1), p.7.

[12] CASIA v1.0 Dataset :


http://forensics.idealtest.org/casiav1/

22

Você também pode gostar