Você está na página 1de 66

Intro The issues in general Motivation Solution Experiments Tools eof()

Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools
Muhammad Najmi Ahmad Zabidi
International Islamic University Malaysia

MOSC 2012 Berjaya Times Square, Kuala Lumpur

9th July 2012

Muhammad Najmi Ahmad Zabidi

MOSC 2012

1/34

Intro The issues in general Motivation Solution Experiments Tools eof()

About
I am a research grad student at Universiti Teknologi

Malaysia, Skudai, Johor Bahru, Malaysia


My current employer is International Islamic University

Malaysia, Kuala Lumpur


Research area - malware detection, narrowing on

Windows executables
For past few years (since 2003), I am a Subversion(SVN)

committer for KDE localization project to Malay language (but now rarely commit.. need a new intern to replace :) )

Muhammad Najmi Ahmad Zabidi

MOSC 2012

2/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Computing world as we knew it

Interconnected machine Previously less connected, now socialized machines Brought real problems to the cyberworld

Muhammad Najmi Ahmad Zabidi

MOSC 2012

3/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Risks

Financial lost Company/government level espionage Privacy breach

Muhammad Najmi Ahmad Zabidi

MOSC 2012

4/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Types of adversaries

Spam Scam Phishing Malware, botnet, rookit etc Anything else?

Muhammad Najmi Ahmad Zabidi

MOSC 2012

5/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Spam

Muhammad Najmi Ahmad Zabidi

MOSC 2012

6/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Spam

Annoying

Muhammad Najmi Ahmad Zabidi

MOSC 2012

6/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Spam

Annoying Productivity wasted in unneccesary file deletion

Muhammad Najmi Ahmad Zabidi

MOSC 2012

6/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Spam

Annoying Productivity wasted in unneccesary file deletion Difficult to find important email - extreme case

Muhammad Najmi Ahmad Zabidi

MOSC 2012

6/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Scam

Muhammad Najmi Ahmad Zabidi

MOSC 2012

7/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Scam

Preying on naive victims

Muhammad Najmi Ahmad Zabidi

MOSC 2012

7/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Scam

Preying on naive victims Sounds to good to be true, but still some people believed

Muhammad Najmi Ahmad Zabidi

MOSC 2012

7/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Scam

Preying on naive victims Sounds to good to be true, but still some people believed Organized crime/syndicate... with mules cooperating

Muhammad Najmi Ahmad Zabidi

MOSC 2012

7/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Phishing

Muhammad Najmi Ahmad Zabidi

MOSC 2012

8/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Phishing
Almost similar with scam, but different tactic

Muhammad Najmi Ahmad Zabidi

MOSC 2012

8/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Phishing
Almost similar with scam, but different tactic More sophisticated, but does not need mule/physical

meetup

Muhammad Najmi Ahmad Zabidi

MOSC 2012

8/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Phishing
Almost similar with scam, but different tactic More sophisticated, but does not need mule/physical

meetup
Main purpose to gain important details - online banking

login name, password hence access to the victims account

Muhammad Najmi Ahmad Zabidi

MOSC 2012

8/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Phishing
Almost similar with scam, but different tactic More sophisticated, but does not need mule/physical

meetup
Main purpose to gain important details - online banking

login name, password hence access to the victims account


More secure to the criminal

Muhammad Najmi Ahmad Zabidi

MOSC 2012

8/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Malware

Muhammad Najmi Ahmad Zabidi

MOSC 2012

9/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Malware
Safely to say,covers

trojan,virus,dialers,rabbits,worms,rootkit(bundled nowadays)

Muhammad Najmi Ahmad Zabidi

MOSC 2012

9/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Malware
Safely to say,covers

trojan,virus,dialers,rabbits,worms,rootkit(bundled nowadays)
Already infecting computers since 1980s, threat is more

obvious when the Internet is coming in

Muhammad Najmi Ahmad Zabidi

MOSC 2012

9/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Malware
Safely to say,covers

trojan,virus,dialers,rabbits,worms,rootkit(bundled nowadays)
Already infecting computers since 1980s, threat is more

obvious when the Internet is coming in


Attacking any operating system, Linux, Windows, Mac...

even Android phones

Muhammad Najmi Ahmad Zabidi

MOSC 2012

9/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Problems with adversaries detection

Some manually crafted, some automated React relatively fast, difficult to trace Too many (for example, spam) hence too time consuming

for manual work

Muhammad Najmi Ahmad Zabidi

MOSC 2012

10/34

Intro The issues in general Motivation Solution Experiments Tools eof()

In house analysis

Given enough expertise, in house analysis could be useful Maintaining reputation, having own group of analysts to

handle incidents
Try minimize costs, use open source tools whenever

possible

Muhammad Najmi Ahmad Zabidi

MOSC 2012

11/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Machine Learning
Associated with the Artificial Intelligence Mimicking human (brain) learning Learns through experience Deals with known and unknown patterns Overlapping (or somehow originated) with Data Mining,

Pattern Recognition

Muhammad Najmi Ahmad Zabidi

MOSC 2012

12/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification Deals with known data

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification Deals with known data Supervised learning

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification Deals with known data Supervised learning Popular algorithms includes:

Random Forest Neural Networks k-Nearest Neighbor Decision Trees

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification Deals with known data Supervised learning Popular algorithms includes:

Random Forest Neural Networks k-Nearest Neighbor Decision Trees

Predictive [Tan et al., 2005]

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification Deals with known data Supervised learning Popular algorithms includes: Clustering

Random Forest Neural Networks k-Nearest Neighbor Decision Trees

Predictive [Tan et al., 2005]

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification Deals with known data Supervised learning Popular algorithms includes: Clustering Deals with unknown data

Random Forest Neural Networks k-Nearest Neighbor Decision Trees

Predictive [Tan et al., 2005]

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification Deals with known data Supervised learning Popular algorithms includes: Clustering Deals with unknown data Unsupervised learning

Random Forest Neural Networks k-Nearest Neighbor Decision Trees

Predictive [Tan et al., 2005]

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification Deals with known data Supervised learning Popular algorithms includes: Clustering Deals with unknown data Unsupervised learning Popular algorithms includes:

Random Forest Neural Networks k-Nearest Neighbor Decision Trees

K-means Fuzzy C Gaussian

Predictive [Tan et al., 2005]

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Table 1: Differences between clustering and classification


Classification Deals with known data Supervised learning Popular algorithms includes: Clustering Deals with unknown data Unsupervised learning Popular algorithms includes:

Random Forest Neural Networks k-Nearest Neighbor Decision Trees

K-means Fuzzy C Gaussian

Predictive [Tan et al., 2005]

Descriptive [Tan et al., 2005]

Muhammad Najmi Ahmad Zabidi

MOSC 2012

13/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

What to look?

We look for patterns In some case, have the spam,phishing mails corpus ready We call these patterns as features

Muhammad Najmi Ahmad Zabidi

MOSC 2012

14/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Spam/scam
The language that being used Perhaps words like You have won GBP100,000,000

notification through emails


Spam bombarded emails, some might be true businesses,

but irresistable to handle.


Scam, asking people to bank in money for untruthful

reasons

Muhammad Najmi Ahmad Zabidi

MOSC 2012

15/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Phishing mails

Look for URL Current effort for example by PhishTank is done by using

public submission and (I believe) manual verification

Muhammad Najmi Ahmad Zabidi

MOSC 2012

16/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Malware
Researchers tend to look on the Application

Programming Interface (API) calls, some on the opcodes


Analysis done either by using static or dynamic analysis

Muhammad Najmi Ahmad Zabidi

MOSC 2012

17/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Categories

Some example

Figure 1: Automated classification proposed by [Rieck et al., 2009]

Muhammad Najmi Ahmad Zabidi

MOSC 2012

18/34

Intro The issues in general Motivation Solution Experiments Tools eof()

The datasets
Spam email research is already quite sometimes

compared to the other (phishing)


Sample dataset: http://csmining.org/index.php/spam-email-datasets-.html http://archive.ics.uci.edu/ml/datasets/Spambase

Scam email somehow very much associated with spam,

since it is unwanted email. Might as well being categorized as sub-spam Phishing emails samples:
Sample dataset: http://phishtank.com

Muhammad Najmi Ahmad Zabidi

MOSC 2012

19/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Feature Selection/Extraction

When analyzing, were interested with features What kind of feature? Important keywords, strong features Non important features will be phased out.. unneccesary Some features might be redundant

Muhammad Najmi Ahmad Zabidi

MOSC 2012

20/34

Intro The issues in general Motivation Solution Experiments Tools eof()

There are algorithms which meant for this: Information Gain Support Vector Machine (SVM) other... some maybe hybrid algoritms(combining several algorithms altogether) - also known as ensemble

Muhammad Najmi Ahmad Zabidi

MOSC 2012

21/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

List of tools

Muhammad Najmi Ahmad Zabidi

MOSC 2012

22/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

List of tools
Weka

Muhammad Najmi Ahmad Zabidi

MOSC 2012

22/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

List of tools
Weka R language

Muhammad Najmi Ahmad Zabidi

MOSC 2012

22/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

List of tools
Weka R language Octave (as replacement for Matlab)

Muhammad Najmi Ahmad Zabidi

MOSC 2012

22/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

List of tools
Weka R language Octave (as replacement for Matlab) Python Sci-py with Matplotlib

Muhammad Najmi Ahmad Zabidi

MOSC 2012

22/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

Figure 2: Weka

Muhammad Najmi Ahmad Zabidi

MOSC 2012

23/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

Weka

Obtained data are in numbers and visualizations Need to do some reading on how to interpret them Test with different algorithms to get the best results

Muhammad Najmi Ahmad Zabidi

MOSC 2012

24/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

R language

No merely a tool, but a language by itself Usually being used by data analysts

Muhammad Najmi Ahmad Zabidi

MOSC 2012

25/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

Figure 3: These books use R language for their analysis purposes

Muhammad Najmi Ahmad Zabidi

MOSC 2012

26/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

Octave

Octave is an open source alternative for Matlab (MATrix

LABoratory)
Works almost similar like Matlab does

Muhammad Najmi Ahmad Zabidi

MOSC 2012

27/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

Figure 4: Octave also has GUI, QtOctave - discontinued


Muhammad Najmi Ahmad Zabidi MOSC 2012 28/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

Python Scipy
#!/usr/bin/env python """ Example: simple line plot. Show how to make and save a simple line plot with labels, title and grid """ import numpy import pylab t = numpy.arange(0.0, 1.0+0.01, 0.01) s = numpy.cos(2*2*numpy.pi*t) pylab.plot(t, s) pylab.xlabel(time (s)) pylab.ylabel(voltage (mV)) pylab.title(About as simple as it gets,folks) pylab.grid(True) pylab.savefig(simple_plot) pylab.show()

Muhammad Najmi Ahmad Zabidi

MOSC 2012

29/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Weka R language Octave Python Scipy

Muhammad Najmi Ahmad Zabidi

MOSC 2012

30/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Flowchart Conclusion

The flow
Feature Selection Feature Categorization

Weka, Octave, R

Clustering

Weka, Octave, R

Classification

scipy, octave, R

Visualization

scipy, octave, R

Muhammad Najmi Ahmad Zabidi

MOSC 2012

31/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Flowchart Conclusion

Conclusion

Muhammad Najmi Ahmad Zabidi

MOSC 2012

32/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Flowchart Conclusion

Conclusion
Malicious/unwanted threats from spam, scam, phishing

and malware is not easy

Muhammad Najmi Ahmad Zabidi

MOSC 2012

32/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Flowchart Conclusion

Conclusion
Malicious/unwanted threats from spam, scam, phishing

and malware is not easy


Perhaps one sample could be done by hands, but having

thousands per day is tedious

Muhammad Najmi Ahmad Zabidi

MOSC 2012

32/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Flowchart Conclusion

Conclusion
Malicious/unwanted threats from spam, scam, phishing

and malware is not easy


Perhaps one sample could be done by hands, but having

thousands per day is tedious


Machine learning assist in automation

Muhammad Najmi Ahmad Zabidi

MOSC 2012

32/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Flowchart Conclusion

Conclusion
Malicious/unwanted threats from spam, scam, phishing

and malware is not easy


Perhaps one sample could be done by hands, but having

thousands per day is tedious


Machine learning assist in automation Open source provides alternative (free as in minimal cost)

for the analysis

Muhammad Najmi Ahmad Zabidi

MOSC 2012

32/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Flowchart Conclusion

Conclusion
Malicious/unwanted threats from spam, scam, phishing

and malware is not easy


Perhaps one sample could be done by hands, but having

thousands per day is tedious


Machine learning assist in automation Open source provides alternative (free as in minimal cost)

for the analysis


In house analysis provides security in an

organization/enterprise reputation
Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Flowchart Conclusion

Get in touch!

http://mypacketstream.blogspot.com
A This slides was created with LTEX Beamer

najmi.zabidi @ gmail.com

Muhammad Najmi Ahmad Zabidi

MOSC 2012

33/34

Intro The issues in general Motivation Solution Experiments Tools eof()

Flowchart Conclusion

Bibliography

Rieck, K., Trinius, P., Willems, C., and Holz, T. (2009). Automatic analysis of malware behavior using machine learning. TU, Professoren der Fak. IV. Tan, P.-N., Steinbach, M., and Kumar, V. (2005). Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.

Muhammad Najmi Ahmad Zabidi

MOSC 2012

34/34