Você está na página 1de 53

Big Data Analytics for Security Intelligence

D.RamBabu,Asst.Professor, S.Aruna,Associate Professor, Dept of Information Technology.

Vasavi College of Engineering, Hyd- 500031.

What is big data?


Every

day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big

data.

Huge amount of data


There are huge volumes of data in the world:
+ From the beginning of recorded time until 2003,
+

We created 5 billion gigabytes (exabytes) of data.

+ In 2011, the same amount was created every two days

+ In 2013, the same amount of data is created every 10

minutes.

Big data spans three dimensions: Volume, Velocity and Variety


Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabyteseven petabytesof information. Turn 12 terabytes of Tweets created each day into improved product sentiment analysis
Convert 350 billion annual meter readings to better predict power consumption

Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. Scrutinize 5 million trade events created each day to identify potential fraud
Analyze 500 million daily call detail records in real-time to predict customer churn

faster The latest I have heard is 10 nano seconds delay is too much. Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. Monitor 100s of live video feeds from surveillance cameras to target points of interest Exploit the 80% data growth in images, video and documents to improve customer satisfaction

Finally.
`Big- Data is similar to Small-data but bigger
.. But having data bigger it requires different approaches:

Techniques, tools, architecture


with an aim to solve new problems

Or old problems in a better way and Providing security.

Whom does it matter


Research Community
Business Community - New tools, new capabilities, new infrastructure,

new business models etc.,

On sectors

Financial Services..

Where Is This Big Data Coming From ?


12+ TBs of tweet data every day 30 billion RFID tags today (1.3B in 2005) 4.6 billion
camera phones world wide

data every day

? TBs of

100s of millions of GPS enabled


devices sold annually

25+ TBs of log data every day

2+ billion
76 million smart meters in 2009 200M by 2014
people on the Web by end 2011

With Big Data, Weve Moved into a New Era of Analytics

12+ terabytes
of Tweets create daily.
Volume Velocity

5+ million
trade events per second.

100s
of different types of data.

Variety

Veracity

Only

1 in 3

decision makers trust their information.

Hype, Reality or ?

Basics
Big Data refers to the vast quantities of data that businesses and governments gather This data is believed to contain useful, actionable intelligence that could lead to
Process efficiencies
Lower costs, Higher profits, Identification of terrorism threats/plans

What is needed is the will and expertise to perform the relevant analysis.
10

How Big is Big?


It depends on how quickly you can access and process data (with normal database management tools)
For a small company, hundreds of gigabytes could be big. For a larger company, hundreds of terabytes
1 terabyte = 1000 gigabytes 1 petabyte = 1000 terabytes

1 exabyte = 1000 petabytes Zettabyte, Yottabyte

11

Size Contexts
Some areas of science generate huge amounts of data:
Meteorology (weather forecasting) & Remote Sensing Genomics (genome sequencing) Physics, e.g. CERN 150 million sensors each deliver data 40 million times per second Working with only 0.001% of the data collected, still 25 petabytes a year is collected If all data was used, it would be 500 exabytes a day 200 times more than all other global data sources combined Social data, RFID data, Surveillance NSA & GCHQ

12

The History
Big Data is not a new topic
Data has been getting bigger continually ever since the

first byte was created It is related to storage capacity and processing power which also keep growing continually

Over the last 25 years, many governments have attempted to consolidate data holdings into single

databases controlled by single parties


National ID Schemes National Health Records Management
13

Corporate Examples
Amazon handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data Facebook handles 50 billion photos. TaoBao & Alibaba again, billions of transactions Consumer profile databases, Loyalty Cards, Octopus
14

Ford
http://www.datanami.com/datanami/2013-03-

16/how_ford_is_putting_hadoop_pedal_to_the_metal.html

Fords modern hybrid Fusion model generates up to 25 gigabytes of data per hour Data that is a potential goldmine for Ford, as long as it can find the right analytical tools for the job. The data can be used to understand driving behaviors and reduce accidents, understand wear and tear identify issues that lower maintenance costs, avoid collisions But who should own the data? Ford? The car owner?

Needles & Haystacks


The volume of data is huge,

beyond imagination, and the consultants and software firms want us to believe that somewhere, if you can find them, there may be some needles pieces of actionable intelligence

16

Who is Pushing Big Data?


IBM! Because they want to sell you their software that (they claim) will help you to analyse the data and find the needles Consultants stand to make millions, by panicking their

clients into spending on software solutions Globally, this is a US$100 billion industry, growing 10% a year

17

Is Everyone Happy?
The consultants suggest not. Accenture: 22% of companies are very satisfied 35% are quite satisfied 34% are dissatisfied 39% say that they have data that is relevant to their business strategy Big data can be useful if you know what to look for

and how to get that intelligence to the people who can use it

18

Consultant Perspectives
Companies have lots of data, but most organisations

measure too many things that dont matter and dont put sufficient focus onto the things that do (Accenture). Companies are buried in information and are struggling to use it (McKinsey) The more data they have, the less they seem to know!

19

Then What Should the Companies Do?


Spend more money (say the consultants) a large investment in new data capabilities

McKinsey

embed analytics into business processes Accenture

Alternatively Go and ask people what they think is happening! Ask your lost customers why they got lost!

A survey or big data analytics wont tell you why.

20

Big Data and Intelligence


One of the highest impact news stories since June 2013

has concerned the secret surveillance activities of the NSA and GCHQ agencies as revealed by Edward Snowden These surveillance activities are fundamentally about big data and analytics, just as they are also about privacy and security, espionage and politics

21

Selected Events
Publication of a top-secret court order against Verizon mandating it to hand over the call records

of all its customers

http://www.theguardian.com/world/2013/jul/19/nsa-extendedverizon-trawl-through-court-order

Orders for all other telecoms firms also existed

Large-scale collection of data without individual warrants


Prism http://en.wikipedia.org/wiki/PRISM_(surveillance_program)

22

Prism
A system that gives the NSA access to the personal

information of non-US people from US Internet companies


Apple, Facebook, Google, Microsoft, Skype, Yahoo,

These companies always claimed that they protected

individual privacy, but it seems that this was not the case However, they were legally required to say nothing the court orders prohibited them saying anything about their data sharing with the NSA Data obtained by cable tapping
Metadata & content from 4 US telecoms providers cables
23

Facebook
During Jan-June 2013, governments requested info on

38,000 Facebook users


11,000 + from the US (79% compliance) 4000+ from India (50% compliance) 170 from Turkey (47% compliance) 11 from Egypt (0% compliance) http://www.theguardian.com/technology/2013/aug/27/f

acebook-government-user-requests

24

XKeyscore
This is the data retrieval system used to collect,

process and search the data

http://en.wikipedia.org/wiki/XKeyscore

It allows an NSA analyst to query nearly everything a

typical user does on the Internet in near-real time, including:


Email content Websites visited and searches Metadata

In theory these systems were designed to analyse data

about foreigners, but many Americans were also included in the databases

25

(GCHQ The Government Communications Headquarters ) is a British intelligence agency


In 2009, GCHQ spied on foreign politicians visiting

the UK for a G20 summit


Eavesdropping phonecalls, emails Monitoring computers Installing keyloggers and then tracking activities post-

summit Turkish Finance Minister (Simsek) Russian leader (Medvedev)

Purpose Economic/Political Intelligence

26

Tempora
Much of the data is harvested from Internet cables

that enter the UK (GBs-TBs per second)


300 GCHQ and 250 NSA analysts are involved

Telephone calls, Email messages, Facebook entries, Personal Internet history, IM chats, pwds,

Cooperation with private telecoms companies Data held for 3 days, metadata for 30

http://en.wikipedia.org/wiki/Tempora http://www.theguardian.com/uk/2013/jun/21/gchq-cablessecret-world-communications-nsa

27

Bullrun
NSA and GCHQ spend millions developing

programmes that can break Internet security (cryptography) protocols like https, ssl, etc. They also work directly with the telecom providers to ensure that they have backdoors that help them to access data that clients think is private/secret There are no Secrets!
http://www.theguardian.com/world/2013/sep/05/nsa-gchq-encryption-codes-

security

28

Collusion or Legal Obligation?


One defence offered by the private companies that

hold the data (whether in databases or as ISPs) is that they are required to obey the law of the countries in which they operate
They have no choice they must hand over the data, or

cooperate with the security agencies Also, they cannot reveal that they are cooperating they are gagged from revealing the existence of the Prism/Tempora/Bullrun systems

29

Payouts
GCHQ and NSA are working with each other, sharing

each others data NSA subsidizes GCHQs costs @ GBP millions annually

http://www.theguardian.com/uk-news/2013/aug/01/nsa-paidgchq-spying-edward-snowden

NSA benefits by GCHQ operating under less strict

operating & oversight rules NSA expects returns reports, intelligence.

30

Problems
Big data is HUGE there is simply too much data to

collect and analyse


GCHQ may collect up to 20% of the actual data flow

Big data is getting bigger Cables that carry hundreds of GB/second make that task harder still As always, 99.999% of the data is not useful. Can you find the 0.001% that might be?

31

Reactions
There have been attempts to stop media organizations

from reporting on the surveillance programmes Computers owned by the Guardian newspaper were physically destroyed in an attempt to remove the data & prevent further publication
Additional copies are held in Brazil and the US http://www.wired.com/threatlevel/2013/08/guardian-

snowden-files-destroyed/

32

Implications for Individuals


Is your data being harvested? It seems likely. Are your private communications, including online

purchases, secure?
Not very.

Are you protected by data privacy laws? Not against governments. Perhaps against private companies.

http://www.pcpd.org.hk/

33

Questions
What kind of data is being collected? Where, By Who, For What Purposes??? Can we see/find (some of) the data anywhere? Are you personally at risk?

That depends on who you are, what you do, who you talk to and what about. Is there anything we can do as individuals, as decision makers, as companies?

Should we be concerned?

http://www.theguardian.com/world/2013/sep/05/nsa-how-to-remain-secure-surveillance

Or is it more sensible just to get on with our lives?

Do some Internet research now and try to answer

some of these questions.


34

Welcome to a Not So Friendly CyberBiggest World Bank Heist in History Nets $45Million
All without setting foot in a Bank
CYBER ESPIONAGE VIA SOCIAL NETWORKING SITES TARGET: US DOD OFFICIALS

Hidden Malware Steals 3000 Confidential Documents


Japanese Ministry

Traditional Approach to Security Predicated on a Defensive Mindset

Playing Defense
Assumes explicit organizational perimeter Optimized for combating external threats Presumes standardization mitigates risk Dependent on general awareness of attack methodologies Requires monitoring and control of traffic flows
Origins of Security Intelligence

Layered Defenses Essential for Good Security Hygiene and Addressing Traditional Security Threatsbut attackers adapting too

Business Change is ComingIf Not Already Here


Enterprises are Undergoing Dynamic Transformations

The Organizations Cyber Perimeter is Being BlurredIt can no longer be assumed

Evolving Attack TacticsFocus on Breaching Defenses

A Look at the Emerging Threat Landscape


APTs
Concealed, Motivated, Opportunistic Targeted, Persistent, Clandestine

Fraud

Insider Threat
Situational, Subversive, Unsanctioned

Hacktivism
Topical, Disruptive, Public

Cyber Attack
Focused, Well-Funded, Scalable

Questions CISO Want to be Able to Answer

Incorporating a More Proactive Mindset to Enterprise Security


Audit, Patch & Block
Think like a defender, defense-in-depth mindset

Detect, Analyze & Remediate


Think like an attacker, counter intelligence mindset

Protect all assets Emphasize the perimeter Patch systems Use signature-based detection Scan endpoints for malware Read the latest news Collect logs Conduct manual interviews Shut down systems
Broad

Protect high value assets Emphasize the data Harden targets and weakest links Use anomaly-based detection Baseline system behavior Consume threat feeds Collect everything Automate correlation and analytics

Gather and preserve evidence

Targeted

Greater Need for Security Intelligence

Visibility across organizational security systems to improve response times and incorporate adaptability/flexibility required for early detection of threats or risky behaviors

Evolution of Security Intelligence


Initial Visibility Facilitates Compliance Attackers adapt not to leave a trace Network Does Not Lie Greater Coverage across organization Attackers adapt to hide in the noise

SIEM
other relevant data
Filters out the noise, improves incident and offense identification Proactive to detect targeted and zero-day attacks Needs scalability to add more data sources and extensibility to support additional security analytics

Security Intelligence

Amplifying Security Intelligence with Big Data Analytics


The Triggers That Motivate Big Data Analytics for Security Intelligence:

Extending the IQ of a Security Intelligence Solution to Big Data


Need to derive security relevant semantics from syntactic elements contained in raw data.

Distilling
Analytical functions, tools and workflows that can be employed to deliver insights

Availability of codified human know-how and understanding to enable machine processing and progressively automate manual processes

Security Intelligence From Real-time Processing of Big Data


Behavior monitoring and flow analytics

Network Traffic Doesnt Lie


Attackers can stop logging and erase their tracks, but cant cut off the network (flow data)

Activity and data access monitoring

Improved Breach Detection


360-degree visibility helps distinguish true breaches from benign activity, in real-time

Stealthy malware detection

Irrefutable Botnet Communication


Layer 7 flow data shows botnet command and control instructions

Security Intelligence Amplified by Advanced Analytics


Hunting for External Command & Control (C&C) Domains of an Attacker
Historical analysis of DNS activity within organization Automate correlation against external DNS registries Advanced analytics identify suspicious domains
Why only a few hits across the entire organization to these domains? Correlating to public DNS registry information increases suspicions

Pursue Active Spear-Phishing Campaigns Targeting the Organization


Employ Big Data Analytics on email to identify patterns to identify targets and redirects Build visualizations, such as heat maps, to view top targets of a spear-phishing attacks Load Spear-Phishing targets and redirect URLs into real-time security intelligence analysis to thwart the attack

Security Intelligence Amplified by Advanced Analytics


Tracking Multiple Unrelated Identities
Who am I? Who are you? Who do we communicate with? What devices do we own? Name: John Smith Corporate ID: John.Smith@us.ibm.com Google analytics: jsmith22@gmail.com Mobile: 613-334-6572, MAC, IP Public Community: BigPipes11 Laptop: Several IPs, MAC Addresses, HostNames Tablet: IP Address, MAC Address Other linking attributes: Fonts installed, language, user agent, installed software, web sites commonly visited, people who are communicated with, etc

Employ Big Data Analytics on structured attributes and un structured communications to link identities

Attributes have a tendency to cross identities, similar problems with device profiles

Security Intelligence Amplified by Advanced Analytics


Todays Knowledge Applied to Yesterdays Problems
Today breached organizations go weeks or months un-aware of someone who has already infiltrated their network Why not use todays knowledge to analyze yesterdays data? Capture all traffic from for a period of time.. As Security Detection technics are updated (AV, IPS Signatures, BlackLists, MD5s, etc) run them against yesterdays data

Big Data not only allows us to store everything, we can extract the attributes used for detection up front to speed up analysis of old data: PCAP Data -> List of all IPs and Domains All File MD5s All Links in email and social communications Host Inventory Data -> Registry Values Patches Applied File System Audit

Quickly check for new indicators in yesterdays values

Design Pattern: Security Intelligence Employing Big Data


Visualizations & Reporting

Security IQ

Real-Time Security Analytics

Asymmetric SecurityFocused Big Data Analytics

Operational Management

Data Exploration

Customizing & Extending Security Intelligence with Big Data Solution


Triggers for Specific Capabilities to Augment Core Security Intelligence with Big Data Solution:
Example Data Sources: Telecom: Customer Data Records Energy & Utilities: Grid Sensor Data Surveillance: Video/Audio content

Ingesting and Pre-processing Domain or Industry Specific Very High Velocity Data Streams for correlation with cyber security data

Performing Advanced Statistical, Predictive and/or Identity Analytics on all data captured to yield security insights

Example Analysis: Visualize linkages of users to privileged identities Which user group has the highest propensity for insider fraud?

Executing Frequently Repeated Queries and other Analytical workloads best suited for massive parallel processing on Warehoused Security-enriched data

Example Queries: Quarterly reporting on historical warehoused security data

Find out more about Security Intelligence with Big Data


Watch the Demonstration! Visit the website Watch the video Read the white paper

Read the thought pieces


What is Your Organizations Security IQ? What You Need to Know About Security Intelligence with Big Data

Develop a richer understanding of big data


Understanding Big Data eBook Harness the Power of Big Data eBook

Thank you

Você também pode gostar