Escolar Documentos
Profissional Documentos
Cultura Documentos
day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big
data.
minutes.
Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. Scrutinize 5 million trade events created each day to identify potential fraud
Analyze 500 million daily call detail records in real-time to predict customer churn
faster The latest I have heard is 10 nano seconds delay is too much. Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. Monitor 100s of live video feeds from surveillance cameras to target points of interest Exploit the 80% data growth in images, video and documents to improve customer satisfaction
Finally.
`Big- Data is similar to Small-data but bigger
.. But having data bigger it requires different approaches:
On sectors
Financial Services..
? TBs of
2+ billion
76 million smart meters in 2009 200M by 2014
people on the Web by end 2011
12+ terabytes
of Tweets create daily.
Volume Velocity
5+ million
trade events per second.
100s
of different types of data.
Variety
Veracity
Only
1 in 3
Hype, Reality or ?
Basics
Big Data refers to the vast quantities of data that businesses and governments gather This data is believed to contain useful, actionable intelligence that could lead to
Process efficiencies
Lower costs, Higher profits, Identification of terrorism threats/plans
What is needed is the will and expertise to perform the relevant analysis.
10
11
Size Contexts
Some areas of science generate huge amounts of data:
Meteorology (weather forecasting) & Remote Sensing Genomics (genome sequencing) Physics, e.g. CERN 150 million sensors each deliver data 40 million times per second Working with only 0.001% of the data collected, still 25 petabytes a year is collected If all data was used, it would be 500 exabytes a day 200 times more than all other global data sources combined Social data, RFID data, Surveillance NSA & GCHQ
12
The History
Big Data is not a new topic
Data has been getting bigger continually ever since the
first byte was created It is related to storage capacity and processing power which also keep growing continually
Over the last 25 years, many governments have attempted to consolidate data holdings into single
Corporate Examples
Amazon handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers. Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data Facebook handles 50 billion photos. TaoBao & Alibaba again, billions of transactions Consumer profile databases, Loyalty Cards, Octopus
14
Ford
http://www.datanami.com/datanami/2013-03-
16/how_ford_is_putting_hadoop_pedal_to_the_metal.html
Fords modern hybrid Fusion model generates up to 25 gigabytes of data per hour Data that is a potential goldmine for Ford, as long as it can find the right analytical tools for the job. The data can be used to understand driving behaviors and reduce accidents, understand wear and tear identify issues that lower maintenance costs, avoid collisions But who should own the data? Ford? The car owner?
beyond imagination, and the consultants and software firms want us to believe that somewhere, if you can find them, there may be some needles pieces of actionable intelligence
16
clients into spending on software solutions Globally, this is a US$100 billion industry, growing 10% a year
17
Is Everyone Happy?
The consultants suggest not. Accenture: 22% of companies are very satisfied 35% are quite satisfied 34% are dissatisfied 39% say that they have data that is relevant to their business strategy Big data can be useful if you know what to look for
and how to get that intelligence to the people who can use it
18
Consultant Perspectives
Companies have lots of data, but most organisations
measure too many things that dont matter and dont put sufficient focus onto the things that do (Accenture). Companies are buried in information and are struggling to use it (McKinsey) The more data they have, the less they seem to know!
19
McKinsey
Alternatively Go and ask people what they think is happening! Ask your lost customers why they got lost!
20
has concerned the secret surveillance activities of the NSA and GCHQ agencies as revealed by Edward Snowden These surveillance activities are fundamentally about big data and analytics, just as they are also about privacy and security, espionage and politics
21
Selected Events
Publication of a top-secret court order against Verizon mandating it to hand over the call records
http://www.theguardian.com/world/2013/jul/19/nsa-extendedverizon-trawl-through-court-order
22
Prism
A system that gives the NSA access to the personal
individual privacy, but it seems that this was not the case However, they were legally required to say nothing the court orders prohibited them saying anything about their data sharing with the NSA Data obtained by cable tapping
Metadata & content from 4 US telecoms providers cables
23
Facebook
During Jan-June 2013, governments requested info on
acebook-government-user-requests
24
XKeyscore
This is the data retrieval system used to collect,
http://en.wikipedia.org/wiki/XKeyscore
about foreigners, but many Americans were also included in the databases
25
26
Tempora
Much of the data is harvested from Internet cables
Telephone calls, Email messages, Facebook entries, Personal Internet history, IM chats, pwds,
Cooperation with private telecoms companies Data held for 3 days, metadata for 30
http://en.wikipedia.org/wiki/Tempora http://www.theguardian.com/uk/2013/jun/21/gchq-cablessecret-world-communications-nsa
27
Bullrun
NSA and GCHQ spend millions developing
programmes that can break Internet security (cryptography) protocols like https, ssl, etc. They also work directly with the telecom providers to ensure that they have backdoors that help them to access data that clients think is private/secret There are no Secrets!
http://www.theguardian.com/world/2013/sep/05/nsa-gchq-encryption-codes-
security
28
hold the data (whether in databases or as ISPs) is that they are required to obey the law of the countries in which they operate
They have no choice they must hand over the data, or
cooperate with the security agencies Also, they cannot reveal that they are cooperating they are gagged from revealing the existence of the Prism/Tempora/Bullrun systems
29
Payouts
GCHQ and NSA are working with each other, sharing
each others data NSA subsidizes GCHQs costs @ GBP millions annually
http://www.theguardian.com/uk-news/2013/aug/01/nsa-paidgchq-spying-edward-snowden
30
Problems
Big data is HUGE there is simply too much data to
Big data is getting bigger Cables that carry hundreds of GB/second make that task harder still As always, 99.999% of the data is not useful. Can you find the 0.001% that might be?
31
Reactions
There have been attempts to stop media organizations
from reporting on the surveillance programmes Computers owned by the Guardian newspaper were physically destroyed in an attempt to remove the data & prevent further publication
Additional copies are held in Brazil and the US http://www.wired.com/threatlevel/2013/08/guardian-
snowden-files-destroyed/
32
purchases, secure?
Not very.
Are you protected by data privacy laws? Not against governments. Perhaps against private companies.
http://www.pcpd.org.hk/
33
Questions
What kind of data is being collected? Where, By Who, For What Purposes??? Can we see/find (some of) the data anywhere? Are you personally at risk?
That depends on who you are, what you do, who you talk to and what about. Is there anything we can do as individuals, as decision makers, as companies?
Should we be concerned?
http://www.theguardian.com/world/2013/sep/05/nsa-how-to-remain-secure-surveillance
Welcome to a Not So Friendly CyberBiggest World Bank Heist in History Nets $45Million
All without setting foot in a Bank
CYBER ESPIONAGE VIA SOCIAL NETWORKING SITES TARGET: US DOD OFFICIALS
Playing Defense
Assumes explicit organizational perimeter Optimized for combating external threats Presumes standardization mitigates risk Dependent on general awareness of attack methodologies Requires monitoring and control of traffic flows
Origins of Security Intelligence
Layered Defenses Essential for Good Security Hygiene and Addressing Traditional Security Threatsbut attackers adapting too
Fraud
Insider Threat
Situational, Subversive, Unsanctioned
Hacktivism
Topical, Disruptive, Public
Cyber Attack
Focused, Well-Funded, Scalable
Protect all assets Emphasize the perimeter Patch systems Use signature-based detection Scan endpoints for malware Read the latest news Collect logs Conduct manual interviews Shut down systems
Broad
Protect high value assets Emphasize the data Harden targets and weakest links Use anomaly-based detection Baseline system behavior Consume threat feeds Collect everything Automate correlation and analytics
Targeted
Visibility across organizational security systems to improve response times and incorporate adaptability/flexibility required for early detection of threats or risky behaviors
SIEM
other relevant data
Filters out the noise, improves incident and offense identification Proactive to detect targeted and zero-day attacks Needs scalability to add more data sources and extensibility to support additional security analytics
Security Intelligence
Distilling
Analytical functions, tools and workflows that can be employed to deliver insights
Availability of codified human know-how and understanding to enable machine processing and progressively automate manual processes
Employ Big Data Analytics on structured attributes and un structured communications to link identities
Attributes have a tendency to cross identities, similar problems with device profiles
Big Data not only allows us to store everything, we can extract the attributes used for detection up front to speed up analysis of old data: PCAP Data -> List of all IPs and Domains All File MD5s All Links in email and social communications Host Inventory Data -> Registry Values Patches Applied File System Audit
Security IQ
Operational Management
Data Exploration
Ingesting and Pre-processing Domain or Industry Specific Very High Velocity Data Streams for correlation with cyber security data
Performing Advanced Statistical, Predictive and/or Identity Analytics on all data captured to yield security insights
Example Analysis: Visualize linkages of users to privileged identities Which user group has the highest propensity for insider fraud?
Executing Frequently Repeated Queries and other Analytical workloads best suited for massive parallel processing on Warehoused Security-enriched data
Thank you