Você está na página 1de 4

Introduction - Big Data

According to Gartner, "By 2015, companies that have adopted big data and extreme
information management will begin to outperform unprepared competitors by 20% in
every available financial metric". Check this
Analyzing Big Data is becoming important to identify trends and lot of companies
have started doing it to get the edge over competitors, scenarios that fit for such
analysis are,
Political campaign wanting to be get a real time feedback of various actions based
on Tweets and Facebook comments
Phone companies wanting to trend their billing data
Credit card companies trying to stop fraud before sales happens
Web log analysis to identify trends
So what is Big Data?
Some people visualize Big Data as the data with the magnitude of Terabytes,
Petabytes. With various views around; it makes sense to establish a common ground
and demystify Big Data a bit.
Data that businesses have to deal with today is no more from mere business
applications as structured data but every activity in the ecosystem from partners,
competitors, vendors, suppliers, investors, regulation agencies, and customer is
generating data. In today's business environment, social channels such as Tweeter,
Facebook, and LinkedIn have become key influencers.
The advent of these un-imaginary data sources is leading to un-imaginary data
volumes. This large and un-imaginary data set is termed as "Big Data".
To define in simple terms, Big Data is typically large volume of un-structured (or
semi structured) and structured data that gets created from various organized and
unorganized applications, activities and channels such as emails, tweeter, web logs,
Facebook, etc.
Also this ultra-fast expansion and influence from un-structured data sources beyond
traditional line of businesses in the enterprise boundary is mandating more inclusive
and rapid analysis (analysis and response in near real time).
The traditional data warehouse and BI approaches were found inadequate to meet the
necessary latency of making business decisions within the budgeted costs while
dealing with such data volume.
As more and more companies are devising unique strategies for dealing with Big Data
leading to warming up of the Big Data market. Apache Hadoop Map Reduce based
open source solutions has been at the forefront of providing the solutions in the Big
Data space. Microsoft is also positioning itself strongly as a choice of platform for the
Big Data solutions. Check a MS Vs Hadoop poll on this.
In the next blog, we will see predominantly the technologies that Microsoft is
throwing at enterprises to solve the Big Data problem.

Cloud Poll: Can Microsoft's Distributed Analytics Tools


Compete with Hadoop?
Klint Finley July 22nd, 2011

1 Comment

inShare42

This week Microsoft Research released Project Daytona MapReduce Runtime, a


developer preview of a new product designed for working with large distributed data
sets. Microsoft also has a big data analytics platform that uses LINQ instead of
MapReduce called LINQ to HPC. Notably, LINQ to HPC is used in production at
Microsoft Bing.
But Microsoft is entering an increasingly crowded market. There's the open source
Apache Hadoop, which is now being sold in different flavors by companies such as
Cloudera, DataStax, EMC, IBM and soon a spin-off of Yahoo. Not to mention HPCC
which will be open-sourced by LexisNexis.
Microsoft's products are currently in early, experimental stages and the company may
never step up the development and marketing of these to be serious Hadoop and
HPCC competitors. But could Microsoft be competitive here if it wants to?

Microsoft Big Data stack


Microsoft is working aggressively towards bridging some of the gaps in its stack for
Big Data processing and has a few announcements towards this in recent past.
Here is a view covering various MS technologies for structured and un-structured data
analysis.

Apache Hadoop based platform support


Microsoft also has announced Apache Hadoop based distribution with Windows
Server and Windows Azure which is expected to be available in CY12 (most probably
in late second half). This is for customers who have already adopted open source
Apache Hadoop based solutions but would like to run on bespoke Microsoft and
intends to integrate best of the two. Check this here
LINQ TO HPC
LINQ to HPC erstwhile Dryad from Microsoft Research, is a distributed runtime for
processing un-structured data and runs on top of Windows Server 2008 R2 cluster
having HPC pack.
LINQ to HPC is now shipped with Windows Server 2008 R2 SP3 HPC pack. This is
currently in RC (Release candidate) and is expected to be released in early 2011. The
RC bits can be downloaded from here
LINQ to HPC based unstructured data analysis is recommended for customers who
would like to utilize existing .NET skills and environment in the enterprise.
Daytona
Daytona is a runtime that needs to be installed on Windows Azure and provides the
execution environment for running large computing jobs. Daytona supports HPC

scenarios for data analytics, financial analysis, machine learning on Windows Azure.
It is currently in beta and expected to be available in early 2012.
Other existing products
SQL Server Parallel DataWarehouse (PDW) is an appliance meant for large scale data
warehousing, BI needs.
SQL Server 2008, or SQL Server 2012 (beta) provides BI stack for standard BI, DW
needs
SQL Server Analysis Services is part of SQL Server, as well PDW and supports
cube for (OLAP) Online Analyticalprocessing.
SQL Server Integration Services (SSIS) is part of SQL Server, as well PDW and
provides data integration, cleansing, etc. capability
SQL Server Reporting Services (SSRS) is part of SQL Server, as well PDW and
provides data reporting, slicing, dicing, etc.
SQL Server also provides data mining algorithms.
ExcelDataScope is a reporting layer from Microsoft Research and is yet to be
released.More about it here
These are some of the key technologies already released and slated to release in
evolving MS Big Data stack.

Você também pode gostar