Você está na página 1de 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/304019247

Unravelling unstructured data: A wealth of information in big data

Conference Paper · September 2015


DOI: 10.1109/ICRITO.2015.7359270

CITATIONS READS

4 91

3 authors:

Mona Tanwar Reena Duggal


Amity University Amity University
2 PUBLICATIONS   4 CITATIONS    6 PUBLICATIONS   43 CITATIONS   

SEE PROFILE SEE PROFILE

Sunil Kumar Khatri


Amity University
104 PUBLICATIONS   193 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

data analytics, software reliability engineering View project

Regression testing and SBST View project

All content following this page was uploaded by Reena Duggal on 17 June 2016.

The user has requested enhancement of the downloaded file.


Unravelling Unstructured Data: A Wealth of
Information in Big Data
Mona Tanwar1, Reena Duggal2, Sunil Kumar Khatri3
1-3
Amity Institute of Information Technology
Amity University Uttar Pradesh, Noida, India
1
mona.tanwar@student.amity.edu
2
reena.duggal@student.amity.edu
3
skkhatri@amity.edu,sunilkkhatri@gmail.com

Abstract— Big Data is data of high volume and high variety suitable and adequate to process big data as they are based on
being produced or generated at high velocity which cannot be structured data which is a very small fraction of big data and
stored, managed, processed or analyzed using the existing secondly because they are not scalable to the extremely high
traditional software tools, techniques and architectures. With big rate of generation of big data.
data many challenges such as scale, heterogeneity, speed and
The description of big data is incomplete without
privacy are associated but there are opportunities as well.
Potential information is locked in big data which if properly mentioning the three V’s of big data which are volume,
leveraged will make a huge difference to business. With the help variety and velocity. These are the fundamental characteristics
of big data analytics, meaningful insights can be extracted from of big data as given in Fig. 1 [3].
big data which is heterogeneous in nature comprising of 1. Volume – It refers to the high magnitude of big data
structured, unstructured and semi-structured content. One which is in the order of terabytes to petabytes and more. For
prime challenge in big data analytics is that nearly 95% data is instance, some earlier estimates suggested that 20 petabytes of
unstructured. This paper describes what big data and big data storage space was used to store 260 billion Facebook photos.
analytics is. A review of different techniques and approaches to In 2010, it was reported that up to one million photographs
analyze unstructured data is given. This paper emphasizes the
were processed by Facebook per second [4]. Twitter generates
importance of analysis of unstructured data along with
structured data in business to extract holistic insights. The need 12 terabytes of data daily [5]. In 2012, Facebook stated that
for appropriate and efficient analytical methods for knowledge 2.7 billion “likes” and “comments” were registered daily by
discovery from huge volumes of heterogeneous data in the users [6].
unstructured formats has been highlighted. 2. Variety – Massive volumes of data of heterogeneous
nature is generated by different sources. It consists of
Keywords— Big Data, Unstructured data, Text Analytics, Audio structured, semi-structured and unstructured data. Structured
Analytics, Video Analytics, Social Media Analytics data has a fixed format whereas unstructured data has no fixed
schema or format.
I. INTRODUCTION 3. Velocity – Big data is being generated continuously at an
Big data has caught attention of professionals, exponential rate. 90% of current data is generated in last two
academicians and researchers since it came into light. This years [7, 8]. Social media is one major contributor which is
paper explores various aspects of big data and big data generating data explosively. Sensors, smart phones and
analytics. Various definitions have come up during this phase internet are leading to huge data feeds.
when professionals throughout the world were putting in The explosive rate of growth of big data presents
endeavors to understand and state what big data is. The first tremendous opportunity and will yield big economic gains if
thing that comes to mind is its size but it is much more than correctly exploited. The resources needed for exploiting big
that. data are falling short because of the high rate of growth of
IDC defines big data as "Big data technologies describe a data. Eric Schmidt, the CEO of Google in the Lake Tahoe
new generation of technologies and architecture designed to Technology Conference held in 2010 [9] quoted that
economically extract value from very large volumes of a wide “Between the dawn of civilization through 2003, just 5
variety of data, enabling high velocity capture, discovery exabytes of information was created. That much information
and/or analysis " in 2011 [1]. TechAmerica Foundation is now created every two days and the pace is increasing.
defines big data as “Big data is a term that describes large People aren’t ready for the technology revolution that’s going
volumes of high velocity, complex and variable data that to happen to them.” According to a recent study, such amount
require advanced techniques and technologies to enable the of data is being generated in every 10 minutes now [10]. It has
capture, storage, distribution, management, and analysis of the also been estimated that more than 85% of Fortune 500
information.” [2]. organizations will fail to exploit big data for competitive
The traditional data management and analysis systems like advantage [11]. They will lag behind the 15% organizations
Relational Database Management Systems (RDBMS) are not that will leverage big data. Organizations that exploit big data

978-1-4673-7231-2/15/$31.00 ©2015 IEEE


will have a competitive advantage over other o organizations generating personalized offers for the customers. Big data
[12, 13]. analytics enables organizations to create real-time intelligence
Nearly 95% of big data is unstructuredd [2]. To improve from big data. Devices such as smart phones and sensors have
business performance, insights need to be gained from accelerated the rate of grow wth of data generation and
unstructured data and integrated with othher data to get the magnified the need of real-tim me analytics. The scope and
holistic picture so that better business decissions can be made. benefits are ubiquitous and never ending. Therefore, to
Organizations that would leverage unstrucctured data as well process and analyze big data, there
t is always a need of cost
along with structured data would have a competitive edge effective, efficient and appropriate tools and techniques.
over others. This has been stated time and again. The process of extracting infformation from big data can be
divided into five phases [15]. The first phase is Acquisition
and Recording of data, secondd is Extraction, Cleaning and
Annotation, third is Inteegration, Aggregation and
Representation, fourth is Modeelling and Analysis and finally
the fifth is Interpretation. Thee first three phases are Data
Management tasks and the lateer two are Analytics tasks. The
data management process invollves acquisition and storage of
data, and retrieving and preparinng it for analysis. The analytics
process refers to analyzing and acquiring knowledge from big
data.
Analytics can further be categorized into Descriptive
Analytics, Predictive Analyticcs and Prescriptive Analytics.
Descriptive analytics is conneccted with business intelligence
and is based on historical daata. Predictive analytics uses
statistical data and is meant forr making predictions for future
scope based on the data. Prescrriptive analytics is used to find
out the optimized solution for thhe concerned problem having a
set of constraints [16].
Handling heterogeneity in big data analytics becomes
Fig.1 Characteristics of Big Datta tedious and difficult. Apart froom heterogeneity, complexities
arise in big data analytics becauuse of other factors as well such
II. WHAT IS “BIG DATA ANALY YTICS"? as noise accumulation, spurrious correlations, scale etc.
Big data analytics is the process of collecting, storing and Moreover, the conventional tecchniques are statistical and do
analyzing huge volumes of high velocitty diverse data to not scale up to the need of bigg data. A small size of data set
extract hidden patterns and meaningful innsights. The results known as the sample is selecteed and relationships are found
can be used for decision making by prooviding meaningful out of the sample. The conclussion is then generalized to the
insights which help business grow, help in discovering certain entire population but in case off big data, the sample would be
patterns in fields like weather forecast, social networks, online massive. So it doesn’t hold significance
s in this case. The
browsing, customer purchases, telecom andd location data. Big computational efficiency of these t techniques is also not
data analytics can help in making discoverry in various areas scalable on big data.
such as space science, research in varioous fields such as
medicines, in analyzing various trends inn stock market to III. ROLE OF UNSTTRUCTURED DATA
make predictions, in predicting customer demand,
d in making Big data consists of struuctured, semi-structured and
future predictions for sales and market direections etc. During unstructured data as shown in Fig.
F 2[17]. This is often referred
the last elections that were held in India, the
t BJP party used as structural heterogeneity in big
b data. Only 5% of existing
big data analytics to prepare their strategiees for campaigning data is structured [18]. Structuured data is tabular data which
[14]. The information obtained by anallytics is used by exists in the form of relationaal databases and spreadsheets.
organizations to get deeper knowledge so thhat evidence based Data having no particular formmat, schema or structure is said
decisions can be made faster to get a com mpetitive advantage to be unstructured, it can be inn any form such as text, audio,
over other organizations. For instance, Clickstream data images and video for example, word files, PDF files, content
enables online retailers to understand custoomer behavior and of blogs, forums, tweets, emailss, web pages, audio files, video
also the browsing patterns. It provides innformation on how files, images etc. The internett is flooded with unstructured
much time the customers spent on which pages and the content. Nearly 95% of existinng data is unstructured, a vast
sequence of pages as well. This helpss in taking better portion of which is in the form of videos [2]. Major
decisions with respect to what kind of prodducts, services and contributors to unstructured data
d are social networks and
deals to offer them. Such data could also help in improving sensors. In between structuredd and unstructured data, there
the website designs. Data being generateed through various exists semi-structured data whhich has no strict standard for
mobile apps contain huge information whhich could help in example Extensible Markup Lannguage (XML) files, web logs,
sensor logs etc. XML enables exchange of data on web and is Information Extraction or IE techniques convert
machine readable since it consists of user-defined data tags. unstructured text into structured content. For instance, from
Unstructured data adds to the complicacy of analytics medical prescriptions, structured data like drug dosage,
process since mostly the machines require a structural composition and other information can be obtained. There are
organization of data for processing and analysis. But in order two sub tasks in information extraction process namely Entity
to have a competitive advantage over other organizations, the Recognition and Relation Extraction [21]. ER or Entity
potential information locked in unstructured data needs to be Recognition finds name information in text data and classifies
extracted by organizations. The real value of unstructured data the information into categories like person, firm and year. RE
can be leveraged by deconstructing it which results in data or Relation Extraction finds and retrieves the semantic
enriched with metadata. This process converts it into semi- relationships between entities such as persons, company, etc
structured content which can be used to gain insights. from the text data.
It has also been stated that for decision making, the One another technique used in text analytics is Text
decision makers cannot rely completely on structured data as Summarization [22]. These techniques are used to produce a
they would miss the vast amount of information available on summary from a single or multiple text sources. These
the open source unstructured web content [18]. To keep pace techniques can be applied to various text data sources such as
with the fast evolving information age, organizations will news, articles, emails, blogs and forums. There are two
need to exploit unstructured content to extract patterns across different approaches to the summarization techniques. One is
complex data types for evidence based decision making. Extractive and the second is Abstractive. The extractive
For predictive analytics, both structured and unstructured approach creates summary by extracting original text units or
data are required. Without their integration and analysis, a sentences from the source document and therefore the
complete picture cannot be obtained. For instance, while an summary is a subset of it. This approach requires no
organization may correlate its sales data with variables such as understanding of the document. On the other hand, the
quarter of year or demographics of customer to track the sales abstractive approach extracts semantic information from the
record of some product or service, the organization may document. The summaries generated by this approach do not
understand what is happening but the reason can be contain the original text units necessarily. This approach uses
understood after analyzing social media data and call center advanced Natural Language Processing Techniques or NLP
logs i.e. unstructured data [19]. A wealth of information is techniques to parse the text and produce the summary. For big
stored in unstructured data which if leveraged properly can data, extractive systems are more suitable as they are easier to
make a big difference to business. adopt.
One major challenge in unstructured data analytics is that
unstructured data is noisy [19]. It needs to be cleaned first and
then it is analyzed after which it is integrated with structured
data. Integration of unstructured data with structured data is a
challenging task. Many real time challenges exist with
unstructured data. There is a need of efficient cost-effective
techniques and tools to exploit unstructured data in real time.

IV. VARIOUS TECHNIQUES OF UNRAVELLING UNSTRUCTURED


DATA
In this section, the analytical approaches and techniques
used for extracting information from unstructured data (text,
audio and video) have been discussed. Social media analytics
has also been discussed though it consists both of structured
and unstructured data but unstructured data has a very large
fraction in comparison to structured data.
Fig. 2 Types of Big Data
A. Text Analytics
It refers to the techniques used for extracting or retrieving There are Question Answering techniques or QA
information or insights from text based data such as emails, techniques also which provide answers to questions framed in
documents, advertisements, forums, blogs, news content, natural language. Such systems have their applications in
social network content, website content, call center logs, various fields such as finance, academics, healthcare and
customer comments and reviews, tweets etc. It involves marketing. They are based on complex natural language
Statistical Analysis, Computational Linguistics and Machine processing techniques. Question answering techniques have
Learning. Meaningful insights are gained which support three different approaches. The first is Information Retrieval
decision making in businesses. For example, by retrieving approach, the second is Knowledge-based approach and the
information from financial news, stock market can be third is Hybrid approach. In the information retrieval or IR
predicted [20]. approach, firstly question processing is done to create an
appropriate query from the question, then document
processing is done to extract relevant pre-written content from dictionary. If the system fails to do so, the most similar word
existing documents after which answer processing is done to is returned. The output of this system is a file which contains
retrieve the candidate answers which are ranked and the top the sequence of the spoken words in the speech. In the
ranked answer is returned as the solution or output. The searching phase, to find the search term, standard text-based
knowledge based approach produces the semantic information methods are implemented. The phonetics based approach
of the question and then it is used for querying. This approach works with sounds or phonemes which helps in distinguishing
is suitable for restricted domains such as tourism and one word from the other. This approach can also be further
medicine as there are no large volumes of pre-written content divided into two phases which are phonetic indexing and
in these domains. In the hybrid approach, the questions are searching. In the indexing phase, the input speech is translated
semantically analyzed using the knowledge based approach into a sequence of phonemes. In the searching phase, the
whereas the candidate solutions are generated using the search terms are phonetically represented by searching the
information retrieval approach. output obtained from the previous phase.
Sentiment Analysis or Opinion Mining techniques are the
techniques which analyze opinionated text which consists of C. Video Analytics
opinion or views of people towards entities such as It is the process of monitoring, analyzing and gaining
individuals, products, brands, firms or events [23]. It is done meaningful insights from video streams. It is also referred as
in areas such as marketing, political and social science and Video Content Analysis (VCA) and is still in its initial stage
finance. Sentiment analysis is done at document level, [26]. There are techniques to process real-time and pre-
sentence level and aspect level. The document level recorded videos. The major contributors to video data are
techniques infer whether there is a positive or negative Closed-Circuit Television (CCTV) cameras and the video-
sentiment about a single entity in the whole document. In the sharing websites such as YouTube. One prime challenge in
sentence level techniques, the single sentiment is determined video analytics is the huge size of video data. Over two
about the entity in the sentence. This is more complicated in thousand pages of text data is equivalent to just one second of
comparison to document level techniques. The aspect based a High Definition (HD) video [27]. Every minute hundred
techniques determine all sentiments specific to the entity with hours of video content is uploaded to YouTube [28]. High
respect to the different aspects of the entity in the document. volume of video content poses a challenge but opportunity as
This is useful where customer opinions towards different well with the help of big data technologies. Intelligence can be
features of a product are reviewed. Text based content is drawn from thousands of hours of video content. Video
increasing at an accelerated rate and the ability of business analytics has been applied in areas such as automated security,
decision makers to extract useful insights from text data monitoring and surveillance systems for detecting breaches,
remains challenging. identifying thefts, detecting littering of areas, keeping a check
on suspicious activities or objects left unattended. Upon
B. Audio Analytics detection of any such activity the security officers may be
It is the process of analyzing and extracting or retrieving notified in real time or a necessary automatic action may be
information from unstructured audio content or data. It is taken such as turning on the sound alarms, locking the doors
known as Speech Analytics when applied to human spoken or turning on the lights. Labor-based surveillance systems are
language [24, 25]. It is implemented in areas such as customer costly and are less effective in comparison to automatic
call centers and healthcare. In call centers, it is used to analyze systems. It has been observed that security personnel cannot
recorded customer call content to evaluate the performance of maintain their focus for more than twenty minutes on the
the agents, to understand customer behavior, to identify issues video content [29]. The content of CCTV cameras placed in
of product and services, to improve the customer experience retail outlets can be analyzed for extracting information. Such
and to increase the sales rate. Live calls can also be analyzed information can help in better marketing and operations
and feedbacks can also be provided to the agents in real time. management. The retailers can extract information such as the
Products and services recommendations can be formulated number of customers, their demographic information like age
based on previous and current interactions with the customers. and gender, the time duration for which they were in the store,
In healthcare, audio analytics is useful in diagnosing and their movement patterns and the time duration for which they
treating diseases which affect the communication patterns of were in different sections. Queues can also be monitored in
the patients like Depression, Cancer and Schizophrenia [24]. real time. Meaningful insights can be extracted by correlating
Infant’s cries can also be analyzed to gain information about such information with customer demographics to make better
the infant’s health and emotional status [25]. decisions for product price and placement, deals and offers,
There are two approaches for speech analytics: the making combos, staffing, product promotion strategies and
Transcript-based approach and the Phonetic-based approach. layout optimization. The customer group’s buying behavior,
The transcript-based approach which is also known as the group size, group’s demographics and the buying behavior of
Large Vocabulary Continuous Speech Recognition (LVCSR) individual members of the group can also be analyzed.
approach is further divided into two phases: Indexing and One another application of video analytics is Automatic
Searching. In the indexing phase, Automatic Speech Video Indexing and Retrieval for easy search and retrieval of
Recognition (ASR) algorithms are used to match sounds with videos. For video searching and retrieval, there is a metadata
words which are identified with the help of a predefined based approach in which relational database management
systems are used. In the soundtracks and transcripts approach, Activity graphs are more preferable in comparison to social
video indexing can be done by applying audio analytics and graphs from analytics point of view as active relationships are
text analytics [30]. more informative.
There are two different system architecture approaches in Community detection or discovery is a technique which
video analytics: Server-based architecture and Edge-based extracts implicit communities of a network. Communities are
architecture [31]. In server-based architecture, a centralized sub-networks where the entities interact more with each other
server performs video analytics on the videos captured by with respect to the entire network. Behavioral patterns and the
each camera. The limitation to this approach is that the properties of the network can be extracted from such data.
bandwidth is limited and hence the videos are compressed by Community detection has similarity with clustering in this
either reducing the image resolution and/or the frame rates. respect [36]. It has various application areas such as marketing
And therefore, the accuracy of the analysis gets affected and the World Wide Web (WWW) [37]. It is helpful in
because of loss of information. But maintenance is easier in developing better product recommendation systems.
this approach and it also provides economies of scale. In edge- Social Influence Analysis analyzes the influence of entities
based architecture, video analytics is applied on the video and connections in a social network. It is based on the
content captured by the camera locally. There is no loss of assumption that the behavior of an entity or participant is
information in this approach and the content analysis is more influenced or affected by others. Such data gives an insight
effective. The maintenance of such systems is costly and the into the actors influence, strength of connections and the
processing power is lower with respect to the server-based patterns of influence in the network. This technique is helpful
systems. in marketing to enhance awareness of brands and products and
their adoption.
D. Social Media Analytics Link Prediction is a technique which predicts possible
It is the analysis of structured and unstructured content of future linkages between the existing entities in the social
social media channels such as social networks, social news, network [34]. A social network keeps growing over the time
blogs, micro blogs, media sharing, wikis, social bookmarking, and new edges and nodes keep adding up. The goal is to
question-and-answer sites and review sites [32]. Many mobile understand and predict the possible interactions, collaboration
apps also facilitate social interactions and therefore are social or influence among the participants of a social network in a
media channels. Research on social media is done in various given time period. It has been observed that these techniques
disciplines such as psychology, sociology, computer science, outperform pure chance by factors of 40-50 indicating that the
mathematics, economics, physics and anthropology. In recent present structure of the social network has potential
years, the results of social media analytics have been helpful information about future links [38]. By link predictions,
for marketing because of widespread adoption of social media recommendation systems are developed such as Facebook’s
by users throughout the world [33]. “People You May Know”, YouTube’s “Recommended for
In social media, the two sources of information are the You” and Netflix’s and Amazon’s recommender engines.
content (images, audios, customer feedbacks, product reviews, Besides the discussed techniques, other analytical
videos, bookmarks, sentiments etc.) generated by users and techniques are evolving with time. This area is the focus of
the relationships between the entities of network (people, many researchers and professionals currently. The future will
organizations, products etc.). The social media analytics can have cost effective efficient tools and techniques for analytics.
be categorized into two parts: Content-based analytics and
Structure-based analytics [34]. In content-based analytics, V. CONCLUSIONS
analytics is performed on the content posted by the users on Big data has invaluable information which if extracted
the social media platforms. Such content is of high volume, completely can make a big difference to business gains. In this
unstructured, noisy and dynamic nature. To extract insights paper, various aspects of big data analytics along with the role
from such data, the text, audio and video analytics techniques of unstructured data have been discussed. Various application
can be applied. The data processing challenges are addressed areas have been described. The analytical techniques and
by the big data technologies. In structure-based analytics, the approaches of unstructured data such as text, audio and video
focus is on the structural attributes of the social network. have been reviewed. These techniques include information
Insights are extracted from the relationships of the entities. extraction, summarization, question-answering and sentiment
A social network is represented by a graph consisting of a analysis for text analytics, the transcript-based and phonetic-
set of nodes representing the participants and a set of edges based approach for audio analytics, automatic video indexing
representing the relationships between the entities or and retrieval for video analytics and content based and
participants. The network graphs can further be categorized structure based approaches, community detection, social
into two kinds of graphs: Social graphs and Activity graphs influence analysis and link prediction for social media
[35]. In social graphs, the edges represent the existence of analytics. It has also been emphasized that there is a need of
links like friendship between the respective entities. knowledge discovery from unstructured data along with
Communities or hubs could be determined by such data. In structured data in businesses to gain competitive edge over
activity networks, actual interactions between entities are others. This fact has been supported by giving various real-life
represented by the edges. There is exchange of information examples. Unstructured data has a wealth of information and
between the linked entities such as likes and comments.
is in a very large fraction in comparison to structured data. In http://www.theorsociety.com/Pages/SpecialInterest/AnalyticsNet
work_anal%ytics.aspx
predictive analytics, both structured and unstructured data
[17] Akash, Bryan, Nishi, Prashant, Subhadeep and Vaibhav,
hold significance for extracting knowledge and making “Emerging Marketing Analytics- Big Data.” [Online]. Available:
predictions in business. Cost effective and efficient tools and http://www.slideshare.net/AkashTyagi8/big-data-marketing-
techniques are needed to analyze structured and unstructured analytics
[18] K. Cukier, “Data, data everywhere: A special report on managing
data in real time. With data increasing over time, opportunities
information,” The Economist, February 25, 2010.
are arising in every field whether it is businesses, medicines, [19] Steve Andriole, “Unstructured Data: The Other Side
research, weather forecasting, etc. Data is a natural resource of Analytics.” Forbes. [Online]. Available:
these days and if utilized effectively, the future will yield cost- http://www.forbes.com/sites/steveandriole/2015/03/05/the-other-
side-of-analytics/
effective and critical insights to all mankind problems. We
[20] W. Chung, “BizPro: Extracting and categorizing business
intend to work in this area in future and come up with better intelligence factors from textual news articles,” International
approaches to big data analytics using unstructured data. Journal of Information Management, vol. 34(2), pp. 272–284,
2014.
ACKNOWLEDGMENT [21] J. Jiang, “Information extraction from text,” in C. C. Aggarwal,
& C. Zhai (Eds.), Mining text data, Springer, pp. 11–41, 2012.
Authors express their deep sense of gratitude to the [22] U. Hahn and I. Mani, “The challenges of automatic
Founder President of Amity University, Dr. Ashok K. summarization,” Computer, vol. 33(11), pp. 29–36, 2000.
Chauhan for his keen interest in promoting research in the [23] B. Liu, “Sentiment analysis and opinion mining,” Synthesis
Lectures on Human Language Technologies, vol. 5(1), pp. 1–
Amity University and have always been an inspiration for 167, 2012.
achieving great heights. [24] J. Hirschberg, A. Hjalmarsson and N. Elhadad, “You’re as sick
as you sound: Using computational approaches for modeling
REFERENCES speaker state to gauge illness and recovery,” A. Neustein (Ed.),
[1] J. Gantz and D. Reinsel, “Extracting value from chaos,'' in Proc. Advances in speech recognition, Springer, pp. 305–322, 2010.
IDC iView, pp. 1-12, 2011. [25] H. A. Patil, “Cry baby: Using spectrographic analysis to assess
[2] Amir Gandomi and Murtaza Haider, “Beyond the hype: Big data neonatal health status from an infant’s cry,” in A. Neustein (Ed.),
concepts, methods, and analytics,” International Journal of Advances in speech recognition, Springer, pp. 323–348, 2010.
Information Management, vol. 35, Issue 2, pp 137-144, [26] B. K. Panigrahi, A. Abraham and S. Das, “Computational
April 2015. intelligence in power engineering,” Studies in Computational
[3] (2015) The Data Science Central website. [Online]. Available: Intelligence, Springer, vol. 302, 2010.
www.datasciencecentral.com [27] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C.
[4] D. Beaver, S. Kumar, H. C. Li, J. Sobel and P. Vajgel, “Finding a Roxburgh, et al., “Big data: The next frontier for innovation,
needle in haystack: Facebook’s photo storage,” in Proc. The competition, and productivity,” McKinsey Global Institute
nineth USENIX Conference on Operating Systems Design and Article, May 2011.
Implementation, Berkeley, CA, USA: USENIX Association, pp. 1– [28] YouTube Statistics (n.d.). [Online]. Available:
8, 2010. http://www.youtube.com/yt/press/statistics.html
[5] K. Rupanagunta, D. Zakkam, and H. Rao, “How to Mine [29] A. Hakeem, H. Gupta, A. Kanaujia, T. E. Choe, K. Gunda, A.
Unstructured Data,” Article in Information Management, June Scanlon, et al., “Video analytics for business intelligence,” Video
29, 2012. analytics for business intelligence , Springer, pp. 309–354, 2012.
[6] S. Marche, “Is Facebook making us lonely,'' Atlantic, vol. 309, [30] W. Hu, N. Xie, L. Li, X. Zeng and S. Maybank, “A survey on
no. 4, pp. 60-69, 2012. visual content-based video indexing and retrieval,” Systems,
[7] “IBM What Is Big Data: Bring Big Data to the Enterprise,” IBM, Man, and Cybernetics, Part C: Applications and Reviews, IEEE
2012. [Online]. Available: http://www- Transaction, vol. 41(6), pp. 797–819, 2011.
01.ibm.com/software/data/bigdata/. [31] Agent Comparative Analysis. [Online]. Available:
[8] Reena Duggal, Balvinder Shukla and Sunil Kumar Khatri, “Big http://www.agentvi.com/images/Video_Analytics_Architectures_
Data Analytics in Indian Healthcare System – Opportunities and Comparative_Analysis.pdf
Challenges” research paper accepted at National Conference on [32] G. Barbier, and H. Liu, “Data mining in social media,” C. C.
Computing, Communication and Information Processing Aggarwal (Ed.), Social network data analytics, Springer, pp.
(NCCCIP-2015) – May 2015: 92-104 327–352, 2011.
[9] Steve Pederson, “Exploiting Big Data from the Deep Web,” [33] W. He, S. Zha, and L. Li, “Social media competitive analysis and
BrightPlanet Corporation, Tech. Report, Dec 17, 2012. text mining: A case study in the pizza industry,” International
[10] Randy Rieland, "Big Data or Too Much Information," Journal of Information Management, vol. 33(3), pp. 464–472,
Smithsonian Magazine, May 7, 2012. 2013.
[11] Stephen Prentice, “From Data to Decision: Delivering Value [34] Charu C. Aggarwal, “An Introduction to Social Network Data
from ‘Big Data,’” Gartner Inc., March 28, 2012. Analysis,” Social Network Data Analytics, Springer, 2011.
[12] M. A. Beyer and D. W. Cearley, ‘“Big Data’ and Content Will [35] J. Heidemann, M. Klier, and F. Probst, “Online social networks:
Challenge IT across the Board,” Gartner Inc., February 15, 2012. A survey of a global phenomenon,” Computer Networks, vol.
[13] M. Blechar, M. Adrian, T. Friedman, W. R. Schulte and D. 56(18), pp. 3866–3878, 2012.
Laney, “Predicts 2012: Information Infrastructure and Big Data,” [36] C. C. Aggarwal, “An introduction to social network data
Gartner Inc., November 29, 2011. analytics,” C. C. Aggarwal (Ed.), Social network data analytics,
[14] Neerja Pawha Jetley, “How big data has changed India Springer, pp. 1–15, 2011.
elections.” CNBC News. [Online]. available: [37] S. Parthasarathy, Y. Ruan, and V. Satuluri, “Community
http://www.cnbc.com/2014/04/10/how-big-data-have-changed- discovery in social networks: Applications, methods and
india-elections.html emerging trends,” C. C. Aggarwal (Ed.), Social network data
[15] A. Labrinidis and H. V. Jagadish., “Challenges and opportunities analytics, Springer, pp. 79–113, 2011.
with big data,” in Proc. VLDB Endowment, vol. 5(12), pp. 2032– [38] D. Liben-Nowell, and J. Kleinberg, “The link prediction problem
2033, 2012. for social networks,” in Proc. The twelfth International
[16] (2013) G. Blackett.. Analytics Network- O. R. Analytics. Conference on Information and Knowledge Management, ACM,
[Online]. Available: pp. 556–559, 2003.

View publication stats

Você também pode gostar