Você está na página 1de 4

Coursework – Question 2

Introduction
According to the Journal of Business Research (Janssen, 2017) “Big data refers to datasets that are
both big and high in variety and velocity, which makes them difficult to handle using traditional tools
and techniques”. In my opinion the simplest, and my own personal definition of big data is that the
term describes vast amounts of both unstructured and structured data from internal and external
data points, this data is able to be used in business analytical work in order to provide companies
advanced knowledge and wisdom.

The company ‘SAS’ states that the importance of big data is all about how the data you have is
utilised, rather than how much you have, and that it can be used to determine root causes of
failures, issues and defects in near-real time, generate coupons at the point of sale based on
customer’s buying habits and recalculate entire risk portfolios in minutes. (SAS Institute, 2017).

Big Data Explained

Distinction of data forms


First of all it is important for the distinction to be made about the difference between data, ‘big
data’, information and knowledge. Checkland and Holwell (Checkland & Holwell, 2006) touch on a
hierarchy with data being the lowest form of an information system followed by capta, information
and finally knowledge. Data are just facts waiting to be analysed or referenced, and forms the most
basic foundation of the hierarchy, capta is a made up word which serves the purpose of enriching
data and naming facts that are selected for specific attention, or to create a new category such as
sales data being separated into regions. Information refers to facts that have meaning and refer to
something specifically, data or capta has to be processed into information and it is contextualised in
order for meaning to be derived. Finally we have knowledge which is the final stage of the hierarchy
(wisdom is a theoretical higher state of intelligence and understanding, however it is not applied in
this specific theory), it has a greater longevity and is larger in scale than information, and as such
gives companies a grand overview and broad context, insight and understanding of the information
that has been provided (Frost, 2017).

Information systems serving data analytics


Big data is separated based on four characteristics (the four ‘V’s’ according to the Journal of the
Association for Information Systems (Abbasi, et al., 2016) and these are volume, variety, velocity and
veracity. Sometimes only three ‘V’s’ are used such as in Forbes’ explanation of big data (What
Exactly Is Big Data?, 2017) .Volume is the amount of data that is obtained, variety is the different
kind of data available, velocity is the speed at which the data is generated and veracity is the
credibility and reliability of the data generated.

Once again using the Journal of the Association for Information Systems (Abbasi, et al., 2016) as a
reference, it is stated that “its four “V” characteristics have had a profound impact on the people,
processes, and technologies related to the information value chain.” This is due to the data provided
being able to be derived from the big data and then once it has passed through the four ‘V’s’ it is
now able to be used by companies to help serve data analytics.
Generally for standard information systems that do not use big data as a model, they will only record
specific data that they need and that almost always serves only one purpose such as: collecting
receipts, sales data, customer information, work rota data, primary research data, etc. This data is
then used for that specific function or purpose and is it is used to investigate a pre-existing
hypothesis. Big data inverts this rule; the data collected is used to form hypothesis and in turn more
information is gathered, swiftly followed by knowledge being gained from the big data. In direct
contrast, the hypothesis from older information systems is already set, with the data collected only
being used to answer one specific hypothesis mostly (Khan, 2015). The advantage to the standard
method is that businesses aren’t overwhelmed by the four V’s and the answer is easily accessible
with correct data. No excess data means less costs storing the data and less cost obtaining the vast
volume of data. However big data is only going to expand the financial strength of the company if it
is properly utilised, having all the data in the world is meaningless if it is not utilised properly and if
no information or knowledge is gained then there is no point in having all the data there in the first
place. Effective and efficient use of big data however is going to expand the potential financial
barriers of any company in the world.

Application of big data


As mention previously the three fundamental pillars of big data are volume, velocity and variety, and
as stated before I have also chosen to include veracity in the mix thanks to the references I’ve
already used. The BBC wrote an article (BBC News, 2011) detailing how the application of big data
from SAS Institute, according to their chief technology officer at the time, can be used to improve
business models and business queries. Within the BBC article it is stated that they worked with a
bank and required customer satisfaction information to “understand how to minimise the times a
machine runs out of cash, and project when the device fails”, after interpreting the data they
brought a £2 million reduction in maintenance costs. This shows the capability that big data can
provide, and a justification to being analytically minded as a business in order to reduce costs in
various aspects of the company.

SAS Institute is a reference that I have already used myself and they explain how big data works,
who uses it, its’ importance and its’ history; I am going to refer to how it works (SAS Institute, 2017).
SAS Institute says that the data source for big data is generally either streaming data (this is data
that reaches company IT systems from a web of connected devices), social media data (this data is
unstructured or semi structured and is from all social media sources surrounding specific parameters
e.g. a certain location or the name of the company) and publically available sources (this data is
available through open data sources like data.gov and European Union Open Data Portal). All of
these lead the company to consider how to store the newly acquired data and how to manage the
data, this storage is the volume aspect of big data. A company must know how much of the data
should be analysed, if the company has high enough performance technology such as grid
computing or in-memory analytics then all data collected could be analysed. It is said that only 0.5%
of all aggregated data is ever analysed (What Exactly Is Big Data?, 2017). Finally the company must
know how to use the newly found information and as such a strategy should be put in place in order
to optimise the information and covert it to business knowledge.

Also in the BBC article (BBC News, 2011) a statement read “It’s about bringing analytics to specific
business problems. We had very good success with this in the retail space, and also helping banks
fighting credit card fraud.” In addition it says “There is an explosion in the understanding in the value
of analytics. One problem is actually acquiring enough talent to deal with the demand.” This shows
that even with the new insights it must be analysed by a trained team in order to maximise the
potential for cost reductions, otherwise the data is essentially meaningless.

Difficulties surrounding reliable data and business analytics.


Big data is the present and future of data analysis, but it comes at a hefty price to new companies
getting to grips with the system. The issue is seen in the fourth ‘V’ of veracity, where by it is hard to
gauge how credible a certain piece of data is. Unreliable data is more dangerous than having no data
at all due to it potentially leading to a result or application that reduces costs for the data set, but for
the company it increases costs because the analysis had been misappropriated and the focus is too
broad. Reliable data is not something that you can leave out thanks to a certain parameter, with
such a large volume of data it is inevitable that some, or even potentially most, of it is misleading
and unreliable.

As an extra resource, the company insideBIGDATA gives its definition of veracity (Normandeau,
2013) – “Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is
being stored, and mined meaningful to the problem being analysed. Inderpal feel veracity in data
analysis is the biggest challenge when compared to things like volume and velocity. In scoping out
your big data strategy you need to have your team and partners work to help keep your data clean
and processes to keep ‘dirty data’ from accumulating in your systems” This source states that
“Inderpal feel veracity in data analysis is the biggest challenge” which is evidence that if monitored
incorrectly, the analysis will be greatly diminished.

Correlation does not equal causation, so for example, if sales rise when the temperature is over 20
degrees it does not automatically mean that people are more likely to buy specific company items;
another causal link could be underlying in the data, such as a new advertising campaign that
positively affects sales for that period of time. If data is analysed incorrectly and then money is spent
based on the unreliable data the company is practically going in blind with the false perception of
proposed knowledge.

This new data requires companies to either outsource data analysts or the company has to increase
its own staff roster with their own data analysts and it must support them with high quality and high
performance technology. If the data analysts are outsourced then the cost will be more expensive in
the long run, but if they can’t afford the start-up costs to maximise their own data analysts then it
may be their only option. SAS Institute (SAS Institute, 2017) runs down a list of considerations to
help smooth over the big data analysis, it suggests: ‘Cheap, abundant storage, faster processors,
affordable open source, distributed big data platforms such as Hadoop (this is a big data software),
parallel processing, clustering, MPP (massive parallel processing), virtualisation, large grid
environments, high connectivity and throughputs, and finally cloud computing and other flexible
resource allocation arrangements’. These technological systems will cost a large amount of money
to the majority of companies who aren’t specifically heavily invested in the IT sector.

The costs can be recouped through smart implementation of ideas and processes, which have been
gathered by the big data analysed; but it is vital that the analysts understand what the company
needs, and that the veracity of the data is thoroughly checked before being used in future
developments.
Conclusion
The advantages that are available through big data are potentially huge, as long as all four V’s are
monitored and managed regularly, quickly and attentively. The information provided must be fully
utilised and thoroughly examined by hard working and knowledgeable data analysts to improve it
into knowledge for the business in question, only then will the benefits be fully realised and the
company will likely see financial benefits from the practices.

The costs associated with implementing a big data system are large, especially when considering all
of the technological tools and systems necessary to fully optimise the analysis of the everlasting
volume of data, but knowledge on such a large scale is worth the money required. The acquired
knowledge will undoubtedly improve the functions of nearly every aspect surrounding the business.
Most importantly, even more important than the financial aspect of big data, is that the application
of the knowledge acquired will lead to smarter and better informed decision making, which is the
bedrock of any safe, stable and trustworthy company.

Você também pode gostar