Escolar Documentos
Profissional Documentos
Cultura Documentos
BI technologies provide historical, current and predictive views of business operations. Common
functions of business intelligence technologies are reporting, online analytical processing,
analytics, data mining, process mining, complex event processing, business performance
management, benchmarking, text mining, predictive analytics and prescriptive analytics.
Though the term business intelligence is sometimes a synonym for competitive intelligence
(because they both support decision making), BI uses technologies, processes, and
applications to analyze mostly internal, structured data and business processes while
competitive intelligence gathers, analyzes and disseminates information with a topical focus on
company competitors. If understood broadly, business intelligence can include the subset of
competitive intelligence.
Business intelligence can be applied to the following business purposes, in order to drive
business value.
1. Measurement program that creates a hierarchy of performance metrics (see
also Metrics Reference Model) and benchmarking that informs business leaders
about progress towards business goals (business process management).
Class Notes
Page 1
inside and outside the business) to work together through data sharing and
electronic data interchange.
5. Knowledge management program to make the company data driven through
strategies and practices to identify, create, represent, distribute, and enable
adoption of insights and experiences that are true business knowledge.
Knowledge management leads to learning management and regulatory
compliance.
In addition to above, business intelligence also can provide a pro-active approach, such as
ALARM function to alert immediately to end-user. There are many types of alerts, for example if
some business value exceeds the threshold value the color of that amount in the report will turn
RED and the business analyst is alerted. Sometimes an alert mail will be sent to the user as
well. This end to end process requires data governance, which should be handled by the expert.
In computing, extract, transform, and load (ETL) refers to a process in database usage and
especially in data warehousing that:
ETL systems are commonly used to integrate data from multiple applications, typically
Class Notes
Page 2
Extract
The first part of an ETL process involves extracting the data from the source systems. In many
cases this is the most challenging aspect of ETL, since extracting data correctly sets the stage
for how subsequent processes go further.
Most data warehousing projects consolidate data from different source systems. Each separate
system may also use a different data organization and/or format. Common data source formats
are relational databases and flat files, but may include non-relational database structures such
as Information Management System (IMS) or other data structures such as Virtual Storage
Access Method (VSAM) or Indexed, or even fetching from outside sources such as through web
spidering or screen-scraping. The streaming of the extracted data source and load on-the-fly to
the destination database is another way of performing ETL when no intermediate data storage
is required. In general, the goal of the extraction phase is to convert the data into a single format
appropriate for transformation processing.
An intrinsic part of the extraction involves the parsing of extracted data, resulting in a check if
the data meets an expected pattern or structure. If not, the data may be rejected entirely or in
part.
Transform
The transform stage applies a series of rules or functions to the extracted data from the source
to derive the data for loading into the end target. Some data sources require very little or even
no manipulation of data. In other cases, one or more of the following transformation types may
be required to meet the business and technical needs of the target database:
Selecting only certain columns to load (or selecting null columns not to load). For
example, if the source data has three columns (also called attributes), roll_no, age,
and salary, then the selection may take only roll_no and salary. Similarly, the
selection mechanism may ignore all those records where salary is not present
(salary = null).
Translating coded values (e.g., if the source system stores 1 for male and 2 for
female, but the warehouse stores M for male and F for female)
Encoding free-form values (e.g., mapping "Male" to "M")
Class Notes
Page 3
Sorting
Joining data from multiple sources (e.g., lookup, merge) and deduplicating the data
Aggregation (for example, rollup summarizing multiple rows of data total sales
Transposing or pivoting (turning multiple columns into multiple rows or vice versa)
Splitting a column into multiple columns (e.g., converting a comma-separated list,
series of addresses in one record into single addresses in a set of records in a linked
address table)
Lookup and validate the relevant data from tables or referential files for slowly
changing dimensions.
Applying any form of simple or complex data validation. If validation fails, it may
result in a full, partial or no rejection of the data, and thus none, some or all the data
are handed over to the next step, depending on the rule design and exception
handling. Many of the above transformations may result in exceptions, for example,
when a code translation parses an unknown code in the extracted data.
Load
The load phase loads the data into the end target, usually the data warehouse (DW). Depending
on the requirements of the organization, this process varies widely. Some data warehouses may
overwrite existing information with cumulative information; frequently, updating extracted data is
done on a daily, weekly, or monthly basis. Other data warehouses (or even other parts of the
same data warehouse) may add new data in a historical form at regular intervalsfor example,
hourly. To understand this, consider a data warehouse that is required to maintain sales records
of the last year. This data warehouse overwrites any data older than a year with newer data.
However, the entry of data for any one year window is made in a historical manner. The timing
and scope to replace or append are strategic design choices dependent on the time available
and the business needs. More complex systems can maintain a history and audit trail of all
changes to the data loaded in the data warehouse.
As the load phase interacts with a database, the constraints defined in the database schema
as well as in triggers activated upon data load apply (for example, uniqueness, referential
integrity, mandatory fields), which also contribute to the overall data quality performance of the
ETL process.
Class Notes
Page 4
Another way that companies use ETL is to move information to another application
permanently. For instance, the new application might use another database vendor
and most likely a very different database schema. ETL can be used to transform the
data into a format suitable for the new application to use.
An example of this would be an Expense and Cost Recovery System (ECRS) such
as used by accountancies, consultancies and lawyers. The data usually ends up in
the time and billing system, although some businesses may also utilize the raw data
for employee productivity reports to Human Resources (personnel dept.) or
equipment usage reports to Facilities Management.
Class Notes
Page 5
Data mining (the analysis step of the "Knowledge Discovery and Data Mining" process, or
KDD), an interdisciplinary subfield of computer science, is the computational process of
discovering patterns in large data sets involving methods at the intersection of artificial
intelligence, machine learning, statistics, and database systems. The overall goal of the data
mining process is to extract information from a data set and transform it into an understandable
Class Notes
Page 6
Page 7
Page 8
Page 9
Page 10
Page 11
Class Notes
Page 12
Page 13
Page 14
Class Notes
Page 15
Class Notes
Page 16
Class Notes
Page 17
Page 18
F-Score
2014 2 20
2
2013 Score
back test F-Score
2 2 4 3
F-Score
Page 19
9 1 0 9
F-ScoreF-Score 0 9
8 9
2000 F-Score
1976 1996 20 23%
500 14.5%F-Score
500 8.5%F-Score
F-Score 10
back test F-Score
300
11 12 5 F-Score
Class Notes
Page 20
F-Score F-Score 8
9 2 4
5 4
F-Score
F-Score F-Score
bell-shaped curve F-Score
3 7 8 9 0 2
0 8 9
0 2 1
F-Score 8 9 0 2
F-Score
EJFQ
Class Notes
Page 21
5 F-Score 89
5
2008 5 4
2 10
4 F-Score
2006 2006 5 4
3 20072008 2011
F-Score
EJFQ
EJFQ
Class Notes
Page 22
5 F-Score 89
5
34
10 2003
2008 2011
F-Score
EJFQ
10
F-Score
Class Notes
Page 23
Profitability
Taking the US stock market in 1976-1996 as an example, by adopting the F-Score screening
method to buy high-scoring stock and short-sell the low-scoring, the average annual rate of
return is up to 23%. Compared with the S & P 500 Index over the same period (which showed
Class Notes
Page 24
Since companies with smaller capitalization tend to have more valuation trap, the backtesting range only focused on those some 300 stocks with a relatively large market
capitalization in the Hang Seng Composite Index.
2.
Calculation will only be applied to those companies with financial year cycle ending in
November/December (and announcement of corresponding annual financial results given
before May in the ensuing year) so that a complete F-score could be calculated.
Measurement of efficacy will be made with respect to the annual change in stock price
starting from May in the ensuing year. A 30% stop-loss risk management rule is also
applied.
Class Notes
Page 25
A bell-shaped distribution is observed, i.e. F-score of most of the enterprises is in the range of
3-7; only a few number of companies (generally less than one percent) has a score of 8-9 or 0-2
(so far there is no company having the score of 0). Focusing on the small number of stocks
bearing the extreme scores of 8-9 and 0-2 may help to achieve higher efficiency in portfolio
management. Two investment strategies as follows have been adopted for testing their efficacy
against performance of the Hang Seng Index:
Class Notes
Page 26
In the past 10 years, the cumulative performance of Strategy 1 and 2 is superior to the HSI
[Figure 4]. It is worth noting that, since F-Score strategy can often help to pick stocks with
explosive rising trends, the cumulative performance over the years can far outstrip the
performance of the HSI. For Strategy 1, underperformance vs HSI was only recorded in 2006
(i.e., early May 2006 till end of April 2007) in the decade. For Strategy 2, underperformance was
only recorded in the three years of 2007, 2008 and 2011. In other words, the effectiveness of FScore in the application in Hong Kong stocks is very significant.
Two more strategies as follows have also been attempted based on inclusion of the EJFQ
signals in the stock selection process:
Class Notes
Page 27
Class Notes
Page 28
Heng Seng
Index
Strategy 1
Strategy 2
Strategy 3
Strategy 4
Similar to Strategies 1 and 2, Strategies 3 & 4 also outperform HSI to a significant extent. For
Strategy 3, underperformance vs HSI was only recorded in 2003. For Strategy 4,
underperformance was only recorded in 2008 and 2011.
With exclusion of factors like transaction costs and share dividends, Strategy 3 is the best
according to the above back test. Certainly, inclusion of transaction costs and share dividends
will help to provide a more comprehensive assessment.
Class Notes
Page 29