Escolar Documentos
Profissional Documentos
Cultura Documentos
S.NO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
TITLE
Abstract
Introduction
Existing system
Disadvantages
Proposed system
Advantages
System Architecture
Flow Diagram
Use case Diagram
Class Diagram
Sequence Diagram
ER diagram
Testing Of Product
Modules
Modules Description
Algorithm Description
Software Requirements
Hardware Requirements
H/W&S/W Description
Literature Survey
Screen Shots
Future Enhancement
Conclusion
References
PAGE .NO
2
3
INTRODUCTION
Web spam refers to all forms of malicious manipulation of user generated
data so as to impudence usage patterns of the data. The number of mobile Apps has
grown at a breath Taking rate over the past few years. To stimulate the
development of mobile Apps, many App stores launched daily App leader boards,
which demonstrate the chart rankings of most popular Apps. Indeed, the App
leader boars one of the most important way for promoting mobile Apps. A higher
rank on the leader board usually leads to huge number of downloads and million
dollars in the revenue. Therefore, App developers tend to explore various ways
such as advertising campaigns to promote their Apps in order to have their Apps
ranked as high as possible in such App leader boards Margins, column widths, line
spacing, and type styles are built-in; examples of the type styles are provided
throughout this document and are Identified in italic type, hence within
parentheses, following the example. Some components, such as multi-leveled
equations, graphics, and tables are not prescribed, although the various table text
styles are provided. The formatter will need to create these components,
incorporating the applicable criteria that follow. Indeed, our careful observation
reveals that mobile Apps are not always ranked high in the leader board, but only
in some leading events, which form different leading sessions. Note that we will
introduce both leading events Ease of Use and leading sessions in detail later. In
other words, ranking fraud usually happens in these leading sessions. Therefore,
detecting ranking fraud of mobile Apps is actually to detect ranking fraud within
leading sessions of mobile. Several recent studies have pointed out that advertising
in mobile (smartphones and tablets) apps is plagued by various types of frauds.
Mobile app advertisers are estimated to lose nearly 1 billion dollars (12% of the
mobile ad budget) in 2013 due to these frauds. The frauds fall under two main
categories: (1) Bot-driven frauds employ bot networks or paid users to initiate fake
ad impressions and clicks (more than 18% impressions/clicks come from bots), and
(2) Placement frauds manipulate visual layouts of ads to trigger ad impressions and
unintentional clicks from real users (47% of user clicks are reportedly accidental).
Mobile app publishers are incentivized to commit such frauds since ad networks
pay them based on impression count click count, or more commonly, combinations
of both. Bot-driven ad frauds have been studied recently, but placement frauds in
mobile apps have not received much attention from the academic community. In
this paper, we make two contributions. First, we present the design and
implementation of a scalable system for automatically detecting ad placement
fraud in mobile apps. Second, using a large collection of apps, we characterize the
prevalence of ad place-Detecting ad fraud. In Web advertising, most fraud
detection is centered on analyzing server-side logs or network traffic, which are
mostly effective for detecting bot-driven ads. These can also reveal placement
frauds to some degree (e.g., an ad not shown to users will never receive any
clicks), but such detection is possible only after fraudulent impressions and clicks
have been created. While this may be feasible for mobile apps, we explore a
qualitatively different approach: to detect fraudulent behavior by analyzing the
structure of the app, an approach that can detect placement frauds more effectively
and before an app is used (e.g., before it is released to the app store). Our approach
leverages the highly specific, and legally enforceable, terms and conditions that ad
networks place on app developers. For example, Microsoft Advertising says
developers must not edit, resize, modify, filter, obscure, hide, make transparent, or
reorder any advertising. Despite these prohibitions, app developers continue to
engage in fraud: Figure shows (on the left) an app in which 3 ads are shown at the
bottom of a page while ad networks restrict developers to 1 per page, and (on the
right) an app in which an ad is hidden behind UI buttons. The key insight in our
work is that manipulation of the visual layout of ads in a mobile app can be
programmatically detected by combining two key ideas: (a) a UI automation tool
that permits automated traversal of all the pages of a mobile app, and (b)
extensible fraud checkers that test the visual layout of each page for compliance
with an ad networks terms and conditions. While we use the term ad fraud, we
emphasize that our work deems as fraud any violation of published terms and
conditions, and does not attempt to infer whether the violations are intentional or
not. This survey paper categorizes, compares, and summarizes from almost all
published technical and review articles in automated fraud detection within the last
10 years. It defines the professional fraudster, formalizes the main types and
subtypes of known fraud, and presents the nature of data evidence collected within
affected industries. Within the business context of mining the data to achieve
higher cost savings, this research presents methods and techniques together with
their problems. Compared to all related reviews on fraud detection, this survey
covers much more technical articles and is the only one, to the best of our
knowledge, which proposes alternative data and solutions from related domains.
Data mining is about finding insights which are statistically reliable, unknown
previously, and actionable from data. This data must be available, relevant,
adequate, and clean. Also, the data mining problem must be well-defined, cannot
be solved by query and reporting tools, and guided by a data mining process
model. The term fraud here refers to the abuse of a profit organizations system
without necessarily leading to direct legal consequences. In a competitive
environment, fraud can become a business critical problem if it is very prevalent
and if the prevention procedures are not fail-safe. Fraud detection, being part of the
overall fraud control, automates and helps reduce the manual parts of a
screening/checking process. This area has become one of the most established
industry/government data mining applications. It is impossible to be absolutely
certain about the legitimacy of and intention behind an application or transaction.
Given the reality, the best cost effective option is to tease out possible evidences of
fraud from the available data using mathematical algorithms. Evolved from
numerous research communities, especially those from developed countries, the
analytical engine within these solutions and software are driven by artificial
immune systems, artificial intelligence, auditing, database, distributed and parallel
computing, econometrics, expert systems, fuzzy logic, genetic algorithms, machine
learning, neural networks, pattern recognition, statistics, visualization and others.
There are plenty of specialized fraud detection solutions and software which
protect businesses such as credit card, e-commerce, insurance, retail,
telecommunications industries. There are often two main criticisms of data miningbased fraud detection research: the dearth of publicly available real data to perform
experiments on; and the lack of published well-researched methods and techniques.
To counter both of them, this paper garners all related literature for categorization
and comparison, selects some innovative methods and techniques for discussion;
and points toward other data sources as possible alternatives. Many Android
applications are distributed for free but are supported by advertisements. Ad
libraries embedded in the app fetch content from the ad provider and display it on
the app's user interface. The ad provider pays the developer for the ads displayed to
the user and ads clicked by the user. A major threat to this ecosystem is ad fraud,
where a miscreant's code fetches ads without displaying them to the user
or\clicks"on ads automatically. Ad fraud has been extensively studied in the
context of web advertising but has gone largely unstudied in the context of mobile
advertising. Online advertising is a financial pillar that supports both free Web
content and services, and free mobile apps. Both web and mobile advertising use a
similar infrastructure: the ad library embedded in the web page or mobile app
fetches content from ad providers and displays it on the web page or the mobile
app's user interface. The ad provider pays the developer for the ads displayed
(impressions) and the ads clicked (clicks) by the user. Because web and mobile
advertising use a similar infrastructure, they are subject to the same security
concerns, such as tracking and privacy infringements. Perhaps the biggest threat to
the sustainability of this ecosystem is ad fraud, where a miscreant's code fetches
ads without displaying them to the user or \clicks" on ads programmatically. Ad
fraud has been extensively studied in the context of web advertising but has gone
largely unstudied in the context of mobile advertising. On the web, ad fraud is
often perpetrated by botnets, which are collections of compromised user machines
called bots. Fraudsters issue fabricated impressions and clicks using bots so that
the traffic they generate is varied (i.e., by IP address), making the fraud harder to
detect. We take the first step to study fraud and other undesirable behavior in
mobile advertising. First, we identify unique characteristics of mobile ad fraud. On
Android, at any time at most one app is running in the foreground, where the app
has a UI. Our first observation is that when an app fetches ads while it is in the
background, this is most likely fraudulent, because the app developer gets credit
for this ad impression without displaying it to the user. Our second observation is
that when an app clicks an ad without user interaction, it is definitely fraudulent.
Based on our observations, we set out to measure the prevalence of ad fraud in the
wild. We use two sets of apps: 1) 130,339 apps crawled from 19 Android markets
including Play and many third-party markets, and 2) 35,087 apps that likely
contain malware provided by a security company. We build a testing infrastructure,
where we launch multiple instances of the Android emulator concurrently. In each
emulator, we install an app from our datasets, run it for a fixed time, push it to the
background, and continue running for a fixed time, while capturing all the network
traffic from the emulator. Finally, we extract impressions, clicks, and other ad
related activities from the network traffic.
OVERVIEW
Several recent studies have pointed out that advertising in mobile apps is plagued
by various types of frauds. In Web advertising, most fraud detection is centered
around analyzing server-side logs network traffic, which are mostly effective for
detecting bot-driven ads. These can also reveal placement frauds to some degree,
but such detection is possible only after fraudulent impressions and clicks have
been created.
While this may be feasible for mobile apps, we explore a qualitatively different
approach: to detect fraudulent behavior by analyzing the structure of the app, an
approach that can detect placement frauds more effectively and before an app is
used. Our approach leverages the highly specific, and legally enforceable, terms
and conditions that ad networks place on app developers.
OBJECTIVE
Detect fraud ranking in daily app leaderboard. Avoid ranking manipulation. The
proposed frame-work is scalable and can be extended with other domaingenerated evidences for ranking fraud detection. The scalability of the detection
algorithm as well as some regularity of ranking fraud activities.
Application:
In This Process focus on research in data and knowledge engineering, for developing
effective and efficient data analysis techniques for emerging data intensive
applications. area and Process area. And the Google baseline is used for
evaluating the effectiveness of our ranking aggregation method.
EXISTING SYSTEM
This project proposes techniques to accurately locate the ranking fraud by
mining the active periods, namely leading sessions, of mobile Apps. Proposed
approach investigate three types of evidences, i.e., ranking based evidences, rating
based evidences and review based evidences, by modeling Apps ranking, rating
and review behaviors through statistical hypotheses tests. It proposes an
optimization based aggregation method. The proposed frame-work is scalable and
can be extended with other domain-generated evidences for ranking fraud
detection. A critical challenge along this line is that the context log of each
individual user may not contain sufficient data for mining his/her context-aware
preferences. Therefore, we propose to first learn common context-aware
preferences from the context logs of many users. Then, the preference of each user
can be represented as a distribution of these common context-aware preferences.
Specifically, we develop two approaches for mining common context-aware
approach and indicate some inspiring findings. In contrast, the robustness of CIAP
is not good with small numbers of common context-aware preferences but
becomes stable when the setting of the number increases. It may be because that
CDAP leverages associations between contexts and user content categories for
extracting common context-aware preferences and such associations have been
filtered from noisy data. Thus, the quality of mined common context-aware
preferences is always relatively good with different parameters since the mining
are on the basis of pruned training data. In contrast, CIAP leverages ACP-features
for extracting common context-aware preferences, where ACP-features usually
contain more noisy information and thus make the mining results more sensitive to
parameters. Note that, raw locations in context data, such as GPS coordinates or
cell IDs, have been transformed into semantic locations such as Home and
Work Place by some location mining approaches. The basic idea of these
approaches is to find the clusters of user locations and recognize their semantic
meaning by a time pattern analysis. Moreover, we also map the raw usage records
to the usage records of particular categories of contents. In this way, the context
data and usage records in context logs are normalized and the data sparseness
problem is somewhat alleviated. In this paper, we proposed to exploit user context
logs for mining the personal context-aware preferences of mobile users. First, we
identified common context-aware preferences from the context logs of many users.
Then, the personal context-aware preference of an individual user can be
represented as a distribution of common context-aware preferences. Finally, the
experimental results on a real-world data set clearly showed that the proposed
approach could achieve better performances than benchmark methods for mining
personal context-aware preferences, and the one implementation based on the
independent assumption of context data slightly outperforms another one but has
relatively higher computational cost. In this paper, we illustrate how to extract
personal context-aware preferences from the context-rich device logs for building
novel personalized context-aware recommender systems.
Disadvantages:
1.
It is not easy to identify and confirm the evidences linked to ranking fraud.
2.
3.
Ranking fraud does not always happen in the whole life cycle of an App.
4.
When an app was promoted with the help of ranking manipulation it could
be top in leaderboard and more new users could be purchased that product.
5.
PROPOSED SYSTEM
In proposed system, the system propose to develop a ranking fraud detection
system for mobile apps. Ranking fraud does not always happen in the hole life
cycle of an app, so the system need to detect the time when fraud happens. Indeed
our careful observation revels that mobiles apps are not always ranked high in the
leaderboard, but only in some leading events, which form different leading
sessions. Specifically the system first proposed a simple but effective algorithm to
identify the leading sessions of each app based on its historical ranking records.
Ranking fraud in the mobile App market refers to fraudulent or deceptive activities
which have a purpose of bumping up the Apps in the popularity list. Indeed, it
becomes more and more frequent for App developers to use shady means, such as
inflating their Apps' sales or posting phony App ratings, to commit ranking fraud.
While the importance of preventing ranking fraud has been widely recognized,
there is limited understanding and research in this area. To this end, in this paper,
we provide a holistic view of ranking fraud and propose a ranking fraud detection
system for mobile Apps. Specifically, we first propose to accurately locate the
ranking fraud by mining the active periods, namely leading sessions, of mobile
Apps. Such leading sessions can be leveraged for detecting the local anomaly
instead of global anomaly of App rankings. Furthermore, we investigate three types
of evidences, i.e., ranking based evidences, rating based evidences and review
based evidences, by modeling Apps' ranking, rating and review behaviors through
statistical hypotheses tests. In addition, we propose an optimization based
aggregation method to integrate all the evidences for fraud detection. Finally, we
evaluate the proposed system with real-world App data collected from the iOS App
Store for a long time period. In the experiments, we validate the effectiveness of
the proposed system, and show the scalability of the detection algorithm as well as
some regularity of ranking fraud activities. A key step for the mobile app usage
analysis is to classify apps into some predefined categories. However, it is a nontrivial task to effectively classify mobile apps due to the limited contextual
information available for the analysis. To this end, in this paper, we propose an
approach to first en-rich the contextual information of mobile apps by exploiting
the additional Web knowledge from the Web search engine. Then, inspired by the
observation that different types of mobile apps may be relevant to different realworld contexts, we also extract some contextual features for mobile apps from the
context-rich device logs of mobile users. Finally, we combine all the enriched
contextual information into a Maximum Entropy model for training a mobile app
classifier. The experimental results based on 443 mobile users device logs clearly
show that our approach outperforms two state-of-the-art benchmark methods with
a significant margin. To this end, in this paper, we propose to leverage both Web
knowledge and real-world contexts for enriching the contextual information of
apps, thus can improve the performance of mobile app classification. According to
Advantages:
1
SYSTEM ARCHITECTURE
User Rating
User Review
End
SEQUENCE DIARGAM
E-R DIAGRAM
E-R DIAGRAM
TESTING OF PRODUCT
Testing of Product:
System testing is the stage of implementation, which aimed at
ensuring that system works accurately and efficiently before the live operation
commence. Testing is the process of executing a program with the intent of finding
an error. A good test case is one that has a high probability of finding an error. A
successful test is one that answers a yet undiscovered error.
Testing is vital to the success of the system. System testing makes a
logical assumption that if all parts of the system are correct, the goal will be
successfully achieved. The candidate system is subject to variety of tests-on-line
response, Volume Street, recovery and security and usability test. A series of tests
are performed before the system is ready for the user acceptance testing. Any
engineered product can be tested in one of the following ways. Knowing the
specified function that a product has been designed to from, test can be conducted
to demonstrate each function is fully operational. Knowing the internal working of
a product, tests can be conducted to ensure that al gears mesh, that is the internal
operation of the product performs according to the specification and all internal
components have been adequately exercised.
UNIT TESTING:
Unit testing is the testing of each module and the integration of the overall
system is done. Unit testing becomes verification efforts on the smallest unit of
software design in the module. This is also known as module testing. The
modules of the system are tested separately. This testing is carried out during the
programming itself. In this testing step, each model is found to be working
satisfactorily as regard to the expected output from the module. There are some
validation checks for the fields. For example, the validation check is done for
verifying the data given by the user where both format and validity of the data
entered is included. It is very easy to find error and debug the system.
INTEGRATION TESTING:
Data can be lost across an interface, one module can have an adverse
effect on the other sub function, when combined, may not produce the desired
major function. Integrated testing is systematic testing that can be done with
sample data. The need for the integrated test is to find the overall system
performance. There are two types of integration testing. They are:
1
White Box testing is a test case design method that uses the control
structure of the procedural design to drive cases. Using the white box testing
methods, we derived test cases that guarantee that all independent paths within a
module have been exercised at least once.
BLACK BOX TESTING:
1
Interface error
Performance errors
User acceptance of the system is the key factor for the success of the
system. The system under consideration is tested for user acceptance by constantly
keeping in touch with prospective system at the time of developing changes
whenever required.
2
OUTPUT TESTING:
After performing the validation testing, the next step is output asking the
user about the format required testing of the proposed system, since no system
could be useful if it does not produce the required output in the specific format.
The output displayed or generated by the system under consideration. Here the
output format is considered in two ways. One is screen and the other is printed
format. The output format on the screen is found to be correct as the format was
designed in the system phase according to the user needs. For the hard copy also
output comes out as the specified requirements by the user. Hence the output
testing does not result in any connection in the system.
System Implementation:
Implementation of software refers to the final installation of the
package in its real environment, to the satisfaction of the intended users and the
operation of the system. The people are not sure that the software is meant to make
their job easier.
1
The active user must be aware of the benefits of using the system
Before going ahead and viewing the system, the user must know that for
viewing the result, the server program should be running in the server. If the server
object is not running on the server, the actual processes will not take place.
User Training:
To achieve the objectives and benefits expected from the proposed system
it is essential for the people who will be involved to be confident of their role in the
new system. As system becomes more complex, the need for education and
training is more and more important.
Education is complementary to training. It brings life to formal training
by explaining the background to the resources for them. Education involves
creating the right atmosphere and motivating user staff. Education information can
make training more interesting and more understandable.
Training on the Application Software:
After providing the necessary basic training on the computer
awareness, the users will have to be trained on the new application software. This
will give the underlying philosophy of the use of the new system such as the screen
flow, screen design, type of help on the screen, type of errors while entering the
data, the corresponding validation check at each entry and the ways to correct the
data entered. This training may be different across different user groups and across
different levels of hierarchy.
Operational Documentation:
Once the implementation plan is decided, it is essential that the user of the
system is made familiar and comfortable with the environment. A documentation
providing the whole operations of the system is being developed. Useful tips and
guidance is given inside the application itself to the user. The system is developed
user friendly so that the user can work the system from the tips given in the
application itself.
System Maintenance:
The maintenance phase of the software cycle is the time in which
software performs useful work. After a system is successfully implemented, it
should be maintained in a proper manner. System maintenance is an important
aspect in the software development life cycle. The need for system maintenance is
to make adaptable to the changes in the system environment. There may be social,
technical and other environmental changes, which affect a system which is being
implemented. Software product enhancements may involve providing new
functional capabilities, improving user displays and mode of interaction, upgrading
the performance characteristics of the system. So only thru proper system
maintenance procedures, the system can be adapted to cope up with these changes.
Software maintenance is of course, far more than finding mistakes.
Corrective Maintenance:
The first maintenance activity occurs because it is unreasonable to
assume that software testing will uncover all latent errors in a large software
system. During the use of any large program, errors will occur and be reported
to the developer. The process that includes the diagnosis and correction of one or
more errors is called Corrective Maintenance.
Adaptive Maintenance:
The second activity that contributes to a definition of maintenance
occurs because of the rapid change that is encountered in every aspect of
computing. Therefore Adaptive maintenance termed as an activity that modifies
software to properly interfere with a changing environment is both necessary and
commonplace.
Perceptive Maintenance:
The third activity that may be applied to a definition of maintenance
occurs when a software package is successful. As the software is used,
recommendations for new capabilities, modifications to existing functions, and
general enhancement are received from users. To satisfy requests in this category,
Perceptive maintenance is performed. This activity accounts for the majority of all
efforts expended on software maintenance.
Preventive Maintenance:
The fourth maintenance activity occurs when software is changed to
improve future maintainability or reliability, or to provide a better basis for future
MODULES:
1
MODULE DESCRIPTION
Rating
based
evidence
model
has
useful
for
ranking
fraud
detection. We also study how to extract fraud evidences from Apps historical
rating records. Specifically, after an App has been published, it can be
rated by any user who downloaded it. Indeed, user rating is one of the most
important features of App advertisement. An App which has higher rating may
attract more users to download and can also be ranked higher in the leaderboard.
Thus, rating manipulation is also an important perspective of ranking fraud.
Intuitively, if an App has ranking fraud in a leading session s, the ratings during
the time period may have anomaly patterns compared with its historical ratings,
which can be used for constructing rating based evidences. In our project verify the
user id for which is come to give the rating for particular app.
of
reviews
in
the
leading
sessions
and
capturing
them as evidences for ranking fraud detection are still under-explored. To this end,
here we propose two fraud evidences based on Apps review behaviors in leading
sessions for detecting ranking fraud.
Admin Analyze Rank Evidence Aggregation:
That Apps ranking behaviors in a leading event always satisfy a specific ranking
pattern, which consists of three different ranking phases, namely, rising phase,
maintaining phase and recession phase. Specifically, in each leading event, an
Apps ranking first increases to a peak position in the leaderboard then keeps such
peak position for a period, and finally decreases till the end of the event.in this
phase the admin has
taken
fraudulent app. Based on this leading session admin has given rank for each
Apps. There are many ranking and evidence aggregation methods in the literature,
such as permutation based models score based models and Dumpster Shafer rules.
However, some of these methods focus on learning a global ranking for all
candidates. This is not proper for detecting ranking fraud for new Apps. Other
methods are based on supervised learning techniques, which depend on the labeled
training data and are hard to be exploited. Instead, we propose an unsupervised
approach based on fraud similarity to combine these evidences.
Tool Description
In part of speech tagging algorithms, two main sources of information are used to
compute the probability that a specific tag is correct: the probability of a specific
tag for a specific word and the relative probability of the current sequence of tags
in English. To combine these two probabilities and determine the best overall tag
for a given word, many statisticians use Hidden Markov Models (HMMs).
In understanding how an HMM works, first we must examine how it would work if
it only took into account the probability that a specific tag occurs with a specific
word. This process works in ways very similar to n-grams (which are actually
instances of Markov Models) in that it makes the bigram assumption the words tag
can be determined simply based on the previous words tag. Given that the model
has been trained on a tagged corpus, which already has part of speech information
added, it can calculate the probabilities that specific words serve as nouns, verbs,
or other parts of speech. Additionally, it can calculate the probability that one part
of speech occurs after another part of speech. The models task can best be
understood by using an example, such as to flower. We assume that we know the
proper tag for to and that we know flower can be a noun or a verb. Then we
seek to maximize the product of the probability that the tag for flower follows the
tag for to and the probability that, given we are expecting a certain tag, flower is
the word matching this tag. Expressed mathematically, we compare the values of
P(VERB | TO) P(flower | VERB) and P(NOUN | TO)P(flower | NOUN), as TO
is the correct tag for to. This is the task that the model would face if it were
seeking to tag an individual word when given the preceding words tag.
In actuality, HMMs usually try to tag whole sentences at one time, and they are not
given any tags that are certain. Thus, there are many more probabilities and
comparisons involved. To limit these computations and create a process that is
manageable given time and computing constraints, HMMs are usually
insert_line() finds the proper place to insert the contents of new_line , having
searchkey key in the sorted file pointed to by fp . It returns NULL if a line with this
searchkey is already in the file.
Term frequency
Variants of TF weight
weighting scheme
binary
{0,1}
raw frequency
f_{t,d}
Log normalization
1 + \log (f_{t,d})
Double normalization
0.5
0.5 + 0.5 \frac { f_{t,d} }{\max
{f_{t,d}}}
Double normalization K
K + (1 - K) \frac { f_{t,d} }{\max
{f_{t,d}}}
from 312 vehicles, we demonstrate that this algorithm effectively limits tracking
risks, in particular, by eliminating tracking outliers. It also achieves significant data
accuracy improvements compared to known algorithms. We then present two
enhancements to the algorithm. First, it also addresses the home identification risk
by reducing location information revealed at the start and end of trips. Second, it
also considers heading information reported by users in the tracking model. This
version can thus protect users who are moving in dense areas but in a different
direction from the majority.
Advantage:
Data accuracy
Disadvantage:
It is not sufficient to protect privacy
Disadvantage :
It may not effectively follow the broken links to detect the fault with so
many broken links.
and experimental results show that iExpand can lead to better ranking performance
than state-of-the-art methods with a significant margin.
Advantage:
The interest expansion is more proper to capture the diversified interests and
find potential interests for the users.
Disadvantage:
It does not perform well.
5. Mobile App Classification with Enriched Contextual Information
Paper Description:
The study of the use of mobile Apps plays an important role in
understanding the user preferences, and thus provides the opportunities for
intelligent personalized context-based services. A key step for the mobile App
usage analysis is to classify Apps into some predefined categories. However, it is a
nontrivial task to effectively classify mobile Apps due to the limited contextual
information available for the analysis. For instance, there is limited contextual
information about mobile Apps in their names. However, this contextual
information is usually incomplete and ambiguous. To this end, in this paper, we
propose an approach for first enriching the contextual information of mobile Apps
by exploiting the additional Web knowledge from the Web search engine. Then,
inspired by the observation that different types of mobile Apps may be relevant to
different real-world contexts, we also extract some contextual features for mobile
Apps from the context-rich device logs of mobile users. Finally, we combine all the
enriched contextual information into the Maximum Entropy model for training a
mobile App classifier. To validate the proposed method, we conduct extensive
experiments on 443 mobile users device logs to show both the effectiveness and
efficiency of the proposed approach. The experimental results clearly show that our
Disadvantage:
It cannot be computed tractably.
SYSTEM REQUIREMENTS
Software Requirements
1
O/S
: Windows XP.
Language
: Java.
IDE
Data Base
: MySQL
Hardware Requirements
1
System
Hard Disk
Monitor
: 15 VGA color
Mouse
: Logitech.
Keyboard
Ram
: 2GB
SOFTWARE DESCRIPTION
Java
Java is a programming language originally developed by James Gosling at
Sun Microsystems (now a subsidiary of Oracle Corporation) and released in 1995
as a core component of Sun Microsystems' Java platform. The language derives
much of its syntax from C and C++ but has a simpler object model and fewer lowlevel facilities. Java applications are typically compiled to byte code (class file)
that can run on any Java Virtual Machine (JVM) regardless of computer
architecture. Java is a general-purpose, concurrent, class-based, object-oriented
language that is specifically designed to have as few implementation dependencies
as possible. It is intended to let application developers "write once, run anywhere."
Java is currently one of the most popular programming languages in use,
particularly for client-server web applications.
Java Platform:
written
in
the
Java
language
must
run
similarly
on
any
Beans
Platform is
The Net Beans IDE bundle for Java SE contains what is needed to
start developing Net Beans plug-in and Net Beans Platform based
applications; no additional SDK is required.
Applications can install modules dynamically. Any application
can include the Update Center module to allow users of the
application
to
and
new
Wamp Server
WAMPs are packages of independently-created programs
installed on computers that use a Microsoft Windows operating
system.
Apache is a web server. MySQL is an open-source database.
PHP is a scripting language that can manipulate information held
in a database and generate web pages dynamically each time
content is requested by a browser. Other programs may also be
included in a package, such as phpMyAdmin which provides a
graphical user interface for the MySQL database manager, or the
alternative scripting languages Python or Perl.
MySQL
database
management
system
often
use
MySQL.
FEASIBILITY STUDY
The feasibility study is carried out to test whether the proposed system is
worth being implemented. The proposed system will be selected if it is best enough
in meeting the performance requirements.
The feasibility carried out mainly in three sections namely.
Economic Feasibility
Technical Feasibility
Behavioral Feasibility
Economic Feasibility
Economic analysis is the most frequently used method for evaluating
effectiveness of the proposed system. More commonly known as cost benefit
analysis. This procedure determines the benefits and saving that are expected from
the system of the proposed system. The hardware in system department if
sufficient for system development.
Technical Feasibility
This study center around the systems department hardware, software and to
what extend it can support the proposed system department is having the required
hardware and software there is no question of increasing the cost of implementing
the proposed system.
and the proposed system can be developed with the existing facility.
Behavioral Feasibility
People are inherently resistant to change and need sufficient amount of
training, which would result in lot of expenditure for the organization. The
proposed system can generate reports with day-to-day information immediately at
the users request, instead of getting a report, which doesnt contain much detail.
System Implementation
Implementation of software refers to the final installation of the
package in its real environment, to the satisfaction of the intended users and the
operation of the system. The people are not sure that the software is meant to make
their job easier.
1
The active user must be aware of the benefits of using the system
viewing the result, the server program should be running in the server. If the server
object is not running on the server, the actual processes will not take place.
User Training
To achieve the objectives and benefits expected from the proposed system it
is essential for the people who will be involved to be confident of their role in the
new system. As system becomes more complex, the need for education and
training is more and more important. Education is complementary to training. It
brings life to formal training by explaining the background to the resources for
them. Education involves creating the right atmosphere and motivating user staff.
Education
information
can
make
training
more
interesting
and
more
understandable.
Operational Documentation
Once the implementation plan is decided, it is essential that the user of the
system is made familiar and comfortable with the environment. A documentation
providing the whole operations of the system is being developed. Useful tips and
guidance is given inside the application itself to the user. The system is developed
user friendly so that the user can work the system from the tips given in the
application itself.
System Maintenance
The maintenance phase of the software cycle is the time in which software
performs useful work. After a system is successfully implemented, it should be
maintained in a proper manner. System maintenance is an important aspect in the
software development life cycle. The need for system maintenance is to make
adaptable to the changes in the system environment. There may be social, technical
and other environmental changes, which affect a system which is being
implemented. Software product enhancements may involve providing new
functional capabilities, improving user displays and mode of interaction, upgrading
the performance characteristics of the system. So only thru proper system
maintenance procedures, the system can be adapted to cope up with these changes.
Software maintenance is of course, far more than finding mistakes.
Corrective Maintenance
The first maintenance activity occurs because it is unreasonable to assume
that software testing will uncover all latent errors in a large software system.
During the use of any large program, errors will occur and be reported to the
developer. The process that includes the diagnosis and correction of one or more
errors is called Corrective Maintenance.
Adaptive Maintenance
Perceptive Maintenance
The third activity that may be applied to a definition of maintenance occurs
when a software package is successful. As the software is used, recommendations
for new capabilities, modifications to existing functions, and general enhancement
are received from users. To satisfy requests in this category, Perceptive
maintenance is performed. This activity accounts for the majority of all efforts
expended on software maintenance.
Preventive Maintenance
The fourth maintenance activity occurs when software is changed to improve
future maintainability or reliability, or to provide a better basis for future
enhancements. Often called preventive maintenance, this activity is characterized
by reverse engineering and re-engineering techniques
CONCLUSION
The system developed a ranking fraud detection sys-tem for mobile Apps.
Specifically, we first showed that ranking fraud happened in leading sessions and
provided a method for mining leading sessions for each App from its historical
ranking records. This survey has explored almost all published fraud detection
studies. It defines the adversary, the types and subtypes of fraud, the technical
nature of data, performance metrics, and the methods and techniques. After
identifying the limitations in methods and techniques of fraud detection, this paper
shows that this field can benefit from other related fields.
Future Enhancement
In this System Specifically, unsupervised approaches from counterterrorism
work, actual monitoring systems and text mining from law enforcement, and semisupervised and game-theoretic approaches from intrusion and spam detection
communities can contribute to future fraud detection research. However, the
system show that there are no guarantees when they successfully applied their
fraud detection method to news story monitoring but unsuccessfully to intrusion
detection. Future work will be in the form of credit application fraud detection.
REFERENCES
[1] (2014). [Online]. Available: h ttp://en.wikipe dia.or g/wiki/ cohens_kappa
[2]
(2014).
[Online].
Available:
ttp://en.wikipe
dia.or
g/wiki/
information_retrieval
[3] (2012). [Online]. Available: https://developer.apple.com/news/ index.php?
id=02062012a
[4] (2012). [Online]. Available: http://venturebeat.com/2012/07/03/ applescrackdown-on-app-ranking-manipulation/
[5]
(2012).
[Online].
Available:
crackdown-biggest-app-store-ranking-fra
http://www.ibtimes.com/apple-threatensud-406764
[6]
(2012).
[Online].
[11] D. F. Gleich and L.-h. Lim, Rank aggregation via nuclear norm
minimization, in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Dis-covery Data
Mining, 2011, pp. 6068.
[12] T. L. Griffiths and M. Steyvers, Finding scientific topics, Proc. Nat. Acad.
Sci. USA, vol. 101, pp. 52285235, 2004.
[13] G. Heinrich, Parameter estimation for text analysis, Univ. Leipzig, Leipzig,
Germany, Tech. Rep., http://faculty.cs.byu.edu/~ring-ger/CS601R/papers/HeinrichGibbsLDA.pdf, 2008.
[14] N. Jindal and B. Liu, Opinion spam and analysis, in Proc. Int. Conf. Web
Search Data Mining , 2008, pp. 219230.
[15] J. Kivinen and M. K. Warmuth, Additive versus exponentiated gradient
updates for linear prediction, in Proc. 27th Annu. ACM Symp. Theory Comput.,
1995, pp. 209218.
[16] A. Klementiev, D. Roth, and K. Small, An unsupervised learning algorithm
for rank aggregation, in Proc. 18th Eur. Conf. Mach. Learn. , 2007, pp. 616623.
[17] A. Klementiev, D. Roth, and K. Small, Unsupervised rank aggre-gation with
distance-based models, in Proc. 25th Int. Conf. Mach. Learn. , 2008, pp. 472479.
[18] A. Klementiev, D. Roth, K. Small, and I. Titov, Unsupervised rank
aggregation with domain-specific expertise, in Proc. 21 st Int. Joint Conf. Artif.
Intell., 2009, pp. 11011106.
[19] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw, Detecting
product review spammers using rating behaviors, in Proc. 19th ACM Int. Conf.
Inform. Knowl. Manage. , 2010, pp. 939948.
[20] Y.-T. Liu, T.-Y. Liu, T. Qin, Z.-M. Ma, and H. Li, Supervised rank
aggregation, in Proc. 16th Int. Conf. World Wide Web , 2007, pp. 481490.
[21] A. Mukherjee, A. Kumar, B. Liu, J. Wang, M. Hsu, M. Castellanos, and R.
Ghosh, Spotting opinion spammers using behavioral foot-prints, in Proc. 19th
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2013, pp. 632640.
[22] A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly, Detecting spam web
pages through content analysis, in Proc. 15th Int. Conf. World Wide Web, 2006,
pp. 8392.
[23] G. Shafer, A Mathematical Theory of Evidence . Princeton, NJ, USA:
Princeton Univ. Press, 1976.
[24] K. Shi and K. Ali, Getjar mobile application recommendations with very
sparse datasets, inProc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining , 2012, pp. 204212.
[25] N. Spirin and J. Han, Survey on web spam detection: Principles and
algorithms, SIGKDD Explor. Newslett., vol. 13, no. 2, pp. 50 64, May 2012.
[26] M. N. Volkovs and R. S. Zemel, A flexible generative model for preference
aggregation, in Proc. 21st Int. Conf. World Wide Web, 2012, pp. 479488.
[27] Z. Wu, J. W u, J. Cao, and D. Tao, HySAD: A semi-supervised hybrid
shilling attack detector for trustworthy product recom-mendation, in Proc. 18th
ACM SIGKDD Int. Conf. Knowl. Discov-ery Data Mining , 2012, pp. 985993.
[28] S. Xie, G. Wang, S. Lin, and P. S. Yu, Review spam detection via temporal
pattern discovery, in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining , 2012, pp. 823831.