Você está na página 1de 28

MALWARE IDENTIFICATION USING DEEP LEARNING

Abstract

Malware, short for Malicious Software, is growing continuously in numbers and sophistication
as our digital world continuous to grow. It is a very serious problem and many efforts are
devoted to malware detection in today's cybersecurity world. Many machine learning algorithms
are used for the automatic detection of malware in recent years. Most recently, deep learning is
being used with better performance. Deep learning models are shown to work much better in the
analysis of long sequences of system calls. In this paper a shallow deep learning-based feature
extraction method (word2vec) is used for representing any given malware based on its opcodes.
Gradient Boosting algorithm is used for the classification task. Then, k-fold cross-validation is
used to validate the model performance without sacrificing a validation split. Evaluation results
show up to 96% accuracy with limited sample data.

1
Chapter I

Introduction

Malware is a malicious software, and it is harmful when executed on a computing device or


system. The consequences of running malware could be from minor to catastrophic damages,
such as loosing critical data or failure in a nuclear power plant. Therefore, malware affects our
daily lives continuously, and our computing systems need protection from malware attacks round
the clock. To provide protection, many researchers and cybersecurity firms are working on new
and robust methods and tools for the automatic identification and detection of malware software.
Automatic malware detection is a scientific research area and many machine learning algorithms
are used and adapted for automatic malware detection problem. Recently deep learning
algorithms, that are multilayer neural networks, are being used in machine learning area and they
are successful in many learning processes, such as classification. However, Deep Learning
requires more computation time to train and retrain the models, which is common in malware
detection business since new malware types appear frequently and they need to be added to the
training sets. The trade-off is therefore challenging: the classical ML machine learning
algorithms are fast but not so much accurate, on the other hand emerging deep learning methods
are time consuming but more accurate in malware detection.

Signature method is an old technique used by antivirus software for malware detection. A
signature is a short sequence of bytes that uniquely idetifies a malware type. However this
method is not so effective due to ever changing malware types. Malware analysis has two
phases. In the first phase, so called malware discovery phase, malware is first caught and
identifed. In the second phase, which is the malware classification phase, security systems try to
identify or classify each threat sample as one of the appropriate malware families. For malware
classification, feature vector selection methods are used in the classification process and this
could be divided into two types: static and dynamic analysis. Dynamic analysis method consists
of executing the malware and monitoring its behavior, and stating changes in the executed
environment. The complexity of environment setup is very high and the time needed to see the
outcome of executing all malware is too long. However, this approach gives the most safe, good,
and reliable results.

2
Static analysis method on the other hand implies examining the malware by analyzing the
metadata of malware executables, assembly code instructions, and binary data in its content. This
method requires a simple environment setup cost and the results can be obtained much faster.

3
Chapter II

System Analysis

System analysis is the overall analysis of the system before implementation and for arriving at a
precise solution. Careful analysis of a system before implementation prevents post
implementation problems that might arise due to bad analysis of the problem statement. Thus the
necessity for systems analysis is justified. Analysis is the first crucial step, detailed study of the
various operations performed by a system and their relationships within and outside of the
system. Analysis is defining the boundaries of the system that will be followed by design and
implementation.

Existing System

In the supervised learning approach, models are built from malware input data and its labeled
class. The created models are used in the classification of malware. Furthermore, feature vectors
are used for supervised learning which consists exactly two columns. The first column of a
feature vector is a categorical which describes a malware class and the second one has malware
opcodes, which consists of code sections describing each malware and play an important role in
the classification accuracy of a malware. The malware classification accuracy heavily depends
on the selection of appropriate feature vectors.

A great deal of research use neural networks for malware detection and analysis. For instance, a
large scale approach on random projections and neural networks to classify malware has been
done. However, they have shown that increasing the number of hidden layers in the neural
network did not provide significant accuracy gains. Feedforward based deep neural networks are
used for the classification of static malware. But, dynamic analysis results are missing in their
research. Disassembling obfuscated binaries may not give satisfactory inputs for classification.

4
Proposed System

Word2Vec is a popular tool used in the literature especially for natural languages like English. In
this model an association of each word is accomplished with a mathematical vector
representation under which some structural or semantic relation holds. Despite being designed
for natural language processing, it is also satisfactory for malware assembly code represention.
Word2Vec is used for generating embeddings for words from text in natural languages. At the
end of this learning process, word embeddings (vector representations of words) are generated
for each input instance. These vectors show the syntactic and semantic relationships between
words; words that share common context are located in close proximity in the vector space.

5
Chapter III

Feasibility Study

The preliminary investigation examines project feasibility; the likelihood the system will be
useful to the organization. The main objective of the feasibility study is to test Technical,
Operational and Economical feasibility for adding new modules and debugging oldest running
system. All systems are feasible if they are given unlimited resources and infinite time. There are
aspects in the feasibility study portion of the preliminary investigation:

Operational Feasibility

The application smart audit does not require additional manual involvement or labor towards
maintenance of the system. The Cost of training is minimized due to the user friendliness of the
developed application. Recurring expenditures on consumables and materials are minimized.

Technical Feasibility

Keeping in mind the existing system network, software & Hardware, already available the audit
application generated in java have provided an executable file that requires tomcat that provides
compatibility from windows98 without having to load java software. No additional hardware or
software is required which makes smart audit technically feasible.

Economic Feasibility

The system is economically feasible keeping in mind:


 Lesser investment towards training.
 One time investment towards development.
 Minimizing recurring expenditure towards training, facilities offered and Consumables.
 The system as a whole is economically feasible over a period of time.

6
Chapter IV
System Design

System design concentrates on moving from problem domain to solution domain. This
important phase is composed of several steps. It provides the understanding and procedural
details necessary for implementing the system recommended in the feasibility study. Emphasis
is on translating the performance requirements into design specification. The design of any
software involves mapping of the software requirements into Functional modules. Developing a
real time application or any system utilities involves two processes. The first process is to design
the system to implement it. The second is to construct the executable code.

Software design has evolved from an intuitive art dependent on experience to a science, which
provides systematic techniques for the software definition. Software design is a first step in the
development phase of the software life cycle. Before design the system user requirements have
been identified, information has been gathered to verify the problem and evaluate the existing
system. A feasibility study has been conducted to review alternative solution and provide cost
and benefit justification. To overcome this proposed system is recommended. At this point the
design phase begins.

The process of design involves conceiving and planning out in the mind and making a drawing.
In software design, there are three distinct activities: External design, Architectural design and
detailed design. Architectural design and detailed design are collectively referred to as internal
design. External design of software involves conceiving and planning out and specifying the
externally observable characteristics of a software product.

INPUT DESIGN:

Systems design is the process of defining the architecture, components, modules, interfaces, and

data for a system to satisfy specified requirements. Systems design could be seen as the
application of systems theory to product development. There is some overlap with the disciplines
of systems analysis, systems architecture and systems engineering.

7
Input Design is the process of converting a user oriented description of the inputs to a computer-
based business system into a programmer-oriented specification.

• Input data were found to be available for establishing and maintaining master and
transaction files and for creating output records

• The most suitable types of input media, for either off-line or on-line devices, where
selected after a study of alternative data capture techniques.

INPUT DESIGN CONSIDERATIONS

• The field length must be documented.

• The sequence of fields should match the sequence of the fields on the source document.

• The data format must be identified to the data entry operator.

Design input requirements must be comprehensive. Product complexity and the risk associated
with its use dictate the amount of detail

• These specify what the product does, focusing on its operational capabilities and the
processing of inputs and resultant outputs.

• These specify how much or how well the product must perform, addressing such issues
as speed, strength, response times, accuracy, limits of operation, etc.

OUTPUT DESIGN:

A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
other system through outputs.

In output design it is determined how the information is to be displaced for immediate need and
also the hard copy output. It is the most important and direct source information to the user.
Efficient and intelligent output design improves the system’s relationship to help user decision-
making.

8
1. Designing computer output should proceed in an organized, well thought out manner;
the right output must be developed while ensuring that each output element is designed so
that people will find the system can use easily and effectively. When analysis design
computer output, they should Identify the specific output that is needed to meet the
requirements.

2. Select methods for presenting information.

3. Create document, report, or other formats that contain information produced by the
system.

The output form of an information system should accomplish one or more of the following
objectives.

• Convey information about past activities, current status or projections of the

• Future.

• Signal important events, opportunities, problems, or warnings.

• Trigger an action.

• Confirm an action.

9
Architecture Diagram

Dataflow Diagram

The Data Flow Diagram is a graphical model showing the inputs, processes, storage & outputs of
a system procedure in structure analysis. A DFD is also known as a Bubble Chart. The Data flow
diagram provides additional information that is used during the analysis of the information
domain, and server as a basis for the modeling of functions. The description of each function
presented in the DFD is contained is a process specification called as PSPEC.

DFD Symbols

Data Flow

Arrows marking the movement of data through the system indicate data flows. It is the pipeline
carrying packets of data from an identified point of origin to specific destination.

Process

Bubbles or circles are used to indicate where incoming data flows are processed and then
transformed into outgoing data flows. The processes are numbered and named to indicate the
occurrence in the system flow.

10
External Entity

A rectangle indicates any source or destination of data. The entity can be a class of people, an
Organization or even another system. The function of the external entity is to, supply data to
Receive data from the system. They have no interest in how to transform the data.

Data Store
A data store denoted as open rectangles. It is observed that programs and sub systems have
complex interdependencies including flow of data, flow of control and interaction with data
stores. It is used to identify holding points.

Dataflow diagram

11
Chapter V

Literature Survey

S.no Title Year Methodology Disadvantage


Large-scale malware 2013 A novel malware Not able to obtain a
1. classification using classification architecture significant accuracy gain
random projections that first projects the high- by adding additional
and Neural networks dimensional feature vector hidden layers.
to a lower dimensional
subspace.

Deep Learning for 2016 Two types of neural High training time
2. Classification of network layers such as
Malware System Call convolutional and
Sequences recurrent layers are
constructed and analyzed
for modeling system call
sequences.
Droiddetector: 2016 An online deep-learning- Needs better optimization
3. Android malware based Android malware of the deep learning
characterization and detection engine that can model
detection using deep automatically detect
learning whether an app is a
malware or not.

12
Adversarial 2016 Construction of highly- More complexity
4. Perturbations Against effective adversarial
Deep Neural sample crafting attacks for
Networks for the neural networks used as
Malware malware classifiers.
Classification

Deep Neural Network 2015 Layered approach of deep High difficulty of


5. Based Malware neural networks and two detecting genuinely novel
Detection Using Two dimensional histogram malware programs versus
Dimensional features detecting malware
Binary Program samples
Features

13
Chapter VI

System Requirements

The hardware and software specification specifies the minimum hardware and software required
to run the project. The hardware configuration specified below is not by any means the optimal
hardware requirements. The software specification given below is just the minimum
requirements, and the performance of the system may be slow on such system.

Hardware Requirements

 System : Pentium IV 2.4 GHz


 Hard Disk : 40 GB
 Floppy Drive : 1.44 MB
 Monitor : 15 VGA color
 Mouse : Logitech.
 Keyboard : 110 keys enhanced
 RAM : 256 MB

Software Requirements

 Operating System : Windows


 Front End : Dot Net
 Back End : MySQL

14
Chapter VII

System Implementation
Implementation is the stage in the project where the theoretical design is turned into a working
system. The implementation phase constructs, installs and operates the new system. The most
crucial stage in achieving a new successful system is that it will work efficiently and effectively.
There are several activities involved while implementing a new project.

• End user Training


• End user Education
• Training on the application software

Modules

Feature Extrator: It is composed of Decompressor and Portable Executable (PE) Parser. The
Decompressor needs to be employed first when a PE file is previously compressed by a binary
compress tool or embedded a homemade packer. The PE parser is used to extract the Windows
Application Programming Interface (API) calls from each PE file. Through the API query
database, the Windows API calls can be converted to a set of 32-bit global IDs representing the
corresponding API functions. Then the API calls are stored as the signatures of the PE files in the
signature database.

Deep Learning based Classifier: Based on the Windows API calls, a deep learning architecture
using SAEs model is designed to perform unsupervised feature learning, supervised fine-tuning,
and thus malware detection.

15
Chapter VIII
Software Description
Introduction

A programming infrastructure created by Microsoft for building, deploying, and


running applications and services that use .NET technologies, such as desktop applications
and Web services. The .NET Framework contains three major parts:

 The Common Language Runtime


 The Framework Class Library

ASP.NET

Microsoft started development of the .NET Framework in the late 1990s, originally under the
name of Next Generation Windows Services (NGWS). By late 2000 the first beta versions of
.NET 1.0 were released. The .NET Framework (pronounced dot net) is a software
framework developed by Microsoft that runs primarily on Microsoft Windows. It includes a
large library and provides language interoperability (each language can use code written in other
languages) across several programming languages. Programs written for the .NET Framework
execute in a software environment (as contrasted to hardware environment), known as
the Common Language Runtime (CLR), an application virtual machine that provides services
such as security, memory management, and exception handling. The class library and the CLR
together constitute the .NET Framework. An application software platform from Microsoft
introduced in 2002 and commonly called .NET ("dot net"). The .NET platform is similar in
purpose to the Java EE platform, and like Java's JVM runtime engine, .NET's runtime engine
must be installed in the computer in order to run .NET applications.

.NET Programming Languages

.NET is similar to Java because it uses an intermediate bytecode language that can be
executed on any hardware platform that has a runtime engine. It is also unlike Java, as it provides
support for multiple programming languages. Microsoft languages are C# (C Sharp), J# (J
Sharp), Managed C++, JScript.NET and Visual Basic.NET. Other languages have been
reengineered in the European version of .NET, called the Common Language Infrastructure.

16
.NET Versions

.NET Framework 1.0 introduced the Common Language Runtime (CLR) and .NET
Framework 2.0 added enhancements. .NET Framework 3.0 included the Windows programming
interface (API) originally known as "WinFX," which is backward compatible with the Win32
API. .NET Framework 3.0 added the following four subsystems and was installed with
Windows, starting with Vista. .NET Framework 3.5 added enhancements and introduced a
client-only version (see .NET Framework Client Profile). .NET Framework 4.0 added parallel
processing and language enhancements.

The User Interface (WPF)

Windows Presentation Foundation (WPF) provides the user interface. It takes advantage of
advanced 3D graphics found in many computers to display a transparent, glass-like appearance.
Messaging (WCF)

Windows Communication Foundation (WCF) enables applications to communicate with


each other locally and remotely, integrating local messaging with Web services.

Workflow (WWF)

Windows Workflow Foundation (WWF) is used to integrate applications and automate


tasks. Workflow structures can be defined in the XML Application Markup Language.

User Identity (WCS)

Windows CardSpace (WCS) provides an authentication system for logging into a Web
site and transferring personal information.

17
DESIGN FEATURES

Interoperability

Because computer systems commonly require interaction between newer and older applications,
the .NET Framework provides means to access functionality implemented in newer and older
programs that execute outside the .NET environment. Access to COM components is provided in
the System.Runtime.InteropServices and System.EnterpriseServices namespaces of the
framework; access to other functionality is achieved using the P/Invoke feature.

Common Language Runtime engine

The Common Language Runtime (CLR) serves as the execution engine of the .NET Framework.
All .NET programs execute under the supervision of the CLR, guaranteeing certain properties
and behaviors in the areas of memory management, security, and exception handling.

Language independence

The .NET Framework introduces a Common Type System, or CTS. The


CTS specification defines all possible datatypes and programming constructs supported by the
CLR and how they may or may not interact with each other conforming to the Common
Language Infrastructure (CLI) specification. Because of this feature, the .NET Framework
supports the exchange of types and object instances between libraries and applications written
using any conforming .NET language.

18
Base Class Library

The Base Class Library (BCL), part of the Framework Class Library (FCL), is a library of
functionality available to all languages using the .NET Framework. The BCL
provides classes that encapsulate a number of common functions, including file reading and
writing, graphic rendering, database interaction, XML document manipulation, and so on. It
consists of classes, interfaces of reusable types that integrates with CLR(Common Language
Runtime).

Simplified deployment

The .NET Framework includes design features and tools which help manage the installation of
computer software to ensure it does not interfere with previously installed software, and it
conforms to security requirements.

Security

The design addresses some of the vulnerabilities, such as buffer overflows, which have been
exploited by malicious software. Additionally, .NET provides a common security model for all
applications.

Portability

While Microsoft has never implemented the full framework on any system except Microsoft
Windows, it has engineered the framework to be platform-agnostic and cross-platform
implementations are available for other operating systems. Microsoft submitted the
specifications for the Common Language Infrastructure , the C# language and the C++/CLI
language to both ECMA and the ISO, making them available as official standards. This makes it
possible for third parties to create compatible implementations of the framework and its
languages on other platforms.

19
Chapter IX

System Testing

Software Testing

Software testing is an investigation conducted to provide stakeholders with information about


the quality of the product or service under test. Software testing can also provide an objective,
independent view of the software to allow the business to appreciate and understand the risks of
software implementation. Test techniques include, but are not limited to the process of executing
a program or application with the intent of finding software bugs (errors or other defects).The
purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality of
components, sub-assemblies, assemblies and/or a finished product It is the process of exercising
software with the intent of ensuring that the software system meets its requirements and user
expectations and does not fail in an unacceptable manner. There are various types of test. Each
test type addresses a specific testing requirement.

Software testing is the process of evaluation a software item to detect differences between given
input and expected output. Also to assess the feature of a software item. Testing assesses the
quality of the product. Software testing is a process that should be done during the development
process. In other words software testing is a verification and validation process.

Types of testing

There are different levels during the process of Testing .Levels of testing include the different
methodologies that can be used while conducting Software Testing. Following are the main
levels of Software Testing:

 Functional Testing.

 Non-Functional Testing.

20
Steps Description

I The determination of the functionality that the intended application is meant to


perform.

II The creation of test data based on the specifications of the application.

III The output based on the test data and the specifications of the application.

IV The writing of Test Scenarios and the execution of test cases.

V The comparison of actual and expected results based on the executed test cases.

Functional Testing

Functional Testing of the software is conducted on a complete, integrated system to evaluate the
system's compliance with its specified requirements. There are five steps that are involved when
testing an application for functionality.

An effective testing practice will see the above steps applied to the testing policies of every
organization and hence it will make sure that the organization maintains the strictest of standards
when it comes to software quality.

Unit Testing

This type of testing is performed by the developers before the setup is handed over to the testing
team to formally execute the test cases. Unit testing is performed by the respective developers on
the individual units of source code assigned areas. The developers use test data that is separate
from the test data of the quality assurance team. The goal of unit testing is to isolate each part of
the program and show that individual parts are correct in terms of requirements and
functionality.

Limitations of Unit Testing

Testing cannot catch each and every bug in an application. It is impossible to evaluate every
execution path in every software application. The same is the case with unit testing.

21
There is a limit to the number of scenarios and test data that the developer can use to verify the
source code. So after he has exhausted all options there is no choice but to stop unit testing and
merge the code segment with other units.

Integration Testing

The testing of combined parts of an application to determine if they function correctly together is

Integration testing. There are two methods of doing Integration Testing Bottom-up Integration
testing and Top- Down Integration testing.

S.N. Integration Testing Method

1 Bottom-up integration
This testing begins with unit testing, followed by tests of progressively higher-
level combinations of units called modules or builds.

2 Top-Down integration
This testing, the highest-level modules are tested first and progressively lower-
level modules are tested after that.

In a comprehensive software development environment, bottom-up testing is usually done first,


followed by top-down testing. The process concludes with multiple tests of the complete
application, preferably in scenarios designed to mimic those it will encounter in customers'
computers, systems and network.

System Testing

This is the next level in the testing and tests the system as a whole. Once all the components are
integrated, the application as a whole is tested rigorously to see that it meets Quality Standards.
This type of testing is performed by a specialized testing team. System testing is so important
because of the following reasons:

 System Testing is the first step in the Software Development Life Cycle, where the
application is tested as a whole.

 The application is tested thoroughly to verify that it meets the functional and technical
specifications.

22
 The application is tested in an environment which is very close to the production
environment where the application will be deployed.

 System Testing enables us to test, verify and validate both the business requirements as
well as the Applications Architecture.

Regression Testing

Whenever a change in a software application is made it is quite possible that other areas within
the application have been affected by this change. To verify that a fixed bug hasn't resulted in
another functionality or business rule violation is Regression testing. The intent of Regression
testing is to ensure that a change, such as a bug fix did not result in another fault being uncovered
in the application. Regression testing is so important because of the following reasons:

 Minimize the gaps in testing when an application with changes made has to be tested.

 Testing the new changes to verify that the change made did not affect any other area of
the application.

 Mitigates Risks when regression testing is performed on the application.

 Test coverage is increased without compromising timelines.

 Increase speed to market the product.

Acceptance Testing

This is arguably the most importance type of testing as it is conducted by the Quality Assurance
Team who will gauge whether the application meets the intended specifications and satisfies the
client requirements. The QA team will have a set of pre written scenarios and Test Cases that
will be used to test the application.

More ideas will be shared about the application and more tests can be performed on it to gauge
its accuracy and the reasons why the project was initiated. Acceptance tests are not only intended
to point out simple spelling mistakes, cosmetic errors or Interface gaps, but also to point out any
bugs in the application that will result in system crashers or major errors in the application.

23
By performing acceptance tests on an application the testing team will deduce how the
application will perform in production. There are also legal and contractual requirements for
acceptance of the system.

Alpha Testing

This test is the first stage of testing and will be performed amongst the teams (developer and QA
teams). Unit testing, integration testing and system testing when combined are known as alpha
testing. During this phase, the following will be tested in the application:

 Spelling Mistakes

 Broken Links

 Cloudy Directions

 The Application will be tested on machines with the lowest specification to test loading
times and any latency problems.

Beta Testing

This test is performed after Alpha testing has been successfully performed. In beta testing a
sample of the intended audience tests the application. Beta testing is also known as pre-release
testing. Beta test versions of software are ideally distributed to a wide audience on the Web,
partly to give the program a "real-world" test and partly to provide a preview of the next release.
In this phase the audience will be testing the following:

 Users will install, run the application and send their feedback to the project team.

 Typographical errors, confusing application flow, and even crashes.

 Getting the feedback, the project team can fix the problems before releasing the software
to the actual users.

 The more issues you fix that solve real user problems, the higher the quality of your
application will be.

 Having a higher-quality application when you release to the general public will increase
customer satisfaction.

24
Chapter X

Conclusion

In this work, a new malware representation method is presented based on the static analysis,
sequences of opcodes are used without arguments. A low dimension of Word2Vec feature
vectors and a boosting classifier like GBM are shown. This yields maximum malware
classification accuracy easily If the cross-validation k-fold value is chosen greater instead of k=5,
the accuracy might be higher. Also noted that GBM model tree can be searched with the help of
grid search (tree pruning) and this will facilitate a wide range model lookups in the GBM tree
search, leading to better accuracy.

Future Work

For future work, hyperparameter optimization stages will be added to the model search, dataset
will also be extended to include all classes of malware in the wild. Finding semantic
relationships between malware opcodes will be a research direction as well.

25
References

[1] Mihai Christodorescu and Somesh Jha. 2003. Static Analysis of Executables to Detect
Malicious Patterns. In Proceedings of the 12th Conference on USENIX Security Symposium -
Volume 12 (SSYM’03). USENIX Association, Berkeley, CA, USA, 12–12.

[2] George E Dahl, Jack W Stokes, Li Deng, and Dong Yu. 2013. Large-scale malware
classification using random projections and neural networks. In Acoustics, Speech and Signal
Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 3422–3426.

[3] Jake Drew, Tyler Moore, and Michael Hahsler. 2016. Polymorphic malware detection using
sequence classification methods. In Security and Privacy Workshops (SPW), 2016 IEEE. IEEE,
81–87.

[4] Jerome H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting Machine,
In Annals of Statistics. Annals of Statistics 29, 1189–1232.

[5] Wenyi Huang and Jack W Stokes. 2016. MtNet: a multi-task neural network for dynamic
malware classification. In Detection of Intrusions and Malware, and Vulnerability Assessment.
Springer, 399–418.

[6] "IDA". 2013. "Ida : Disassembler and debugger. https://www.hexrays.com/products/ida/".


(2013).

[7] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean.

2013. Distributed Representations of Words and Phrases and their Compositionality. In


Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling,
Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119.

[8] Razvan Pascanu, Jack W Stokes, Hermineh Sanossian, Mady Marinescu, and Anil Thomas.
2015. Malware classification with recurrent networks. In Acoustics, Speech and Signal
Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 1916–1920.

26
[9] Igor Popov. 2017. Malware detection using machine learning based on word2vec embeddings
of machine code instructions. In Data Science and Engineering (SSDSE), 2017 Siberian
Symposium on. IEEE, 1–4.

[10] Igor Santos, Felix Brezo, Xabier Ugarte-Pedrero, and Pablo G. Bringas. 2013. Opcode
sequences as representation of executables for data-mining-based unknown malware detection.
Information Sciences 231 (2013), 64 – 82.

[11] Joshua Saxe and Konstantin Berlin. 2015. Deep neural network based malware detection
using two dimensional binary program features. In Malicious and Unwanted Software
(MALWARE), 2015 10th International Conference on. IEEE, 11–20.

[12] Alexander Statnikov and Constantin F Aliferis. 2007. Are random forests better than
support vector machines for microarray-based cancer classification?. In AMIA annual
symposium proceedings, Vol. 2007. American Medical Informatics Association, 686.

[13] A. H. Sung, J. Xu, P. Chavez, and S. Mukkamala. 2004. Static Analyzer of Vicious
Executables (SAVE). In Proceedings of the 20th Annual Computer Security Applications
Conference (ACSAC ’04). IEEE Computer Society, Washington, DC, USA, 326–334.

[14] S. Momina Tabish, M. Zubair Shafiq, and Muddassar Farooq. 2009. Malware Detection
Using Statistical Analysis of Byte-level File Content. In Proceedings of the ACM SIGKDD
Workshop on CyberSecurity and Intelligence Informatics (CSIKDD ’09). ACM, New York, NY,
USA, 23–31.

[15] B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray. 2015. A Generic Approach to


Automatic Deobfuscation of Executable Code. In 2015 IEEE Symposium on Security and
Privacy. 674–691.

[16] Yanfang Ye, Tao Li, Donald Adjeroh, and S Sitharama Iyengar. 2017. A survey on malware
detection using data mining techniques. ACM Computing Surveys (CSUR) 50, 3 (2017), 41.

[17] Mahmood Yousefi-Azar, Vijay Varadharajan, Len Hamey, and Uday Tupakula. 2017.
Autoencoder-based feature learning for cyber security applications. In Neural Networks
(IJCNN), 2017 International Joint Conference on. IEEE, 3854–3861.

27
[18] Mikhail Zolotukhin and Timo Hamalainen. 2014. Detection of zero-day malware based on
the analysis of opcode sequences. (01 2014), 386–391.

28

Você também pode gostar