Big Data Ensemble Classification And Analytics Of Malicious Traffic









Abstract

In the era of modern Information Technology, access to information is not only easy but seamless as well. Information is now accessible not only from personal computing systems like desktops and laptops but also from hand held computing devices like Smart Phones, Tablets, Notepads and other mobile computing devices. In addition to the ease of Internet access, the emergence of various social media that allows sharing of audio and video content, images, pictures has made the Internet key source of information. While the use of Internet is critical for information dissemination, access to e-Commerce, Government portals, there is also risk Internet users being victims of attacks from malicious attackers whose intent to get access to private and confidential information. Cyber Threat and Cyber Security is one of the key areas of research in academic as well as industry where the goal is to detect and mitigate any attack on the system or the user. While researchers on one side are devising methods to safe guard against attacks, there is also equal motivation for attackers to create new methods to infiltrate and access critical information. One of the very prominent form of invading a user system is using an approach called Malware. A Malware is a specialized malicious piece of code that enters the user system in masquerading as a legitimate part of a content accessed or downloaded from the Internet. Detection and removal of Malware is one of key features of any Anti-Virus and Anti-Malware application. These applications are typically on the user system and scans all the traffic being downloaded by the user onto the system. The basic methodology of a Malware detection system is to use a know set of Malware patterns know as Malware Signature Set and then look for patters in the downloaded content that match the Malware Signature Set. Any reasonable match of the Malware Signature pattern in the downloaded content will make the Anti-Malware application, flag the content as Malware and enforces a mitigation action such as deletion of the file or termination of the network connection and so on as a way to protect the user from the malicious activities of the Malware. However Malware writers have also become sophisticated where the patterns of the Malware key changing for every new attack. This makes Malware detection difficult as there is no specific pattern that is followed. Such Malware attacks where attack patterns are new as called Day Zero Malwares. The approach of Malware detection has now changed from recognizing know patterns to learning the characteristics or traits of the Malware. This requires application of cutting edge technologies such as Cognitive Learning and Machine Learning in detection of Malware. In this approach the system is "trained" to learn the attack methods of the Malware by going through previously know instances of Malware and then applying those methods on any new content to determine if the content is Malware or not. In this work, an approach to detect and analyze Malware is presented using the concepts of Machine Learning and Big Data Classifiers, so as to build an application that takes a set of training data as input and uses it to train the application to recognize the traits of Malware, which in turn will help to determine if any given new content is Malware or Cleanware.


Modules


Algorithms


Software And Hardware

• Hardware: Processor: i3 ,i5 RAM: 4GB Hard disk: 16 GB • Software: operating System : Windws2000/XP/7/8/10 Anaconda,jupyter,spyder,flask Frontend :-python Backend:- MYSQL