An Effective and Scalable Data Modeling for Enterprise Big Data Platform









Abstract

The enormous growth of the internet, enterprise applications, social media, and IoT devices in the current time caused a huge spike in enterprise data growth. Big data platform provided scalable storage to manage enterprise data growth and served easier data access to decision-makers, stakeholders and business users. It is a well-known challenge to classify, organize and store all this data and process it to provide business insights. Due to nature, variety, velocity, volume and value of data make it difficult to effectively process big data. Enterprises face challenges to apply complex business rules, to generate insights and to support data-driven decisions in a timely fashion. As big data lake integrates streams of data from a bunch of business units, stakeholders usually analyze enterprise-wide data from various data models. Data models are a vital component of Big data platform. Users may do complex processing, run queries and perform big table joins to generate required metrics depending on the available data models. It is usually a time consuming and resource-intensive process to find the value from data. It is a no-brainer that big data platform in the enterprise needs high-quality data modeling methods to reach an optimal mix of cost, performance, and quality. This paper addresses these challenges by proposing an effective and scalable way to organize and store data in Big Data Lake. It presents some of the basic principles and methodology to build scalable data models in a distributed environment. It also describes how it overcomes common challenges and presents findings.


Modules


Algorithms


Software And Hardware

• Hardware: Processor: i3 ,i5 RAM: 4GB Hard disk: 16 GB • Software: operating System : Windws2000/XP/7/8/10 Anaconda,jupyter,spyder,flask,hadoop Frontend :-python Backend:- MYSQL