The database is a collection of interconnected data which is managed and retrieved in an efficient manner. The concept of the larger database is large scale database which will consist of a large number of data can be stored in the database system. Duplication of data is a technique for minimizing storage needs by eliminating inordinate data. Data avoidance is rejection of risk the action that can negatively effect on a larger database system. Duplicate detection is a problem of material or stuff in many kinds of application including user relationship management confidential information management or data mining. Duplicate detection is a method of detecting or observation of all cases multiple demonstrations in the real-world application. In the existing system the duplication of data is checked on the basis of string which checks character by character so it is timeconsuming and it occupied more memory. The proposed system is implemented on Hadoop which handle larger database. It consists detection of duplicate data based on the multiple attributes. In our system we used data pre-processing is data mining technique that consists transformed row data in the understandable format. We applied Parallel Progressive Sorted Neighbourhood Method & Map Reduce algorithm on this data to get a clean database. Map reduce programming allows for the processing of such large data in a completely safe and costeffective manner. It will provide more manageable space and efficient handling of data.
AES-RSA
cloud computing
₹12000 (INR)
IEEE 2017