Log Layering Based on Natural Language Processing


With the increasing number and variety of logs, the requirement of storage space is growing rapidly. Meantime, the speed and accuracy of querying in massive logs are becoming increasingly important. Although the well-built distributed storage technique solves the problem of mass storage and fast query, the cost is too high. As logs are created as the method to trace the historical operation, the requirement for query rate is not high. To balance the storage cost and query rate, this paper proposes a real-time log layering storage technique based on natural language processing. According to the characteristics of the log data, this technique is combined with the text language processing technique. It compresses the real-time log data effectively while considering the query efficiency. Firstly, the method extracts the feature of each log that flows in, which will be the type name of the log. Then, the method performs word segmentation on the log and encodes each word to store the key value pairs. Finally, the key value pairs of the log are stored in the memory, and the code of each log is stored in the database. Experiments show that this method can ensure the integrity of the data effectively, decompression time dropped to 40%, compression rate down to 35%.



Software And Hardware

• Hardware: Processor: i3 ,i5 RAM: 4GB Hard disk: 16 GB • Software: operating System : Windws2000/XP/7/8/10 Anaconda,jupyter,spyder,flask Frontend :-python Backend:- MYSQL