Information Processing and Retrieval from CSV File by Natural Language








Abstract

Comma Separated Value (CSV) files are widely used as a fundamental data format. Due to its simple structure and ease of creation, many of the data files that are published in open source and used by organizations are usually stored in CSV files. However, searching for or retrieving expected data from CSV files is quite limited by the traditional keyword-matching technique which can't specify the conditions for searching or processing any data on the search. This paper presents a new model that will allow users to easily retrieve information from CSV files by natural language, a language that users are familiar with and use in everyday life. Users can specify conditions for data retrieval and processing to create the information they need. This will help non-technician users easily retrieve information without the need to learn any additional computer languages or programs. The research data includes natural language messages collected from various sources, both online and offline, to cover on both formal and semi-formal language level. By using natural language processing and techniques such as semantic patterns, ontology, and interactive conversation system, this model can analyze the completeness and meaning of natural language statements as well as allows users to edit the incomplete or faulty statements, and improve the model by adding new words, sentence syntaxes and semantic patterns for more accurate results. Evaluation of the model is performed by 98 testers. By inputting 1,137 natural language statements to the model, the results showed that the models were effective in retrieving and processing data accurately with very high values of precision, recall, and F-score which were all higher than 0.9. There are only 18 statements or 3.2% of all statements that produce errors in the outputs which were caused by the typo in 3 cases: missing of some alphabets which change the word's meaning, using of the ambiguous words, and wrong position of words in the natural language statement.


Modules


Algorithms


Software And Hardware

• Hardware: Processor: i3 ,i5 RAM: 4GB Hard disk: 16 GB • Software: operating System : Windws2000/XP/7/8/10 Anaconda,jupyter,spyder,flask Frontend :-python Backend:- MYSQL