An-Najah Staff

Publication Type

Original research

Authors

Bilal Shafei

ABSTRACT Obviously, it is not useful to accumulate large amounts of information if we cannot find a particular piece of information. Also, extracting relevant and targeted information from textual data on large digital media, and if they are heterogeneous and multilingual, is certainly not a new problem. However, the current methods prove to be expensive and the results are too often inappropriate, too numerous and not very presentable for the user. In addition to current methods, we propose an original method: Contextual Exploration. This is the EC3 software. EC3 does not need syntactic analysis, statistical analysis nor a "general" ontology. EC3 uses only small ontologies called "linguistic ontologies" that expresses the language of knowledge. This is why EC3 works very quickly on large corpus, which components can be both whole and short text: SMS to books. At the output, EC3 offers a dynamic visual representation of results. EC3 has been tested on very large digitized corpus provided by the French Labex OBVIL "Observatory of the Literary Life", in partnership with the National Library of France.

Journal

Title: In Proceedings of the 10th International Conference on Management of Digital EcoSystems (MEDES ’18), September 25–28, 2018, Tokyo
Publisher: ACM, New York, NY, USA
Publisher Country: France
Publication Type: Both (Printed and Online)
Volume: 1
Year: 2018
Pages: 6

Big textual data: how to find relevant information (with low cost)?