YAKE! Collection-independent Automatic Keyword Extractor

Extracting keywords from texts has become a challenge for individuals and organizations as the information grows in complexity and size. The need to automate this task so that texts can be processed in a timely and adequate manner has led to the emergence of automatic keyword extraction tools. Despite the advances, there is a clear lack of multilingual online tools to automatically extract keywords from single documents. In this paper, we present Yake!, a novel feature-based system for multi-lingual keyword extraction, which supports texts of different sizes, domain or languages. Unlike most of the systems, Yake! does not rely on dictionaries nor thesauri, neither is trained against any corpora. Instead, we follow an unsupervised approach which builds upon features extracted from the text, making it thus applicable to documents written in different languages without the need for further knowledge. This can be beneficial for a large number of tasks and a plethora of situations where the access to training corpora is either limited or restricted. In this demo, we offer an easy to use, interactive session, where users from both academia and industry can try our system, either by using a sample document or by introducing their own text. As an add-on, we compare our extracted keywords against the output produced by the IBM Natural Language Understanding and Rake system. This will enable users to understand the distinctions between the three approaches.

Try our samples

English document samples

English sample 1 English sample 2 English sample 3

Document samples from different languages

Italian German Dutch Spanish Finnish French Polish Turkish Portuguese Arabic

Document samples from official datasets languages

110-PT-BN-KP 500N-KPCrowd-v1.1 Inspec (sample 1) Inspec (sample 2) Nguyen2007 PubMed SemEval2010


Please cite the following works when using YAKE (Best Short Paper Award at ECIR’18).

Campos, R., & Mangaravite, V., & Pasquali, A., & Jorge, A., & Nunes, C., & Jatowt, A. (2018).
A Text Feature Based Automatic Keyword Extraction Method for Single Documents.
In Gabriella Pasi et al. (Eds.), Lecture Notes in Computer Science - Advances in Information Retrieval - 40th European Conference on Information Retrieval (ECIR'18).
Grenoble, France. March 26 – 29. (Vol. 10772(2018), pp. 684 - 691).
				

Campos, R., & Mangaravite, V., & Pasquali, A., & Jorge, A., & Nunes, C., & Jatowt, A. (2018).
YAKE! Collection-independent Automatic Keyword Extractor.
In Gabriella Pasi et al. (Eds.), Lecture Notes in Computer Science - Advances in Information Retrieval - 40th European Conference on Information Retrieval (ECIR'18).
Grenoble, France. March 26 – 29. (Vol. 10772(2018), pp. 806 - 810).