Slovenščina.eu logo

Summarization

The goal of the text summarization task is to convert a longer text into a shorter text while preserving the essential information of the source text. There are generally two approaches to summarizing texts. The extractive approach merely selects (i.e., extracts) the most important sentences or parts of the text and thus does not contribute anything new to the text itself. The abstractive approach is more similar to human-made summaries, as it can connect different parts of the text, shorten longer sentences, replace some phrases with shorter ones, etc. Abstractive text summarizers must achieve a good understanding of texts as well as the ability to articulate key content, which means they can be more misleading compared to extractive ones if they make a mistake. Below we present the following models:

  • Metamodel - neural model based on the Doc2Vec document representation that suggests the best summarizer.
  • Graph-based model - unsupervised extractive graph-based approach that returns N most important sentences.
  • Headline model - supervised abstractive approach (T5 architecture) that returns headline-like abstracts.
  • Article model - supervised abstractive approach (T5 architecture) that returns short summaries.
  • Basic model - unsupervised simple summarizer that uses word frequencies and returns N most important sentences.
  • Hybrid-long model - unsupervised hybrid (graph-based and transformer-based) approach that returns short summaries of long texts.

Web service should be used for demonstration purposes only, and is limited by the number of requests per time unit and input length. To use the service within your applications, please download results of the projects, available in the Clarin.si repository.

1358/3000