Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction - GREYC hultech Access content directly
Conference Papers Year : 2020

Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction

Abstract

We present our contributions for the two tracks of the 2020 FinTOC Shared Tasks: Table of Content (ToC) extraction in English documents and French documents. We describe separately our work on Title Detection and ToC Extraction. For ToC Extraction, we propose an approach that combines information from multiple sources: the table of contents, the wording of the document, and lexical domain knowledge. For the title detection part, we compare surface features to character-based features on various training configurations. We show that title detection results are very sensitive to the kind of training dataset used.
Fichier principal
Vignette du fichier
Daniel_FinTOC2020_TOC_detection.pdf (284.43 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03024867 , version 1 (26-11-2020)

Identifiers

  • HAL Id : hal-03024867 , version 1

Cite

Emmanuel Giguet, Gaël Lejeune, Jean-Baptiste Tanguy. Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction. 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation @COLING’2020, Dec 2020, Barcelone, Spain. ⟨hal-03024867⟩
134 View
86 Download

Share

Gmail Facebook X LinkedIn More