A manually annotated corpus in French for the study of urbanization and the natural risk prevention - GREYC codag Accéder directement au contenu
Article Dans Une Revue Scientific Data Année : 2023

A manually annotated corpus in French for the study of urbanization and the natural risk prevention

Résumé

Land artificialization is a serious problem of civilization. Urban planning and natural risk management are aimed to improve it. In France, these practices operate the Local Land Plans (PLU – Plan Local d’Urbanisme) and the Natural risk prevention plans (PPRn – Plan de Prévention des Risques naturels) containing land use rules. To facilitate automatic extraction of the rules, we manually annotated a number of those documents concerning Montpellier, a rapidly evolving agglomeration exposed to natural risks. We defined a format for labeled examples in which each entry includes title and subtitle. In addition, we proposed a hierarchical representation of class labels to generalize the use of our corpus. Our corpus, consisting of 1934 textual segments, each of which labeled by one of the 4 classes (Verifiable, Non-verifiable, Informative and Not pertinent) is the first corpus in the French language in the fields of urban planning and natural risk management. Along with presenting the corpus, we tested a state-of-the-art approach for text classification to demonstrate its usability for automatic rule extraction.
Fichier principal
Vignette du fichier
ScientificData2023.pdf (2.92 Mo) Télécharger le fichier
Origine Fichiers produits par l'(les) auteur(s)
licence

Dates et versions

hal-04520001 , version 1 (25-03-2024)

Licence

Identifiants

Citer

Maksim Koptelov, Margaux Holveck, Bruno Cremilleux, Justine Reynaud, Mathieu Roche, et al.. A manually annotated corpus in French for the study of urbanization and the natural risk prevention. Scientific Data , 2023, 10 (1), pp.818. ⟨10.1038/s41597-023-02705-y⟩. ⟨hal-04520001⟩
19 Consultations
6 Téléchargements

Altmetric

Partager

Gmail Mastodon Facebook X LinkedIn More