Skip to Main content Skip to Navigation
Theses

Partitionnement de données pour l'informatique climatique : Contributions à l'amélioration des méthodes d'identification automatique des régimes de temps en climat tropical insulaire.

Abstract : This manuscript reports on new work in the field of climate informatics, which has led to a set of contributions to methods for the automatic identification of weather patterns in the Caribbean region. Starting from computational methods widely presented in the bibliography consulted, we have previously targeted those of unsupervised learning, and more particularly the K-Means clustering (KMS) and Hierarchical Ascending Classification (HAC) methods. Direct applications of these methods to the problems of the vector currents of Sargasso algae banks, and then to the partitioning of Geopotential data have been carried out. These methods made it possible to identify groups of days (or clusters) with similar characteristics. The barycentres (or centroids) of the groups thus obtained were analysed by climate experts. However, this approach does not systematically produce consistent results since these barycentres do not always represent the physical reality of the structures selected. Subsequently, we concentrated our efforts on researching and identifying weather patterns characteristic of the Caribbean zone. These regimes are generally described as recurrent spatio-temporal configurations, on a large scale, which influence local weather situations. Research in this area for the Caribbean region is still in its infancy. For the work already published, several points seem problematic. Three of them have attracted our attention. Firstly, the lack of quantification of the quality of the clusters, makes a large amount of physical justification necessary, to validate the relevance of the proposed regimes. It also complicates the comparison between the different existing works. Then, among the arguments presented, some show that the proposals made are not fully satisfactory. Finally, according to the experts, the temporal coherence of the clusters of certain studies does not seem to correspond to the seasonality of the region. In order to overcome these difficulties, as a first step, we propose the use of the Silhouette index. The evaluation of the relevance of the selected clusters, but also the comparison of the different methods used, were carried out using this index. After verification, there is a concordance between the analysis produced by the index and that of the climate experts. Nevertheless, in some cases, the index also indicates that the clusters constituted can be improved. Looking more specifically at the partitioning algorithms, and in particular at the notion of distance they use, it appears that these difficulties are mainly related to the complexity of the data, but also to the similarity measures that make it possible to compare them. After a critique of the properties of the distance L2, used by default, we propose the implementation of a new dissimilarity measure, named Expert Deviation (ED). It is based on a spatial breakdown, a quantification in histograms, and a zonal treatment with the Kulback-Leibler (KL) divergence. We show that the ED leads to much better results, both in numerical evaluations of cluster quality by the silhouette index and in interpretations by experts in the field. This new measure is adaptive in its design and use. We present its principle and move on to an application in the field of atmospheric physics, using data such as precipitation measured by satellite. Rainfall in the Lesser Antilles is known to be highly variable in space and time and directly influences the climate at these latitudes. Using ED, we were able to identify more coherent and physically interpretable recurrent patterns for this parameter and for wind. These results have increased the knowledge of climate experts on the atmospheric structures related to inter-seasonal weather patterns and their dynamics. All this work and the use of the "ED measure" open up a large number of perspectives for the search for recurrent spatio-temporal configurations, but also in all fields of applications using images.
Complete list of metadatas

https://hal.archives-ouvertes.fr/tel-03098202
Contributor : Emmanuel Biabiany <>
Submitted on : Tuesday, January 5, 2021 - 4:33:52 PM
Last modification on : Thursday, January 7, 2021 - 3:14:43 AM

File

manuscrit.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-03098202, version 1

Collections

Citation

Biabiany Emmanuel. Partitionnement de données pour l'informatique climatique : Contributions à l'amélioration des méthodes d'identification automatique des régimes de temps en climat tropical insulaire.. Informatique [cs]. Université des Antilles, 2020. Français. ⟨tel-03098202⟩

Share

Metrics

Record views

40

Files downloads

9