Graphlet correlation distance to compare small graphs

Type Article
Date 2023-02
Language English
Author(s) Roux JeromeORCID1, Bez Nicolas2, Rochet Paul3, Joo RocíoORCID4, Mahévas StephanieORCID1
Affiliation(s) 1 : UMR DECOD, IFREMER, BP 21105, Nantes Cedex, France
2 : MARBEC, IRD, Univ Montpellier, Ifremer, CNRS, INRAE, Sète, France
3 : ENAC, Toulouse, France
4 : Global Fishing Watch, Washington, DC, United States of America
Source Plos One (1932-6203) (Public Library of Science (PLoS)), 2023-02 , Vol. 18 , N. 2 , P. e0281646 (20p.)
DOI 10.1371/journal.pone.0281646
Abstract

Graph models are standard for representing mutual relationships between sets of entities. Often, graphs deal with a large number of entities with a small number of connections (e.g. social media relationships, infectious disease spread). The distances or similarities between such large graphs are known to be well established by the Graphlet Correlation Distance (GCD). This paper deals with small graphs (with potentially high densities of connections) that have been somewhat neglected in the literature but that concern important fora like sociology, ecology and fisheries, to mention some examples. First, based on numerical experiments, we study the conditions under which Erdős-Rényi, Fitness Scale-Free, Watts-Strogatz small-world and geometric graphs can be distinguished by a specific GCD measure based on 11 orbits, the GCD11. This is done with respect to the density and the order (i.e. the number of nodes) of the graphs when comparing graphs with the same and different orders. Second, we develop a randomization statistical test based on the GCD11 to compare empirical graphs to the four possible null models used in this analysis and apply it to a fishing case study where graphs represent pairwise proximity between fishing vessels. The statistical test rules out independent pairing within the fleet studied which is a standard assumption in fisheries. It also illustrates the difficulty to identify similarities between real-world small graphs and graph models.

Full Text
File Pages Size Access
Publisher's official version 20 4 MB Open access
S1 Fig. Quality of clustering (AUPR) for pairs of models. 296 KB Open access
S2 Fig. Probability of correctly distinguishing two random graph models. 458 KB Open access
S1 Eq. To account for the difference in variability between the correlation coefficients of each pair of orbits, we also computed the following standardised distance between GCM(Mk) and where σ(i, j) 1 130 KB Open access
S1 Table. Estimated p-values (std). 1 107 KB Open access
Top of the page