ICPR 2016  Graph Distance Contest
Measuring the dissimilarity between graphs is a key step in data analysis based on graph representations. This contest focusses on the definition and the computation of general dissimilarity measures through two challenges:
 Challenge 1: computation of the exact or an approximate graph edit distance
 Challenge 2: computation of a dissimilarity measure for graph classification
Different datasets are considered with symbolic or numerical attributes on both nodes and edges. Methods submitted to Challenge 1 can also be submitted to Challenge 2.
This contest is supported by the Technical Committee on GraphBased Representations in Pattern Recognition (TC 15) of the International Association of Pattern Recognition (IAPR)
April 2017  Results of Challenge 1 submitted to a journal 

Dec. 2016  ICPR / results 
Sept. 2016 
Submissions
 Challenge 1: 8
 Challenge 2: 2
Challenge 1 GED computation
Goal
Compute the exact or an approximate GED, for several pairs of graphs of several datasets, under the following constraints: running time limited to 30 seconds maximum for each pair of graphs
 edit costs are imposed for each dataset
 tiny graphs (less than 10 nodes): selected such that the exact GED is known for all pairs of graphs
 small graphs (less than 30 nodes): selected such that the exact GED is known for some pairs of graphs
 larger graphs (less that 70 nodes): the exact GED is unknown
Evaluation
Methods are compared according to running time
 closeness to reference distances
 the exact GED computed by the A*based method [1], if available (depends on the size of the graphs), or
 the smallest distance computed by methods tested in this challenge:
Participation
This challenge is open to any method based on the computation of edit paths: exact methods
 methods based on heuristics
 linear and quadratic programming approaches
 algorithms improving computational time, e.g. multithreaded methods (using at most 4 threads)
Datasets
Three groups of datasets are considered.
Dataset  #graphs  mean #nodes  mean degree  min #nodes  max #nodes  node attributes  edge attributes  exact GED 

Alkane  150  8.9  1.8  1  10  chemical symbol  valence (bound type)  99.5467% 
Acyclic  185  8.2  1.8  3  11  85.7595%  
PAH  94  20.7  20.4  10  28  no  
MAO  68  18.4  2.1  11  27  no  
CMU  111  30  30  30    distance  no 
For a finer analysis, each dataset of the second group is decomposed into several subsets of graphs having the same number of nodes, and one subset of graphs having different number of nodes. Each subset is considered as a dataset.
Dataset  #subsets  #graphs / subset  mean #nodes  mean degree  min #nodes  max #nodes  node attributes  edge attributes  exact GED 

MUTA  7  10  10  10  10  chemical symbol  valence (bound type)  no  
20  20  20  
30  30  30  
40  40  40  
50  50  50  
60  60  60  
70  70  70  
10, 20, ..., 70  10  70  
GREC  4  10  5  5  5  x,y coordinate and type  type  90% (100% soon)  
10  10  10  
15  15  15  
20  20  20  
5, 10, 15, ..., 20  5  20 
Other attributes can be found in the datasets. Attributes listed here are those used in the challenge. See Prepare your method for the expressions of the costs using these attributes.
About these datasets
All datasets are taken from existing benchmarks: PAH, MAO, Alkane, Acyclic are part of the GREYC's Chemistry dataset.
 GREC, MUTA and CMU are part of GDR4GED [3]. GREC and MUTA are subsets of the IAM Graph Database Repository, and CMU is a set of Delaunay graphs computed from the House image sequence of the CMU/VASC Image Database.
Graph format: Graph eXchange Language (GXL)
Evaluation
For each dataset D (or subset), each set of cost parameters C and each method M, the distance is computed for all pairs of graphs. Note that for CMU, each graph is compared with graphs representing the same image rotated at 10, 20, 30, 40, 50, 60, 70, 80, and 90 degrees only (see README file in datasets).
Then, methods are compared according to the following metrics [3] computed for each triplet (D,C,M):
 mean deviation of the computed distances from the reference distances
 number of times the reference distance is found
 mean running time
 time/deviation score
Prepare your method
Given two graphs g1.gxl and g2.gxl, each encoding a graph, your program must:
 load the graphs and their attributes from the GXL files
 compute the distance based on edit operations
 print the following information on the standard output:
opt_sol;time;node_match
opt_sol
is an integer: 0 (not found), 1 (found), 2 (don't know)time
(in seconds) is the total running time of steps 1 and 2, printed with 8 digits after the commanode_match
is a vector encoding substitutions and removals of nodes of g1 it is indexed by nodes of g1 in the order defined in the GXL file (attribute
id
)  for each node
i
of g1node_match_{i} = j
ifi
is substituted by the node of g2 with indexj
, ornode_match_{i} = 1
ifi
is removed
 it is indexed by nodes of g1 in the order defined in the GXL file (attribute
Step 1 or 2 must include the computation of the costs. Moreover, it cannot run more than 30 seconds, and no more than 4 threads can be used.
Since all graphs are simple, the full edit path can be easily reconstructed from this vector without any ambiguity. Its total cost defines the value of the distance computed by your method. It cannot be less than the exact GED.
We are in charge of the computation of the full edit path and the associated distance from node_match
.
Example
Your program computes a distance in 0.5 seconds between a graph g1 with 4 nodes and a graph g2 with 5 nodes, with operations on nodes given by:
 substitutions
1>2
,2>1
,4>4
 removal: node
3
of g1  insertions: nodes
4
and5
of g2
and induced operations on edges. Your method does not know if the distance is exact.
Your program must print: 2;0.50000000;2 1 1 4
on the standard output.
Other operations are computed from this last information and from the graphs.
Programs
We accept commandline programs for Linux 64 bits (16GB RAM, 4 QuadCore AMD Opteron F 8350 2GHz):
 binaries, executable JAR files
 source code in C, C++, Java
 MATLAB scripts (with/out MEX files, compiled or not)
 other languages may be possible on demand
There are 3 types of pairs of attributes on nodes and edges, so you have to prepare 3 programs, as described below.
1. for symbolic graphs (Alkane, Acyclic, PAH, MAO, MUTA)
6 inputs  node costs  c_{node,sub}, c_{node,del/ins} 

edge costs  c_{edge,sub}, c_{edge,del/ins}  
graphs as GXL files  g1.gxl, g2.gxl 
myexec_symbolic c_{node,sub} c_{node,del/ins} c_{edge,sub} c_{edge,del/ins} g1.gxl g2.gxl
with costs computed by myexec_symbolic
as follows:
dataset  node attribute  node costs  edge attribute  edge costs 

Alkane  chemical symbol (label) l 
c_{sub}(u,v)=0 if l(u)=l(v) , c_{node,sub} elsec_{del/ins}(u)=c_{node,del/ins} for all node u

valence (bound type) t 
c_{sub}(a,b)=0 if t(a)=t(b) , c_{edge,sub} elsec_{del/ins}(a)=c_{edge,del/ins} for all edge a 
PAH  
MAO  
Acyclic  
MUTA 
several values of parameters c_{node,sub}
, c_{node,del/ins}
, c_{edge,sub}
, c_{edge,del/ins}
are tested.
2. for GREC dataset
2 inputs  graphs as GXL files  g1.gxl, g2.gxl 

myexec_grec g1.gxl g2.gxl
with costs computed by myexec_grec
as follows:
GREC dataset  used attributes  costs  

nodes  (x,y) coordinates pos , andtype (label) l 
c_{sub}(u,v) 
0.5*d_{euc}(pos(u),pos(v)) if l(u)=l(v) (same labels) 
90 else (different labels) 

c_{ins/del} 
45 

edges  type0 (string: line or arc)t frequency (label) f 
c_{sub}(a,b) 
0 if f(a)=f(b)=1 and t(a)=t(b) 15 if f(a)=f(b)=1 and t(a)≠t(b) 
0 if f(a)=f(b)=2 7.5 if f(a)≠f(b) 

c_{ins/del}(a) 
7.5*f(a) 
where d_{euc}
is the Euclidean distance.
3. for CMU dataset
2 inputs  graphs as GXL files  g1.gxl, g2.gxl 

myexec_cmu g1.gxl g2.gxl
with costs computed by myexec_cmu
as follows:
Dataset  node attributes  fixed node costs  edge attributes  edge costs 

CMU  constant integer 1  c_{sub}=0 c_{ins/del}=inf 
Euclidean length L 
c_{sub}(a_{i},a_{j})=0.5L(a_{i})L(a_{j}) c_{del/ins}(a)=0.5*dist(a)

Participants and methods
8 methods were submitted:
 Bipartite GED with random walks and Hungarian algorithm adapted to errorcorrecting bipartite matching (LSAPE)
 Approximate Graph Edit Distance Guided by Bipartite Matching of Bags of Walks
B. Gaüzère, S. Bougleux, K. Riesen, L. Brun
S+SSPR 2014 [paper]  Linear Sum Assignment with Edition
S. Bougleux, L. Brun
Research Report, Normandie Univ, 2016 [paper]
 Approximate Graph Edit Distance Guided by Bipartite Matching of Bags of Walks
 GED as a quadratic assignment problem approximated by IPFP (QAPE)
 Graph Edit Distance as a Quadratic Program
S. Bougleux, B. Gaüzère, L. Brun
ICPR 2016 [to appear]  Graph Edit Distance as a Quadratic Assignment Problem
S. Bougleux, L. Brun, V. Carletti, P. Foggia, B. Gaüzère, M. Vento
Pattern Recognition Letters 87, 2017 [paper]
 Graph Edit Distance as a Quadratic Program
 GED as a binary linear programming problem solved by CPLEX (F2)
Parallel version of F2 via a CPLEX parameter dedicated to number of threads (F24threads)
Relaxed version of F2 (F2LP) Graph edit distance: a new binary linear programming formulation
J. Lerouge, Z. AbuAisheh, R. Raveaux, P. Héroux, S. Adam
Research Report [paper]  Exact Graph Edit Distance Computation Using a Binary Linear Program
J. Lerouge, Z. AbuAisheh, R. Raveaux, P. Héroux, S. Adam
S+SSPR 2016 [paper]
 Graph edit distance: a new binary linear programming formulation
 GED solved by depthfirstbased branchandbound strategy (DF)
Exploration of one branch in the search tree of DF (DFUB) An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems
Z. AbuAisheh, R. Raveaux, J.Y. Ramel and P. Martineau
ICPRAM 2015 [paper]
 An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems
 Parallel version of DF via a load balancing strategy (PDFS)
 A Parallel Graph Edit Distance Algorithm
Z. AbuAisheh, R. Raveaux, J.Y. Ramel and P. Martineau
Research Report, 2017 [paper]
 A Parallel Graph Edit Distance Algorithm
Results
Following quantities are reported here:
 mean deviation from the reference distances (in %)
 % of reached reference distances
 mean computational time (in seconds)
More details are given in Evaluation and Prepare your method sections, and in the presentation given at ICPR. Also, a deeper analysis has been recently submitted to a journal.
For symbolic datasets, 3 cost settings have been used:
c_{node,sub}=2, c_{node,del/ins}=4, c_{edge,sub}=c_{edge,ins/del}=1
c_{node,sub}=2, c_{node,del/ins}=4, c_{edge,sub}=1, c_{edge,ins/del}=2
c_{node,sub}=6, c_{node,del/ins}=2, c_{edge,sub}=3, c_{edge,ins/del}=1
Alkane  

cost parameter 1  cost parameter 2  cost parameter 3  
Dev  %  Time  Dev  %  Time  Dev  %  Time  
BS  16.623  59.448  0.133  20  59.835  0.133  18.258  59.835  0.201 
LSAPE  65.654  5.431  0.013  75.340  5.817  0.014  73.766  5.817  0.014 
QAPE  18.939  54.715  0.029  23.264  54.546  0.028  21.482  54.546  0.028 
F2  0  100  0.833  0  100  0.835  0  100  0.835 
F24threads  0  100  0.918  0  100  0.931  0  100  0.922 
F2LP  38.507  29.746  0.809  46.632  29.933  0.808  44.728  29.928  0.808 
DF  0  99.613  0.705  0  100  0.706  0  100  0.703 
DFUB  19.776  52.546  0.035  23.928  52.933  0.035  21.979  52.933  0.035 
PDFS  0  100  0.517  0  100  0.493  0  100  0.526 
Acyclic  

cost 1  cost 2  cost 3  
Deviation  %  Time  Deviation  %  Time  Deviation  %  Time  
BS  21.682  19.394  0.127  34.016  13.464  0.127  22.304  20.042  0.128 
LSAPE  38.437  9.677  0.010  55.823  7.865  0.010  40.136  10.283  0.010 
QAPE  9.390  64.340  0.024  13.732  57.165  0.023  10.141  57.759  0.024 
F2  0  100  0.792  0  100  0.815  0  100  0.790 
F24threads  0  100  0.839  0  100  0.865  0  100  0.837 
F2LP  14.813  56.484  0.781  25.266  42.575  0.788  11.825  58.156  0.780 
DF  0.045  99.782  2.949  0.080  99.647  3.351  0.056  99.674  4.320 
DFUB  33.796  8.764  0.036  45.938  7.402  0.036  36.059  9.280  0.036 
PDFS  0.007  99.967  1.976  0.019  99.922  2.359  0.019  99.895  2.359 
PAH  

cost 1  cost 2  cost 3  
Deviation  %  Time  Deviation  %  Time  Deviation  %  Time  
BS  49.484  4.741  0.786  62.790  4.538  0.779  59.830  4.730  0.777 
LSAPE  93.919  1.063  0.141  98.665  1.063  0.144  97.237  1.063  0.142 
QAPE  41.361  9.042  0.196  54.656  9.325  0.191  51.627  9.574  0.189 
F2  13.541  61.736  18.782  17.882  61.407  18.721  15.315  62.426  18.594 
F24threads  1.294  84.019  11.428  3.278  83.080  11.511  1.426  83.997  11.408 
F2LP  92.706  0.882  2.709  65.105  2.206  2.709  96.530  1.063  2.709 
DF  33.827  16.772  27.446  45.019  16.455  27.435  42.137  16.862  27.441 
DFUB  56.308  3.225  0.100  70.246  3.066  0.100  67.414  3.225  0.100 
PDFS  31.023  20.348  26.498  41.550  20.133  26.484  38.656  20.484  26.484 
MUTA  

MUTA10  MUTA20  MUTA30  MUTA40  MUTA50  MUTA60  MUTA70  MUTAMIX  
Dev  Time  Dev  Dev  Time  Dev  Dev  Time  Dev  Dev  Time  Dev  Dev  Time  Dev  Dev  time  Dev  Dev  Time  Dev  Dev  Time  Dev  
BS  13.769  0.15  49.054  0.56  48.628  1.54  59.997  3.42  55.078  7.21  45.563  14.94  69.847  28.42  26.347  11.69  
LSAPE  54.456  0.02  86.870  0.11  91.132  0.46  89.868  1.47  84.990  4.25  88.332  11.69  86.487  27.76  45.623  4.98  
QAPE  11.270  0.04  25.971  0.17  27.498  0.49  22.670  1.40  5.214  4.34  3.335  11.55  2.490  27.09  3.719  4.92  
F2  0  0.61  0.547  6.24  11.179  25.72  18.591  27.50  26.893  30  43.395  34.14  56.530  42.20  9.375  22.89  
F24threads  0  0.61  0  3.57  1.369  24.70  1.842  27.40  6.539  29.91  11.144  34.13  26.961  42.10  1.802  21.51  
F2LP  15.186  0.57  47.308  1.38  72.450  3.09  65.052  7.70  46.954  16.51  47.900  30.36  57.691  42.08  18.559  14.15  
DF  0  1.70  42.278  27.05  47.624  27.08  58.678  27.10  57.334  27.17  48.201  27.23  42.855  27.37  25.721  27.17  
DFUB  28.554  0.04  64.687  0.09  66.615  0.14  71.280  0.22  70.072  0.29  65.286  0.37  57  0.50  30.948  0.27  
PDFS  0  1.24  25.716  26.65  37.903  27.08  50.340  27.10  47.937  27.17  38.746  27.23  36.300  27.37  22.442  26.96 
MAO  

cost 1  cost 2  cost 3  
Deviation  %  Time  Deviation  %  Time  Deviation  %  Time  
BS  5.980  72.577  0.559  9.427  67.236  0.562  7.880  74.372  0.537 
LSAPE  30.593  19.355  0.097  45.677  16.976  0.094  76.581  1.384  0.086 
QAPE  2.017  77.573  0.128  2.779  79.044  0.123  5.921  72.534  0.126 
F2  0  100  1.065  0  100  1.268  0  100  0.979 
F24threads  0  100  1.047  0  100  1.513  0  100  0.990 
F2LP  1.242  89.100  1.050  0  100  1.268  2.330  86.029  0.977 
DF  0.349  98.421  6.996  0.763  97.945  8.858  0.877  96.799  5.513 
DFUB  11.254  56.444  0.083  16.157  56.293  0.083  13.902  59.580  0.083 
PDFS  1.589  95.436  7.368  2.784  94.442  8.806  2.001  94.809  5.793 
CMU  

Deviation  %  Time  
BS  54.636  28.484  29.235  
F2  36.469  50.909  26.128  
F24threads  0  50.909  24.708  
F2LP  36.625  50.757  24.516  
DF  39.313  40.454  25.773  
DFUB  37.504  41.212  0.657  
PDFS  38.641  40.606  24.788 
GREC  

GREC5  GREC10  GREC15  GREC20  GRECMIX  
Deviation  %  Time  Deviation  %  Time  Deviation  %  Time  Deviation  %  Time  Deviation  %  Time  
BS  2.907  71  0.085  1.291  61  0.413  1.955  42  0.833  3.265  32  1.601  1.900  39  0.622 
F2  0  100  0.411  0  100  0.449  0  100  1.752  0  100  1.401  0  100  0.493 
F24threads  0  100  0.420  0  100  0.488  0  100  1.240  0  100  1.335  0  100  0.519 
F2LP  0  100  0.409  0.678  77  0.445  1.364  52  0.662  1.072  47  0.872  0.751  69  0.468 
DF  0  100  0.099  0  100  5.308  1.310  47  16.503  2.071  39  21.848  0.486  83  7.685 
DFUB  11.152  41  0.051  4.832  44  0.088  9.941  18  0.141  12.444  27  0.191  8.172  26  0.107 
PDFS  0  100  0.155  0  100  3.677  1.210  48  16.691  1.556  43  22.149  0.467  85  7.150 
Challenge 2 Graph classification
Goal
Given a dataset decomposed into a train set, a validation set and a test set, determine the class of the graphs in the test set with a usual kNN classifier (with k=3, decision at majority, reject if 3 different classes)
To this, you have to train your measure according to the train set and the validation set. The final output corresponds to a matrix of dissimilarities including all pairwise dissimilarities of graphs contained in the three sets (train, validation and test). In order to avoid any ambiguity in the interpretation of the matrix, please respect carrefully the order of the graphs provided in a separate file given with each datasets.
Several datasets are considered: four existing datasets (already used in Challenge 1)
 a new dataset of graphs encoding monoterpenes (main components of essential oils)
Evaluation
Methods are analyzed and compared, for each dataset, according to: recognition rate
 confusion matrix
Participation
This challenge is open to any method which computes a dissimilarity between attributed graphs.
Contrary to Challenge 1, methods do not need to compute a distance based on edit paths. For such a distance, costs are not constrained in this challenge, and more generally, parameters can be optimized to obtain the best classification score. Also, running time and number of threads are not limited.
See How to participate for more informations
Datasets
Dataset  #graphs  #graphs train, validation, test  mean #nodes  mean degree  min #nodes  max #nodes  node attributes  edge attributes  #class 

PAH  94  23, 23, 48  20.7  20.4  10  28  any attribute defined in the graph files can be used  2  
MAO  68  17, 17, 34  18.4  2.1  11  27  2  
MUTA  4337  1500, 500, 2337  30.3  417  2  
GREC  1100  286, 286, 528  11.5  25  22  
MONOTERP  to be announced  10 
Since datasets are already known with classes (except for MONOTERP), 10% of graphs will be removed in each dataset, and decomposition of each dataset into train/validation/test does not correspond to original ones
About these datasets
 PAH and MAO are part of GREYC's Chemistry dataset.
 GREC and MUTA are part of IAM Graph Database Repository
 MONOTERP is a new dataset (monoterpenes) available for the participants of the contest (no class is provided for graphs in the test set)
it will be included in GREYC's Chemistry dataset after the contest
Graph format: Graph eXchange Language (GXL)
How to participate
This challenge is open to any method that computes a dissimilarity measure between attributed graphs.
 inscriptions before July 12
 download datasets and compute dissimilarity matricies
 submit your results before August 12, as a compressed archive (zip, tgz, ...) containing:
 a
README
file containing: names of the participants
 name of the method
 an abstract (maximum 200 words) for contest report and website
 a folder for each dataset (having the name of the dataset), each containing:
 a text file
diss.dat
containing the dissimilarity matrixM
(each line of the matrix being on one line of the file), each linei
and each columnj
correspond to a graph of the dataset, they are indexed according to their order in filestrain.cxl
, thenvalid.cxl
and thentest.cxl
, values are separated by space(s)  a file
time.txt
containing the time taken to compute this matrix
 a text file
 a
 submit your extended abstract
Dates and inscriptions
registration open  13 June 

datasets available  15 June 
registration closed  12 July 
submission deadlines  
Challenge1 Challenge2 abstract  15 July 12 August 15 August 
results  December 
Inscriptions / Submissions
 Register by sending an email to containing:
 the names of the participants
 the title of your method
 selected challenge(s), and if Challenge 1 is selected, the type of programs
 Download datasets
 Challenge 1: from this website (datasets) for preparing your programs
 Challenge 2: from the link given after inscription (datasets)
 Submit your results (see Challenge 1 or Challenge 2)
 Challenge 1: your three programs
 Challenge 2: your classification results
 Submit your extended abstract (1 A4 page)
You can register/submit to only one challenge, to both challenges at the same time, or to one challenge and then to the other one. Register and submit your programs for Challenge 1 as soon as possible.
Publications
Any method must be described in an extended abstract (1 A4 page), containing details for Challenge 1, Challenge 2, or both.
New methods can be submitted (46 pages in ICPR style), reviewed, and published in ICPR proceedings (more details soon).
Organizers
Scientific committee
Luc Brun  Normandie Univ, ENSICAEN, France 
Xiaoyi Jiang  Univ Münster, Germany 
JeanYves Ramel  Univ Tours, LI Tours, France 
Organizing committee
Zeina AbuAisheh  Univ Tours, LI Tours, France 
Sébastien Bougleux  Normandie Univ, UNICAEN, France 
Benoit Gaüzère  Normandie Univ, INSA Rouen, LITIS, France 
References
 M. Neuhaus, K. Riesen and H. Bunke. Fast Suboptimal Algorithms for the Computation of Graph Edit Distance. Structural, Syntactic, and Statistical Pattern Recognition. LNCS Vol. 4109, pp 163172, 2006
 K. Riesen, M. Neuhaus, H. Bunke. Bipartite graph matching for computing the edit distance of graphs. GraphBased Representations in Pattern Recognition. LNCS Vol. 4538, p. 112, 2007
 Z. AbuAisheh, R. Raveaux, J.Y. Ramel. A Graph Database Repository and Performance Evaluation Metrics for Graph Edit Distance. GraphBased Representations in Pattern Recognition. LNCS Vol. 9069, p. 138147, 2015