ICPR 2016 - Graph Distance Contest

Measuring the dissimilarity between graphs is a key step in data analysis based on graph representations. This contest focusses on the definition and the computation of general dissimilarity measures through two challenges:

  • Challenge 1: computation of the exact or an approximate graph edit distance
  • Challenge 2: computation of a dissimilarity measure for graph classification

Different datasets are considered with symbolic or numerical attributes on both nodes and edges. Methods submitted to Challenge 1 can also be submitted to Challenge 2.

This contest is supported by the Technical Committee on Graph-Based Representations in Pattern Recognition (TC 15) of the International Association of Pattern Recognition (IAPR)

April 2017Results of Challenge 1 submitted to a journal
Dec. 2016ICPR / results
Sept. 2016Submission deadline

Submissions

Number of methods
- Challenge 1: 8
- Challenge 2: 2

Goal

Compute the exact or an approximate GED, for several pairs of graphs of several datasets, under the following constraints: Datasets are divided into two groups:

Evaluation

Methods are compared according to Given two graphs, the reference distance can be: More information about evaluation.

Participation

This challenge is open to any method based on the computation of edit paths:

Datasets

Three groups of datasets are considered.

Dataset #graphs mean #nodes mean degree min #nodes max #nodes node attributes edge attributes exact GED
Alkane 150 8.9 1.8 1 10 chemical symbol valence (bound type) 99.5467%
Acyclic 185 8.2 1.8 3 11 85.7595%
PAH 94 20.7 20.4 10 28 no
MAO 68 18.4 2.1 11 27 no
CMU 111 30 30 30 - distance no

For a finer analysis, each dataset of the second group is decomposed into several subsets of graphs having the same number of nodes, and one subset of graphs having different number of nodes. Each subset is considered as a dataset.

Dataset #subsets #graphs / subset mean #nodes mean degree min #nodes max #nodes node attributes edge attributes exact GED
MUTA 7 10 10 10 10 chemical symbol valence (bound type) no
20 20 20
30 30 30
40 40 40
50 50 50
60 60 60
70 70 70
10, 20, ..., 70 10 70
GREC 4 10 5 5 5 x,y coordinate and type type 90% (100% soon)
10 10 10
15 15 15
20 20 20
5, 10, 15, ..., 20 5 20

Other attributes can be found in the datasets. Attributes listed here are those used in the challenge. See Prepare your method for the expressions of the costs using these attributes.

About these datasets

All datasets are taken from existing benchmarks: Visit links for more information.

Evaluation

For each dataset D (or subset), each set of cost parameters C and each method M, the distance is computed for all pairs of graphs. Note that for CMU, each graph is compared with graphs representing the same image rotated at 10, 20, 30, 40, 50, 60, 70, 80, and 90 degrees only (see README file in datasets).

Then, methods are compared according to the following metrics [3] computed for each triplet (D,C,M):

Prepare your method

Given two graphs g1.gxl and g2.gxl, each encoding a graph, your program must:

  1. load the graphs and their attributes from the GXL files
  2. compute the distance based on edit operations
  3. print the following information on the standard output: opt_sol;time;node_match
    • opt_sol is an integer: 0 (not found), 1 (found), 2 (don't know)
    • time (in seconds) is the total running time of steps 1 and 2, printed with 8 digits after the comma
    • node_match is a vector encoding substitutions and removals of nodes of g1
      • it is indexed by nodes of g1 in the order defined in the GXL file (attribute id)
      • for each node i of g1
        • node_matchi = j if i is substituted by the node of g2 with index j, or
        • node_matchi = -1 if i is removed

Step 1 or 2 must include the computation of the costs. Moreover, it cannot run more than 30 seconds, and no more than 4 threads can be used.

Since all graphs are simple, the full edit path can be easily reconstructed from this vector without any ambiguity. Its total cost defines the value of the distance computed by your method. It cannot be less than the exact GED.

We are in charge of the computation of the full edit path and the associated distance from node_match.

Example

Your program computes a distance in 0.5 seconds between a graph g1 with 4 nodes and a graph g2 with 5 nodes, with operations on nodes given by:

and induced operations on edges. Your method does not know if the distance is exact.

Your program must print: 2;0.50000000;2 1 -1 4 on the standard output.

Other operations are computed from this last information and from the graphs.

Programs

We accept command-line programs for Linux 64 bits (16GB RAM, 4 Quad-Core AMD Opteron F 8350 2GHz):

There are 3 types of pairs of attributes on nodes and edges, so you have to prepare 3 programs, as described below.

1. for symbolic graphs (Alkane, Acyclic, PAH, MAO, MUTA)

6 inputsnode costscnode,sub, cnode,del/ins
edge costscedge,sub, cedge,del/ins
graphs as GXL filesg1.gxl, g2.gxl
myexec_symbolic cnode,sub cnode,del/ins cedge,sub cedge,del/ins g1.gxl g2.gxl

with costs computed by myexec_symbolic as follows:

dataset node attribute node costs edge attribute edge costs
Alkane chemical symbol (label) l csub(u,v)=0 if l(u)=l(v), cnode,sub else

cdel/ins(u)=cnode,del/ins for all node u
valence (bound type) t csub(a,b)=0 if t(a)=t(b), cedge,sub else

cdel/ins(a)=cedge,del/ins for all edge a
PAH
MAO
Acyclic
MUTA

several values of parameters cnode,sub, cnode,del/ins, cedge,sub, cedge,del/ins are tested.

2. for GREC dataset

2 inputsgraphs as GXL filesg1.gxl, g2.gxl
myexec_grec g1.gxl g2.gxl

with costs computed by myexec_grec as follows:

GREC dataset used attributes costs
nodes (x,y) coordinates pos, and
type (label) l
csub(u,v) 0.5*deuc(pos(u),pos(v)) if l(u)=l(v) (same labels)
90 else (different labels)
cins/del 45
edges type0 (string: line or arc)t
frequency (label) f
csub(a,b) 0 if f(a)=f(b)=1 and t(a)=t(b)
15 if f(a)=f(b)=1 and t(a)≠t(b)
0 if f(a)=f(b)=2
7.5 if f(a)≠f(b)
cins/del(a) 7.5*f(a)

where deuc is the Euclidean distance.

3. for CMU dataset

2 inputsgraphs as GXL filesg1.gxl, g2.gxl
myexec_cmu g1.gxl g2.gxl

with costs computed by myexec_cmu as follows:

Dataset node attributes fixed node costs edge attributes edge costs
CMU constant integer 1 csub=0
cins/del=inf
Euclidean length L csub(ai,aj)=0.5|L(ai)-L(aj)|
cdel/ins(a)=0.5*dist(a)

Participants and methods

8 methods were submitted:

Beam search (BS), proposed by M. Neuhaus, K. Riesen and H. Bunke [1], was also tested

Results

Following quantities are reported here:

More details are given in Evaluation and Prepare your method sections, and in the presentation given at ICPR. Also, a deeper analysis has been recently submitted to a journal.

For symbolic datasets, 3 cost settings have been used:

  1. cnode,sub=2, cnode,del/ins=4, cedge,sub=cedge,ins/del=1
  2. cnode,sub=2, cnode,del/ins=4, cedge,sub=1, cedge,ins/del=2
  3. cnode,sub=6, cnode,del/ins=2, cedge,sub=3, cedge,ins/del=1
Alkane
cost parameter 1cost parameter 2cost parameter 3
Dev%TimeDev%TimeDev%Time
BS 16.623 59.448 0.133 20 59.835 0.133 18.258 59.835 0.201
LSAPE65.654 5.431 0.013 75.340 5.817 0.014 73.766 5.817 0.014
QAPE 18.939 54.715 0.029 23.264 54.546 0.028 21.482 54.546 0.028
F2 0 100 0.833 0 100 0.835 0 100 0.835
F24threads 0 100 0.918 0 100 0.931 0 100 0.922
F2LP 38.507 29.746 0.809 46.632 29.933 0.808 44.728 29.928 0.808
DF 0 99.613 0.705 0 100 0.706 0 100 0.703
DFUB 19.776 52.546 0.035 23.928 52.933 0.035 21.979 52.933 0.035
PDFS 0 100 0.517 0 100 0.493 0 100 0.526
Acyclic
cost 1cost 2cost 3
Deviation%TimeDeviation%TimeDeviation%Time
BS 21.682 19.394 0.127 34.016 13.464 0.127 22.304 20.042 0.128
LSAPE 38.437 9.677 0.010 55.823 7.865 0.010 40.136 10.283 0.010
QAPE 9.390 64.340 0.024 13.732 57.165 0.023 10.141 57.759 0.024
F2 0 100 0.792 0 100 0.815 0 100 0.790
F24threads 0 100 0.839 0 100 0.865 0 100 0.837
F2LP 14.813 56.484 0.781 25.266 42.575 0.788 11.825 58.156 0.780
DF 0.045 99.782 2.949 0.080 99.647 3.351 0.056 99.674 4.320
DFUB 33.796 8.764 0.036 45.938 7.402 0.036 36.059 9.280 0.036
PDFS 0.007 99.967 1.976 0.019 99.922 2.359 0.019 99.895 2.359
PAH
cost 1cost 2cost 3
Deviation%TimeDeviation%TimeDeviation%Time
BS 49.484 4.741 0.786 62.790 4.538 0.779 59.830 4.730 0.777
LSAPE 93.919 1.063 0.141 98.665 1.063 0.144 97.237 1.063 0.142
QAPE 41.361 9.042 0.196 54.656 9.325 0.191 51.627 9.574 0.189
F2 13.541 61.736 18.782 17.882 61.407 18.721 15.315 62.426 18.594
F24threads 1.294 84.019 11.428 3.278 83.080 11.511 1.426 83.997 11.408
F2LP 92.706 0.882 2.709 65.105 2.206 2.709 96.530 1.063 2.709
DF 33.827 16.772 27.446 45.019 16.455 27.435 42.137 16.862 27.441
DFUB 56.308 3.225 0.100 70.246 3.066 0.100 67.414 3.225 0.100
PDFS 31.023 20.348 26.498 41.550 20.133 26.484 38.656 20.484 26.484
MUTA
MUTA-10MUTA-20MUTA-30 MUTA-40MUTA-50MUTA-60MUTA-70MUTA-MIX
DevTimeDevDevTimeDevDevTimeDev DevTimeDev DevTimeDevDevtimeDevDevTimeDevDevTimeDev
BS 13.769 0.15 49.054 0.56 48.628 1.54 59.997 3.42 55.078 7.21 45.563 14.94 69.847 28.42 26.347 11.69
LSAPE 54.456 0.02 86.870 0.11 91.132 0.46 89.868 1.47 84.990 4.25 88.332 11.69 86.487 27.76 45.623 4.98
QAPE 11.270 0.04 25.971 0.17 27.498 0.49 22.670 1.40 5.214 4.34 3.335 11.55 2.490 27.09 3.719 4.92
F2 0 0.61 0.547 6.24 11.179 25.72 18.591 27.50 26.893 30 43.395 34.14 56.530 42.20 9.375 22.89
F24threads 0 0.61 0 3.57 1.369 24.70 1.842 27.40 6.539 29.91 11.144 34.13 26.961 42.10 1.802 21.51
F2LP 15.186 0.57 47.308 1.38 72.450 3.09 65.052 7.70 46.954 16.51 47.900 30.36 57.691 42.08 18.559 14.15
DF 0 1.70 42.278 27.05 47.624 27.08 58.678 27.10 57.334 27.17 48.201 27.23 42.855 27.37 25.721 27.17
DFUB 28.554 0.04 64.687 0.09 66.615 0.14 71.280 0.22 70.072 0.29 65.286 0.37 57 0.50 30.948 0.27
PDFS 0 1.24 25.716 26.65 37.903 27.08 50.340 27.10 47.937 27.17 38.746 27.23 36.300 27.37 22.442 26.96
MAO
cost 1cost 2cost 3
Deviation%TimeDeviation%TimeDeviation%Time
BS5.98072.5770.5599.42767.2360.5627.88074.3720.537
LSAPE30.59319.3550.09745.67716.9760.09476.5811.3840.086
QAPE2.01777.5730.1282.77979.0440.1235.92172.5340.126
F201001.06501001.26801000.979
F24threads01001.04701001.51301000.990
F2LP1.24289.1001.05001001.2682.33086.0290.977
DF0.34998.4216.9960.76397.9458.8580.87796.7995.513
DFUB11.25456.4440.08316.15756.2930.08313.90259.5800.083
PDFS1.58995.4367.3682.78494.4428.8062.00194.8095.793
CMU
Deviation%Time
BS 54.636 28.484 29.235
F2 36.469 50.909 26.128
F24threads 0 50.909 24.708
F2LP 36.625 50.757 24.516
DF 39.313 40.454 25.773
DFUB 37.504 41.212 0.657
PDFS 38.641 40.606 24.788
GREC
GREC-5GREC-10GREC-15 GREC-20GREC-MIX
Deviation%TimeDeviation%TimeDeviation%Time Deviation%Time Deviation%Time
BS 2.907 71 0.085 1.291 61 0.413 1.955 42 0.833 3.265 32 1.601 1.900 39 0.622
F2 0 100 0.411 0 100 0.449 0 100 1.752 0 100 1.401 0 100 0.493
F24threads 0 100 0.420 0 100 0.488 0 100 1.240 0 100 1.335 0 100 0.519
F2LP 0 100 0.409 0.678 77 0.445 1.364 52 0.662 1.072 47 0.872 0.751 69 0.468
DF 0 100 0.099 0 100 5.308 1.310 47 16.503 2.071 39 21.848 0.486 83 7.685
DFUB 11.152 41 0.051 4.832 44 0.088 9.941 18 0.141 12.444 27 0.191 8.172 26 0.107
PDFS 0 100 0.155 0 100 3.677 1.210 48 16.691 1.556 43 22.149 0.467 85 7.150

Goal

Given a dataset decomposed into a train set, a validation set and a test set, determine the class of the graphs in the test set with a usual k-NN classifier (with k=3, decision at majority, reject if 3 different classes)

To this, you have to train your measure according to the train set and the validation set. The final output corresponds to a matrix of dissimilarities including all pairwise dissimilarities of graphs contained in the three sets (train, validation and test). In order to avoid any ambiguity in the interpretation of the matrix, please respect carrefully the order of the graphs provided in a separate file given with each datasets.

Several datasets are considered:

Evaluation

Methods are analyzed and compared, for each dataset, according to: You submit your results (dissimilarity matrix and computational time) and we perform the classification.

Participation

This challenge is open to any method which computes a dissimilarity between attributed graphs.

Contrary to Challenge 1, methods do not need to compute a distance based on edit paths. For such a distance, costs are not constrained in this challenge, and more generally, parameters can be optimized to obtain the best classification score. Also, running time and number of threads are not limited.

See How to participate for more informations

Datasets

Dataset #graphs #graphs train, validation, test mean #nodes mean degree min #nodes max #nodes node attributes edge attributes #class
PAH 94 23, 23, 48 20.7 20.4 10 28 any attribute defined in the graph files can be used 2
MAO 68 17, 17, 34 18.4 2.1 11 27 2
MUTA 4337 1500, 500, 2337 30.3 417 2
GREC 1100 286, 286, 528 11.5 25 22
MONOTERP to be announced 10

Since datasets are already known with classes (except for MONOTERP), 10% of graphs will be removed in each dataset, and decomposition of each dataset into train/validation/test does not correspond to original ones

About these datasets

Visit links for more information.

How to participate

This challenge is open to any method that computes a dissimilarity measure between attributed graphs.

  1. inscriptions before July 12
  2. download datasets and compute dissimilarity matricies
  3. submit your results before August 12, as a compressed archive (zip, tgz, ...) containing:
    • a README file containing:
      • names of the participants
      • name of the method
      • an abstract (maximum 200 words) for contest report and website
    • a folder for each dataset (having the name of the dataset), each containing:
      • a text file diss.dat containing the dissimilarity matrix M (each line of the matrix being on one line of the file), each line i and each column j correspond to a graph of the dataset, they are indexed according to their order in files train.cxl, then valid.cxl and then test.cxl, values are separated by space(s)
      • a file time.txt containing the time taken to compute this matrix
  4. submit your extended abstract
registration open13 June
datasets available15 June
registration closed12 July
submission deadlines
Challenge-1
Challenge-2
abstract
15 July
12 August
15 August
resultsDecember

Inscriptions / Submissions

  1. Register by sending an email to containing:
    • the names of the participants
    • the title of your method
    • selected challenge(s), and if Challenge 1 is selected, the type of programs
  2. Download datasets
  3. Submit your results (see Challenge 1 or Challenge 2)
  4. Submit your extended abstract (1 A4 page)

You can register/submit to only one challenge, to both challenges at the same time, or to one challenge and then to the other one. Register and submit your programs for Challenge 1 as soon as possible.

Any method must be described in an extended abstract (1 A4 page), containing details for Challenge 1, Challenge 2, or both.

New methods can be submitted (4-6 pages in ICPR style), reviewed, and published in ICPR proceedings (more details soon).

Scientific committee

Luc BrunNormandie Univ, ENSICAEN, France
Xiaoyi JiangUniv Münster, Germany
Jean-Yves RamelUniv Tours, LI Tours, France

Organizing committee

Zeina Abu-AishehUniv Tours, LI Tours, France
Sébastien BougleuxNormandie Univ, UNICAEN, France
Benoit GaüzèreNormandie Univ, INSA Rouen, LITIS, France
  1. M. Neuhaus, K. Riesen and H. Bunke. Fast Suboptimal Algorithms for the Computation of Graph Edit Distance. Structural, Syntactic, and Statistical Pattern Recognition. LNCS Vol. 4109, pp 163-172, 2006
  2. K. Riesen, M. Neuhaus, H. Bunke. Bipartite graph matching for computing the edit distance of graphs. Graph-Based Representations in Pattern Recognition. LNCS Vol. 4538, p. 1-12, 2007
  3. Z. Abu-Aisheh, R. Raveaux, J.-Y. Ramel. A Graph Database Repository and Performance Evaluation Metrics for Graph Edit Distance. Graph-Based Representations in Pattern Recognition. LNCS Vol. 9069, p. 138-147, 2015