1
# -*- coding: utf-8 -*-
2 3
"""
3
    Implementation of the modified Gromov–Hausdorff (mGH) distance
4
    between compact metric spaces induced by unweighted graphs. This
5
    code complements the results from "Efficient estimation of a
6
    Gromov–Hausdorff distance between unweighted graphs" by V. Oles et
7
    al. (https://arxiv.org/pdf/1909.09772). The mGH distance was first
8
    defined in "Some properties of Gromov–Hausdorff distances" by F.
9
    Mémoli (Discrete & Computational Geometry, 2012).
10

11
    Author: Vladyslav Oles
12

13
    ===================================================================
14

15
    Usage examples:
16

17
    1) Estimating the mGH distance between 4-clique and single-vertex
18
    graph from their adjacency matrices. Note that it suffices to fill
19
    only the upper triangle of an adjacency matrix.
20

21
    >>> AG = [[0, 1, 1, 1], [0, 0, 1, 1], [0, 0, 0, 1], [0, 0, 0, 0]]
22
    >>> AH = [[0]]
23
    >>> lb, ub = gromov_hausdorff(AG, AH)
24
    >>> lb, ub
25
    (0.5, 0.5)
26

27
    2) Estimating the mGH distance between cycle graphs of length 2 and
28
    5 from their adjacency matrices. Note that the adjacency matrices
29
    can be given in both dense and sparse SciPy formats.
30

31
    >>> AI = np.array([[0, 1], [0, 0]])
32
    >>> AJ = sps.csr_matrix(([1] * 5, ([0, 0, 1, 2, 3], [1, 4, 2, 3, 4])), shape=(5, 5))
33
    >>> lb, ub = gromov_hausdorff(AI, AJ)
34
    >>> lb, ub
35
    (0.5, 1.0)
36

37
    3) Estimating all pairwise mGH distances between multiple graphs
38
    from their adjacency matrices as an iterable.
39

40
    >>> As = [AG, AH, AI, AJ]
41
    >>> lbs, ubs = gromov_hausdorff(As)
42
    >>> lbs
43
    array([[0. , 0.5, 0.5, 0.5],
44
           [0.5, 0. , 0.5, 1. ],
45
           [0.5, 0.5, 0. , 0.5],
46
           [0.5, 1. , 0.5, 0. ]])
47
    >>> ubs
48
    array([[0. , 0.5, 0.5, 0.5],
49
           [0.5, 0. , 0.5, 1. ],
50
           [0.5, 0.5, 0. , 1. ],
51
           [0.5, 1. , 1. , 0. ]])
52

53
    ===================================================================
54

55
    Notations:
56

57
    |X| denotes the number of elements in set X.
58

59
    X → Y denotes the set of all mappings of set X into set Y.
60

61
    V(G) denotes vertex set of graph G.
62

63
    mGH(X, Y) denotes the modified Gromov–Hausdorff distance between
64
    compact metric spaces X and Y.
65

66
    row_i(A) denotes the i-th row of matrix A.
67

68
    PSPS^n(A) denotes the set of all permutation similarities of
69
    all n×n principal submatrices of square matrix A.
70

71
    PSPS^n_{i←j}(A) denotes the set of all permutation similarities of
72
    all n×n principal submatrices of square matrix A whose i-th row is
73
    comprised of the entries in row_j(A).
74

75
    ===================================================================
76

77
    Glossary:
78

79
    Distance matrix of metric space X is a |X|×|X| matrix whose
80
    (i, j)-th entry holds the distance between i-th and j-th points of
81
    X. By the properties of a metric, distance matrices are symmetric
82
    and non-negative, their diagonal entries are 0 and off-diagonal
83
    entries are positive.
84

85
    Curvature is a generalization of distance matrix that allows
86
    repetitions in the underlying points of a metric space. Curvature
87
    of an n-tuple of points from metric space X is an n×n matrix whose
88
    (i, j)-th entry holds the distance between the points from i-th and
89
    j-th positions of the tuple. Since these points need not be
90
    distinct, the off-diagonal entries of a curvature can equal 0.
91

92
    n-th curvature set of metric space X is the set of all curvatures
93
    of X that are of size n×n.
94

95
    d-bounded curvature for some d > 0 is a curvature whose
96
    off-diagonal entries are all ≥ d.
97

98
    Positive-bounded curvature is a curvature whose off-diagonal
99
    entries are all positive, i.e. the points in the underlying tuple
100
    are distinct. Equivalently, positive-bounded curvatures are
101
    distance matrices on the subsets of a metric space.
102
"""
103 3
import numpy as np
104 3
import warnings
105 3
import scipy.sparse as sps
106 3
from scipy.sparse.csgraph import shortest_path, connected_components
107

108

109 3
__all__ = ["gromov_hausdorff"]
110

111

112
# To sample √|X| * log (|X| + 1) mappings from X → Y by default.
113 3
DEFAULT_MAPPING_SAMPLE_SIZE_ORDER = np.array([.5, 1])
114

115

116 3
def gromov_hausdorff(
117
        AG, AH=None, mapping_sample_size_order=DEFAULT_MAPPING_SAMPLE_SIZE_ORDER):
118
    """
119
    Estimate the mGH distance between simple unweighted graphs,
120
    represented as compact metric spaces based on their shortest
121
    path lengths.
122

123
    Parameters
124
    -----------
125
    AG: np.array (|V(G)|×|V(G)|)
126
        (Sparse) adjacency matrix of graph G, or an iterable of
127
        adjacency matrices if AH=None.
128
    AH: np.array (|V(H)|×|V(H)|)
129
        (Sparse) adjacency matrix of graph H, or None.
130
    mapping_sample_size_order: np.array (2)
131
        Parameter that regulates the number of mappings to sample when
132
        tightening upper bound of the mGH distance.
133

134
    Returns
135
    --------
136
    lb: float
137
        Lower bound of the mGH distance, or a square matrix holding
138
        lower bounds of pairwise mGH distances if AH=None.
139
    ub: float
140
        Upper bound of the mGH distance, or a square matrix holding
141
        upper bounds of pairwise mGH distances if AH=None.
142
    """
143
    # Form iterable with adjacency matrices.
144 3
    if AH is None:
145 3
        if len(AG) < 2:
146 0
            raise ValueError("'estimate_between_unweighted_graphs' needs at least"
147
                             "2 graphs to discriminate")
148 3
        As = AG
149
    else:
150 3
        As = (AG, AH)
151

152 3
    N = len(As)
153
    # Find lower and upper bounds of each pairwise mGH distance between
154
    # the graphs.
155 3
    lbs = np.zeros((N, N))
156 3
    ubs = np.zeros((N, N))
157 3
    for i in range(N):
158 3
        for j in range(i + 1, N):
159
            # Transform adjacency matrices of a pair of graphs to
160
            # distance matrices.
161 3
            DX = make_distance_matrix_from_adjacency_matrix(As[i])
162 3
            DY = make_distance_matrix_from_adjacency_matrix(As[j])
163
            # Find lower and upper bounds of the mGH distance between
164
            # the pair of graphs.
165 3
            lbs[i, j], ubs[i, j] = estimate(
166
                DX, DY, mapping_sample_size_order=mapping_sample_size_order)
167

168 3
    if AH is None:
169
        # Symmetrize matrices with lower and upper bounds of pairwise
170
        # mGH distances between the graphs.
171 3
        lower_triangle_indices = np.tril_indices(N, -1)
172 3
        lbs[lower_triangle_indices] = lbs.T[lower_triangle_indices]
173 3
        ubs[lower_triangle_indices] = ubs.T[lower_triangle_indices]
174

175 3
        return lbs, ubs
176
    else:
177 3
        return lbs[0, 1], ubs[0, 1]
178

179

180 3
def make_distance_matrix_from_adjacency_matrix(AG):
181
    """
182
    Represent simple unweighted graph as compact metric space (with
183
    integer distances) based on its shortest path lengths.
184

185
    Parameters
186
    -----------
187
    AG: np.array (|V(G)|×|V(G)|)
188
        (Sparse) adjacency matrix of simple unweighted graph G.
189

190
    Returns
191
    --------
192
    DG: np.array (|V(G)|×|V(G)|)
193
        (Dense) distance matrix of the compact metric space
194
        representation of G based on its shortest path lengths.
195
    """
196
    # Convert adjacency matrix to SciPy format if needed.
197 3
    if not sps.issparse(AG) and not isinstance(AG, np.ndarray):
198 0
        AG = np.asarray(AG)
199

200
    # Compile distance matrix of the graph based on its shortest path
201
    # lengths.
202 3
    DG = shortest_path(AG, directed=False, unweighted=True)
203
    # Ensure compactness of metric space, represented by distance
204
    # matrix.
205 3
    if np.any(np.isinf(DG)):
206 0
        warnings.warn("disconnected graph is approximated by its largest connected component")
207
        # Extract largest connected component of the graph.
208 0
        _, components_by_vertex = connected_components(AG, directed=False)
209 0
        components, component_sizes = np.unique(components_by_vertex, return_counts=True)
210 0
        largest_component = components[np.argmax(component_sizes)]
211 0
        DG = DG[components_by_vertex == largest_component]
212

213
    # Cast distance matrix to optimal integer type.
214 3
    DG = cast_distance_matrix_to_optimal_int_type(DG)
215

216 3
    return DG
217

218

219 3
def cast_distance_matrix_to_optimal_int_type(DX):
220
    """
221
    Given a metric space X induced by simple unweighted graph,
222
    cast its distance matrix to the smallest signed integer type,
223
    sufficient to hold all its entries.
224

225
    Parameters
226
    -----------
227
    DX: np.array (|X|×|X|)
228
        Distance matrix of X.
229

230
    Returns
231
    --------
232
    DX: np.array (|X|×|X|)
233
        Distance matrix of X, cast to optimal type.
234
    """
235 3
    max_distance = np.max(DX)
236
    # Type is signed integer to allow subtractions.
237 3
    optimal_int_type = determine_optimal_int_type(max_distance)
238 3
    DX = DX.astype(optimal_int_type)
239

240 3
    return DX
241

242

243 3
def determine_optimal_int_type(value):
244
    """
245
    Determine smallest signed integer type sufficient to hold a value.
246

247
    Parameters
248
    -----------
249
    value: non-negative integer
250

251
    Returns
252
    --------
253
    optimal_int_type: np.dtype
254
        Optimal signed integer type to hold the value.
255
    """
256 3
    feasible_int_types = (int_type for int_type in [np.int8, np.int16, np.int32, np.int64]
257
                          if value <= np.iinfo(int_type).max)
258 3
    try:
259 3
        optimal_int_type = next(feasible_int_types)
260 0
    except StopIteration:
261 0
        raise ValueError("value {} too large to be stored as unsigned integer")
262

263 3
    return optimal_int_type
264

265

266 3
def estimate(DX, DY, mapping_sample_size_order=DEFAULT_MAPPING_SAMPLE_SIZE_ORDER):
267
    """
268
    For X, Y metric spaces induced by simple unweighted graphs, find
269
    lower and upper bounds of mGH(X, Y).
270

271
    Parameters
272
    ----------
273
    DX: np.array (|X|×|X|)
274
        (Integer) distance matrix of X.
275
    DY: np.array (|Y|×|Y|)
276
        (Integer) distance matrix of Y.
277
    mapping_sample_size_order: np.array (2)
278
        Parameter that regulates the number of mappings to sample when
279
        tightening upper bound of mGH(X, Y).
280

281
    Returns
282
    --------
283
    lb: float
284
        Lower bound of mGH(X, Y).
285
    ub: float
286
        Upper bound of mGH(X, Y).
287
    """
288
    # Ensure distance matrices are of integer type.
289 3
    if not np.issubdtype(DX.dtype, np.integer) or not np.issubdtype(DY.dtype, np.integer):
290 0
        raise ValueError("non-integer metrics are not yet supported")
291
    # Cast distance matrices to signed integer type to allow
292
    # subtractions.
293 3
    if np.issubdtype(DX.dtype, np.uint):
294 0
        DX = cast_distance_matrix_to_optimal_int_type(DX)
295 3
    if np.issubdtype(DY.dtype, np.uint):
296 0
        DY = cast_distance_matrix_to_optimal_int_type(DY)
297
    # Estimate mGH(X, Y).
298 3
    double_lb = find_lb(DX, DY)
299 3
    double_ub = find_ub(
300
        DX, DY, mapping_sample_size_order=mapping_sample_size_order, double_lb=double_lb)
301

302 3
    return 0.5 * double_lb, 0.5 * double_ub
303

304

305 3
def find_lb(DX, DY):
306
    """
307
    For X, Y metric spaces induced by simple unweighted graphs, find
308
    lower bound of mGH(X, Y).
309

310
    Parameters
311
    ----------
312
    DX: np.array (|X|×|X|)
313
        (Integer) distance matrix of X.
314
    DY: np.array (|Y|×|Y|)
315
        (Integer) distance matrix of Y.
316

317
    Returns
318
    --------
319
    double_lb: float
320
        Lower bound of 2*mGH(X, Y).
321
    """
322 3
    diam_X = np.max(DX)
323 3
    diam_Y = np.max(DY)
324 3
    max_diam = max(diam_X, diam_Y)
325
    # Obtain trivial lower bound of 2*mGH(X, Y) from
326
    # 1) mGH(X, Y) ≥ 0.5*|diam X - diam Y|;
327
    # 2) if X and Y are not isometric, mGH(X, Y) ≥ 0.5.
328 3
    trivial_double_lb = max(abs(diam_X - diam_Y), int(len(DX) != len(DY)))
329
    # Initialize lower bound of 2*mGH(X, Y).
330 3
    double_lb = trivial_double_lb
331
    # Try tightening the lower bound.
332 3
    d = max_diam
333 3
    while d > double_lb:
334
        # Try proving 2*mGH(X, Y) ≥ d using d-bounded curvatures of
335
        # X and Y of size 3×3 or larger. 2×2 curvatures are already
336
        # accounted for in trivial lower bound.
337

338 3
        if d <= diam_X:
339 3
            K = find_largest_size_bounded_curvature(DX, diam_X, d)
340 3
            if len(K) > 2 and confirm_lb_using_bounded_curvature(d, K, DY, max_diam):
341 0
                double_lb = d
342 3
        if d > double_lb and d <= diam_Y:
343 3
            L = find_largest_size_bounded_curvature(DY, diam_Y, d)
344 3
            if len(L) > 2 and confirm_lb_using_bounded_curvature(d, L, DX, max_diam):
345 0
                double_lb = d
346

347 3
        d -= 1
348

349 3
    return double_lb
350

351

352 3
def find_largest_size_bounded_curvature(DX, diam_X, d):
353
    """
354
    Find a largest-size d-bounded curvature of metric space X induced
355
    by simple unweighted graph.
356

357
    Parameters
358
    ----------
359
    DX: np.array (|X|×|X|)
360
        (Integer) distance matrix of X.
361
    diam_X: int
362
        Largest distance in X.
363

364
    Returns
365
    --------
366
    K: np.array (n×n)
367
        d-bounded curvature of X of largest size; n ≤ |X|.
368
    """
369
    # Initialize curvature K with the distance matrix of X.
370 3
    K = DX
371 3
    while np.any(K[np.triu_indices_from(K, 1)] < d):
372
        # Pick a row (and column) with highest number of off-diagonal
373
        # distances < d, then with smallest sum of off-diagonal
374
        # distances ≥ d.
375 3
        K_rows_sortkeys = -np.sum(K < d, axis=0) * (len(K) * diam_X) + \
376
                      np.sum(np.ma.masked_less(K, d), axis=0).data
377 3
        row_to_remove = np.argmin(K_rows_sortkeys)
378
        # Remove the row and column from K.
379 3
        K = np.delete(K, row_to_remove, axis=0)
380 3
        K = np.delete(K, row_to_remove, axis=1)
381

382 3
    return K
383

384

385 3
def confirm_lb_using_bounded_curvature(d, K, DY, max_diam):
386
    """
387
    For X, Y metric spaces induced by simple unweighted graph, try to
388
    confirm 2*mGH(X, Y) ≥ d using K, a d-bounded curvature of X.
389

390
    Parameters
391
    ----------
392
    d: int
393
        Lower bound candidate for 2*mGH(X, Y).
394
    K: np.array (n×n)
395
        d-bounded curvature of X; n ≥ 3.
396
    DY: np.array (|Y|×|Y|)
397
        Integer distance matrix of Y.
398
    max_diam: int
399
        Largest distance in X and Y.
400

401
    Returns
402
    --------
403
    lb_is_confirmed: bool
404
        Whether confirmed that 2*mGH(X, Y) ≥ d.
405
    """
406
    # If K exceeds DY in size, the Hausdorff distance between the n-th
407
    # curvature sets of X and Y is ≥ d, entailing 2*mGH(X, Y) ≥ d (from
408
    # Theorem A).
409 3
    lb_is_confirmed = len(K) > len(DY) or \
410
                      confirm_lb_using_bounded_curvature_row(d, K, DY, max_diam)
411

412 3
    return lb_is_confirmed
413

414

415 3
def confirm_lb_using_bounded_curvature_row(d, K, DY, max_diam):
416
    """
417
    For X, Y metric spaces induced by simple unweighted graph, and K,
418
    a d-bounded curvature of X, try to confirm 2*mGH(X, Y) ≥ d using
419
    some row of K.
420

421
    Parameters
422
    ----------
423
    d: int
424
        Lower bound candidate for 2*mGH(X, Y).
425
    K: np.array (n×n)
426
        d-bounded curvature of X; n ≥ 3.
427
    DY: np.array (|Y|×|Y|)
428
        Integer distance matrix of Y; n ≤ |Y|.
429
    max_diam: int
430
        Largest distance in X and Y.
431

432
    Returns
433
    --------
434
    lb_is_confirmed: bool
435
        Whether confirmed that 2*mGH(X, Y) ≥ d.
436
    """
437 3
    lb_is_confirmed = False
438
    # Represent row of K as distance distributions, and retain those
439
    # that are maximal by the entry-wise partial order.
440 3
    K_max_rows_distance_distributions = find_unique_max_distributions(
441
        represent_distance_matrix_rows_as_distributions(K, max_diam))
442
    # Represent rows of DY as distance distributions.
443 3
    DY_rows_distance_distributions = represent_distance_matrix_rows_as_distributions(DY, max_diam)
444
    # For each i ∈ 1,...,n, check if ||row_i(K) - row_i(L)||_∞ ≥ d
445
    # ∀L ∈ PSPS^n(DY), which entails that the Hausdorff distance
446
    # between the n-th curvature sets of X and Y is ≥ d, and therefore
447
    # 2*mGH(X, Y) ≥ d (from Theorem B).
448 3
    i = 0
449 3
    while not lb_is_confirmed and i < len(K_max_rows_distance_distributions):
450 3
        lb_is_confirmed = True
451
        # For fixed i, check if ||row_i(K) - row_i(L)||_∞ ≥ d
452
        # ∀L ∈ PSPS^n_{i←j}(DY)  ∀j ∈ 1,...,|Y|, which is equivalent to
453
        # ||row_i(K) - row_i(L)||_∞ ≥ d  ∀L ∈ PSPS^n(DY).
454 3
        j = 0
455 3
        while lb_is_confirmed and j < len(DY_rows_distance_distributions):
456
            # For fixed i and j, checking ||row_i(K) - row_i(L)||_∞ ≥ d
457
            # ∀L ∈ PSPS^n_{i←j}(DY) is equivalent to solving a linear
458
            # (bottleneck) assignment feasibility problem between the
459
            # entries of row_i(K) and row_j(DY).
460 3
            lb_is_confirmed = not check_assignment_feasibility(
461
                K_max_rows_distance_distributions[i], DY_rows_distance_distributions[j], d)
462 3
            j += 1
463

464 3
        i += 1
465

466 3
    return lb_is_confirmed
467

468 3
def represent_distance_matrix_rows_as_distributions(DX, max_d):
469
    """
470
    Given a metric space X induced by simple unweighted graph,
471
    represent each row of its distance matrix as the frequency
472
    distribution of its entries. Entry 0 in each row is omitted.
473

474
    Parameters
475
    ----------
476
    DX: np.array (n×n)
477
        (Integer) distance matrix of X.
478
    max_d: int
479
        Upper bound of the entries in DX.
480

481
    Returns
482
    --------
483
    DX_rows_distributons: np.array (n×max_d)
484
        Each row holds frequencies of each distance from 1 to
485
        max_d in the corresponding row of DX. Namely, the (i, j)-th
486
        entry holds the frequency of distance (max_d - j) in row_i(DX).
487
    """
488
    # Add imaginary part to distinguish identical distances from
489
    # different rows of D.
490 3
    unique_distances, distance_frequencies = np.unique(
491
        DX + 1j * np.arange(len(DX))[:, None], return_counts=True)
492
    # Type is signed integer to allow subtractions.
493 3
    optimal_int_type = determine_optimal_int_type(len(DX))
494 3
    DX_rows_distributons = np.zeros((len(DX), max_d + 1), dtype=optimal_int_type)
495
    # Construct index pairs for distance frequencies, so that the
496
    # frequencies of larger distances appear on the left.
497 3
    distance_frequencies_index_pairs = \
498
        (np.imag(unique_distances).astype(optimal_int_type),
499
         max_d - np.real(unique_distances).astype(max_d.dtype))
500
    # Fill frequency distributions of the rows of DX.
501 3
    DX_rows_distributons[distance_frequencies_index_pairs] = distance_frequencies
502
    # Remove (unit) frequency of distance 0 from each row.
503 3
    DX_rows_distributons = DX_rows_distributons[:, :-1]
504

505 3
    return DX_rows_distributons
506

507

508 3
def find_unique_max_distributions(distributions):
509
    """
510
    Given frequency distributions of entries in M positive integer
511
    vectors of size p, find unique maximal vectors under the following
512
    (entry- wise) partial order: for v, u vectors, v < u if and only if
513
    there exists a bijection f: {1,...,p} → {1,...,p} such that
514
    v_k < u_{f(k)}  ∀k ∈ {1,...,p}.
515

516
    Parameters
517
    ----------
518
    distributions: np.array (M×max_d)
519
        Frequency distributions of entries in the m vectors; the
520
        entries are bounded from above by max_d.
521

522
    Returns
523
    --------
524
    unique_max_distributions: np.array (m×max_d)
525
        Unique frequency distributions of the maximal vectors; m ≤ M.
526
    """
527 3
    pairwise_distribution_differences = \
528
        np.cumsum(distributions - distributions[:, None, :], axis=2)
529 3
    pairwise_distribution_less_thans = np.logical_and(
530
        np.all(pairwise_distribution_differences >= 0, axis=2),
531
        np.any(pairwise_distribution_differences > 0, axis=2))
532 3
    distributions_are_max = ~np.any(pairwise_distribution_less_thans, axis=1)
533 3
    try:
534 3
        unique_max_distributions = np.unique(distributions[distributions_are_max], axis=0)
535 0
    except AttributeError:
536
        # `np.unique` is not implemented in NumPy 1.12 (Python 3.4).
537 3
        unique_max_distributions = np.vstack(
538
            {tuple(distribution) for distribution in distributions[distributions_are_max]})
539

540 3
    return unique_max_distributions
541

542

543 3
def check_assignment_feasibility(v_distribution, u_distribution, d):
544
    """
545
    For positie integer vectors v of size p and u of size q ≥ p, check
546
    if there exists injective f: {1,...,p} → {1,...,q}, such that
547
    |v_k - u_{f(k)}| < d  ∀k ∈ {1,...,p}
548

549
    Parameters
550
    ----------
551
    v_distribution: np.array (max_d)
552
        Frequency distribution of entries in v; the entries are bounded
553
        from above by max_d.
554
    u_distribution: np.array (max_d)
555
        Frequency distribution of entries in u; the entries are bounded
556
        from above by max_d.
557
    d: int
558
        d > 0.
559

560
    Returns
561
    --------
562
    is_assignment_feasible: bool
563
        Whether such injective f: {1,...,p} → {1,...,q} exists.
564
    """
565 3
    def next_i_and_j(min_i, min_j):
566
        # Find reversed v distribution index of smallest v entries yet
567
        # to be assigned. Then find index in reversed u distribution of
568
        # smallest u entries to which the b entries can be assigned to.
569 3
        try:
570 3
            i = next(i for i in range(min_i, len(reversed_v_distribution))
571
                     if reversed_v_distribution[i] > 0)
572 3
        except StopIteration:
573
            # All v entries are assigned.
574 3
            i = None
575 3
            j = min_j
576
        else:
577 3
            j = next_j(i, max(i - (d - 1), min_j))
578

579 3
        return i, j
580

581 3
    def next_j(i, min_j):
582
        # Find reversed u distribution index of smallest u entries to
583
        # which v entries, corresponding to a given reversed v
584
        # distribution index, can be assigned to.
585 3
        try:
586 3
            j = next(j for j in range(min_j, min(i + (d - 1),
587
                                                 len(reversed_u_distribution) - 1) + 1)
588
                     if reversed_u_distribution[j] > 0)
589 0
        except StopIteration:
590
            # No u entries left to assign the particular v entries to.
591 0
            j = None
592

593 3
        return j
594

595
    # Copy to allow modifications and stay pure; reverse to be
596
    # compatible with distributions of different size.
597 3
    reversed_v_distribution = list(v_distribution[::-1])
598 3
    reversed_u_distribution = list(u_distribution[::-1])
599
    # Injectively assign v entries to u entries if their difference
600
    # is < d, going from smallest to largest entries in both v and u,
601
    # until all v entries are assigned or such assignment proves
602
    # infeasible.
603 3
    i, j = next_i_and_j(0, 0)
604 3
    while i is not None and j is not None:
605 3
        if reversed_v_distribution[i] <= reversed_u_distribution[j]:
606 3
            reversed_u_distribution[j] -= reversed_v_distribution[i]
607 3
            reversed_v_distribution[i] = 0
608 3
            i, j = next_i_and_j(i, j)
609
        else:
610 0
            reversed_v_distribution[i] -= reversed_u_distribution[j]
611 0
            reversed_u_distribution[j] = 0
612 0
            j = next_j(i, j)
613

614
    # The assignment is feasible if and only if for some injective f,
615
    # |v_k - u_{f(k)}| < d  ∀k ∈ {1,...,p}.
616 3
    is_assignment_feasible = j is not None
617

618 3
    return is_assignment_feasible
619

620 3
def find_ub(DX, DY, mapping_sample_size_order=DEFAULT_MAPPING_SAMPLE_SIZE_ORDER, double_lb=0):
621
    """
622
    For X, Y metric spaces, find an upper bound of mGH(X, Y).
623

624
    Parameters
625
    ----------
626
    DX: np.array (|X|×|X|)
627
        Distance matrix of X.
628
    DY: np.array (|Y|×|Y|)
629
        Distance matrix of Y.
630
    mapping_sample_size_order: np.array (2)
631
        Parameter that regulates the number of mappings to sample when
632
        tightening the upper bound.
633
    double_lb: float
634
        Lower bound of 2*mGH(X, Y).
635

636
    Returns
637
    --------
638
    double_ub: float
639
        Upper bound of 2*mGH(X, Y).
640
    """
641
    # Find upper bound of smallest distortion of a mapping in X → Y.
642 3
    ub_of_X_to_Y_min_distortion = find_ub_of_min_distortion(
643
        DX, DY, mapping_sample_size_order=mapping_sample_size_order, goal_distortion=double_lb)
644
    # Find upper bound of smallest distortion of a mapping in Y → X.
645 3
    ub_of_Y_to_X_min_distortion = find_ub_of_min_distortion(
646
        DY, DX, mapping_sample_size_order=mapping_sample_size_order,
647
        goal_distortion=ub_of_X_to_Y_min_distortion)
648

649 3
    return max(ub_of_X_to_Y_min_distortion, ub_of_Y_to_X_min_distortion)
650

651

652 3
def find_ub_of_min_distortion(DX, DY,
653
                              mapping_sample_size_order=DEFAULT_MAPPING_SAMPLE_SIZE_ORDER,
654
                              goal_distortion=0):
655
    """
656
    For X, Y metric spaces, find an upper bound of smallest distortion
657
    of a mapping in X → Y by heuristically constructing some mappings
658
    and choosing the smallest distortion in the sample.
659

660
    Parameters
661
    ----------
662
    DX: np.array (|X|×|X|)
663
        Distance matrix of X.
664
    DY: np.array (|Y|×|Y|)
665
        Distance matrix of Y.
666
    mapping_sample_size_order: np.array (2)
667
        Exponents of |X| and log (|X|+1) in their product that defines
668
        how many mappings from X → Y to sample.
669
    goal_distortion: float
670
        No need to look for distortion smaller than this.
671

672
    Returns
673
    --------
674
    ub_of_min_distortion: float
675
        Upper bound of smallest distortion of a mapping in X → Y.
676
    """
677
    # Compute the numper of mappings to sample.
678 3
    n_mappings_to_sample = int(np.ceil(np.prod(
679
        np.array([len(DX), np.log(len(DX) + 1)]) ** mapping_sample_size_order)))
680
    # Construct each mapping in X → Y in |X| steps by choosing the image
681
    # of π(i)-th point in X at i-th step, where π is randomly sampled
682
    # |X|-permutation. Image of each point is chosen to minimize the
683
    # intermediate distortion at each step.
684 3
    permutations_generator = (np.random.permutation(len(DX)) for _ in range(n_mappings_to_sample))
685 3
    ub_of_min_distortion = np.inf
686 3
    goal_distortion_is_matched = False
687 3
    all_sampled_permutations_are_tried = False
688 3
    pi = next(permutations_generator)
689 3
    while not goal_distortion_is_matched and not all_sampled_permutations_are_tried:
690 3
        mapped_xs_images, distortion = construct_mapping(DX, DY, pi)
691 3
        ub_of_min_distortion = min(distortion, ub_of_min_distortion)
692 3
        if ub_of_min_distortion <= goal_distortion:
693 3
            goal_distortion_is_matched = True
694

695 3
        try:
696 3
            pi = next(permutations_generator)
697 3
        except StopIteration:
698 3
            all_sampled_permutations_are_tried = True
699

700 3
    return ub_of_min_distortion
701

702

703 3
def construct_mapping(DX, DY, pi):
704
    """
705
    # For X, Y metric spaces and |X|-permutation π, construct a mapping
706
    from X → Y in |X| steps by choosing the image of π(i)-th point in X
707
    at i-th step. The image of each point is chosen to minimize the
708
    intermediate distortion at the corresponding step.
709

710
    DX: np.array (|X|×|X|)
711
        Distance matrix of X.
712
    DY: np.array (|Y|×|Y|)
713
        Distance matrix of Y.
714
    pi: np.array (|X|)
715
        |X|-permutation specifying the order in which the points in X
716
        are mapped.
717
    Returns
718
    --------
719
    mapped_xs_images: list
720
        image of the constructed mapping.
721
    distortion: float
722
        distortion of the constructed mapping.
723
    """
724
    # Map π(1)-th point in X to a random point in Y, due to the
725
    # lack of better criterion.
726 3
    mapped_xs = [pi[0]]
727 3
    mapped_xs_images = [np.random.choice(len(DY))]
728 3
    distortion = 0
729
    # Map π(i)-th point in X for each i = 2,...,|X|.
730 3
    for x in pi[1:]:
731
        # Choose point in Y that minimizes the distortion after
732
        # mapping π(i)-th point in X to it.
733 3
        bottlenecks_from_mapping_x = np.max(
734
            np.abs(DX[x, mapped_xs] - DY[:, mapped_xs_images]), axis=1)
735 3
        y = np.argmin(bottlenecks_from_mapping_x)
736
        # Map π(i)-th point in X to the chosen point in Y.
737 3
        mapped_xs.append(x)
738 3
        mapped_xs_images.append(y)
739 3
        distortion = max(bottlenecks_from_mapping_x[y], distortion)
740
        
741 3
    return mapped_xs_images, distortion

Read our documentation on viewing source code .

Loading