1. Introduction
Many well-known optimization problems on graphs fall into the category of graph layout problems. A layout of a graph on n vertices consists of a bijection between the vertices of the graph and the set
of natural numbers, which can be interpreted as arranging the vertices of the graph in some order on a line. A graph layout problem then consists of optimizing some objective function over the set of possible layouts of a graph. There is an analogous notion of layout and layout problems for directed acyclic graphs, i.e. directed graphs with no directed cycles
where
is a directed edge from the vertex
to
with the indices taken modulo n. A layout of a directed acyclic graph is simply a topological sort of it, so that the layout respects edge directions. The particular layout problems we consider in this paper are Minimum Cut Linear Arrangement (also known as Cutwidth), Vertex Separation, Edge Bisection, and Vertex Bisection, along with the analogous problems on directed acyclic graphs. These problems are well known to find applications in VLSI design, job scheduling, parallel computing, graph drawing, etc. We direct the interested reader to a survey [1] on the topic.
Graph layout problems are often computationally difficult to solve exactly. The decision versions of both the undirected and directed vertex separation problems are known to be NP-complete [2]. The same is true for the undirected [3] and directed [4] minimum cut linear arrangement problems, and the vertex bisection [5] and edge bisection (shown in [6] as a special case of minimum cut into bounded sets) problems on undirected graphs. We do not know of a reference which proves the NP-hardness of the vertex and edge bisection problems on directed graphs, though we have no reason to believe that they are not. Due to the practical applications of the problems considered, many researchers have sought to find approximation algorithms for these problems. It is common to analyze the performance of algorithms on random instances as a proxy for their “real” performance, so that one might seek to analyze the approximability of layout problems on random graphs. Diaz et al. [7] showed that for any of the undirected layout problems considered above, if
, then for large enough random graphs with appropriate sparsity conditions, any solution of the problem has cost within a factor C of the optimal with high probability. Hence, these problems can be trivially approximated to within any factor of
for large enough random graphs with high probability.
In this paper, in addition to showing that the constant of approximation can be improved to any
with slightly weaker sparsity and convergence results, we show that the same result holds for the directed versions of the problems which were not considered in [7]. Moreover, we only use the Hoeffding inequality for tail bounds of sums of independent and identically distributed (i.i.d) random variables and some well-known asymptotic estimates to obtain these results, thus avoiding the more technical “mixing graph” framework used in [7]. In summary, for large enough random graphs with appropriate sparsity conditions, any solution of these layout problems will have cost arbitrarily close to optimal with high probability.
2. Definitions
We first recall some terminology in [7]. Given an undirected graph
with
, a linear arrangement (or a layout) of G is an bijective function
. The problems we consider all take the form of optimizing some objective function over the set of linear arrangements of a graph. For a linear arrangement
of G and each
, the two sets
and the two objective functions
are defined. We may interpret
as the number of edges lying over the i-th “gap” in the arrangement, i.e. edges whose left vertex is in at most the i-th position in the arrangement and whose right vertex is in at least the
-th position. Additionally, we may interpret
as the number of vertex to the left of the i-th “gap” which are connected by an edge to some vertex to the right of the gap. (In this paper, we sometimes casually refer to
as the set on the left and
as the set on the right.)
Let
denote the set of linear arrangements of G. The problems we consider for undirected graphs are given in Table 1. These problems are all known to be NP-hard.
We also consider analogous problems on directed graphs. Given a directed acyclic graph
with
, a linear arrangement (or layout) of G is an bijective function
such that if
is a directed edge from
to
then
. Note that this is simply a topological sort of G, which exists as G is directed acyclic. Again, we let
denote the set of linear arrangements of G. The definitions of
,
,
, and
remain unchanged for directed acyclic graphs. The following problems are directed versions of the problems listed in Table 1.
• Directed cutwidth (DCUTWIDTH): Given a directed acyclic graph
, compute
where
.
Table 1. Undirected layout problems.
• Directed vertex separation (DVERTSEP): Given a directed acyclic graph
, compute
where
.
• Directed edge bisection (DEDGEBIS): Given a directed acyclic graph
, compute
where
.
• Directed vertex bisection (DVERTBIS): Given a directed acyclic graph
, compute
where
.
For each arrangement problem considered above, we also define the maximum-cost solution of the problem on a graph. For example, for CUTWIDTH, in contrast to
, we define
, and similarly for every other problem considered above. Moreover, we define the gap of a problem on a given graph G to be the ratio of the maximum-cost solution to the minimum-cost solution. For example, for CUTWIDTH, the gap is
and this quantity is defined in the same way for every other arrangement problem considered above.
Any discussion on random graphs requires a probability distribution on graphs. In this paper, we adopt a variant of the Erdös-Renyi probability distribution [8] for undirected graphs defined as follows:
For a positive integer n and probability
, the Erdös-Renyi distribution
on the set of n-vertex graphs assigns an n-vertex graph
probability
, where
. That is, we sample n-vertex graphs by including each possible edge with probability p.
For a probability distribution on directed acyclic graphs, we use a variant of the Erdös-Renyi probability distribution [9] which produces directed acyclic graphs, defined as follows:
For a positive integer n and probability
, the distribution
on the set of n-vertex directed acyclic graphs first samples a random graph from
on the vertex set
and orients each edge
from i to j if
.
As the edges in the sampled directed graph always point from a lower numbered vertex to a higher numbered vertex, it is clear that the sampled graph is acyclic.
3. Preliminary Lemmas
We first list some technical lemmas necessary for carrying out the probabilistic analysis in our main theorems.
Lemma 1 (Hoeffding’s inequality). Suppose that
are independent identically distributed Bernoulli random variables with mean p, and let
, where
is a sample of
. Then for
,
Proof. This is a special case of Theorem 1 in Hoeffding’s original paper [10] for Bernoulli random variables.
Lemma 2. If
, then
Proof. Recall the well-known inequalities
which can be obtained via Stirling’s approximation. It follows that
If
, then
. By the above chain of inequalities, we have that
as desired.
Lemma 3. Suppose that
and
where
. If
, then
Proof. Taking logarithms, we find that
for appropriate constants
. But then by L’Hopital’s rule,
Since
, we have that
as
. It follows that
Hence,
as desired.
4. Main Results
4.1. Undirected Graph Problems
For the theorems that follow, let
be a sequence of random graphs such that for each
,
is given by
with edge probability
. The following theorems show that for each of the undirected graph arrangement problems GAPCW, GAPEP, GAPVS, and GAPVB that the ratio of the maximum value to the minimum value of the corresponding objective function over all arrangements of
is asymptotically close to 1 with high probability, subject to appropriate sparsity conditions.
Theorem 1. Let
satisfy
. Then for all
there exists an N such that for all
,
Theorem 2. Let
satisfy
. Then for all
there exists an N such that for all
,
Theorem 3. Let
satisfy
. Then for all
there exists an N such that for all
,
Theorem 4. Let
satisfy
. Then for all
there exists an N such that for all
,
To prove Theorem 1, we first establish the following lemmas:
Lemma 4. Let
satisfy
. Then for all
there exists an N such that for all
,
Lemma 5. Suppose that
. Then for all
there exists an N such that for all
,
We will make use of the following definition in our proofs for the above lemmas: For a graph
and
,
is defined to be the number of edges joining a vertex in A and a vertex in
. That is,
Proof of Lemma 4. For a linear arrangement
, define
Clearly,
. It follows that
. Suppose that
. Observe that
where
. Hence, for all
,
To prove the lemma, it suffices to show that
for n sufficiently large.
Let S be an arbitrary subset of V with
. As
is an Erdös-Renyi random graph with vertex set V and edge probability
,
is a binomial random variable with mean
.
Applying the first inequality in Lemma 1 with
where
, we obtain that
As
with
, we can choose
to get the desired convergence. Indeed, setting
where l satisfies
, we have
and
for some
. Hence,
Note that
. Hence, by the union bound,
for n sufficiently large. Thus,
for n sufficiently large.
Since
, for a sufficiently large n,
Hence, for n sufficiently large,
as desired.
Proof of 5. Let
. Observe that for all
,
and that
for all
such that
and
. To see the second inequality, note that
is the sum of
i.i.d. Bernoulli random variables with probability of success
and
is maximized at
. Hence, to prove the lemma, it suffices to show that
for n sufficiently large.
Let
denote the set
. Let
. Then, by the second inequality in Lemma 1 with
where
, we obtain that
where
as in the proof of Lemma 4.
As
with
, we again set
where l satisfies
, so that
and
for some
. Then,
Hence,
for a sufficiently large n.
As
, we have that
for a sufficiently large n. Thus, for n sufficiently large,
as desired.
With these two lemmas, the main theorem for the cut width gap can be readily established.
Proof of Theorem 1. As in the statement of the theorem, let
satisfy
for some
, and let
be given. Since
there exists
satisfying
. By Lemma 4, there exists
such that for
,
By Lemma 5, there exists
such that for
,
Hence, if
, then for
we have that
As
, it follows that
Thus,
as desired.
Since edge bisection is essentially a restricted version of the cutwidth problem, it is straightforward to carry over the proofs above to prove Theorem 2.
Proof of Theorem 2. Note that for a graph G,
is simply the minimization of
over subsets
with
, and
the corresponding maximization. Hence, the proof of Lemma 4 carries through to give that
for any given
when n is sufficiently large.
Similarly, the proof of Lemma 5 carries through to yield that
for any given
when n is sufficiently large. Combining these two results in the same manner as in the proof of Theorem 1 gives the desired result.
The following lemma essentially proves Theorem 3.
Lemma 6. Let
satisfy
. Then for all
there exists an N such that for all
,
We will make use of the following definition in our proof for the above lemma. For a graph
and
,
is defined to be the number of vertices in A which are connected to a vertex in
. That is,
Proof of Lemma 6. So as to obtain the desired convergence, for each positive integer n, set
where
. Consider a linear arrangement
. Note that for any k,
. In particular, if we define
then
. It follows then that
. Observe that
where
. Hence, for all
,
To prove the lemma, it suffices to show that for any
,
for n sufficiently large.
Let S be an arbitrary element of
. As there are
vertices in
, the probability of a given vertex
not being connected to any vertex in
is
. Hence,
is a binomial random variable on
trials with event probability
. By Lemma 1, we have that
As
,
The first and third lines of the above computation follow respectively from the union bound and the preceding inequality. The second line follows from the fact that if
are any two elements of
, then the distributions of the random variables
with
sampled from
are isomorphic via any permutation of the vertex set
which carries S to
. In particular
so that the second line follows from the fact that
.
By Lemma 2, we find that
since
, so that
Substituting in our definitions
, we find that since
,
This expression tends to negative infinity as
since
by the condition that
, implying that
tends to 0 from above. Thus, for n sufficiently large, we have that
and hence
Moreover, if we substitute in our definitions
,
,
, we find that
Since by definition
, where
, we have that
, and hence by Lemma 3, we have that
as
. Thus,
so that for any
,
for n sufficiently large. Hence, for any
, we find that for n sufficiently large,
as desired.
Proof of Theorem 3. Observe that
. Let
be given. Since
from above as
, set
so that for large n,
. By Lemma 6, there exists a natural number N such that for
,
Thus, with probability greater than
,
so that
as desired.
Even though our proof of Theorem 4 does not follow quite as readily from the proof of Theorem 3 as for the edge bisection/cut width case above, it does not require any new techniques.
Proof of Theorem 4. Let
be given. Since
from above as
, choose
so that
. As
trivially,
we focus on the lower bound for
. Set
such that
, where
. Since
, we have that
for all n sufficiently large. We will show that
which will prove the theorem.
For all
, we denote the set of neighbors of v by
; that is,
. If
is a linear arrangement such that
, then there must exist a subset
with
such that
where
is the complement of
in V. Indeed, if
then by definition there exist
vertices in the left half of the arrangement
which are not connected to any of the
vertices in the right half.
We estimate the probability that such a set S can exist. Note that it suffices to bound the probability that such an S with
can exist since for any such S with
, we have that there exists a subset
with
satisfying
. Let
of size
be fixed. Each vertex
lies in
with probability
, so that for each
,
Let
be the random variable over the set of random graphs on n vertices with edge probability
whose value is the cardinality of
. (Recall that S has been fixed.) Applying the above reasoning to each vertex
, we
see that
is a Bernoulli random variable on
trials with event probability
. Since
by construction, we have that
as
by Lemma 3. Letting
be as before, set
so that
, and
. Using Hoeffding’s inequality in Lemma 1 with
, we obtain that
By the union bound, the probability that there exists any such S among the subsets of V of size
is at most
Since
, by Lemma 2, we have the estimate
so that
. Thus, the union bound probability that there exists any such S is at most
as
, since the
term dominates the
term.
We conclude that
for all
with n sufficiently large. As
, since we chose
to satisfy
, we conclude that
for large enough n, as desired.
4.2. Directed Graph Problems
We now show that the analogous versions of the above problems for undirected graphs hold for directed graphs. Fortunately, no new techniques are required and the desired results for directed graphs follow immediately from our work for undirected graphs.
For the theorems that follow, let
be a sequence of random directed acyclic graphs such that for each
,
is independently sampled from a
Erdös-Renyi distribution with edge probability
. Similar to before, the following theorems show that for each of the directed arrangement problems GAPDCW, GAPDEB, GAPDVS, and GAPDVB that the ratio of the maximum value to the minimum value of the corresponding objective function over all arrangements of
is asymptotically close to 1 with high probability, subject to appropriate sparsity conditions.
Theorem 5. Let
satisfy
. Then for all
there exists an N such that for all
,
Theorem 6. Let
satisfy
. Then for all
there exists an N such that for all
,
Theorem 7. Let
satisfy
. Then for all
there exists an N such that for all
,
Theorem 8. Let
satisfy
. Then for all
there exists an N such that for all
,
We illustrate the proof for Theorem 5; the proofs of Theorems 6, 7, 8 will follow verbatim.
Proof of 5. For
a directed acyclic graph sampled from
, let
denote the quantities MAXCW, MINCW, GAPCW for the undirected graph corresponding to
; that is,
with every directed edge replaced with an undirected edge. Recall that
is the maximum of
over all permutations
of the vertex set V of
, whereas
is the maximum of
over all topological sorts
of
. Since the set of topological sorts is a subset of the set of permutations, we conclude that
Similarly, we deduce that
so that
Note that this holds for all directed acyclic graphs
drawn from
.
In order to derive a gap convergence statement for GAPDCW from the convergence result for GAPCW, we need to relate the distributions
and
. Let
be the map from the underlying set of
to the underlying set of
which takes an undirected graph
on the vertex set
to the corresponding directed graph with the oriented edge
if
and
is an edge of
. By the definition of
we have that
is a bijection between the underlying sets of
and
, with inverse just given by replacing each directed edge with the corresponding undirected edge. Moreover, by the definition of
we have that
preserves probabilities between
and
, so that
is an isomorphism between the probability distributions
and
.
Thus, let
be as in the statement of Theorem 5. By Theorem 1, there exists an N such that for all
,
where
is sampled from
. By the isomorphism between
and
, we have that the same statement holds if
is instead sampled from
and
is defined as previously by taking the underlying undirected graph. Since
for all
sampled from
, we conclude that
for all
, where
is sampled from
. This proves Theorem 5.
The same observation that the set of topological sorts of a directed graph is a subset of the set of permutations of the vertices implies that the quantities GAPDEB, GAPDVS, GAPDVB are bounded above by the corresponding undirected quantities as in the above proof. Thus, Theorems 6, 7, 8 follow precisely from the corresponding undirected versions, Theorems 2, 3, 4, as desired.
5. Concluding Remarks
In this paper, we have shown that many graph layout problems of interest can be approximated arbitrarily close to the optimal with high probability for large random graphs with appropriate sparsity conditions. We note that there is still room for improvement with our results. The previous factor of 2 approximations in [7] held for edge probabilities
, whereas our results for the layout problems on edges (that is, minimum cut linear arrangement, edge bisection, and directed versions) only hold for
for
. Thus, we pose the following:
Question: Can the factor of 1 approximation results for MINCW, MINEB, be proven for random graphs with
, or more generally
for
? If not, can one determine the barrier between
and
where the factor of 1 approximation no longer holds and must be replaced by a factor of 2?
Both outcomes to the above question would be interesting, but we would find it more surprising if there were such a “sparsity barrier” between factor of 1 and factor of 2 approximations. Moreover, the results in [7] do not experience the same tradeoff between sparsity and speed of convergence that our results do, a seeming consequence of the strength of their “mixing graph” framework. Here by “speed of convergence”, we refer to how quickly
shrinks in our statements of the form “
with probability
”, where
is a stand-in for GAPCW, GAPVS, ..., etc. In [9], they are able to take
independent of
, whereas we are only typically able to take
where
and
. Thus, we ask:
Question: Can the factor of 1 approximation results for MINCW, MINVS, MINEB, MINVB be proven with
with
as above, or at least for a
which does not depend on the asymptotics for
?
Our asymptotics for
as above do technically depend on
, but regardless of
our asymptotics for
are still competitive with the
bound in [7], in contrast to the situation for
. For a possible solution of the above questions, we note that some of the key results about mixing graphs used in [7] call upon the Hoeffding inequality, which was our primary probabilistic tool in this paper. Hence, it would be interesting to see if the techniques of this paper and those of mixing graphs could be unified somehow to give our improved constant of approximation but retain the better sparsity and convergence conditions of [7].
Acknowledgements
The authors would like to thank Emmanuel Ruiz and Ashkan Moatamed for conversations on research involving graph layout problems. Additionally, the research in this paper was made possible by the support of the Fields Institute through its 2017 Fields Undergraduate Summer Research Program.