Un Weighted SC

Approximating the unweighted k-set cover problem: greedy
meets local search

Asaf Levin
August 21, 2008

Abstract
In the unweighted set-cover problem we are given a set of elements E = {e
1
, e
2
, . . . , e
n
}
and a collection F of subsets of E. The problem is to compute a sub-collection SOL F
such that

S
j
SOL
S
j
= E and its size |SOL| is minimized. When |S| k for all S F
we obtain the unweighted k-set cover problem. It is well known that the greedy algorithm
is an H
k
-approximation algorithm for the unweighted k-set cover, where H
k
=

k
i=1
1
i
is the k-th harmonic number, and that this bound on the approximation ratio of the
greedy algorithm, is tight for all constant values of k. Since the set cover problem is a
fundamental problem, there is an ongoing research eort to improve this approximation
ratio using modications of the greedy algorithm. The previous best improvement of
the greedy algorithm is an
_
H
k

1
2
_
-approximation algorithm. In this paper we present
a new
_
H
k

196
390
_
-approximation algorithm for k 4 that improves the previous best
approximation ratio for all values of k 4. Our algorithm is based on combining local
search during various stages of the greedy algorithm.
1 Introduction
In the weighted set-cover problem we are given a set of elements E = {e
1
, e
2
, . . . , e
n
}
and a collection F of subsets of E, where
SF
S = E and each S F has a positive cost
c
S
. The goal is to compute a sub-collection SOL F such that

SSOL
S = E and its cost
SSOL
c
S
is minimized. Such a sub-collection of subsets is called a cover. When we consider
instances of the weighted set-cover such that each S
j
has at most k elements (|S| k
for all S F), we obtain the weighted k-set cover problem. The unweighted set
cover problem and the unweighted k-set cover problem are the special cases of the
weighted set cover and of weighted k-set cover, respectively, where c
S
= 1 S F.
It is well known (see [3]) that a greedy algorithm is an H
k
-approximation algorithm for the
weighted k-set cover, where H
k
=

k
i=1
1
i
is the k-th harmonic number, and that this bound is
tight even for the unweighted k-set cover problem (see, [13, 17]). For unbounded values of k,
Slavk [21] showed that the approximation ratio of the greedy algorithm for the unweighted set
cover problem is lnnlnln n+(1). Feige [6] proved that unless NP DTIME(n
polylog n
)
the unweighted set cover problem cannot be approximated within a factor (1) lnn, for any
> 0. Raz and Safra [20] proved that if P = NP then for some constant c, the unweighted
Department of Statistics, The Hebrew University, Jerusalem, Israel. email levinas@mscc.huji.ac.il

1
set cover problem cannot be approximated within a factor c log n. This result shows that the
greedy algorithm is an asymptotically best possible approximation algorithm for the weighted
and unweighted set cover problem (unless NP DTIME(n
polylog n
)). The unweighted k-set
cover problem is known to be NP-complete [14] and MAX SNP-hard for all k 3 [4, 15, 18].
Another algorithm for the weighted set cover problem by Hochbaum [11] has an approximation
ratio that depends on the maximum number of subsets that contain any given element (the
local-ratio algorithm of Bar-Yehuda and Even [2] has the same performance guarantee). See
Paschos [19] for a survey on these results.
In spite of the above bad news Goldschmidt, Hochbaum and Yu [8] modied the greedy
algorithm for the unweighted k-set cover and showed that the resulting algorithm has a
performance guarantee of H
k

1
6
. Halldorsson [9] presented an algorithm based on local
search that has an approximation ratio of H
k

1
3
for the unweighted k-set cover, and a
(1.4 +)-approximation algorithm for the unweighted 3-set cover. Duh and F urer [5] further
improved this result and presented an (H
k

1
2
)-approximation algorithm for the unweighted
k-set cover. We will base our algorithm on the algorithm of Duh and F urer [5], and therefore
we will review their algorithm and results in Section 2.2. All of these improvements [8, 9, 5] are
based on running the greedy algorithm until each new subset covers at most t new elements
(where t = 2 in [8] and larger values of t in [9, 5]) and then switch to another algorithm.
Regarding approximation algorithms for the weighted k-set cover problem within a factor
better than H
k
, a rst improvement step was given by Fujito and Okumura [7] who pre-
sented an
_
H
k

1
12
_
-approximation algorithm for the k-set cover problem where the cost of
each subset is either 1 or 2. More recently, Hassin and Levin [10] provided an
_
H
k

k1
8k
9
_
-
approximation algorithm for the general weighted k-set cover problem.
The maximum set packing problem is the following related problem: We are given a
set of elements E = {e
1
, e
2
, . . . , e
n
} and a collection F of subsets of E, where
SF
S = E,
and the goal is to compute a maximum size set packing, i.e., a sub-collection F
F of disjoint
subsets. The relation between the maximum set packing problem and the unweighted set cover
problem is that the fractional version of the maximum set packing problem is the dual linear
program of the fractional version of the unweighted set cover problem. Hurkens and Schrijver
[12] proved that a local-search algorithm for the maximum set packing problem where each
subset in F has at most k elements, is a
_
2
k

_
-approximation algorithm. Therefore, this
local-search algorithm has a better performance guarantee than the greedy selection rule that
returns any maximal sub-collection. The greedy selection rule has an approximation ratio of
1
k
.
Paper overview: In Section 2 we review the greedy algorithm for the unweighted min-
imum k-set cover problem, and its analysis, the semi-local optimization algorithm of [5],
and then we present our improved algorithm. We analyze its performance in Section 3, i.e.,
we show in Theorem 1 that our improved algorithm is an
_
H
k

196
390
_
-approximation algo-
rithm for the unweighted k-set cover problem where k 4, improving the earlier
_
H
k

1
2
_
-
approximation algorithm of [5]. We conclude in Section 4 by discussing open questions.
2
2 Algorithms for the unweighted k-set cover problem
In Subsection 2.1 we review the greedy algorithm for the unweighted minimum k-set cover
problem, and its analysis. In Subsection 2.2 we review the semi-local optimization algorithm
of [5]. In Subsection 2.3 we present our improved algorithm.
Given an input to the unweighted k-set cover problem we let the extended input be dened
over the same set of elements where the collection of subsets of the extended input is obtained
from the input by including every subset of a subset in the input (i.e., the extended input
is the closure of the input under taking subsets). We note that the extended input can be
represented compactly by representing the maximal (under inclusion) subsets. A solution to
the extended input is easily transformed into a solution for the original input by adding a
superset which is included in the input, of each subset in the solution. This mapping can be
maintained while creating the solution. For simplifying the presentation of the algorithms we
assume that they are solving the extended input. We also assume that the optimal solution
is with respect to the extended input.
We start our study by stating a simplication lemma on the structure of the optimal
solution.
Lemma 1 Without loss of generality, we may assume that the optimal solution to the (ex-
tended input of ) a set cover instance satises that each element is covered by exactly one
subset of the optimum.
Proof: Let an optimal solution to the problem consist of a collection of sets S
j
, j J
, with
jJ
S
j
= E. We now construct another optimal solution formed of element-disjoint sets S
j
where S
j
S
j
for all j J
. To do that, we assign each element e E to the smallest index

set S
j
, j J
that contains e, and for all values of j we let S
j
be the set of elements assigned
to S
j
. In the extended input the sets S
j
for all j belong to the collection F, and the claim
follows.
We dene a j-set to be a set with j elements. We x an optimal solution OPT, and we
say that a k-set is an optimal k-set if it is contained in OPT.
Given a partial cover C and an algorithm , let cost
(C) be the number of sets used by

Algorithm applied on the elements left uncovered by C and let cost
,1
(C) be the number of
1-sets among those.
2.1 The greedy algorithm
In this subsection we review the greedy algorithm for the unweighted k-set cover problem and
the proof of its performance guarantee.
The greedy algorithm starts with an empty collection of subsets in the solution and no
element being covered. Then, it iterates the following procedure until all elements are covered:
Let w
S
be the number of currently uncovered elements in a set S F, and the current
ratio of S is r
S
=
1
w
S
. Let S
be a set such that r

S
is minimized. The algorithm adds S
to
the collection of subsets of the solution, denes the elements of S
as covered, and assigns a

price of r
S
to all the elements that are now covered but were uncovered prior to this iteration
(i.e., the elements that were rst covered by S
).
3
Johnson [13], Lovasz [17] and Chvatal [3] showed that the greedy algorithm is an H
k
-
approximation algorithm for the unweighted k-set cover.
Chvatals proof is the following: rst note that the cost of the greedy solution equals the
sum of prices assigned to the elements. Second, consider a set S that belongs to an optimal
solution OPT. Then, OPT pays 1 for S. Consider the elements of S in the order in which
they are covered by the greedy algorithm breaking ties arbitrarily. When the algorithm covers
the i-th element of S, the algorithm could, instead, choose S as a feasible set with a current
ratio of
1
|S|i+1
. Therefore, the price assigned to the this element is at most
1
|S|i+1
. It follows
that the total price assigned to the elements of S is at most

|S|
i=1
1
|S|i+1
=

|S|
i
=1
1
i
H
k
,
and therefore, the approximation ratio of the greedy algorithm is at most H
k
.
2.2 The semi-local optimization algorithm
Duh and F urer [5] suggested the following procedure to approximate the unweighted 3-set
cover problem. In a pure local improvement step, we replace a number of sets with fewer
sets to form a new cover with a reduced cost. To dene a semi-local step, they observed (see
also [8]) that once the 3-sets are selected the remaining instance can be solved optimally in
polynomial time by reduction to maximum matching. Hence, to solve the unweighted 2-set
cover instance results after selecting the 3-sets, they invoke the following algorithm A.
Algorithm A for solving optimally unweighted 2-set cover instance
1. Find a maximum matching in the following graph: there is a vertex for each element,
and an edge between two vertices if there is a 2-set consisting of this pair of elements.
2. Return the set of 2-sets corresponding to the edges of the maximum matching and
the 1-sets of the uncovered elements (by the collection of 2-sets which we found).
Thus a local change in the 3-sets allows any global changes in the 2-sets and 1-sets and
such a change is called a semi-local change. They allowed the algorithm to remove one 3-set
and insert at most a pair of 3-sets if one of the following happens: either the total cost is
reduced, or the total cost remains the same and the number of 1-sets in the resulting solution
is reduced (thus the total cost is the primary objective whereas the number of 1-sets is a
secondary objective). This results in the approximation algorithm (Algorithm B below) for
the unweighted k-set cover of [5] which is useful mainly for k = 3.
4
Algorithm B for approximating unweighted k-set cover instance
1. Greedily build a maximal collection C of disjoint sets where each set in the collection
contains at least three elements.
2. While there are sets C C and C
1
, C
2
/ C such that C
= (C \ {C}) {C
1
, C
2
} is
a collection of disjoint sets where each set in the collection contains at least three
elements, and such that the following condition hold:
cost
A
(C
)+|C
| < cost
A
(C)+|C| or (cost
A
(C
)+|C
| = cost
A
(C)+|C| and cost
A,1
(C
) <
cost
A,1
(C)).
Replace C by C
.
3. Apply Algorithm A on the remaining uncovered elements.
They showed that Algorithm B is a
4
3
-approximation algorithm for the unweighted 3-set
cover problem. More precisely, the following proposition was proved in [5].
Proposition 2 Assume that an optimal solution for the unweighted 3-set cover instance has
b
1
1-sets, b
2
2-sets and b
3
3-sets. Then the solution that Algorithm B returns, costs at most
b
1
+b
2
+
4
3
b
3
(i.e., cost
B
() b
1
+b
2
+
4
3
b
3
). Moreover, the number of 1-sets in the solution
that the algorithm returns, is at most b
1
(i.e., cost
B,1
() b
1
).
In order to extend their result to a better algorithm for larger values of k, they suggested
the following Algorithm C:
Algorithm C for approximating unweighted k-set cover instance
1. Greedy Phase For j = k down to 6 do:
greedily choose a maximal collection of disjoint j-sets (each covering exactly j new
elements).
2. Restricted Phase For j = 5 down to 4 do:
choose a maximal collection of disjoint j-sets (each covering exactly j new elements)
with the restriction that the choice of these j-sets does not increase the number of
1-sets. That is, we add a j-set to the current collection of disjoint j-sets C and create
a new collection of disjoint j-sets C
only if cost
B,1
(C) cost
B,1
(C
).
3. Semi-local Optimization Phase Run the semi-local optimization algorithm (i.e.,
Algorithm B) on the remaining instance of the uncovered elements.
Duh and F urer proved that this algorithm is an
_
H
k

1
2
_
-approximation, and they also
showed that this bound is tight for the semi-local optimization algorithm.
2.3 The improved algorithm
In this section we present our modication of the semi-local optimization algorithm where
we use a local-search algorithm during the phase where each new set covers exactly four
previously uncovered elements.
5
Algorithm D for approximating unweighted k-set cover instance - the improved
algorithm
1. Greedy Phase For j = k down to 6 do:
greedily choose a maximal collection of disjoint j-sets (each covering exactly j new
elements).
2. Restricted Phase Choose a maximal collection of disjoint 5-sets (each covering
exactly ve new elements) with the restriction that the choice of these 5-sets does
not increase the number of 1-sets. That is, we add a 5-set to the current collection of
disjoint 5-sets C and create a new collection of disjoint 5-sets C
only if cost
B,1
(C)
cost
B,1
(C
).
3. Restricted Local-Search Phase
(a) Choose a maximal collection of disjoint 4-sets (each covering exactly four new
elements) with the restriction that the choice of these 4-sets does not increase the
number of 1-sets. That is, we add a 4-set to the current collection of disjoint
4-sets C and create a new collection of disjoint 4-sets C
only if cost
B,1
(C)
cost
B,1
(C
).
(b) While there are 4-sets C C and C
1
, C
2
/ C such that C
= (C \{C}) {C
1
, C
2
}
is a collection of disjoint 4-sets and such that cost
B,1
(C
) cost
B,1
(C), replace
C by C
.
4. Semi-Local Optimization Phase Run the semi-local optimization algorithm (i.e.,
Algorithm B) on the remaining instance of the uncovered elements.
In Phase 3 we are using local-search whose neighborhood is dened by removing one 4-set
and inserting at least a pair of 4-sets as long as the number of 1-sets in the returned solution
does not increase. The use of a local-search procedure is motivated by the approximation
algorithm of [12] for the maximum set packing problem. That is, throughout the restricted
phase (of Algorithm C) we try to maximize the number of sets in the collection of disjoint
subsets that we add for a xed value of the index j. Since local search has proved to be superior
heuristic for this task (with respect to its approximation ratio for this set packing problem),
we suggest to replace the greedy construction for j = 4 in Algorithm C by the local search
approach. This improved phase is the corner stone on which our improved approximation
ratio is based.
To establish the time complexity of Algorithm D, we rst note that Algorithm A is polyno-
mial as it applies a maximum (cardinality) matching algorithm with time complexity O(n
3
).
Hence Algorithm B is also a polynomial time algorithm, as each iteration can be executed by
trying all O(m
3
) triplets of sets and trying to increase the collection C with these sets. Such
a test (for a given triplet of sets) is done in O(n
3
) by application of Algorithm A. Since the
number of iterations of this loop of nding an increased collection of sets is bounded by n/3
the total time complexity of Algorithm B is O(m
3
n
4
), that is, polynomial in the input length.
Now consider Algorithm D. The time complexity of the greedy phase is O(mn) per value of
6
j and there are at most k 5 < n such values, and hence the greedy phase takes O(mn
2
).
Regarding the restricted phase, there are O(m) sets to be considered, and each of them is
tested by application of Algorithm B, and hence this phase takes O(m
4
n
4
). The restricted
local search phase is also polynomial as the number of 4-sets is O(m), and each time we try
to increase the number of 4-sets in C we try O(m
3
) triplet of 4-sets, and such a check is
carried by running Algorithm B. Since the number of such iterations is bounded by n/4 we
get time complexity of O(m
6
n
5
) for this phase. The remaining of the algorithm is a single
execution of Algorithm B. Hence the total time complexity of Algorithm D is O(m
6
n
5
), that
is polynomial, and it returns a feasible solution. Therefore, we establish the following lemma:
Lemma 3 For every value of k, Algorithm D returns a feasible solution in polynomial time.
In the next section we analyze the performance guarantee of Algorithm D.
3 The analysis of Algorithm D
In this section we analyze the performance guarantee of Algorithm D. We say that an element
is an i-covered element if Algorithm D covers it by an i-set. We consider an optimal solution
OPT, and bound the performance guarantee of D. Recall that we assume that OPT is a
partition of the element set E. We now further characterize the structure of OPT.
Lemma 4 If k 5 then without loss of generality we can assume that each set of OPT is a
k-set. If k = 4, then without loss of generality we can assume that each set of OPT is either
a 3-set or a 4-set.
Proof: Assume that the claim does not hold on an instance I. We create a new instance I
such that the optimal solution OPT
for I
costs k times the cost of OPT, and the solution

returned by D on I
costs more than k times the solution returned by Algorithm D on I,

and we will conclude that if there is a bad example for the algorithm, then there is a bad
example for the algorithm that shows the same approximation ratio such that the property
of the lemma holds.
To construct I
for k 5, we rst take k disjoint copies of the instance I. Then, we add

new elements to the copies of the sets of OPT, so that each set in this sub-collection is a k-set.
Note that the number of the new elements is divisible by k. Last, we add new disjoint k-sets
covering these new elements. This is the new instance I
(see Figure 1 for an illustration).

Clearly, the optimal solution OPT
for I
is a union of k copies of OPT where we add

the new elements to their corresponding set to make it a k-set. Hence, OPT
costs exactly k
times the cost of OPT.
Now consider the execution of Algorithm D on the input I
. We can assume that the

algorithm picks the new k-sets of the new elements in its rst steps, and then continue like it
acts on I on each of the k copies of I. Therefore, the cost of the solution returned by D on
I
is strictly larger than k times the cost of the solution returned by D on I.

Thus the ratio
D(I
)
OPT
is larger than the ratio

D(I)
OPT
where D(I
) and D(I) denote the cost of

the solution returned by Algorithm D on instance I
and I, respectively. So the approximation

7
Figure 1: A demonstration of a the instance I
for k = 5 in the proof of Lemma 4. Each circle

represents a new element, and each dashed oval represent a new k-set which is included only
in I
and not in the copies of I.

ratio of Algorithm D can be computed by looking only at instances of the form of I
, which
satisfy the assumption of the lemma.
Now assume that k = 4. We apply a similar construction to the case of k 5, with one
dierence. That is, we no longer add new elements to the copies of the 3-sets of OPT, and
we make sure that each 4-set of new elements that we add has at most one new element from
each set of OPT
. Once again, the cost of OPT
is exactly k times the cost of OPT. Now,

the set of 4-sets of the new elements together with the copies of the original 4-sets returned
by the Restricted Local-Search Phase on each copy of I, gives a feasible collection of 4-sets
that cannot be extended. To see this last claim note that by deleting one 4-set of the new
elements, none of the 4-sets which intersect it becomes disjoint to all other selected 4-sets.
Hence, we can apply the same argument as in the case of k 5. Therefore, the cost of the
solution returned by D on I
is strictly larger than k times the cost of the solution returned

by D on I.
Thus the ratio
D(I
)
OPT
is larger than the ratio

D(I)
OPT
where D(I
) and D(I) denote the cost of

the solution returned by Algorithm D on instance I
and I, respectively. So the approximation

ratio of Algorithm D can be computed by looking only at instances of the form of I
, which
satisfy the assumption of the lemma.
3.1 Sibling sets
We consider special 2-sets and 3-sets that are named sibling sets dened as follows (see [5] for
introduction of this term): a sibling set is a 2-set or a 3-set S chosen by Algorithm D during
the semi-local optimization phase, which intersects exactly two k-sets O
1
, O
2
of OPT such
that |S O
1
| = 1 and S O
1
is the last element which is covered by Algorithm D. If this
condition holds for both O
1
and O
2
, this sibling set is called a special sibling set.
A sibling set is the result of the fact that the Semi-Local Optimization phase does not
create a new singleton, and therefore, if an optimal k-set has k 1 covered elements at the
end of Restricted Local Search Phase out of which at least one is either a 5-covered element or
8
a 4-covered element, then the last element belongs to at least a 2-set (and is not a singleton).
The element of a sibling set S which is the last uncovered element of an optimal k-set,
that is the element of S O
1
, is called a primary element and the other elements of S are
called secondary elements. An element of a sibling set is called a sibling element. An element
which is covered during Phase 4 and is not a sibling element, is called a non-sibling element.
Lemma 5 If a k-set S of OPT has a primary element, then all its elements which are covered
during the Semi-Local Optimization Phase are sibling elements.
Proof: Assume that e is a primary element in S which belong to a sibling set S
, and there
is a non-sibling element in S which is covered during the Semi-Local Optimization Phase.
We note that Algorithm A could match e with its mates in S which are not sibling elements,
without creating new singletons. Hence the secondary elements of S
could be used during

the Restricted Phase or the Local-Search Phase. Hence S
is not a sibling set.

3.2 Good and bad sets
We next partition the k-sets of OPT into bad sets and good sets. Let S be an optimal k-set.
If k 5 we say that S is a bad set if one of the following holds: Either at the end of Greedy
Phase S has exactly ve uncovered elements from which exactly one element is 5-covered,
exactly one is 4-covered, and none of the three remaining elements are sibling elements, or at
the end of the Greedy Phase S has exactly ve uncovered elements from which none of the
elements of S are 5-covered, exactly one 4-covered element and exactly one element of S is a
sibling element. We refer to Figure 2 for an illustration of this denition of a bad set.
Figure 2: A demonstration of bad sets in the case k = 5
If k = 4, then S is a bad set if exactly one of its elements is a 4-covered element and the
other three elements are non-sibling elements. An optimal k-set that is not bad is a good set.
We next show that the proportion of good sets in OPT is not negligible. Denote by n
b
the number of bad sets in OPT and by n
g
the number of good sets in OPT.
9
Lemma 6 n
b
12n
g
.
Proof: Consider a bad set S in OPT. At the beginning of Phase 3, S has four uncovered
elements such that none of these belong to a sibling set. Since S is a bad set, there is exactly
one 4-covered element in S. Let S
be the set intersecting S which is chosen by the algorithm

in Phase 3. If S
intersects only bad sets of OPT, then during Phase 3 we could replace S
by the bad sets it intersects and such a change is feasible because each such bad set has four
elements that consist of a 4-set that we could add to the solution after the removal of S
without increasing the number of singletons. Hence, there is a good set S
OPT such that

S
= .
A good set S OPT can intersect at most four sets that we choose during Phase 3.
These four sets can intersects at most 12 other sets of OPT. These 12 sets might be bad sets.
Therefore, the claim follows.
3.3 The pricing mechanism
Consider an element e, the price assigned to e which we denote by price(e), is dened as
follows.
If e is an i-covered element where i 4, then price(e) =
1
i
.
If e is a member of a special sibling set, then price(e) =
1
2
.
If e is a primary element of a sibling set, then price(e) =
4
5
, and if e is a secondary
element, then price(e) =
1
5
.
If e is a non-sibling element which is covered during Phase 4, we assign its prices
according to the value of n(e) which denotes the number of non-sibling elements in the
k-set of OPT which covers e:
If n(e) = 3, then price(e) =
4
9
.
If n(e) = 2, then price(e) =
1
2
.
If n(e) = 1 and at the end of the Greedy Phase there are at least two uncovered
elements in the optimal k-set which covers e, then price(e) =
1
2
.
Otherwise, that is if price(e) is not already set by the previous cases, then price(e) =
1.
Note that if n(e) = 1 and at the end of the Greedy Phase there are at least two uncovered
elements in the optimal k-set which covers e, then other uncovered element at the end of the
Greedy Phase is not a primary element of a non-special sibling set.
Lemma 7 The cost of the solution returned by Algorithm D is at most the total price of all
the elements.
10
Proof: We clearly assigned a total of a unit price for each selected set in phases 1, 2 and 3,
and for sibling sets that the algorithm selects.
As for the other sets, we denote by b
3
(OPT) the number of k-sets of OPT, with exactly
three non-sibling elements, and we denote by b
2
(OPT) the number of k-sets of OPT, with
exactly two non-sibling elements. By Proposition 2 the number of the non-sibling sets that
the algorithm selects during Phase 4 is at most
4
3
b
3
(OPT) +b
2
(OPT), that is the total price
of the non-sibling elements.
3.4 Bounding the total price assigned to the elements of an optimal k-set
For a set of items S, we denote by price(S) the total price assigned to the elements of S.
Lemma 8 Assume that k 4. Let S be an optimal bad k-set. Then, price(S)
b
= H
k
1
2
.
Proof: If k 5, then the j-th covered element from S during the Greedy Phase is assigned
a price of at most
1
kj+1
, the 5-covered element is assigned a price of
1
5
(if it exists), the
sibling element is assigned a price of
1
5
(if it exists), the 4-covered element is assigned a price
of
1
4
, and each of the remaining three elements is assigned a price of
4
9
. Hence, price(S)
k
i=6
1
i
+
1
5
+
1
4
+ 3
4
9
= H
k

1
2
=
b
. If k = 4, then S has a single 4-covered element that
pays a price of
1
4
and each of the three remaining elements is assigned a price of
4
9
. So again
price(S) = H
k

1
2
=
b
.
Before bounding the total price assigned to an optimal good k-set, we bound the total price
of the items covered during the Semi-Local Optimization Phase of an optimal k-set. These
bounds will be used later in the upper bound proof of the total price assigned to an optimal
good k-set. We denote by N
g
the number of the elements of S that remain uncovered at the
end of Greedy Phase. Note that N
g
5. We denote by N
r
(N
l
) the number of the elements
of S that are covered during the Restricted Phase (the Restricted Local-Search Phase). We
denote by N
s
the number of sibling elements of S that are covered during the Semi-Local
Optimization Phase. We denote by N
n
the number of non-sibling elements of S that are
covered during the Semi-Local Optimization Phase. Then, N
s
+N
n
= N
g
(N
r
+N
l
) is the
number of elements of S which are covered during the Semi-Local Optimization Phase. Let
S
be the subset of S consisting of the elements covered during the Semi-Local Optimization
Phase. The following lemma bound price(S
) as a function of N
s
+ N
n
= |S
|.
Lemma 9 1. If N
s
+ N
n
= 5, then price(S
)
26
15
.
2. If N
s
+ N
n
= 4, then price(S
)
23
15
.
3. If N
s
+ N
n
= 3, then price(S
)
4
3
.
4. If N
s
+ N
n
= 2, then price(S
) 1.
5. If N
s
+ N
n
= 1 and N
g
2, then price(S
)
4
5
.
6. If N
s
+ N
n
= 1 and N
g
= 1, then price(S
) 1.
11
Proof: Assume that N
s
+ N
n
= 5. Then, since S
and each of its 4-subsets are candidates

to be added to the collection of disjoint 4-sets during the Restricted Local-Search Phase and
we choose not to add them, we conclude that at least two elements of S
are sibling elements,

i.e., N
s
2. If one of the elements of S
is a primary element, then all other elements in

S
are secondary elements and price(S
) =
4
5
+
1
5
+
1
5
+
1
5
+
1
5
<
26
15
. Otherwise, all sibling
elements of S
are secondary elements, and each of these pays

1
5
.
If N
s
= 2 then each of the non-sibling element of S
pays
4
9
, and hence price(S
) =
3
4
9
+ 2
1
5
=
26
15
.
If N
s
pays
1
2
, and hence price(S
) =
2
1
2
+ 3
1
5
<
26
15
.
If N
s
= 4, then the unique non-sibling element of S
pays
1
2
and hence price(S
) =
1
1
2
+ 4
1
5
<
26
15
.
If N
s
= 5, then price(S
) = 5
1
5
<
26
15
.
This completes the proof of part 1.
Next assume that N
s
+N
n
= 4. Then, since S
is a candidate to be added to the collection

of disjoint 4-sets during the Restricted Local-Search Phase and we choose not to add it, we
conclude that at least one element of S
is a sibling element, i.e., N

s
1. If one of the
elements of S
is a primary element, then all other elements in S
are secondary elements and

price(S
) =
4
5
+ 3
1
5
<
23
15
. Otherwise, all sibling elements of S
are secondary elements, and

each of these pays
1
5
.
If N
s
pays
4
9
, and hence price(S
) =
3
4
9
+ 1
1
5
=
23
15
.
If N
s
pays
1
2
, and hence price(S
) =
2
1
2
+ 2
1
5
<
23
15
.
If N
s
pays
1
2
and hence price(S
) =
1
1
2
+ 3
1
5
<
23
15
.
If N
s
= 4, then price(S
) = 4
1
5
<
23
15
.
Next assume that N
s
+ N
n
= 3. If one of the elements of S
is a primary element, then

all other elements in S
are secondary elements and price(S
) =
4
5
+2
1
5
<
4
3
. Otherwise, all
sibling elements of S

1
5
.
If N
s
= 0 then each element of S
pays
4
9
, and hence price(S
) = 3
4
9
=
4
3
.
If N
s
pays
1
2
, and hence price(S
) =
2
1
2
+
1
5
<
4
3
.
12
If N
s
pays
1
2
and hence price(S
) =
1
1
2
+ 2
1
5
<
4
3
.
If N
s
= 3, then price(S
) = 3
1
5
<
4
3
.
Next assume that N
s
+ N
n
= 2. If one of the elements of S
is a primary element, then

the other element in S
is a secondary element and price(S
) =
4
5
+
1
5
= 1. Otherwise, all
sibling elements of S

1
5
.
If N
s
= 0 then each element of S
pays
1
2
, and hence price(S
) = 2
1
2
= 1.
If N
s
pays
1
2
and hence price(S
) =
1
1
2
+ 1
1
5
< 1.
If N
s
= 2, then price(S
) = 2
1
5
< 1.
Finally, we assume that N
s
+ N
n
= 1. If N
g
2 then we did not assign this element a
unit price and hence we assign it at most
4
5
which is the maximum price of element excluding
one. If N
g
= 1 the claim is trivial as every element is assigned at most a unit of price. This
completes the proof of parts 5 and 6.
Lemma 10 If N
g
= 5 and N
r
1 , then S has an element that pays exactly
1
5
.
Proof: By the maximality of the sets that we choose during the Restricted Phase, we conclude
that if N
r
= 0, then S has a secondary element. In both cases, S has an element that pays
1
5
.
Lemma 11 Assume that k 4. Let S be an optimal good k-set. Then, price(S)
g
=
H
k

16
30
.
Proof: Our proof is based on a detailed case analysis. These cases are according to the values
of k (either four or at least ve), N
g
, N
r
, N
l
and N
s
+N
n
.
First assume that k = 4. Then, the Greedy phase and the Restricted phase do not select
sets, and therefore N
g
= 4 and N
r
= 0.
Assume that N
l
= 4. Then, each element of S is covered during the Restricted Local-
Search Phase, and pays a price of
1
4
. Therefore, price(S) = 1 < H
4

16
30
=
g
.
Assume that N
l
= 3. Then, each element of S which is covered during Restricted Local-
Search Phase pays a price of
1
4
, and by Lemma 9, the remaining element pays a price
of at most
4
5
. Therefore, price(S)
3
4
+
4
5
=
93
60
=
125
60

32
60
= H
4

16
30
=
g
.
Assume that N
l
= 2. Then, each element of S which is covered during Restricted Local-
Search Phase pays a price of
1
4
. By Lemma 9, the two remaining elements pay a total
price of at most 1. Thus, price(S)
3
2
<
g
.
13
Assume that N
l
= 1. Then, the element of S which is covered during Restricted Local-
Search Phase, pays a price of
1
4
. Since S is a good set, it contains at least one element
that belongs to a sibling set that pays
1
5
(since N
l
= 1, it is not the primary element).
The other two elements of S have total price of at most max{2
1
2
, 1
1
2
+1
1
5
,
4
5
+
1
5
, 2
1
5
} = 1
(the arguments of the maximum are according to the number of sibling elements).
Therefore, price(S)
1
4
+
1
5
+ 1 =
87
60
<
93
60
=
g
.
Assume that N
l
= 0. By Lemma 9, price(S)
23
15
<
g
It remains to consider the case where k 5. First note that by the greedy selection rule
during the greedy phase, we conclude that N
g
5. Moreover, the j-th covered element from
S during the greedy phase (for 1 j k 5) is assigned a price of at most
1
kj+1
. So the
rst k 5 elements which are covered by the algorithm pay a total price of at most H
k
H
5
.
Assume that N
g
2. Then, the k 4-th, the k 3-rd, and the k 2-nd covered
elements from S are covered during the Greedy Phase, and therefore assigned a price
of at most
1
6
for each. The last two elements of S are assigned a total price of at most
max{2
1
4
,
4
5
+
1
4
, 1} =
21
20
(the arguments of the maximum are according to the value
of N
l
). Therefore, price(S) H
k
H
5
+
3
6
+
21
20
= H
k
H
5
+
31
20
= H
k

137
60
+
93
60
=
H
k

44
60
<
g
.
Assume that N
g
= 3. Then, the k 4-th and the k 3-rd covered elements from S are
covered during the Greedy Phase, and therefore assigned a price of at most
1
6
for each.
If N
r
+N
l
= 0, then the last three elements of S are covered during the Semi-Local
Optimization Phase, and by Lemma 9, pay a total price of at most
4
3
. Therefore,
price(S) H
k
H
5
+
2
6
+
4
3
= H
k

137
60
+
5
3
= H
k

37
60
<
g
. Note that in the
remaining cases (of N
r
+ N
l
) it suces to show that the last three elements of S
pay a total price of at most
4
3
.
If N
r
+N
l
= 1, then the last two elements of S are covered during the Semi-Local
Optimization Phase, and by Lemma 9, pay a total price of at most 1. The k 2-
nd element of S is covered during either the Restricted Phase or the Restricted
Local-Search Phase, and so it pays a price of at most
1
4
. Therefore, the last three
elements of S pay a total price of at most
5
4
<
4
3
and again price(S) <
g
.
If N
r
+ N
l
= 2, then by Lemma 9, the last uncovered element pays at most
4
5
.
The k 2-nd and the k 1-st covered elements from S are covered during either
the Restricted Phase or the Restricted Local-Search Phase, and therefore each of
these is assigned a price of at most
1
4
. Again the last three elements of S pay at
most
4
5
+
2
4
<
4
3
, and therefore price(S) <
g
.
If N
r
+ N
l
= 3, then each of the last three elements of S pays a price of at most
1
4
, and in total they pay less than
4
3
. Therefore, price(S) <
g
.
Assume that N
g
= 4. Then, the k 4-th covered element from S is covered during the
Greedy Phase, and therefore pays a price of at most
1
6
, and the set of elements from S
14
that are covered during the Greedy Phase, pays a total price of at most H
k
H
5
+
1
6
.
By Lemma 9, if N
l
= N
r
= 0, then price(S) H
k
H
5
+
1
6
+
23
15
= H
k
137
60
+
102
60
<
g
.
Otherwise, there is at least one element which is covered during the Restricted Phase
or the Restricted Local-Search Phase and hence it pays at most
1
4
. The other three
elements pay a total price of at most max{
4
3
, 1 +
1
4
,
4
5
+
2
4
,
3
4
} =
4
3
(the arguments of the
maximum are according to the value of N
l
k
H
5
+
1
6
+
1
4
+
4
3
=
H
k

137
60
+
105
60
= H
k

32
60
=
g
.
Assume that N
g
= 5. Then, the set of elements from S that are covered during the
Greedy Phase, pays a total price of at most H
k
H
5
. Each of the elements of S that is
covered during Phase 3, pays a price of
1
4
.
Assume that N
r
= N
l
= 0. By Lemma 9, price(S) H
k
H
5
+
26
15
= H
k

137
60
+
104
60
= H
k

33
60
<
g
.
Assume that N
r
= 1 and N
l
= 0. The element of S that is covered during the
Restricted Phase 2, pays a price of
1
5
. By Lemma 9, price(S) H
k
H
5
+
1
5
+
23
15
=
H
k

33
60
<
g
.
Assume that N
r
2. The elements of S that are covered during the Restricted
Phase pay a price of
1
5
each. The last three elements pay a total price of at most
max{
4
3
,
1
4
+ 1, 2
1
4
+
4
5
, 3
1
4
} =
4
3
(the arguments of the maximum are according
to the value of N
s
+N
n
k
H
5
+
2
5
+
4
3
= H
k

33
60
<
g
.
Assume that N
r
1 and N
l
= 1. Since S is a good set, we conclude that either
N
r
= 1 and S has an element that belongs to a sibling set, or S has at least two
elements that belong to sibling sets. The element of S that is covered during the
Restricted Phase (if it exists) pays a price of
1
5
, the element of S that is covered
during Restricted Local-Search Phase pays a price of
1
4
, and each secondary element
of S pays
1
5
. The two last remaining elements have a total price of at most 1.
Therefore, price(S) H
k
H
5
+
1
5
+
1
4
+
1
5
+ 1 = H
k

137
60
+
99
60
= H
k

38
60
<
g
.
Assume that N
r
1 and N
l
= 2. By Lemma 10, S has an element that pays
1
5
.
The two remaining elements which are covered during the Semi-Local Optimization
Phase pay a total price of at most 1. Therefore, price(S) H
k
H
5
+
1
5
+
2
4
+1 =
H
k

137
60
+
102
60
= H
k

35
60
<
g
.
Assume that N
r
1 and N
l
1
5
. By
Lemma 9, the element which is covered during the Semi-Local Optimization Phase
pays at most
4
5
. Therefore, price(S) H
k
H
5
+
1
5
+
3
4
+
4
5
= H
k

137
60
+
105
60
=
H
k

32
60
=
g
.
Assume that N
r
1 and N
l
1
5
.
Therefore, price(S) H
k
H
5
+
1
5
+
4
4
= H
k

137
60
+
72
60
= H
k

65
60
<
g
.
15
3.5 Proving the approximation ratio of Algorithm D
Theorem 1 Algorithm D is a
_
H
k

196
390
_
-approximation algorithm for the unweighted k-set
cover problem.
Proof: By Lemma 3, the algorithm returns a feasible solution in polynomial time. It remains
to establish its approximation ratio.
D n
g

g
+n
b

b
= n
g

_
H
k

16
30
_
+ n
b

_
H
k

1
2
_
(n
g
+ n
b
)
_
1
13

_
H
k

16
30
_
+
12
13

_
H
k

1
2
__
= OPT
_
1
13

_
H
k

16
30
_
+
12
13

_
H
k

1
2
__
= OPT
_
H
k

196
390
_
,
where the rst inequality follows by Lemma 7, the rst equation follows by Lemma 8 and
Lemma 11, the second inequality follows by Lemma 6, the second equation follows because
the cost of OPT is exactly n
b
+ n
g
, and the last equation follows by simple algebra.
4 Concluding remarks
In this paper we addressed the fundamental problem of unweighted k-set cover problem, and
introduced an improvement over the previously best known algorithm for all values of k such
that k 4. Although we obtain a small improvement over the algorithm of Duh and F urer
[5], we think that our analysis is not tight and the approximation ratio of our algorithm can
be improved. Improving the analysis of our Algorithm D is left for future research.
In this paper we showed that incorporating a local-search procedure in various stages of the
greedy algorithm instead of only where each set has at most three uncovered elements, provides
a better approximation ratio. We conjecture that incorporating local-search procedures in
each greedy phase decreases the approximation ratio further. Such an algorithm replaces the
Greedy phase by the following phase:
Improved phase: For j = k, k 1, k 2, . . . , 6 do: apply local-search to choose an approxi-
mated maximum size collection of j-sets (each covering exactly j new elements).
It is easily noted that using the Improved phase instead of the Greedy phase in Algorithm D
does not harm the approximation ratio of the resulting algorithm. We leave the analysis of
this improved algorithm for future research. Following an extended abstract version of this
paper [16], Athanassopoulos et al. [1] showed that this improved step indeed improves the
approximation ratio of the resulting algorithm.
16
References
[1] S. Athanassopoulos, I. Caragiannis and C. Kaklamanis, Analysis of approximation
algorithms for k-set cover using factor-revealing linear programs, Proc. FCT 2007,
52-63.
[2] R. Bar-Yehuda and S. Even, A linear time approximation algorithm for the
weighted vertex cover problem, Journal of Algorithms, 2, 198-203, 1981.
[3] V. Chvatal, A greedy heuristic for the set-covering problem, Mathematics of Op-
erations Research, 4, 233-235, 1979.
[4] P. Crescenzi and V. Kann, A compendium of NP optimization problems,
http://www.nada.kth.se/theory/problemlist.html, 1995.
[5] R. Duh and M. F urer, Approximation of k-set cover by semi local optimization,
Proc. STOC 1997, 256-264, 1997.
[6] U. Feige, A threshold of lnn for approximating set cover, Journal of the ACM,
45, 634-652, 1998.
[7] T. Fujito and T. Okumura, A modied greedy algorithm for the set cover problem
with weights 1 and 2, Proc. ISAAC 2001, 670-681, 2001.
[8] O. Goldschmidt, D. S. Hochbaum and G. Yu, A modied greedy heuristic for the set
covering problem with improved worst case bound, Information Processing Letters,
48, 305-310, 1993.
[9] M. M. Halldorsson, Approximating k set cover and complementary graph coloring,
Proc. IPCO 1996, 118-131, 1996.
[10] R. Hassin and A. Levin, A better-than-greedy approximation algorithm for the
minimum set cover problem, SIAM J. Computing, 35, 189-200, 2006.
[11] D. S. Hochbaum, Approximation algorithms for the weighted set covering and node
cover problems, SIAM Journal on Computing, 11, 555-556, 1982.
[12] C. A. J. Hurkens and A. Schrijver, On the size of systems of sets every t of which
have an SDR, with an application to the worst-case ratio of heuristics for packing
problems, SIAM Journal on Discrete Mathematics, 2, 68-72, 1989.
[13] D. S. Johnson, Approximation algorithms for combinatorial problems, Journal of
Computer and System Sciences, 9, 256-278, 1974.
[14] R. M. Karp, Reducibility among combinatorial problems, Complexity of computer
computations (R.E. Miller and J.W. Thatcher, eds.), Plenum Press, New-York, 1972,
85-103.
17
[15] S. Khanna, R. Motwani, M. Sudan and U. V. Vazirani, On syntactic versus com-
putational views of approximability, SIAM Journal on Computing, 28, 164-191,
1998.
[16] A. Levin, Approximating the unweighted k-set cover problem: Greedy meets local
search, Proc. WAOA 2006, 290-301.
[17] L. Lovasz, On the ratio of optimal integral and fractional covers, Discrete Math-
ematics, 13, 383-390, 1975.
[18] C. H. Papadimitriou and M. Yannakakis, Optimization, approximation and com-
plexity classes, Journal of Computer System Sciences, 43, 425-440, 1991.
[19] V. T. Paschos, A survey of approximately optimal solutions to some covering and
packing problems, ACM Computing Surveys, 29, 171-209, 1997.
[20] R. Raz and S. Safra, A sub-constant error-probability low-degree test, and sub-
constant error-probability PCP characterization of NP, Proc. STOC 1997, 475-484,
1997.
[21] P. Slavk, A tight analysis of the greedy algorithm for set cover, Journal of Algo-
rithms, 25, 237-254, 1997.
18

Un Weighted SC

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Un Weighted SC

Enviado por

Direitos autorais:

Formatos disponíveis

Approximating the unweighted k-set cover problem: greedy

meets local search

August 21, 2008

Department of Statistics, The Hebrew University, Jerusalem, Israel. email levinas@mscc.huji.ac.il

. To do that, we assign each element e E to the smallest index

that contains e, and for all values of j we let S

(C) be the number of sets used by

be a set such that r

as covered, and assigns a

such that the optimal solution OPT

costs k times the cost of OPT, and the solution

costs more than k times the solution returned by Algorithm D on I,

for k 5, we rst take k disjoint copies of the instance I. Then, we add

(see Figure 1 for an illustration).

is a union of k copies of OPT where we add

. We can assume that the

is strictly larger than k times the cost of the solution returned by D on I.

is larger than the ratio

) and D(I) denote the cost of

and I, respectively. So the approximation

for k = 5 in the proof of Lemma 4. Each circle

and not in the copies of I.

. Once again, the cost of OPT

is exactly k times the cost of OPT. Now,

is strictly larger than k times the cost of the solution returned

is larger than the ratio

) and D(I) denote the cost of

and I, respectively. So the approximation

could be used during

is not a sibling set.

be the set intersecting S which is chosen by the algorithm

without increasing the number of singletons. Hence, there is a good set S

OPT such that

and each of its 4-subsets are candidates

are sibling elements,

is a primary element, then all other elements in

are secondary elements and price(S

are secondary elements, and each of these pays

is a candidate to be added to the collection

is a sibling element, i.e., N

is a primary element, then all other elements in S

are secondary elements and

are secondary elements, and

is a primary element, then

are secondary elements and price(S

are secondary elements, and each of these pays

is a primary element, then

is a secondary element and price(S

are secondary elements, and each of these pays

Você também pode gostar