Mathematical Models for a Social Partitioning Problem

In this paper we develop modeling techniques for a social partitioning problem. Different social interaction regulations are imposed during pandemics to prevent the spread of diseases. We suggest partitioning a set of company employees as an effective way to curb the spread, and use integer programming techniques to model it. The goal of the model is to maximize the number of direct interactions between employees who are essential for company’s work subject to the constraint that all employees should be partitioned into components of no more than a certain size implied by the regulations. Then we further develop the basic model to take into account different restrictions and provisions. We also give heuristics for solving the problem. Our computational results include sensitivity analysis on some of the models and analysis of the heuristic performance.


Introduction
Background and Problem Statement To fight the Covid-19 pandemic, measures restricting social gatherings were taken by many countries and states. Many restrictions set limits on the number of people allowed in gatherings. For example, Germany and UK at some point limited social gatherings of more than 2 people [1] while in many US states that number was 10 [2]. While that type of restrictions might help to prevent the spread of the disease they have the following shortcoming. A sick but asymptomatic person could attend several meetings of no more than 10 people (different people in each meeting). People from those meetings could get the virus and at-tend other meetings of no more than 10 people. It could create a chain reaction which will result in many people getting infected.
In this paper we suggest an alternative approach of limiting interactions to prevent the spread of the disease. It is explained on the example of regulating the interactions of a small or medium size company since our approach is most suitable for that type of organizations.
Consider a company that has N employees who are supposed to have direct interactions with each other in everyday operations of the company. Each direct interaction can potentially spread a disease from one person to others. To curb the spread of the disease we suggest partitioning the set of N employees into smaller groups of at most M people each. (Number M can be determined based on state or company regulations.) Employees can have direct interactions only with people of the same smaller group. In that case, a person who has the disease can potentially infect at most M − 1 people within the same group. Thus, the effect of chain reaction is limited.
The set of N employees can be partitioned into smaller groups of at most M people in many different ways. What is the best way of partitioning? While limiting the spread of the disease during a pandemic is a primary goal, another important goal is to keep as much economic activity as possible. Economic activity can be promoted by not interrupting essential employee interactions within the company. In the context of our problem, the goal is the following: find a partitioning into groups of at most M people that maximizes the number of uninterrupted employee interactions.
The corresponding discrete optimization problem is the following. We have a graph G with N nodes (employees). The goal is to remove as few arcs (employee interactions) as possible such that the graph is partitioned into connected components with at most M nodes each. In other words, the number of arcs that remain in the graph should be maximized.
Some of the employee interactions might be more important for the company than others. Thus, it makes sense to assign weights to the arcs, with higher weights assigned to more important arcs. Then the goal is to maximize the total weight of the arcs included in the solution.
Solution methods We give an integer linear programming model for solving the problem. The main feature of the model is a set of constraints providing that two nodes (employees) are in the same component. We give several techniques for making the model computationally more efficient. The techniques include reducing the number of variables and relaxing the integrality requirements of the largest set of variables. We also give a heuristic method for solving the problem. The heuristic output can also serve as an initial solution for the integer programming model to accelerate the solution process. American Journal of Computational Mathematics some specialists doing the same important task for the company might be required to be in different components; sometimes the maximum component size might be violated if it is very important for company's work but in that case a penalty is paid for the violation. We give variations of our main integer programming model to take into account each of the above-mentioned restrictions and provisions. For each of those cases we also give an extension to the heuristic.
Connection to other graph partitioning problems Graph partitioning problems were studied extensively before. [3] gives a survey for algorithms and applications of balanced graph partitioning problems. There are a variety of solution methods for solving the problems, ranging from simple heuristics to sophisticated combinatorial optimization methods.
Some well-known applications are parallel processing, road networks, image processing. [4] gives an approximation algorithm for a balanced partitioning problem which is the closest to our problem. In most of the related problems including [4], the number of components is fixed. In our problem, the number of components is not fixed which makes it harder to solve. The main characteristic of our problem is the maximum size of components. This characteristic comes from the nature of the main social partitioning application of our problem. It is also the main distinction from related problems that use different criteria for partitioning. Unlike many of the related problems, we consider the weighted version of the problem because assigning different weights to the arcs is more realistic in the context of the Social Partitioning problem. Our main solution method is integer programming while approximation algorithms or heuristics are used for many related problems. Integer programming is a suitable method for small companies (up to 50 people) when solving the Social Partitioning problem. We also give directions how to make the model more efficient so that it could be applied to bigger problems. Integer programming is also a suitable technique for adding different extra requirements to the original problem to make it more realistic in the context of the social partitioning ap-plication.

Computational results
We implemented our models on Optimization Modeling language AMPL (A Mathematical Programming Language) [5]. The models were tested for inputs of different sizes and nature. Sensitivity analysis was done for the models that allow penalty for exceeding the maximum component size. The heuristic was also implemented and tested on AMPL, and its performance was evaluated with respect to the optimal solution.
The structure of the paper The paper is structured as follows. Section 2 gives the main integer programming model, directions for improving its efficiency and its AMPL implementation. Section 3 discusses different variations and extensions of the model. Section 4 gives heuristics for solving the main problem and its variations. Section 5 discusses computational results. Section 6 gives several future directions.

The Main Model
We develop the basic model in Section 2.1. In Section 2.2, we discuss how the model can be refined to make it computationally more efficient. Section 2.3 gives the AMPL implementation of the model.

The Basic Model
Let Nodes be a set of N employees within a company. Let Arcs be a set of all possible direct interactions (links) between pairs of employees in Nodes. The regulations require that Nodes should be partitioned into components such that each component has at most M employees. No interactions are allowed between employees in different components. Each direct interaction has some value for the company. Thus, the company wants to maximize the number of possible interactions while complying with the regulations. Variables.
We have the following set of binary variables. In the unweighted version of the problem, the company wants to maximize the total number of links included in the solution. But in reality some links are more essential and important for the company than others. Thus, we define a parameter weight in the range [0, 1] for each link of the set Arcs. A higher weight implies that the link has higher importance for the company. For the rest of the paper we consider the weighted version of the problem. Note that the unweighted version is the special case when all the weights are set to 1.

Objective function
The objective function maximizes the total weight of the arcs included in the solution.

Constraints
We have the following constraints in the model.

C1)
The size of any component should be no more than M. To provide it we require that for any node i, the number of nodes that are in the same component with i is no more than M − 1. Symbolically, if y ij = 1 and y jk = 1 then y ik = 1. Equivalently, if y ij + y jk = 2 then y ik = 1.
The linear constraint that realizes the conditional constraint above is The constraint works the following way. If y ij + y jk = 2 then the right-hand side is 1, and y ik is required to be at least 1; since y ik is a binary variable it will be forced to be exactly 1. On the other hand, if y ij + y jk ≤ 1 then the right-hand side is ≤0; thus, it is not forcing anything on y ik .

A Computationally Refined Version of the Basic Model
In this section, we give several directions on how to refine the model to make it computationally more efficient. The main ideas are reducing the number of y ij variables and relaxing the integrality of y ij variables.
Reducing number of variables by defining symmetric variables only once In our model, the nodes are given in lexicographic order: Nodes = 1, …, N. For every pair of nodes i and j such that i < j , we define only one link (i, j) and corresponding variables x ij and y ij . Since the relationship between i and j is symmetric the opposite link is implicitly in the solution too. This approach significantly reduces the number of variables.
Reducing number of variables by not defining the variables that cannot be 1 Note that in the input of the problem, the nodes might already be partitioned into several connected components. Our model intends to divide those components into even smaller parts not exceeding size M. We will distinguish those components, by calling them input components versus output components. For example, consider the unweighted network in Figure  If nodes i and j are not in the same input component then i and j also cannot be in the same output component, and variable y ij cannot take value 1 in the optimal solution. Thus, there is no point of defining the corresponding variable y ij . For example, there is no need to define variable y 3,8 in the instance of Figure 2 since nodes 3 and 8 are in different input components. This especially makes sense for relatively sparse networks.
Another consideration for excluding variable y ij is the following. Suppose nodes i and j are in the same input component but the shortest distance between i and j is greater than M − 1. If i and j were also in the same output component then that component would include a path with M + 1 nodes connecting i and j (including i and j). But an output component cannot have more than M nodes. Thus, i and j cannot be in the same output component, and there is no point of defining the corresponding variable y ij . For example, there is no need to define variable y 1,6 in the instance of Figure 2 since the distance between nodes 1 and 6 is 3 which is greater than 2 = M − 1; having 1 and 6 in the same output component would imply that all the nodes on the connecting path 1-2-7-6 should be in that output component, thus violating M = 3.
To realize the reduction of variables discussed above, we find the input component of each node using breadth-first search (BFS). The information from BFS computations is used to compute the shortest distance between any two nodes from the same input component. Then variable y ij is defined only if 1) i and j are in the same input component; 2) the shortest distance between i and j is no more than M − 1. The AMPL implementation of these computations is given in Sec- Relaxing the integrality of y ij variables Theorem 1. Suppose the binary requirements of y ij variables are relaxed to 0 ≤ y ij ≤ 1. Then there will be an optimal solution where y ij will take only values 0 or 1.

V. Melkonian
Proof Let 1 2 , , , m C C C  be the output components formed by including arcs (i, j) with x ij = 1 in the solution. Suppose nodes u and v are in the same component. Then there is a path P between u and v such that for all the arcs (i, j) on P, x ij = 1. Constraint (C2) y ij ≥ x ij implies that for all those arcs on P, y ij is also 1. Now recall the Transitivity constraint (C3): 1 ik ij jk y y y + ≥ + . If y ij and y jk both are 1 then the constraint will force y ik also to be 1. Applying (C3) repeatedly for the arcs on path P, we will get that y uv = 1.
Summarizing, if nodes u and v are in the same output component then y uv = 1. The only remaining question is the following. Suppose nodes u and v are in different output components C u and C v . Is it possible that y uv > 0 in the optimal solution?
Note that y uv > 0 cannot improve the objective function value since the objective function has only x ij variables. Having y uv > 0 will make all y ij variables between C u and C v positive (based on transitivity constraint (C3)), thus making the constraint (C1) for any node from C u and C v less likely to be satisfied. Even if some of those y uv variables are positive they will not affect optimal x ij variables and the optimal value. In other words, there might be multiple optimal solutions, and even if some of them could have fractional y ij variables there is always an optimal solution with no fractional y ij variables. Thus, by postprocessing the solution based on x ij variables one can get rid of the fractional values by setting them to 0 and get the actual output components.
To illustrate the result of Theorem 1, consider the example in . Note that these fractional values make the (C1) constraints for nodes 1, 2, and 3 satisfied at equality: 2 × 1 + 2 × 0.5 = 3.
Thus, the corresponding optimal solution is likely to be a basic solution returned by the solver. But we can obtain another optimal solution by setting all fractional  0 y y y y y y = = = = = = . As argued above, all the constraints still will be satisfied, and the objective function value will stay at the optimal value 3.6 since the values of x ij variables are not changed. Thus, this new solution without any fractional values is also optimal.

The AMPL Implementation
The AMPL implementation of the basic model is given below.   2) The company might want to have several employees (specialized in the same type of important task) in different components.

3) Sometimes the maximum component size might be exceeded if it is very
important for company's work. But in that case some penalty must be paid.

Model with Required Arcs
A natural requirement is that some links are so essential to company's work that they cannot be removed. This restriction is not hard to model. But the problem is that the required arcs might not allow to have the maximum component size restriction to be satisfied. Even if the number of required connections for each node is no more than M, transitivity might force to have more than M nodes in some compo-nents, thus making the problem infeasible. For example, consider the unweighted network of Figure 1 with N = 11, M = 3. Suppose arcs (3,5), (4,5), and (4, 6) are required to be in the solution. Then based on transitivity, nodes 3, 4, 5, 6 will be in the same component, thus violating the maximum component size 3.
We suggest several ways of dealing with the problem. 1) Instead of requiring the arcs to be in the solution give those arcs the highest possible weight in the weighted version of the problem.
2) Pay penalty for exceeding M. This solution idea is discussed in Section 3.3.
3) Have required interaction between subsets instead of nodes. This method is discussed in Section 3.1.2 below.

Model with Required Interactions between Subsets
The strict requirement of having a direct interaction between two employees might be too restrictive and has a good chance of making the problem infeasible.
A relaxed version of the requirement could be the following. Given two disjoint subsets E1 and E2 of employees it is required to have at least one direct interaction between an employee from E1 and an employee from E2. Note that when each of the two subsets has only one element then it becomes the original restriction of Section 3.1.1. But the subset requirement is less restrictive, especially if the subset sizes are relatively big, and it could be a realistic restriction in many situations.
To illustrate the idea, we reconsider the example of Figure 1  The requirement is modelled the following way. Let R be the number of all required subset interactions. For each r in 1, …, R, there is a pair of disjoint subsets E1 and E2 such that there should be at least one arc between the two subsets in the solution.

Model with Restricted Connections
If there is a group of employees specialized to do the same kind of important task for the company, then the company might want to make sure that they do not get sick at the same time. Thus, those employees should be in different components. More specifically, we require that there should be at least two different output components having employees from the group.
To illustrate the idea, we reconsider the example of Figure 1 (N = 11, M = 4, unweighted arcs) with the following modification: employees 3 and 4 should be in different components. The original optimal solution with 10 arcs given in Figure  1 does not satisfy this new requirement. A new optimal solution which includes 9 arcs is given in Figure 5

Models with Penalties for Exceeding the Maximum Component Size
In Summarizing, by slightly exceeding the maximum component size and paying corresponding penalty the company was able to increase the number of personal interactions significantly, from 11 to 18. Thus, in some situations it might make sense to pay penalty if it significantly improves the effectiveness of company's operations.
In the rest of this subsection we discuss how to incorporate penalty in the We have two conflicting objectives in this case: maximizing the total weight of included arcs, and minimizing the total penalty. Both parts can be included in the objective function by giving them weights that add up to 1. Let w p be the weight of total penalty in the objective function which takes a value between 0 and 1. Then we have the following modified objective function.
The penalty weight w p is an input parameter that can be controlled by the user of the model. The lower the weight of the total penalty the more likely that M will be exceeded for some components.
Modifications for Method (ii): adding a constraint for the total penalty.
In this case we define a new input parameter P for the total allowed penalty level. Then we need the following constraint which says that the total penalty cannot exceed P.

Comparison of Methods (i) and (ii).
Each method has its advantages and disadvantages. Method (i) allows the user to find a right balance between two important factors: the total weight of included arcs and the total penalty. But it does not limit the total penalty, and a lower value for w p might result in unacceptably large penalty. To illustrate it, we reconsider the example of Figure 1 (N = 11, M = 4, unweighted arcs) when a penalty component is added to the objective function with weight w p = 0.05. The model returned a solution that included all 16 arcs of the network; but the total penalty was 11 × 7 = 77 (for each of the 11 nodes its component size is exceeded by 7). Having a reasonable limit on the total penalty as in Method (ii) would not allow to have such a large penalty for a small example with 11 nodes.
Method (ii) puts a limit on the total penalty. But not having any penalty component in the objective function might result in the following situation. The company might end up paying a significant penalty (still below the penalty limit) but gain very little in the total weight of included arcs. To illustrate it, we reconsider the example of Figure 1 (N = 11, M = 4, unweighted arcs) with the following modification: the total allowed penalty level is P = 5. A new optimal solution is given in Figure 7  Based on the discussion above, a better way to handle the penalty situation is to combine methods (i) and (ii). We suggest including both the constraint from method (ii) and the penalty component of the objective function from method (i) in the model.
An alternative way could be the following: run the model of method (i) for different values of weight w p , consider those solutions that keep the penalty under the maximum allowed level P, and pick the solution that has the maximum weight of included arcs. We give more details about this approach in the computational results of Section 5.

Heuristics
The integer programming models in Sections 2 and 3 guarantee to return optimal solutions for the problem. But for a large problem size, the IP models might be slow in practice (see Section 5 for more discussion on computational efficiency). In this section, we give heuristic algorithms for solving the basic problem and its variations. Note that the heuristics do not guarantee to return optimal solutions. But they might return pretty good solutions that might serve as initial solutions for the integer programming solvers and thus accelerate the solution process. The section is organized as follows. In Section 4.1 we give a heuristic for the basic problem given in Section 2. In Sections 4.2-4.4, the heuristic is extended to the three variations of Section 3.

A Heuristic for the Main Problem
Highest Weight First (HWF) 1) Sort the arcs in decreasing order of their weights (break ties arbitrarily); 2) Consider the arcs (one at a time) in the order given in For the example of Figure 8, the output of the heuristic happens to be the optimal solution. But the heuristic might also return a solution far from the optimal as shown in the next bad-case example. Consider the network in Figure 9. The arc weights are given on the arcs, and the maximum component size is M = 4.
The total weight of the arcs included by the heuristic is 3 × 1 + 2 × 0.9 = 4.8. But the heuristic output is not optimal. The optimal solution is given in Figure 10.   The total weight of the arcs in the optimal solution is 2 × 1 + 10 × 0.9 = 11. Thus, for this example the heuristic returns a solution far from the optimal. More discussion about the heuristic performance is given in Section 5.
Here is the AMPL implementation of the heuristic.

The Heuristic Applied to the Variation with Subset Interaction Requirements
The original heuristic is modified by including an extra stage as follows.

The Heuristic Applied to the Model with Restricted Connections
A new provision is added to the original heuristic to make sure that not all specialists of a group end up in the same component. Highest

The Heuristic Applied to the Models with Penalties
We

Computational Results
The AMPL models for the integer programs were run on the NEOS server using the solver CPLEX [6]. The running time for randomly created instances was 22 seconds for N = 30, 300 seconds for N = 40, and 7094 seconds for N = 50. Thus, the integer programming model is efficient enough for small and medium size companies with number of employees up to 50. For larger companies, more efficient solution methods should be used.
Next we discuss some computational results for the penalty models and the heuristic.
Computational results with sensitivity analysis for penalty models In Section 3.3, we discussed two methods for handling penalty, the first one including a penalty component with weight w p in the objective function, and the second one putting an upper bound P on the total penalty allowed. But as was discussed at the end of that section, a better way to handle penalty would be to combine the two methods. Below we illustrate it based on our computational results.
Computations were run for a randomly created instance with N = 30, M = 4, unweighted arcs (which allows to better understand the results). The basic model when no penalty is allowed returned the following output: 37 arcs were included in the solution (out of total possible 139); the solution had two output components of size 3 and six output components of size 4.
Then we tried the model with different weights w p of penalty in objective function. For each w p , the model was run for two different scenarios: 1) when there was no restriction on total penalty, 2) with a limit P = 30 on total penalty. Note that the first scenario is method (i) of Section 3.3 while the second scenario represents the combined version of methods (i) and (ii). Pure method (ii) is also represented by the second scenario when w p = 0. The results are summarized in Table 1. The values of w p are in increments of 0.05 in the range from 0 to 0.95.
The results in the range from 0.3 to 0.9 are identical and thus are not included in the table for a more compact presentation.
Below is an analysis of the results given in Table 1 with some conclusions.  For w p ≥ 0.3, as in Scenario 1 the solution is the same as the output of the basic model: M = 4 is not violated and thus no penalty is occurred. For w p = 0.25, the solution again is the same as in Scenario 1: 41 arcs can be included by paying penalty of 10. But for w p = 0.2, a new solution is found with 44 arcs and a penalty of 21. When w p is between 0.05 to 0.15, the solution includes 45 arcs by paying penalty of 26. If not exceeding the penalty limit P = 30 is very important then the last two solutions with 44 and 45 arcs are the best ones. Note that those solutions could not be obtained in Scenario 1 because for the same values of w p the solutions violated the penalty limit P = 30.
Note also that for w p = 0, Scenario 2 is simply the Method (ii) and it returns a solution with 45 arcs but with penalty 29. The reason for higher penalty with the same number of arcs is that Method (ii) does not have any penalty component in the objective function, and thus any solution with a penalty under P = 30 can be optimal.
Thus, our conclusion is that Scenario 2 with w p > 0 (combining Methods (i) and (ii)) is a better way to proceed if the penalty limit P cannot be violated.

Heuristic performance
We tested the main heuristic of Section 4.1 for different combinations of M and graph densities. Five instances were randomly created for each combination. N = 30 for all instances. We compared the heuristic output with the optimal solution returned by the integer program in terms of number of included arcs. The numerical results that represent the ratios of the number of arcs returned by the heuristic and the integer program are summarized in Table 2.
The heuristic performance gets slightly worse for larger M. It also gets slightly worse for graphs with higher density. On average, the heuristic output is within 0.8142 of the optimal value.

Future Directions
Below we discuss several future directions.
In Section 2 and 3 we developed the main model and its variations to take into account some reasonable restrictions that can be common for most companies. The models can be further developed to incorporate other government regulations and company-specific restrictions. For example, one can do more careful analysis with the penalty models by defining the penalties in monetary terms.
The integer program is a reasonably efficient technique for solving the problem for companies with up to 50 employees. For bigger problems, more time-efficient techniques are required. One direction is further refining the integer programming model to make it computationally more efficient. The other direction is developing other heuristics and approximation algorithms that are time-efficient and return close to optimal solutions in practice.
In this paper we consider how direct interactions between employees can be affected by social distancing regulations. An extension would be to model how the scheduling of more complex operations in companies can be affected by those regulations.
In our models, we assigned weights to employee links to characterize their relative importance to the company. But the weights might not be the only relevant link characteristics. Recall that the main intent of partitioning is to limit the spread of the disease. Some links might be more infectious than others. There might be different reasons for that: the closeness of the employees during the interaction, the duration of the interaction, some employees being more likely to get infected than others (different age groups), etc. Thus, one could introduce a probability indicating how infectious the link is. The resulting model will be stochastic, and more complex techniques would be needed to solve it.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.