Algebraic multigrid methods for saddle point systems arising from mortar contact formulations

In this paper, a fully aggregation-based algebraic multigrid strategy is developed for nonlinear contact problems of saddle point type using a mortar finite element approach. While the idea of extending multigrid methods to saddle point systems can already be found, e.g., in the context of Stokes and Oseen equations in literature, the main contributions of this work are (i) the development and open-source implementation of an interface aggregation strategy specifically suited for generating Lagrange multiplier aggregates that are required for coupling structural equilibrium equations with contact constraints and (ii) a review of saddle point smoothers in the context of constrained interface problems. The new interface aggregation strategy perfectly fits into an aggregation-based multigrid framework and can easily be combined with segregated transfer operators, which allow to preserve the saddle point structure on the coarse levels. Further analysis provides insight into saddle point smoothers applied to contact problems, while numerical experiments illustrate the robustness of the new method. We have implemented the proposed algorithm within the MueLu package of the open-source Trilinos project. Numerical examples demonstrate the robustness of the proposed method in complex dynamic contact problems as well as its scalability up to 23.9 million unknowns on 480 MPI ranks.


INTRODUCTION
Many engineering applications require the simulation of large-scale contact problems. Therefore, it is not surprising that recent years have seen significant progress in modelling and simulation of contact interaction and its associated phenomena, such as friction [28,52,66], wear [16,24,44,48], adhesion [45,65], or multi-scale contact phenomena [10,73]. This is particularly true with regard to robust finite element based discretization techniques for finite deformations and efficient nonlinear solution algorithms. Above all, mortar finite element methods -originally introduced in the context of domain decomposition [6,9] -are meanwhile well-established as a basis for state-of-the-art contact formulations and widely accepted among researchers as being superior to more classical discretization techniques, such as the node-to-segment (NTS) method, the Gauss-point-to-segment (GPTS) method and other collocation based approaches [21,59,60,89,93].
Nowadays, constraint enforcement in the context of mortar methods is often based on a regularized Lagrange multiplier scheme or an augmented Lagrange method instead of a simple, yet often insufficient penalty approach. Independent from the actual details of the constraint enforcement implementation, the discrete Lagrange multipliers constitute an additional set of degrees of freedom in the mortar finite element contact formulation. When using a dual mortar approach [33,36,37,54,57,87,88], the discrete Lagrange multiplier basis is chosen based on a biorthogonality condition with the underlying finite element basis. This allows for the localization of the contact constraints and, thus, from a more algebraic point of view, for the trivial condensation of the additional Lagrange multiplier degrees of freedom from the final linearized systems of equations. If such a static condensation is not desired or not feasible (e.g. when choosing a standard basis rather than dual basis functions for the Lagrange multipliers, see e.g. [56,88]), the linear system remains in its generalized saddle point format arising from the contact constraint equations. Both the standard and the dual mortar approach have become increasingly popular in recent years, with new contributions focusing for example on higher-order finite element interpolation [55,90], isogeometric mortar methods [20,67,69,70,94], or improved robustness of the solution algorithms [25,34,58], to name only a few particularly active research directions.
It is striking, however, that almost all current research endeavors concerned with mortar finite element methods for contact mechanics focus exclusively on the modelling of various contact phenomena. Yet, for large-scale and industrial applications the appropriate modelling of contact problems is not sufficient. In fact, the demand for efficient solution strategies tailored to the specifics of contact simulations is eminent in order to achieve optimal over-all performance. Whereas one could use parallel direct solvers to solve the linear systems, they are not an option for very large problems. Iterative solvers for sparse systems (e.g. [31,63]) combined with good preconditioning strategies are a far better choice with respect to computational resources. In particular, multigrid methods [32,72] are known to be among the most efficient solution and preconditioning strategies, at least for certain classes of problems.
From the perspective of the linear solvers and multigrid-based preconditioners, the condensation of the Lagrange multipliers seems to be very attractive, since it allows to circumvent the moresophisticated saddle point formulation. For contact problems though, we have experienced that the resulting linear systems after condensation suffer from some challenging matrix properties which cause severe convergence problems for standard preconditioning techniques. In particular, the matrices tend to be non-diagonally dominant due to different (local) coordinate systems that are typically used for the formulation of the structural equilibrium equations and the contact constraints. In our previous work [85], we have developed multilevel preconditioners that address such issues and are specifically tailored to contact problems using the dual mortar method in a condensed formulation.
On the other hand, multigrid methods already have been successfully applied to saddle point problems as they arise from different applications (e.g. Stokes flow [38] or incompressible Navier-Stokes problems [30,46]) and even in the context of mortar finite element methods [83,86]. The multigrid theory for this particular class of saddle point problems has evolved starting from special multigrid methods for mortar finite element methods (e.g. [12,29,41,92]) to mortar finite element methods in saddle point formulation (e.g., [13,14]). Based on these ideas, specific multigrid methods for contact problems in saddle point formulation have been developed in [82]. However, most of the literature available on multigrid for mortar finite element methods and contact problems in saddle point formulations is primarily on geometric multigrid methods with abundant work on saddle point smoothers (cf. [97]). A first algebraic multigrid preconditioner for mortar-based contact problems has been proposed by Adams [5], performing standard aggregation on the graph of an auxiliary matrix imitating the Lagrange multipliers. Alternatively, multigrid methods for contact problems not requiring an outer iteration loop or active set strategy have been developed in [40,91].
In this paper, we address the case of mortar-based contact problems in saddle point formulation and show how to tailor iterative solvers with algebraic multigrid preconditioners to such problems. In contrast to geometric multigrid methods, algebraic multigrid methods (e.g. [68]) do not rely on geometric user-provided mesh information, but use only purely algebraic information from the fine level matrix. Since static condensation of Lagrange multiplier unknowns is not required, our approach is applicable to mortar methods using both standard or dual shape functions. The proposed multigrid method is based on the (smoothed) aggregation algebraic multigrid algorithms (cf. [64,75,76,78,79]) with special extensions for block matrices and some minor contact-specific adaptions. We propose a novel aggregation strategy for the discrete Lagrange multiplier unknowns along the contact interface, which we consider simpler to implement, computationally less expensive, and more intuitive for contact problems compared to the ideas from [5]. Inspired by our prior work on fluid-structure interaction [27,47], where we have investigated the beneficial effect of satisfying interface constraints within the preconditioner, we will then use segregated transfer operators suitable for block matrices to transfer and incorporate the contact constraints in all coarse levels. We analyze various Schur complement block smoothers and assess their suitability for satisfying the contact constraints. Finally, we demonstrate and assess the performance of the proposed preconditioner in several three-dimensional examples.
The remainder of this paper is organized as follows: Section 2 provides a brief introduction to mortar methods for finite deformation contact problems in saddle point formulation. After the basic notation is introduced, we specifically present the resulting linear system that is arising if the discrete Lagrange multipliers are explicitly included into the set of unknowns to be solved for. After a brief introduction to the general idea of multigrid methods, Section 3 describes our strategy to tailor a multigrid preconditioner to contact problems in saddle point formulation. It comprises the coarsening of the mortar contact constraints as detailed in Section 4 as well as suitable block smoothers as discussed in Section 5. Finally, Section 6 presents numerical examples that showcase the robustness, scalability, and performance of the proposed multigrid preconditioners, before we close with some final remarks.

MORTAR METHODS FOR FINITE DEFORMATION CONTACT
As this paper is concerned with preconditioning of the system of linear equations arising from contact problems, just a brief summary to the contact formulation and discretization is given here. For a detailed presentation, the reader is referred to our previous work [54].

Problem formulation and governing equations
We consider two solid bodies, which are represented by Ω c denoting the Dirichlet boundary, the Neumann boundary, and the potential contact interface with unknown contact tractions t (i) c , respectively. The solid bodies themselves are governed by nonlinear elasticity. Since we are only interested in the algebraic block structure of the final system of equations after discretization and linearization, it is sufficient to discuss a quasi-static contact problem with only two deformable bodies.
In order to describe the contact phenomenon, we state the Hertz-Signorini-Moreau conditions Therein, gn defines a so-called gap function, which measures the distance of a point on the slave interface γ (S) c to the projected corresponding point on the master side γ (M) c of the contact interface in the current configuration. Furthermore, pn denotes the normal contact traction. In the mathematical formulation, one introduces the negative slave side contact traction t (1) c as Lagrange multiplier, i.e., λ = −t (1) c . Using n to denote the outward unit normal vector, the normal part of the contact stress can be denoted by λn := λ T n and the tangential part by λτ := λ − λnn.
We employ the usual function spaces U (i) and V (i) for the displacement field u of the the solid body and its weighting function v, respectively. Furthermore, a suitable function space M + for the Lagrange multiplier field λ and its weighting function µ is assumed. The weak form of the governing equations then reads: Find u, λ ∈ U × M + such that Herein, the internal and external virtual work contributions δW int,ext are defined as usual in nonlinear solid mechanics (cf. [96] for example) and, thus, further details are omitted. The second term in (2a) can be identified as contact virtual work δWc and the expression in (2b) as variational inequality formulation of the contact constraints. An extension to frictional contact based on Coulomb's law is straightforward and can be found in our previous work [28] for example.
Inner nodes with displacement degrees of freedom u N1 ∈ Ω (1) with displacement degrees of freedom u A and u I and Lagrange multipliers λ Nodes at master contact interface Γ Figure 1. Schematic mesh illustrating interior, master and slave interface nodes.

Finite element discretization
For spatial discretization of the displacement field, either isoparametric finite elements with first-order and second-order Lagrange interpolation or isogeometric analysis (IGA) with NURBS-based shape functions are employed. After discretization, the discrete representation of displacement unkowns is given by the nodal degrees of freedom (DOFs) Therein, the u Ni , i ∈ {1, 2}, contain all degrees of freedom associated with the mesh nodes of the corresponding solid body without the nodes at the contact interface, where we use the convention that indices 1 and 2 denote the slave and master "body", respectively. The degrees of freedom associated with the contact interface on the slave and master side are represented by u S and u M , respectively. Since (2) represents a mixed variational form, we also have to discretize the Lagrange multiplier field λ. We choose to follow a mortar approach for its mathematical properties and its superiority to other schemes [21,59,60,89,93]. As usual for mortar methods, the Lagrange multiplier field is discretized on the slave side contact interface γ (S) c in the current configuration. We either use standard ansatz functions, i.e. Lagrange polynomials with a trace space relation with the underlying volume element, or dual shape functions. The latter satisfy a biorthogonality property and, thus, allow for a computationally cheap condensation of the additional unknown Lagrange multipliers from the final system of equations. For details on dual basis functions in the context of mortar-based contact discretizations, we refer to [42,55,88]. Their interplay with preconditioners for iterative linear solvers has been discussed in our previous work [85]. The vector of discrete Lagrange multipliers is now referred to as λ. A schematic mesh illustrating interior, slave interface, and master interface nodes is sketched in Figure 1.
The final spatially discretized formulation of the quasi-static frictionless problem (2) using the nodal vector representation now emerges as The internal forces f int (u) and external forces f ext are common in nonlinear finite element methods and need no further explanation. The discrete vector of contact forces fco is computed based on two mortar matrices D and M, arising from the integral over the slave interface γ (S) c in (2a), and the discrete Lagrange multiplier vector λ. For details regarding the computation of D and M, see [54,59] for example. Using g n,h j to denote the discrete weighted gap function at node j, a closer look at the discrete contact constraints reveals that (3b) basically represents a discrete version of the Karush-Kuhn-Tucker (KKT) type conditions in (1) with an additional weighting based on the Lagrange multiplier shape functions ψ j , while the nodal enforcement of frictionless sliding in (3c) is straightforward anyway.
Since the discrete contact constraints summarized in (3b) are still formulated as inequalities, an active set strategy usually referred to as primal-dual active set strategy (PDASS) is needed in addition to the usual nonlinear solution procedure to identify the currently active and inactive contact regions A and I = S \ A, respectively. It has been demonstrated in [17,35,61] that the PDASS can equivalently be interpreted as a semi-smooth Newton method, thus allowing for an integrated treatment of all nonlinearities (including the search for the active set) within one single Newton-Raphson type iteration loop. Meanwhile, many successful applications to small and large deformation contact problems can be found in the literature [28,36,37,54].

Algebraic formulation of linear systems
For efficient iterative solution strategies based on multigrid methods for nonlinear contact problems, one is primarily interested in the structure of the linear systems arising in each nonlinear iteration step of the underlying Newton-Raphson scheme. For the sake of brevity, details on the linearization process and on the Newton-Raphson procedure are omitted here and the interested reader is instead referred to [54,57].
Consistent linearization of (3) and a subsequent update of the active set A and inactive set I yields the system to be solved in every nonlinear iteration. The 2 × 2 block matrix indicated by the dashed lines in (4) describes a linear system with a typical generalized saddle point structure. The upper left block consists of the entries of the tangential stiffness matrix (i.e. linearized internal forces) as well as linearizations of contact forces w.r.t. displacement degrees of freedom u. The upper right block mirrors the discrete contact operator C(u), i.e. basically the two mortar matrices D and M, representing the linearizations of the contact forces w.r.t. the Lagrange multiplier unknowns λ. The kinematic constraints are incorporated in the bottom left block. The very simple sixth block row emerges from (3b) and (3c) for inactive nodes, while the last block row imposes frictionless sliding in the directions tangential to the contact interface. The distinct pattern of zero entries in the upper left block reveals that the two solid bodies (indices N 1 and N 2 ) are indeed only coupled through the slave and master sides of the contact interface (indices S and M). Even though formulated for two solid bodies, the generalization to n solid bodies is straightforward and only a matter of notation.
In case of dual shape functions, the matrix D reduces to a diagonal matrix and, thus, allows for a cheap condensation of the Lagrange multiplier unknowns. Algebraic multigrid preconditioners for this type of condensed system have been proposed in our earlier paper [85]. Furthermore, matrices N M , N I , and N A denote the linearizations of the weighted gap function of (3b) at all active contact nodes. Finally, linearizations of the frictionless sliding condition (3c) are referred to by matrices F I , F A , and T A , respectively.
Note that the given matrix has 8 block rows but only 7 block columns in our notation in order to emphasize that the normal and tangential parts of the contact constraints for active nodes are considered separately, i.e. these two rows contain consistent linearizations of the active branch of (3b) and of (3c). Again, we point out that this separate notation is possible due to the fact that a local convective coordinate system is employed for evaluating the contact constraints / Lagrange multiplier weights µ, while the standard Cartesian frame is still applied for the discrete Lagrange multiplier values λ as well as the displacement unknowns. Yet, of course, the system matrix remains a square matrix with the total numbers of rows and columns being identical. The discrete vector g A contains all weighted gap values g n,h j associated with the active nodes at the contact interface.  For ease of notation, the following short block notation is used in the remainder of the manuscript:

MULTIGRID SCHEME FOR CONTACT PROBLEMS IN SADDLE POINT FORMULATION
Although multigrid methods can be used as standalone solvers for linear systems, they are usually incorporated into an iterative linear solver as a preconditioning method. Throughout this paper, we use a preconditioned generalized minimal residual (GMRES) solver [62] with one multigrid V-cycle sweep for preconditioning. A general introduction into the idea of preconditioning is beyond the scope of this paper. The reader is referred to the literature, e.g. [7].

Algebraic multigrid methods in a nutshell
Multigrid methods are based on the finding that many well-known and computationally cheap iterative methods (e.g. relaxation based iterative methods such as Jacobi or Gauss-Seidel methods) to solve linear systems Ax = b effectively damp the high frequency part of an error vector but are less effective in damping out the low frequency error modes. Multigrid methods heavily make use of this smoothing property by applying such cheap smoothing methods on different coarsened representations of the original fine level problem.
3.1.1. Basic multigrid cycle and algorithm The multigrid algorithm given in Figure 2a is briefly described as follows: on each multigrid level , a level smoothing algorithm S performs ν 1 presmoothing sweeps before the residual vector r is transferred to the next coarser level + 1 using the restriction operator R. After the coarse level problem has been solved on the coarsest level, the correction c is then prolongated using the prolongation operator P and the solution vector is smoothed using ν 2 post-smoothing sweeps. Figure 2b illustrates this basic multigrid V-cycle, exemplifying a three-level setting. As one can see from Figure 2b, applying a multigrid method basically means applying level smoothers on coarse representations A of the fine level problem A 0 .

Algebraic multigrid methods
There are different strategies for defining the transfer operators P and R which are necessary to generate coarse level matrices A ( > 0). For algebraic multigrid (AMG), the fine level operator A 0 is sufficient to generate coarse level matrices A . An important class of AMG is given with the smoothed aggregation AMG which is based on so-called aggregates (see e.g. [75,77]). The fine-level nodes are agglomerated and put into aggregates, which then represent "supernodes" on the next coarser level. The aggregation information together with the null space information of A 0 is used to construct the corresponding tentative transfer operators R (for restriction) and P (for prolongation). Transfer operators are used to restrict the fine level residual to the next coarser level and to interpolate the coarse level correction to the next finer level via prolongation. For an efficient multigrid method, the interaction of fine and coarse levels tackles those error modes, which appear as low-frequency modes on the fine level and cannot effectively be reduced by iterative smoothing methods on the fine level, but resemble high-frequency modes on a coarser level, such that iterative smoothing methods are effective again. On the coarsest level, a direct solver can take care of the remaining error modes. For smoothed aggregation multigrid (cf. [26,75]), the prolongation operator is found by applying one smoothing sweep with a damped Jacobi iteration using with D being the diagonal part of A and a damping parameter ω > 0. Depending on the symmetry of the system matrix A, the restriction operator is either chosen as R = P T or smoothed independently using, e.g. a Petrov-Galerkin approach (cf. [64]).

Algebraic multigrid methods for block matrices
Block matrices usually arise if multiple types of equations are coupled together. In the present context of contact problems in saddle point formulation, two types of equations, namely the balances of linear momentum of the solid bodies and the contact constraints, are connected via the off-diagonal blocks in (5). Similarly, multiphysics problems also yield block matrices where the coupling between different physical fields manifests itself in the off-diagonal blocks of the monolithic system matrix. From a multigrid perspective, the most important question is where to consider the coupling between the different equations within the overall solver layout. In general, there are only two possible strategies to apply multigrid ideas to coupled block systems: Nested multigrid approach: Multigrid methods can serve as local single field smoothers or solvers within well-known block preconditioners such as the SIMPLE method (cf. [50]) and variants for Schur complement based preconditioners or the block Gauss-Seidel (BGS) method. The coupling of the different fields or variables is only considered on the finest level in the outer (SIMPLE or BGS) iteration. This approach is well known in literature, e.g. for the Navier-Stokes equations [30,68], fluid-structure interaction [27,71] or general n-field problems [80]. The implementation is very easy and allows to use existing multigrid components in a standalone fashion within the solver. A graphic representation of this approach is shown in Figure 3a.
Fully coupled multigrid approach: Truly monolithic algebraic multigrid methods aim at coarsening the fully coupled fine level problem such that the block structure of the fine level matrix is preserved and the coupling information is present on all coarser levels, cf. Figure 3b. This is often achieved by using segregated transfer operators to preserve the characteristics of the sparsity pattern across all levels. Then, each level utilizes block smoothers to address the coupling. In [81], a coupled AMG method is developed and analyzed for a stabilized mixed finite element discretization of the Oseen equations. Fully coupled multigrid methods for multiphysics problems have been described in [27,43,80].
(a) Outer coupling iteration with nested multigrid methods. Example with four and two multigrid levels for A 00 and A 11 .
Multiphysics multigrid approach with nested coupling iteration on all multigrid levels. While the nested multigrid approach is easier to implement, it also allows for a high degree of modularity, since the multigrid hierarchies used to approximate the block inverses of the block smoother on the fine level can easily by swapped by any other method, either another type of multigrid algorithm, or another type of multi-level scheme (e.g. based on domain decomposition), or even any single-level approach if the block is of moderate size and scalability is not deteriorated. This flexibility is particularly useful if the coupled blocks differ a lot in size or if the user wants to apply a highly optimized solver for an individual block. The fully coupled multigrid method does not offer such a degree of flexibility, yet it propagates the coupling conditions throughout the entire preconditioner. Thus, one expects a stronger and more robust preconditioning effect, since the coarse level corrections are aware of the coupling conditions. This expectation will later be confirmed in the numerical experiments, where the number of iterations for the fully coupled scheme is lower and more independent of the active contact nodes than for the nested multigrid approach.

Designing algebraic multigrid methods for contact problems
In the present context, we can interpret the mortar contact problem in saddle point formulation as the coupling of two types of equations: the structural equations and the contact equations which serve as constraint equations. Since the contact constraint equations are only defined along the contact interface, we can further classify the mortar contact problem as an interface-coupled problem (in contrast to volume-coupled problems). This information is important for the choice of coarsening strategy. The contact constraint equations are also responsible for the characteristic saddle point structure, which needs special attention when choosing an appropriate coupling algorithm between the structural equations and the contact constraints. Considering the class of fully coupled AMG schemes, the generalized saddle point problem (5) has to be preserved on all multigrid levels such that the contact constraints are considered on all levels. Due to the constraints, this will require Schur complement based level smoothers on all levels.
Alltogether, the key ingredients for designing an algebraic multigrid method for contact problems in saddle point formulation are the coarsening strategy as proposed in Section 4 and the level smoother and the coupling iteration as detailed in Section 5.

Segregated transfer operators
To keep the characteristic saddle point block structure (5) on all multigrid levels, the common approach is to use segregated transfer operators as, e.g. introduced in [5,11]. The segregated block transfer operators (7) are put together from the transfer operator blocks for the different physical and mathematical fields. Here, P u and R u describe the transfer operator blocks corresponding to the stiffness matrix block K in (5). The transfer operators P λ and R λ define the level transfer for the Lagrange multipliers. The block diagonal structure in (7) guarantees that the primary displacement variables and the secondary Lagrange multipliers are not "mixed up" on the coarser levels. That is, the coarse level matrix still has the same block structure with a clear distinction of momentum and constraint equations as for the fine level problem since the columns and rows of the transfer operators P +1 and R +1 can be interpreted as some kind of basis functions.
For many volume-coupled problems, for example in thermo-structure-interaction problems [19], it is straightforward to generate P λ and R λ to be consistent with P u and R u . That is, in context of smoothed aggregation algebraic multigrid we just use the same aggregates for building P u and P λ (and the same for the restrictors, respectively). For interface-coupled problems with interface constraints it is more difficult. Due to the saddle point structure of (5), the nonzero pattern of the Z block is insufficient to generate valid aggregates for the Lagrange multipliers. Consequently, we need a special routine for finding aggregates for the Lagrange multipliers λ to be able to build the (non-smoothed) transfer operators P λ and R λ . Nevertheless it seems natural to reflect the aggregation information of the structural equations along the contact interface in the choice of the aggregates for the corresponding Lagrange multipliers λ.

Aggregation strategy for displacement variables
In order to preserve the physics of the fine level contact problem, it is important to keep the two solid bodies separated in the matrix representation on all coarse levels. Therefore, we apply the standard aggregation strategy to a modified K block from (5), where all off-diagonal entries representing connections between the two solid bodies are dropped -in paricular the matrix blocks K IM , K AM , K MI and K MA -in order to make sure that the resulting displacement aggregates A u do not cross the contact interface (see Figure 4). Neglecting these blocks during aggregation guarantees that the two solid bodies are not melted together in the coarse matrix representation. We stress that the modified K is never formed explicitly, but rather the off-diagonal entries are dropped on the fly during the aggregation process.

Aggregation strategy for Lagrange multipliers
In contrast to geometric multigrid methods, there is not so much literature on aggregation-based AMG methods for contact problems in saddle point formulation. The only publication, the authors are aware of covering all aspects of smoothed aggregation methods for structural contact problems in saddle point formulation, is [5], which also discusses a special aggregation strategy for the Lagrange multipliers. To find aggregates A λ for the Lagrange multipliers, Adams [5] proposes to apply the standard aggregation algorithm to the graph of a suitable matrix representing the Lagrange multipliers. However, this approach has some drawbacks: First, the graph used for the aggregation of the Lagrange multipliers λ has to be built explicitly to serve as input for the standard aggregation algorithm. Secondly, one has to run the aggregation algorithm sequentially both for the displacement degrees of freedom and for the Lagrange multipliers. For the second run of the aggregation method, one might have to use a different set of aggregation parameters to obtain optimal results, which further increases the complexity for the user. Algorithmically, the resulting aggregates A λ for the Lagrange multipliers are built independently from the displacement aggregates A u .
In this work, we propose a different approach to build aggregates A λ for the Lagrange multipliers, which does not suffer from above drawbacks. Instead of explicitly building some helper matrix for the aggregation routine, interface aggregates A λ for the Lagrange multipliers are directly generated using the aggregation information of the displacement variables (see Figure 4). The resulting interface aggregates for the Lagrange multipliers are by construction aligned with the corresponding displacement aggregates.
The exact aggregation procedure is described in Algorithm 1. Assuming that the standard aggregates A u for the displacement degrees of freedom are available, new aggregates A λ are built by collecting the corresponding Lagrange multiplier degrees of freedom. Beside the displacement aggregates A u , only the mortar matrix D is needed to algebraically reconstruct the contact interface and find the associated Lagrange multipliers. The new aggregates A λ for the Lagrange multipliers can be interpreted as the natural extension of the displacement aggregates A u at the interface. This facilitates to keep the ratio of coarse level nodes at the slave contact interface and the coarse Lagrange multipliers constant, which also balances the ratio of contact constraints and inner structural displacement degrees of freedom over all multigrid levels. Initialize empty set and counter for aggregates A λ A λ ← ∅, l ← 0 Initialize empty mapping of displacement aggregates to Lagrange multiplier aggregates Loop over all Lagrange multipliers j for j ∈ D λ do Check whether Lagrange multiplier j is coupled with row i if D i,j = 0 then Find pseudo node n λ for Lagrange multiplier j n λ ← n(j)

Return aggregates for Lagrange multipliers return A λ
The coarsening strategy outlined in Algorithm 1 as well as all other AMG components of the presented saddle-point preconditioner for contact problems have been implemented in MueLu [8], the next-generation multigrid package within the Trilinos project [3]. For further details on the implementation, we refer to the MueLu User's Guide [8] and the MueLu website [2].

BLOCK SMOOTHING METHODS FOR MORTAR CONTACT PROBLEMS
Using saddle point preserving aggregation and segregated transfer operators as outlined in Section 4 to generate a fully coupled AMG hierarchy (see Section 3.3), the coupling of structural equilibrium equations and contact constraints on all levels of a fully coupled AMG hierarchy is now addressed by block smoothing methods on each level. Schur complement based coupling iterations present themselves as ideal candidates to deal with the saddle point structure resulting from the constraint-like contact equations.
Nevertheless, not all classical Schur complement based block smoothers for saddle point systems behave the same way when applied to contact problems. Specifically, it is important that the block smoothers account for the contact constraints in (5). While each iteration's intermediate solution itself is not of great interest, its satisfaction of the contact constraints impacts the overall number of required iterations to reach convergence, since only an intermediate solution that satisfies the contact constraints can then also be accepted as the final solution of the iterative process. While an exact satisfaction of the contact constraints by the block smoothers is desirable, practical computations require a compromise between the accuracy of the block smoothers and their computational effort in order to obtain computationally competitive preconditioners.
In Section 5.1, we first revisit some classical smoothing methods for saddle point systems. Therefore, we assume exact inverses for the predictor and corrector step of each smoother to focus on the systematic error resulting from the specific block structure of the smoother. The assumption of exact inverses allows to assess the behavior of each block smoother via an error matrix, such that the impact of the block smoother on the contact constraints can be characterized. Afterwards, we introduce computationally cheaper variants with inexact block inverses in the predictor and corrector step as a compromise between accuracy and performance in Section 5.2, however preventing the discourse on error matrices. While the approximations due to the block structure of a specific smoother originate from the definition of the smoothing method, the quality of the approximate block inverses can fully be controlled by the user. Finally, Section 5.3 discusses their application in the context of saddle point systems for contact formulations.

Block smoothers for saddle point problems
Now, we study the systematic errors introduced by specific block smoothers and their impact on the contact constraints. Therefore, we first assume exact mathematical block operations. To carefully distinguish between systematic errors and errors stemming from practical Schur complement approximations, we postpone the discussion of such approximations to Section 5.2.
The general block smoothing scheme can be written as where Q describes the 2 × 2 block preconditioning matrix approximating the 2 × 2 block operator in (5). Typical block smoothers consist of an outer coupling iteration with nested subsolvers to build the inverses of the diagonal blocks of Q in an algorithmic predictor-corrector scheme. In the following, a few classical block smoothers from literature (e.g. [49]) are introduced, stating that this list is by far not complete. All of them can be interpreted as block extensions of classical iterative smoothing methods following the general block scheme (8), but depending on the definition of Q with a different effect on the contact constraints by introducing certain systematic errors. In general, the better Q approximates the block operator from (5), the lower the number of linear iterations will be when using the block smoother within a multigrid preconditioner.

Uzawa smoother For the (inexact) Uzawa smoother, one chooses
The parameter α > 0 is a damping parameter and S describes a cheap approximation of the Schur complement S = Z + C 2 K −1 C T 1 . For a theoretical review of Uzawa like smoothers, the reader is referred to [15,23,98].
With the off-diagonal coupling block C 2 in (9), the smoother performs a one-way coupling in the sense that the Lagrange multiplier increments now depend on the current increment of the displacement degrees of freedom. Algorithm 2 represents the practical implementation as a predictorcorrector method. In each smoothing iteration, one calculates a prediction for the displacement increments δu k+1 , which are taken into account when solving for the corresponding Lagrange multiplier increments δλ k+1 .
Assuming the smoothing iteration has converged, the error matrix for the Uzawa smoother is given as Algorithm 2: Uzawa smoother.
Procedure Uzawa(α, kmax) Apply kmax smoothing sweeps with the Uzawa algorithm for k ← 0 to kmax − 1 do Return smooth solution vector return ∆u kmax , ∆λ kmax With α = 1, S = Z + C 2 K −1 C T 1 being an approximation to the Schur complement, and K denoting an easy-to-invert approximation of K, e.g. the diagonal of K as a cheap variant for the approximation K, i.e. K = diag(K), it is easy to verify that the error matrix of the Uzawa smoother reduces to That is, since the second block row in the error matices (10) or (11) does not vanish, the Uzawa smoother by definition cannot exactly fulfill the contact constraints, but adds a systematic error for the contact constraints. This might have a negative impact on the overall performance of the iterative linear solver.

Braess-Sarazin smoother
Originally introduced for the Stokes problem in [11], the Braess-Sarazin smoother belongs to the class of block approximate smoothers and is based on the choice for the block preconditioning matrix Q in (8). Again, the parameter α > 0 denotes a scaling parameter and K refers to an easy-to-invert approximation of K.
Procedure BraessSarazin(α, kmax) Apply kmax smoothing sweeps with Braess-Sarazin algorithm for k ← 0 to kmax − 1 do Prediction step: determine prediction ∆u k+ 1 2 by calculating Return smooth solution vector return ∆u kmax , ∆λ kmax Assuming convergence of the smoother, the error matrix for the Braess-Sarazin smoother is given as With the second block row in the blocked operator (5) being retained in (12), the Braess-Sarazin smoother seems to be a reasonable choice for dealing with contact constraints. The error matrix in (13) reveals that the quality of the block smoother only depends on the choice of K.
The implementation as a predictor-corrector method is based on the splitting of (12) into As one can easily see from Algorithm 3, the prediction step can be understood as one hard-coded sweep with a (damped) Jacobi iteration. In other words, the quality of the prediction for the displacement degrees of freedom must be considered rather poor. Exactly fulfilling contact constraints with respect to a rather poor prediction of the displacement variables might not be optimal for the overall performance of the preconditioner.

SIMPLE variants
Originally introduced in [50,51], the SIMPLE method is based on the approximate block factorization for the iterative method in (8). In (15), S denotes an approximation of the Schur complement S := Z + C 2 K −1 C T 1 with a cheap and easy-to-invert approximation K of the block K. Algorithm 4 shows the implementation of the SIMPLE method using the predictor-corrector scheme (cf. [22]).

Algorithm 4: SIMPLE smoother.
Procedure SIMPLE(α, kmax) Apply kmax smoothing sweeps with SIMPLE algorithm for k ← 0 to kmax − 1 do Prediction step: solve for ∆u k+ 1 2 K∆u k+ 1 2 = r k u − C T 1 ∆λ k Correction step: solve for δλ k+ 1 2 − S δλ k+ 1 2 = r k λ + Z∆λ k − C 2 ∆u k+ 1 2 Update step: update solution variables ∆λ k+1 ← ∆λ k + α δλ k+ 1 Return smooth solution vector return ∆u kmax , ∆λ kmax For our applications, we found the diagonal matrix containing the row sums of K = |a ij |) i,j=1,...,n K to be a good approximation for block K. This corresponds to the SIMPLEC method as introduced in [74]. That is, K is defined as the diagonal lumping of K with The default choice for S is consequently S = αZ + αC 2 K −1 C T 1 with K as defined in (16). A more theoretical discussion on the mathematical consequences of approximations for the Schur complement S can be found in [97].
For the SIMPLE preconditioner, the error matrix is calculated by As one can see from (17), SIMPLE perturbs the Lagrange multipliers, but it does not affect the terms that operate on the primary displacement variables since the first block column in (17) is zero. Choosing S = αZ + αC 2 K −1 C T 1 , the error matrix reduces to That is, an appropriate approximation S of the Schur complement S allows to exactly satisfy the contact constraints within one smoothing sweep. Depending on the choice for the approximation K, the SIMPLE method admits an error in the coupling between displacements and contact constraints. However, compared to the Braess-Sarazin method from Section 5.1.2, we put more focus on a good prediction for the displacements with a consistent update for fulfilling the contact constraints.

Cheap variants of block smoothers
As one can easily see from the Algorithms 2 to 4, all block smoothing methods internally require inverses of the matrix blocks on the matrix diagonal and of the Schur complement operator S. Specifically, there is one linear system to be solved in the prediction step and one during the correction step. To keep the computational costs low in practical computations, one does not solve for the block inverses exactly, but apply a cheap approximation, e.g. by using a fixed number of smoothing sweeps with a relaxation-based smoothing method such as symmetric Gauss-Seidel or an ILU sweep. This allows for more flexibility for finding a good compromise between quality and performance, as the user can decide how much effort should be put on finding a good prediction and fulfilling the Schur complement equation in the corrector step. While the systematic errors introduced by the choice of the block smoother from Section 5.1 are fixed, the practitioner has full control over the quality of the block inverses.
The numerical examples in Section 6 show that such an approximation leads to efficient and computationally reasonable block smoothing methods. As a naming convention, the prefix "Cheap" is added to the name of the block smoothing method to indicate the usage of a cheap approximation for finding the inverse of the diagonal blocks in addition to the systematic approximations of building the Schur complement operator as discussed in Section 5.1.
The block smoothing methods and their cheap variants are available in the Xpetra package of the Trilinos project [3]. For further details on the implementation, we refer to the MueLu User's Guide [8] and the MueLu and Xpetra websites [2, 4], respectively.

Comparison of saddle point smoothing methods for contact problems
Considering the block scheme of (5), there are two main challenges for contact problems. First, we have the two distinct sets of equations as described in Section 2.3: the structural equations formulated in cartesian coordinates and the set of contact constraints formulated in normal-tangential coordinates relative to the contact surface. Second, the coupling of those two distinct sets of structural equations and contact constraint equations.
Algorithmically, the coupling of structural equations at the contact interface via the off-diagonal blocks in (5) is only considered within the block smoothers from Section 5.1. Therefore, constraint smoothers (cf. [39]) are the natural choice for contact problems, since the contact problem is implicitly governed by the contact constraint equations. Only solutions that are in alignment with the contact constraints are of interest.
Under certain preconditions as discussed in Section 5.1 and assuming sufficient smoother iterations to reach convergence of the smoother, both the Braess-Sarazin method and the SIMPLE-based methods would exactly fulfill the contact constraints as one can see from equations (13) or (18). Therefore, with a systematic error in the lower-right block of (11), the Uzawa method seems to be less promising for contact problems. In the Braess-Sarazin method, the approximation K = diag(K) is hard-coded with some scaling parameter α > 0 and consistently used within the approximate Schur complement operator, which is defined by S = Z + 1 α C 2 K −1 C T 1 . In contrary to the Braess-Sarazin method, the SIMPLE based methods keep the full K block whenever possible in the block factorization and use K only where its inverse is required. Consequently, 10 × 10 × 10 in "cheap" variants of the SIMPLE method, more elaborate smoothing strategies can be used for the K block instead of a hard-coded Jacobi sweep. Therefore, one can think of the SIMPLE methods to allow for a more balanced quality of approximations for the displacement degrees of freedom and Lagrange multipliers for the contact constraints, whereas the Braess-Sarazin method exhibits an imbalance in the sense that the computational effort spent for approximating the constraints is much higher than for dealing with the displacement variables.

For the numerical examples, we use our in-house code BACI [1] that internally uses various capabilities from the Trilinos project [3]. The implementation of the multigrid algorithms is based on Trilinos'
MueLu package [8,84]. In particular, all block smoothers from Section 5 as well as the contact specific aggregation strategy for the Lagrange multiplier unknowns as described in Section 4 are readily available in MueLu.

Two solid bodies example
With the first example, we want to study the effect of the block smoothers from Section 5 for contact problems. Here, we not only compare different block smoothers, but also highlight the effect of varying the quality of the sub-smoothing steps within the block smoother versus increasing the number of outer coupling iterations. Motivated by findings in our previous work [85], this example briefly revisits a detail that has been problematic in the context of contact problems in condensed contact formulations. While the discrete global unknowns (u, λ) are -as usual -formulated with respect to the global Cartesian frame, the discrete Lagrange multiplier weights µ and therefore the contact constraint equations in (3b) and (3c) are formulated with respect to a local convective coordinate system. This local system is defined at each slave node j by a surface normal vector and two tangent vectors, i.e. by a triad of orthonormal basis vectors (n) j , (τ ξ ) j and (τ η ) j . Although it represents a quite intuitive and natural choice in contact mechanics, this local constraint formulation may lead to non-diagonally dominant system matrices and therefore poses a serious challenge to the development of iterative linear solvers as has been elaborated in [85]. In the following, we will also investigate the susceptibility of the proposed saddle point preconditioners to this phenomenon.
6.1.1. Geometrical setup Since we are interested in the solver behavior, by intention we choose a simple 3D contact example as shown in Figure 5. There are two solid bodies with the same material parameters using a Neo-Hookean material (density ρ 0 = 0.1 kg m 3 , Young's modulus E = 10 GPa, Poisson's ratio ν = 0.3). The initial gap between the two solid bodies is 0.02 meters. The upper solid body (size: 0.8m × 0.8m × 0.5m) is moving down with constant velocity along the normal to the contact interface towards the lower fixed solid body (size: 1.0m × 1.0m × 1.0m).

Experimental setup
To investigate a potential impact of the contact formulation in different coordinate systems (cartesian coordinates for the structural degrees of freedom and normal-tangential coordinates for the Lagrange multipliers) on the linear solver, we perform a similar experiment as introduced in [85] and rotate the example setup around αy and αz as shown in Figure 6. We expect the number of linear iterations to be independent of the rotation angles αy and αz, since the underlying physics do not change. Any dependency of the linear solver on αy and αz would be a result of purely numerical effects and would turn out highly problematic for the iterative solution of large and complex contact problems. For reasons of symmetry, it is sufficient to vary αy and αz within 0 ≤ αy, αz ≤ π 2 .

Discretization
The spatial discretization is based on a 10 × 10 × 10 mesh for each solid block with altogether 6000 displacement degrees of freedom and 300 Lagrange multipliers modeling the contact coupling constraints for the 10 × 10 slave nodes at the contact interface (see Figure 5). The simulation runs for 40 time steps with a time step size of 0.01s on 4 processors. After 6 time steps (t = 0.06s) both bodies come into contact and are deformed. We assume frictionless contact here. With this example we reduce the contact-specific effects (such as the contact search based on an active set strategy) to a minimum, such that one can focus on the linear solvers. That is, the contact zone is not changing once the two solid bodies are in contact.

Stopping criteria
The nonlinear iteration inside each time step stops if either ∆u e < 10 −8 holds for the Newton increment of the displacement degrees of freedom, or alternatively, if the conditions hold for the nonlinear residuals r u i and r λ i in (4) after applying i Newton iterations. Thereby, • e denotes the Euclidian vector norm. Those stopping criteria for the nonlinear solver are chosen to result in the same number of nonlinear iterations in each time step for an easier comparison of the linear solver behavior.
Within each Newton iteration, the saddle point system (4) is solved iteratively using a preconditioned GMRES method with a 3-level AMG preconditioner as described in Section 3. The iterative process for the linear system is considered to be converged, if it is r k r 0 e < 10 −8 (20) for the full residual vector r k = r u r λ in the linear iteration step k. Here, the subscript i for the nonlinear Newton iteration is dropped.  In this work, we focus on the behavior of the linear solver. Therefore, a fixed stopping criterion for all tested variants is chosen in (20). This allows the comparison of different preconditioning techniques including their effect on the linear solution strategy. For real world problems, and especially for coupled multiphysics problems, the task of choosing appropriate stopping criteria for both the nonlinear and linear solver turns out to be quite challenging. Usually, one would choose a combination of different (length-scaled) norms for the partial vectors r u and r λ . In order to reduce the solver time in the inner linear solver, it is recommended to adapt the linear (relative) solver tolerance according to the residual norms of the outer nonlinear solver.    Table Ia with the results for the CheapBraessSarazin smoother in Table  Ib, the CheapBraessSarazin smoother heavily suffers from the worse approximation of the displacement degrees of freedom using one internal hard-coded Jacobi sweep (cf. Section 5.1.2). The resulting iteration numbers show an obvious dependency on the rotation angles. With a CheapSIMPLEC block smoother, the number of iterations is lower than for the CheapUzawa smoother and independent from αy and αz when compared with the CheapBraessSarazin smoother (see Table Ic). So, the linear solver has some benefit from the two-way coupling of displacements and Lagrange multipliers within the AMG preconditioner. Compared to the Uzawa smoother, the additional computational costs for the CheapSIMPLEC method are very low with only one additional matrix-vector product by K −1 C T 1 per iteration. Therefore, CheapSIMPLEC is the preferred level smoother for our further experiments with some cheap approximations for the internal single fields using some sweeps with a (symmetric) Gauss-Seidel (SGS) method for the structural block or incomplete LU factorization (ILU) for the Lagrange multipliers. Table II illustrates how the number of CheapSIMPLEC coupling iterations and the quality of the single field smoothing methods within the CheapSIMPLEC smoother affect the number of linear iterations. Improving the quality of the Schur complement approximations within CheapSIMPLEC (see Tables IIa vs. IIb) as well as increasing the number of CheapSIMPLEC coupling iterations (see Table IIc) unsurprisingly reduces the number of linear solver iterations. Aside from the concrete parameter choices for the level smoother, one can even further reduce the number of linear iterations with a reasonable transfer operator smoothing strategy for the displacement block, e.g. as indicated in (6).
By intention, we do not report solver timings, since this example is too small to perform reasonable time measurements, especially when using 4 processors for altogether only 6300 degrees of freedom.
The intention of this example is to compare typical saddle point smoothers within a fully coupled AMG preconditioner. One can observe the expected behavior that increasing the number of smoothing sweeps reduces the number of linear GMRES iterations. However, in practice, the variant with a smaller number of GMRES iterations may not always be the fastest method. This example shows that the proper choice of block level smoothing is essential for the overall performance of a saddle point multigrid method. The particular choice of the block smoothing method gives the user full control over the quality of the coupling with field-specific parameters and allows for fine-grained adaptions and problem-specific optimizations.
With the experience from this example one can choose efficient level smoothers which provide results independent from the exact geometric configuration.

Weak scaling behavior
To assess the behavior of the proposed multilevel preconditioner applied to large-scale examples, we now report a weak scaling study. To exclude side effects (such as changes in the contact active set) and to fully focus on the behavior of the preconditioner and iterative linear solver, we study a simplified and linear contact problem.

Setup
We consider a small block (dimensions 0.8m × 0.8m × 0.4m) and a slightly bigger block (dimensions 1.0m × 1.0m × 0.5m), where contact will occur between the large faces of both blocks. To reduce the complexity of the contact problem and to exclude nonlinearities due to changes in the contact active set, the faces opposite to the contact interface are fixed with Dirichlet boundary conditions, while the blocks initially penetrate each other at the contact interface by 0.001. The smaller block acts as the slave side and its entire contact area is initialized as "active". Application of the contact algorithms will then result in a slight compression of both blocks, such that the initial penetration vanishes. This problem setup allows to distill the performance of the AMG preconditioner under uniform mesh refinement and weak scaling conditions. Both blocks use a Neo-Hooke material with Young's modulus E = 10 MPa and Poisson's ratio ν = 0.3. Denoting the mesh refinement factor with κ, both blocks are discretized with 2κ linear hexahedral elements along their longer edges and κ elements along the shorter edges. The Lagrange multiplier field is discretized with standard shape functions, i.e., linear Lagrange polynomials.
6.2.2. Solver and preconditioner settings As linear solver, a preconditioned GMRES method is used with a fully coupled multigrid approach as described in Section 3.2. The convergence criterion for the linear GMRES solver is set to r k r 0 e < 10 −8 (21) for the full residual vector r k = r u r λ in the linear iteration step k. For the segregated block transfer operator as introduced in (7), we combine smoothed aggregation (SA-AMG) with a prolongator smoothing factor ω = 4/3 for the displacement aggregates and plain non-smoothed (PA-AMG) transfer operators for the Lagrange multiplier aggregates. The restriction operators are built as the transposed of the prolongation operators. Coarsening stops when the total number of rows in the saddle-point system drops below 5000.
Following the guidance from Section 5.3 and the findings from Section 6.1.5, we apply 3 sweeps of CheapSIMPLE(0.8) as a level smoother, where both the predictor and corrector step are approximated by 1 sweep of SGS each. We use the same level smoother layout on all multigrid levels except of the coarsest level. For the coarsest level, we compare two variants: a direct solver (marked as "LU", requiring an expensive merging of the block matrix into a regular sparse matrix format and a subsequent LU factorization) vs. the CheapSIMPLE level smoother as on all other multigrid levels (marked as "CheapSIMPLE"). We will study the impact of the coarse solver on scaling behavior, iteration counts, as well as AMG setup and V-cycle timings.

Results
For the weak scaling study, we target the load per rank to be 50k displacement DOFs. A uniform mesh refinement as outlined in Table III is performed with the finest mesh consisting of roughly 23.9 million unknowns in the saddle-point system. Therein, n proc denotes the number of MPI ranks, κ the mesh refinement factor introduced in Section 6.2.1, n u DOF , n λ DOF , and n total DOF the overall number of displacement, Lagrange multiplier, and total number of unknowns, n u DOF /proc the average load per MPI rank. The mesh refinement factor κ is chosen such that the targeted average number of displacement unknowns per MPI rank is met. The number of levels n , the actual size of the coarse level system n total(n ) DOF as well as the operator complexity C A (defined as the ratio of non-zero entries of all multigrid level matrices A with = 0, . . . , n − 1 and the number of non-zeros on the finest level) are reported in the last three columns of Table III. The operator complexity C A slightly increases with larger problem sizes, but does not grow beyond C A ≈ 1.30.
The scaling study is run on our in-house cluster (20 nodes with 2x Intel Xeon Gold 5118 (Skylake-SP) 12 core CPUs, 480 cores in total, Mellanox Infiniband Interconnect). Solver performance in terms of the number of GMRES iterations as well as wall clock time spent in the AMG preconditioner construction and its application (i.e. sweeping through the V-cycle once per GMRES iteration) are summarized in Figure 7. With respect to iteration counts, both the direct solver as well as the block smoother yield the GMRES iteration count to be fairly independent of the overall mesh size, especially when taking the slight deviations of the load per core as detailed in Table III into account. Using a direct solver on the coarsest multigrid level perfectly deals with all left-over error modes that are not properly handled by the block smoothers on the finer multigrid levels. Therefore, the "LU" variant results in a lower iteration count and is less sensitive to the actual size of the coarse level system. In contrast, when using the CheapSIMPLE block smoother as coarse level solver, we observe a somewhat higher iteration count together with an obvious impact of the number of multigrid levels on the iteration count (e.g., when moving from a 3-level hierarchy on 24 MPI ranks to a 4-level hierarchy on 48 MPI ranks). Increasing the number of multigrid levels helps to compensate the lack of coupling of the CheapSIMPLE block smoother compared to a direct solve on the coarsest level.
We show the CheapSIMPLE variants for the coarse solve since this variant in practice might be superior with respect to the overall performance: Regarding AMG setup time, the cost for the direct coarse solver becomes more expensive with a growing coarse system size, since (i) more matrix entries need to be moved from a block sparse matrix layout to a regular sparse matrix layout and (ii) a larger matrix needs to be factorized. For the CheapSIMPLE block smoother as coarse level solver, no additional communication is required on the coarsest level. Hence, the mild slope in the graph for preconditioner setup time for the CheapSIMPLE coarse level solver originates only from the Galerkin product during hierarchy construction. In terms of preconditioner application, i.e. time spent in the AMG V-cycle, both methods are almost identical and scale very well. While proper weak scalability up to 23.9 million unknowns on 480 MPI ranks has been demonstrated, the actual choice of the coarse solver in practical applications also depends on other factors such as the frequency of rebuilding the AMG hierarchy or the balance of setup time, V-cycle time, and cost of the additional GMRES iterations, which all together impact the overall time to solution. In the next example, we will put attention on effects for the linear solver caused by changes in the active set of contact nodes for larger problems.

1000 deformable rings
Even though only a 2D example of moderate size, the 1000 rings example comes with frequent changes in the contact active set and, thus, tests the robustness of the proposed multigrid methods. Particularly, we are interested in the comparison of the fully coupled multigrid approach and the nested multigrid apparoch as described in Section 3.2.
6.3.1. Setup This example consists of 1000 deformable rings (Neo-Hookean material with E = 210 GPa, ν = 0.3 and ρ 0 = 7.83 · 10 −6 kg m 3 ) arranged in a rectangle (see Figure 8). A gravitational force is inducing an acceleration towards a rigid wall. The simulated time extends to 2.0s with a time step size of ∆t = 0.0005s, yielding 4000 time steps in total. The full mesh consists 110000 nodes with 110 nodes for each deformable ring.

Stopping criteria
In each time step, the nonlinear system is handled by a semi-smooth Newton method. As convergence criteria, we have chosen Here, r u i and r λ i denote the nonlinear residual for the displacement and Lagrange multiplier variables after i Newton iterations, respectively. Similarly, ∆u denotes the solution increment for the displacement variables in the i-th Newton iteration.
For solving the linear systems arising during the simulation a GMRES solver is applied with different variants of AMG preconditioners listed in Table IV. The relative tolerance of convergence for the GMRES solver is set to r k r 0 e < 10 −8 with r k = r u r λ being the full residual vector in the linear iteration step k. Here, the subscript i for the nonlinear Newton step is dropped.
The stopping criteria (22) for the nonlinear solver are carefully chosen in such a way that the simulations with all the tested preconditioner variants shown in Table IV always result in the same number of nonlinear iterations. This way we can directly compare the number of linear iterations of all tested solver variants which allow to draw some conclusions on the multigrid preconditioners. Table IV provides an overview of the chosen preconditioner parameters for the level smoothers. For both the nested multigrid schemes and the fully coupled multigrid schemes we apply (a) Initial configuration with close-up view of the meshed deformable rings at t = 0.0s    The multigrid parameters are chosen to be the same for all preconditioner variants: the minimum size of the aggregates is set to 6 nodes for the two-dimensional problem and the maximum coarse level size is set to 1000 degrees of freedom, yielding a 3 level multigrid method.

Results
For the fully coupled AMG variants, different transfer operator strategies are compared, namely the non-smoothed (PA-AMG) transfer operators and the energy minimization approach with local damping parameters for transfer operator smoothing denoted by Emin, cf. [64]. For the nested AMG variants, we compare transfer operators based on PA-AMG and SA-AMG. All the simulations have been run on 16 cores (spread over 2 Intel Xeon E5-2670 Octocore CPUs). Figure 9a shows for each time step the overall number of linear iterations for all nonlinear iterations stacked onto each other. Compared to the CheapSIMPLE based methods, the fully coupled AMG variants need a significantly lower number of linear iterations to solve the problem in each time step. This can be explained by the better approximation of the displacement degrees of freedom using 3 instead of 1 damped Gauss-Seidel sweeps. One can also see how the number of linear iterations correlates with the number of active contact nodes. Obviously, the fully coupled AMG variants are less sensitive to the changes in the number of active contact nodes than the CheapSIMPLE based methods. In Figure 9b, the overall linear solver time is given for each time step. Again, the fully coupled AMG variants perform better, although the difference is less pronounced due to the higher computational effort of fully coupled AMG implementations. The number of nonlinear iterations per time step is 2 for the initial phase without contact and then varies between 4 and 6, when contact is active. It is the same for all preconditioner variants due to the particular choice of stopping criteria of the nonlinear solver (see Section 6.3.2). Moreover, the number of nonlinear iterations is independent of the size of the active set. Finally, overall timings for the linear solver including preconditioner setup are reported in Table V. The overall solver time only varies slightly for all given variants. The variants Emin (CheapSIMPLE) and CheapSIMPLE (SA-AMG) need more setup time due to the additional effort of prolongator smoothing. Emin (CheapSIMPLE) can fully amortize the additional setup cost during the solve phase due to coarse grid corrections, that respect the contact constraints, while CheapSIMPLE (SA-AMG) does not have this benefit and, thus, requires the largest overall solver time. To show a fair comparison, the preconditioner has been rebuilt in every nonlinear iteration step. Of course, re-use strategies might increase amortization of setup costs and reduce overall solver timings when solving actual problems.

Two tori impact example
Inspired by a similar analysis in [95], with the two tori impact example we test the proposed multigrid variants from Section 3.2 on a complex 3D contact example with more than 1 million unknowns. Please refer to [53] for the detailed problem setup of the two tori impact example with geometry and load conditions. 6.4.1. Setup The example consists of two thin-walled tori with a Neo-Hookean material model (E = 2250 GPa, ν = 0.3, ρ 0 = 0.1 kg m 3 ) with a major and minor radius of 76m and 24m and a wall thickness of 4.5m. The left torus in Figure 10 lies in the xy-plane with resting initial conditions. The right torus has been rotated around the y-axis by 45 degrees and has an initial velocity of 1.0 m s directed towards the left torus. The simulated time is 10s divided into 200 time steps with a time step size of 0.05s using a generalized-α time integration scheme [18]. The mesh consists of 284672 first-order hexahedral elements with 350208 nodes.
With the rather complex geometry and contact configuration, that heavily and frequently changes over time, this example can be considered as a representative test for the robustness and efficiency of the tested numerical methods.
Here, r u i and r λ i denote the (nonlinear) residual for the displacement variables and Lagrange multipliers in the i-th Newton iteration, respectively. The quantity ∆u describes the solution increment for the displacement variables only. Again, the stopping criteria in (23) for the nonlinear solver are specifically chosen to produce the same number of nonlinear iterations for all tested solver variants. This allows for an easy comparison of the number of linear iterations during the simulation.
(a) t = 0.0s (b) t = 2.5s (c) t = 5.0s (d) t = 7.5s Figure 10. Two tori impact example -Characteristic stages of deformation.  for the full residual vector r k = r u r λ in the linear iteration step k.

Results
An overview of the different tested preconditioner variants is given in Table VI. We study variants with the fully coupled multigrid approach, the nested multigrid approach, and a SIMPLE based variant without multigrid at all. For the fully coupled AMG variants, the transfer operators for the displacement blocks are varied. Particularly, non-smoothed transfer operators (PA-AMG) are compared to smoothed aggregation transfer operators (SA-AMG). For the multigrid schemes there is no special treatment of the coarsest level. For the fully coupled multigrid schemes as well as for the nested multigrid method we apply the same level smoother on all multigrid levels including the coarsest level. All the simulations have been run on 16 cores (spread over 2 Intel Xeon E5-2670 Octocore CPUs). Figure 11a shows the overall number of linear iterations performed to solve all nonlinear iterations for each time step. All preconditioner variants require exactly the same number of nonlinear iterations per time step due to the particular choice of stopping criteria in (23), ranging between 6 and 10 nonlinear iterations per time step, while also the number of nonlinear iterations is independent of the size of the active set. Obviously, the SIMPLE based methods need more linear iterations than the AMG based methods. In this example, there is nearly no difference between the non-smoothed transfer operator variant PA-AMG (CheapSIMPLE) and the smoothed transfer operator variant SA-AMG (CheapSIMPLE). Furthermore, there is no clear and obvious correlation between the number of linear iterations and the number of active nodes. This indicates that the fully coupled multigrid method is robust and efficient with regard to the increasing complexity of the contact configuration over time, which is not the case for cheaper methods such as the SIMPLE based methods. Evidently, one can see a significant drop in the linear iterations for the last 20 time steps of the simulation, which may correspond to the small number of nodes in contact.
When looking at the corresponding solver timings over the time steps in Figure 11b, one finds the CheapSIMPLE (SA-AMG) method to be very close to the AMG based methods PA-AMG (CheapSIMPLE) and SA-AMG (CheapSIMPLE). For the AMG based methods, one sweep with a CheapSIMPLEC method is applied on each level, which internally uses one sweep with a symmetric Gauss-Seidel iteration for the primary variable and one ILU sweep for the constraint equation. That is, quite a lot of time is invested in the coupling on all levels with the comparably expensive ILU method.  In contrast to the AMG based method, the CheapSIMPLE (SA-AMG) method uses 2 sweeps with a CheapSIMPLE preconditioner for the coupling (on the finest level only). Internally, a 3 level AMG multigrid is used with 2 symmetric Gauss-Seidel sweeps for the level smoother and an ILU sweep for the constraint correction equation. These parameters have been found to result in a reasonably low number of linear iterations. For this example the experiment shows that the CheapSIMPLE (SA-AMG) method needs twice as many iterations as the SA-AMG (CheapSIMPLE) method, but the costs per iteration are only half of the costs of the SA-AMG (CheapSIMPLE). Nevertheless, the AMG based methods seem to have a small advantage, when the number of nodes in contact increases.
Last but not least, the overall timings for the linear solver are reported in Table VII. Except for the CheapSIMPLE (SGS) variant, which is far away from the others, there is no clear winner. The setup costs are quite close, since the same transfer operators have to be built for all methods with only a small difference for smoothed versus non-smoothed transfer operators. To show a fair comparison, the preconditioner has been rebuilt in every nonlinear iteration step. Of course, re-use strategies might increase amortization of setup costs and reduce overall solver timings when solving actual problems.

CONCLUSION
We have presented algebraic multigrid schemes designed for saddle point problems arising from contact problems using mortar finite element methods. The new fully coupled multigrid scheme has the advantage that the contact constraints are considered on all multigrid levels, which significantly reduces the number of iterations. It gives the user full control over the coupling process by appropriately choosing the solver parameters. Additionally, we have proposed a novel aggregation method for the Lagrange multipliers, which reuses existing aggregation information at the contact interface. We have demonstrated the robustness and efficiency of the overall multigrid method for large examples with increasingly complex contact configurations over time as well as its weak scalability up to 23.9 million unknowns on 480 MPI ranks.