2. Background theory

2.1. Introduction

This manual presents theoretical background, numerical examples, and explanation for implementing the program library DCIP3D. This suite of algorithms, developed at UBC-GIF, is needed to efficiently invert large sets of DC potential data and IP responses over a 3D earth structure. The manual is designed so that a geophysicist who has understanding of DC resistivity and induced polarization field experiments, but who is not necessarily versed in the details of inverse theory, can use the codes to invert his or her data.

../_images/potentials.png

Fig. 2.1 Definition of three potentials associated with DC/IP experiments.

A typical DC/IP experiment involves inputting a current \(I\) to the ground and measuring the potential away from the source. In a time-domain system, the current alternates in direction and has off-times between the current pulses at which the IP voltages are measured. A typical time-domain signature is shown in Fig. 2.1. In that figure, \(\phi_\sigma\) is the potential that is measured in the absence of chargeability effects. This is the instantaneous value of the potential measured when the current is turned on. In mathematical terms, this potential is related to the electrical conductivity \(\sigma\) by:

\[\phi_\sigma = \mathcal{F}_{dc}[\sigma]\]

where forward mapping operator \(\mathcal{F}_{dc}\) is defined by the equation:

(2.1)\[\nabla\cdot(\sigma\nabla \phi_\sigma)=-I\delta(\mathbf{r}-\mathbf{r}_s)\]

and appropriate boundary conditions. In Equation (2.1), \(\sigma\) is the electrical conductivity in Siemens/meter (S/m), \(\nabla\) is the gradient operator, math:I is the strength of the input current in Amperes (A), and \(\mathbf{r}_s\) is the location of the current source. For typical earth structures, \(\sigma\), while positive, can vary over many orders of magnitude. The potential in Equation (2.1) is the potential due to a single current. This is the value that would be measured in a pole-pole experiment. If potentials from pole-dipole or dipole-dipole surveys are to be generated then they can be obtained by using Equation (2.1) and the principle of superposition.

When the earth material is chargeable, the measured voltage will change with time and reach a limit value which is denoted by :math:phi_eta: in Fig. 2.1. There are a multitude of microscopic polarization phenomena which when combined produce this response but all of these effects can be consolidated into a single macroscopic parameter called chargeability. We denote chargeability by the symbol \(\eta\). Chargeability is dimensionless, positive, and confined to the region [0,1).

To carry out forward modelling to compute \(\phi_{\eta}\), we adopt the formulation of [Sei59] which states that the effect of a chargeable ground is modelled by using the DC resistivity forward mapping \(\mathcal{F}_{dc}\) but with the conductivity replaced by \(\sigma=\sigma(1-\eta)\). Thus:

(2.2)\[\phi_\eta=\mathcal{F}_{dc}[\sigma(1-\eta)]\]

or

(2.3)\[\nabla\cdot(\sigma(1-\eta)\nabla\phi_\eta)=-I\delta(\mathbf{r}-\mathbf{r}_s)\]

The IP datum can be either the secondary potential (\(\phi_s\)) or the apparent chargeability (\(\eta_a\)). The former is the difference of the forward modelled potentials with, and without, the IP effect:

(2.4)\[\phi_s=\phi_\eta-\phi_\sigma=\mathcal{F}_{dc}[\sigma(1-\eta)]-\mathcal{F}_{dc}[\sigma]\]

The apparent chargeability is then given by the ratio:

(2.5)\[\eta_a=\frac{\phi_s}{\phi_\eta}=\frac{\mathcal{F}_{dc}[\sigma(1-\eta)]-\mathcal{F}_{dc}[\sigma]}{\mathcal{F}_{dc}[\sigma(1-\eta)]}\]

In this definition, the apparent chargeability is dimensionless and, in the case of data acquired over an earth having constant chargeability \(\eta_0\), we have \(\eta_a=\eta_0\). Equations (2.4) and (2.5) show that the IP data can be computed by carrying out two DC resistivity forward modellings with conductivities \(\sigma\) and \(\sigma(1-\eta)\). The secondary potential is the more general form of IP data and the apparent chargeability is only defined when the linear (or polar) arrays are used along a line on the surface or in the same borehole. When the current and potential dipole-electrodes are arranged in 3D space and so they are not aligned, the total potential can take on positive, zero, or negative values. The cross-line experiments on the surface and cross-hole experiment on boreholes are examples of such situations. Because of the zero-crossing in the total potentials, the commonly used apparent chargeability is undefined. In these cases, the appropriate data to measure the IP effect is the secondary potential. Therefore, we will use secondary potential as the basic IP datum except in the case of linear arrays.

The field data from a DC/IP survey are a set of N potentials (ideally \(\phi_\sigma\), but usually \(\phi_\eta\)) and a set of N secondary potentials \(\phi_s\) or a quantity that is related to \(\phi_s\). The goal of the inversionist is to use these data to acquire quantitative information about the distribution of the two physical parameters of interest: conductivity \(\sigma(x,y,z)\) and chargeability \(\eta(x,y,z)\).

The distribution of conductivity and chargeability in the earth can be extremely complicated. Both quantities vary as functions of position in 3D space. In addition, there is often large topographic relief. In this program library, the 3D nature of the physical properties and surface topography are fully incorporated. The Earth model is divided into prismatic cells each having a constant value of conductivity and chargeability. The surface topography is approximated by a piecewise constant surface.

2.2. Forward modelling

2.2.1. Discretized System

The forward modelling for the DC potentials and IP apparent chargeabilities is accomplished using a finite volume method [] and a pre-conditioned conjugate gradient technique.

For the DC problem, (2.1) is discretized and the electric potential on the nodes (\(\boldsymbol{\phi_\sigma}\)) are computed by solving the following linear system:

(2.6)\[\boldsymbol{[G^T M_{e\sigma} G ] \, \phi_\sigma} = \boldsymbol{q}\]

where

  • \(\boldsymbol{G}\) is a discretize gradient operator whose transpose acts as a modified divergence operator

  • \(\boldsymbol{M_{e\sigma}} = diag \big ( \boldsymbol{A_{ec}^T (v \odot \sigma)} \big )\) where \(\boldsymbol{A_{ec}}\) projects from edges to cell centers

  • \(\boldsymbol{q}\) is an integrated source term that lives on the nodes.

Once the system is solved, a sparse projection matrix \(\mathbf{P}\) maps the potentials on the nodes to the electrode positions and computes the data, i.e.:

\[\boldsymbol{d_{dc}} = \boldsymbol{P \phi_\sigma}\]

For the IP problem, the secondary potential due to the IP is computed according to (2.4). \(\phi_\eta\) is obtained by replacing \(\sigma\) with \(\sigma (1 - \eta)\) in (2.6). Therefore:

(2.7)\[\boldsymbol{[G^T M_{e\eta} G ] \, \phi_\eta} = \boldsymbol{q}\]

where \(\boldsymbol{M_{e\eta}} = diag \big ( \boldsymbol{A_{ec}^T (v \odot \Delta\sigma )} \big )\) such that \(\boldsymbol{\Delta \sigma} = \boldsymbol{\sigma \odot (1 - \eta)}\).

Thus using (2.6) and (2.7), the secondary potential at cell centers due to IP is:

\[\boldsymbol{\phi_s} = \boldsymbol{\phi_\eta - \phi_\sigma}\]

The secondary potential data is given by:

\[\boldsymbol{d_{ip}} = \boldsymbol{P \phi_s}\]

And the apparent chargeabilities are given by:

\[\boldsymbol{\eta_a} = \boldsymbol{P} \dfrac{\boldsymbol{\phi_\eta - \phi_\sigma}}{\boldsymbol{\phi_\eta}}\]

2.2.2. Linearized IP Problem

In Version 5.0 we included the option to calculate IP data by multiplying the sensitivity matrix \(\mathbf{J}\) by the chargeability provide by user. That is, we forward model with the linear equations that will be used for the inve sion. The chargeability in this case can have arbitrary units. The forward modelled data are calculated as:

(2.8)\[\mathbf{d_{ip}} = \mathbf{J}_{ip} \eta\]

where \(\mathbf{d_{ip}}\) is the IP data and \(\mathbf{J}_{ip}\) is the sensitivity matrix for the IP problem, given DC data, \(\mathbf{d}_{dc}\). Forward modelling using equation (2.9) is further explained in the section 2.5.

(2.9)\[\mathbf{J}_{ip} = -\frac{\partial \ln \phi_{\eta}}{\partial \ln \sigma} = -\frac{1}{\sigma_{\eta}} \frac{\partial \phi_{\eta}}{\partial \ln \sigma} = -\frac{1}{\mathbf{d}_{dc}}\mathbf{J}_{dc}\]

2.2.3. Mesh Design and Source Discretization

To handle a current electrode that is at an arbitrary position \((x_s; y_s; z_s)\) in the cell we made a modification to distribute any current amongst the 8 nodes of the cell. This approach is shown in Figure 3, where a current I is distributed onto nodes P1 through P8. Effectively, we write

(2.10)\[\mathbf{I}\delta(r_s) \approx {\sum_{i=1}}^8 \mathbf{I} w_i(r_i, r_s) \delta(r_i)\]

where \(r_s = (x_s; y_s; z_s)\) is the position of the current electrode, \(r_i = (x_i; y_i; z_i)\) is the position of the ith node, and \(w_i\) is the linear-interpolation weighting for the ith node:

(2.11)\[{\sum_{i=1}}^8 w_i = 1\]

so that the total current distributed among the 8 nodes is equal to \(\mathbf{I}\). With the linear interpolation we note that if the source electrode is on one of the faces of the cell, then only 4 nodes will be activated. If the source electrode is along an edge then the current is distributed between two nodes, and if the source electrode is at a corner of the prism, then only one node is activated. If potential electrodes are not on the nodes then fields are linearly interpolated.

../_images/finite_volume.png

Fig. 2.2 Current electrode can be placed at an arbitrary position \((x_s; y_s; z_s)\) within a cell, or on a node. The currents are distributed to each node of the cell through linear interpolation.

../_images/forward_model.png

Fig. 2.3 Forward modeled apparent resistivities of a halfspace showing differences between placing the current electrode at (a) cell centers versus (b) cell nodal points.

As an example of the importance of the interpolation, a halfspace is modeled in which the cell size is 50 m and a current is injected at the surface in the center of the cell. Potentials are obtained on a 25-m grid. Apparent resistivities should equal 100 Ohm-m, which is the true halfspace value. The results are shown in Figure Fig. 2.3 a. Errors up to 25% are observed at locations that are 25 m (1/2 cell) from each of the four corners where the distributed current is input. At distances 50 m (one cell width) the error has dropped to about 7%. These are expected results and are in accordance with testing using the first version of DCIP3D. In Figure Fig. 2.3 b the apparent resistivities for a current on a nodal location are plotted. At a distance of 50 m from the current, the error is less than 5% if with on a nodal discretization. The errors increase somewhat between distances of one and two cells. We conclude that for numerical accuracies of about 5% or less, the observation should be at least one cell width away from the location of a current electrode.

2.3. General inversion methodology

The computing programs outlined in this manual solve two inverse problems. In the first we invert the DC potentials \(\phi_{\sigma}\) to recover the electrical conductivity \(\sigma(x; z)\). This is a non-linear inverse problem that requires linearization of the data equations and subsequent iteration steps. After DC data have been inverted, the conductivity model is used to invert the IP data to recover the chargeability \(\eta(x; z)\). Because chargeabilities are usually small quantities (\(\eta<0:3\)) it is possible to linearize equation (2.5) and derive a linear system of equations to be solved. Irrespective of which data set is being inverted however, we basically use the same methodology to carry out the inversions. We outline this methodology here.

The inverse problem is formulated as an optimization problem where an objective function of the model is minimized subject to the constraints in Equation (2.1) for DC resistivity data or Equation (2.3) for IP data. To outline our methodology, it is convenient to introduce a single notation for the data and for the model. We let \(\mathbf{d} = (d_1,d_2,...,d_N)^T\) denote the data, where \(N\) is the number of data. Using this notation, \(d_i\) is either the \(i^{th}\) potential in a DC resistivity data set, or the \(i^{th}\) secondary potential/apparent chargeability in an IP survey. Let the physical property of interest be denoted by the generic symbol \(m\) for the model element. The quantity \(m_i\) denotes the conductivity or chargeability of the \(i^{th}\) model cell. For the inversion, we choose \(m_i=\ln(\sigma_i)\) when inverting for conductivities, and \(m_i=\eta_i\) when reconstructing the chargeability distribution.

The goal of the inversion is to recover a model vector \(\mathbf{m} = (m_1,m_2,...,m_m)^T\) , which acceptably reproduces the n observations of \(\mathbf{d}\). Importantly, the data are noise contaminated, therefore we don’t want to fit them precisely. A perfect fit in our case would be indicative, that incorrect earth model is recovered, as some features observed in the constructed model would assuredly be artifacts of the noise. Therefore, the inverse problem is formulated as an optimization problem where a global objective function, \(\Phi\), is minimized. The global objective functions consists of two components: a model objective function, \(\Phi_m\), and a data misfit function, \(\Phi_d\), such that:

(2.12)\[\begin{split}\min \Phi = \Phi_d+\beta\Phi_m \\ \mbox{s. t. } \Phi_{d}=\Phi_{d}^* \text{and optionally} ~ m^l\leq m\leq m^u, \nonumber\end{split}\]

where \(\beta\) is a trade-off parameter that controls the relative importance of the model smoothness through the model objective function and data misfit function. When the standard deviations of data errors are known, the acceptable misfit is given by the expected value d and we will search for the value of \(\beta\) via an L-curve criterion [Han00] that produces the expected misfit at each linearized step (see section 2.4). Otherwise, a user-defined value is used. Bound are imposed through the projected gradient method so that the recovered model lies between imposed lower \(\mathbf{m}^l\) and upper \(\mathbf{m}^u\) bounds.

The details of the objective function are problem dependent but generally we need the flexibility to be close to a reference model \(\mathbf{m}_o\) and also require that the recovered model be relatively smooth in all three spatial directions. Here we adopt a right-handed Cartesian coordinate system with \(y\) positive north and and \(z\) positive up. In defining the model objective function, the reference model will generally be included in the first component of the objective function but it can be removed, if desired, from the remaining derivative terms since we are often more confident in specifying the value of the model at a particular point than in supplying an estimate of the gradient. This leads to the following two distinct formulations of the model objective function.

(2.13)\[\begin{split}\Phi_m = &&\alpha_s\int\int\ w_s(\mathbf{m}-\mathbf{m}_0)^2dv + \alpha_x\int\int w_x\left(\frac{\partial{(\mathbf{m}-\mathbf{m}_0)}}{\partial x}\right)^2dv+ \nonumber \\ &&\alpha_y\int\int w_y\left(\frac{\partial{(\mathbf{m}-\mathbf{m}_0)}}{\partial y}\right)^2 dv + \alpha_z\int\int\ w_z\left(\frac{\partial{(\mathbf{m}-\mathbf{m}_0)}}{\partial z}\right)^2dv,\end{split}\]
(2.14)\[\begin{split}\Phi_\mathbf{m} = &&\alpha_s\int\int\ w_s(\mathbf{m}-\mathbf{m}_0)^2dv + \alpha_x\int\int w_x\left(\frac{\partial{\mathbf{m}}}{\partial x}\right)^2dv+ \nonumber \\ &&\alpha_y\int\int w_y\left(\frac{\partial{\mathbf{m}}}{\partial y}\right)^2 dv + \alpha_z\int\int\ w_z\left(\frac{\partial{\mathbf{m}}}{\partial z}\right)^2dv,\end{split}\]

where the weighting functions \(w_s\), \(w_x\), \(w_y\) and \(w_z\) are spatially dependent, and \(\alpha_s\), \(\alpha_x\), \(\alpha_y\) and \(\alpha_z\) are coefficients which affect the relative importance of different components in the model objective function. The reference model \(m_o\) may be a general background model that is estimated from previous investigations or it could be a zero model. The purpose of the generalized weighting functions are to place emphasis throughout the model to utilize prior information.

The objective function in equations (2.13) and (2.14) has the flexibility to incorporate many types of prior knowledge into the inversion. The reference model may be a general background model (e.g., background conductivity) that is estimated from previous investigations or it will be a zero model (in terms of chargeability). The reference model would generally be included in the first component of the objective function but it can be removed if desired from the remaining terms; often we are more confident in specifying the value of the model at a particular point than in supplying an estimate of the gradient.

The relative closeness of the final model to the reference model at any location is controlled by the function \(w_s\). For example, if the interpreter has high confidence in the reference model at a particular region, he can specify \(w_s\) to have increased amplitude there compared to other regions of the model. The interface weighting functions \(w_x\), \(w_y\), and \(w_z\) can be designed to enhance or attenuate structures in various regions in the model domain. If geology suggests a rapid transition zone in the model, then a decreased weighting for flatness can be put there and the constructed model will exhibit higher gradients provided that this feature does not contradict the data.

To perform a numerical solution, we discretize the model objective functions in Equations (2.13) and (2.14) using a finite difference approximation on the mesh defining the conductivity/chargeability model. This yields:

(2.15)\[\begin{split}\Phi_m(\mathbf{m})&=&(\mathbf{m}-\mathbf{m}_o)^T(\alpha_s \mathbf{W}_s^T\mathbf{W}_s+\alpha_x \mathbf{W}_x^T\mathbf{W}_x+\alpha_y \mathbf{W}_y^T\mathbf{W}_y+\alpha_z \mathbf{W}_z^T\mathbf{W}_z)(\mathbf{m}-\mathbf{m}_o), \nonumber\\ &\equiv&(\mathbf{m}-\mathbf{m}_o)^T(\mathbf{W}_m^T\mathbf{W}_m)(\mathbf{m}-\mathbf{m}_o), \nonumber\\ &= &\left \| \mathbf{W}_m(\mathbf{m}-\mathbf{m}_o) \right \|^2,\end{split}\]

for Equation (2.13) and the following for Equation (2.14).

(2.16)\[\begin{split}\Phi_m(\mathbf{m}) & = &(\mathbf{m}-\mathbf{m}_o)^T(\alpha_s \mathbf{W}_s^T\mathbf{W}_s)(\mathbf{m}-\mathbf{m}_o)+\mathbf{m}^T(\alpha_x \mathbf{W}_x^T\mathbf{W}_x+\alpha_y \mathbf{W}_y^T\mathbf{W}_y+\alpha_z \mathbf{W}_z^T\mathbf{W}_z)\mathbf{m}, \nonumber\\ &\equiv&(\mathbf{m}-\mathbf{m}_o)^T(\mathbf{W}_s^T\mathbf{W}_s)(\mathbf{m}-\mathbf{m}_o)+\mathbf{m}^T\mathbf{W}^T\mathbf{W}\mathbf{m},\end{split}\]

where \(\mathbf{m}\) and \(\mathbf{m}_o\) are \(M\)-length discretized model vectors which characterize the conductivity/chargeability distributions within the current model and reference model, respectively. The individual matrices \(\mathbf{W}_s\) , \(\mathbf{W}_x\), \(\mathbf{W}_y\), and \(\mathbf{W}_z\) are straight-forwardly calculated once the model mesh and the weighting functions \(w_s\) , \(w_x\), \(w_y\), \(w_z\) are defined. The cumulative matrix \(\mathbf{W}_m^T\mathbf{W}_m\) is then formed.

Having chosen an appropriate model objective function the next step in setting up the inversion is to define a data misfit measure. Here we use the \(l_2\)-norm measure:

(2.17)\[\Phi_d = \left\| \textbf{W}_d(\textbf{d}-\textbf{d}^{obs})\right\|^2_2\]

and assume that the contaminating noise in the data is independent and Gaussian with zero mean. Specifying \(\mathbf{W}_d\) to be a diagonal datum weighting matrix whose \(i^{th}\) element is \(1/\epsilon_i\), where \(\epsilon_i\) is the standard deviation of the \(i^{th}\) datum, makes \(\Phi_d\) a chi-squared variable distributed with \(N\) degrees of freedom. Accordingly \(E[\chi^2]=N\) provides a target misfit for the inversion. We now have the components to solve the inversion as defined in equation (2.12).

To solve the optimization problem when constraints are imposed we use the projected gradients method ([CMore87] ; [Vog02] ). This technique forces the gradient in the Krylov sub-space minimization (in other words a step during the conjugate gradient process) to zero if the proposed step would make a model parameter exceed the bound constraints. The result is a model that reaches the bounds, but does not exceed them.

2.4. Inversion of DC resistivity data

The program library DCIP3D provides a DC resistivity inversion program, DCInv3D. The inversion of DC resistivity data, formulated as the minimization of the global objective function (see Equation (2.12)), is nonlinear since the data do not depend linearly upon the conductivity model. A Gauss-Newton approach is used in which the objective function is linearized about a current model, \(\mathbf{m}^{(n)}\), a model perturbation is computed, and then used to update the current model. Substituting \(\mathbf{m}^{(n+1)}=\mathbf{m}^{(n)}+\delta\mathbf{m}\) into the global objective function (Equation (2.12)) gives:

(2.18)\[\phi(\mathbf{m}+\delta\mathbf{m})=\left \| \mathbf{W}_d(\mathbf{d}^{(n)}+\mathbf{J}\delta\mathbf{m}-\mathbf{d}) \right \|^2+\beta\left \| \mathbf{W}(\mathbf{m}+\delta\mathbf{m}- \mathbf{m}_0) \right \|^2+H.O.T\]

where \(\mathbf{J}\) is the sensitivity matrix and the element \(J_{ij}\) quantifies the influence of the model change in j-th cell on the i-th datum,

(2.19)\[J_{ij}=\frac{\partial d_i}{\partial m_j}=\frac{\partial \phi_i}{\partial ln(\sigma_i)}\]

Neglecting the higher order terms (H.O.T.) and setting to zero the derivative with respect to \(\delta\mathbf{m}\) yields the following system to solve for the model objective function (Equation (2.13)) used when the SMOOTH_MOD_DIF parameter is specified in the inversion input control file:

(2.20)\[(\mathbf{J}^T\mathbf{J}+\beta \mathbf{W}_m^{T}\mathbf{W}_m)\delta\mathbf{m} = -\mathbf{J}^T(\mathbf{d}^{(n)}-\mathbf{d})-\beta \mathbf{W}_m^T\mathbf{W}_m(\mathbf{m}^{(n)}-\mathbf{m}_0)\]

where \(\mathbf{W}_m^T\mathbf{W}_m\) is defined by Equation (2.15).

Similarly, the following system arises when the model objective function (Equation (2.14)) is used (i.e. the SMOOTH_MOD parameter is specified in the inversion input control file):

(2.21)\[(\mathbf{J}^T\mathbf{J}+\beta(\mathbf{W}_{s}^{T}\mathbf{W}_{s}+\mathbf{W}^{T}\mathbf{W}))\delta\mathbf{m} = -\mathbf{J}^T(\mathbf{d}^{(n)}-\mathbf{d})-\beta(\mathbf{W}_{s}^T\mathbf{W}_{s}(\mathbf{m}^{(n)}-\mathbf{m}_0)+\mathbf{W}^{T}\mathbf{W}\mathbf{m})\]

In these formulations we assume that the matrix \(\mathbf{W}_d\) has been absorbed into the sensitivity matrix and data vectors. By solving either of these inverse problems you obtain the model perturbation, which then allows you to generate a new model according to the following relation:

(2.22)\[\mathbf{m}^{(n+1)}=\mathbf{m}^{(n)} + \alpha \delta \mathbf{m},\]

where \(\alpha\) in (0,1] limits the step size and is chosen to ensure that the total objective function is reduced.

The major computational effort in this approach includes the calculation of the sensitivity matrix, solution of the basic linearized Equation (2.20), and the choice of regularization parameter \(\beta\). The sensitivity is computed using the standard adjoint equation approach, and Equation (2.20) or (2.21) is solved using a pre-conditioned conjugate gradient (CG) technique, in which the sensitivity matrix \(\mathbf{J}\) is applied to vectors by sparse multiplications in the wavelet domain after it is compressed using fast wavelet transform.

2.5. Inversion of IP data

To invert IP data it is necessary to linearize Equation (2.4). Let \(\eta_i\) and \(\sigma_i\) denote the chargeability and electrical conductivity of the \(i^{th}\) model cell. Linearizing the potential \(\phi_\eta\) about the conductivity model \(\sigma\) yields:

(2.23)\[\phi_\eta=\phi(\sigma-\eta \sigma)=\phi(\sigma)-\sum_{j=1}^{M}\frac{\partial \phi}{\partial \sigma_j}\eta_j\sigma_i+H.O.T\]

Substituting into Equation (2.4) yields:

(2.24)\[\phi_s=\phi_\eta-\phi_\sigma=-\sum_{j=1}^{M}\frac{\partial \phi}{\partial \sigma_j}\eta_j\sigma_i+H.O.T\]

When apparent chargeability is used as the IP data, substituting the above equation into Equation (2.5), yields:

(2.25)\[\eta_a=-\sum_{j}\frac{\sigma_j}{\phi_i}\frac{\partial \phi_i}{\partial \sigma_j}\eta_j =-\sum_{j}\sigma_j\frac{\partial ln(\phi)}{\partial ln(\sigma_j)}\eta_j\]

Thus the \(i^{th}\) datum (either secondary potential or apparent chargeability) is exposed as:

(2.26)\[d_i=\sum_{j=1}^{M}J_{ij}\eta_{ij}\]

where

(2.27)\[\begin{split}J_{ij} = \left\{ \begin{array}{cl} \frac{\partial \phi_i \left[ \sigma \right]}{\partial ln\sigma_j}, \mathbf{d}=\phi_s\\ \\ \frac{\partial ln\phi_i\left [ \sigma \right ]}{\partial ln\sigma_j}, \mathbf{d}=\eta_a \end{array} \right\}\end{split}\]

is the sensitivity matrix.

The general problem takes the form of d = Jm and can be solved as described in section 2.3. Bound constraints (e.g., positivity) for IP are imposed through projected gradients ([CMore87] ; [Vog02] ). The sensitivity matrix is dense and thus wavelets are used for compression.

2.6. Wavelet Compression of Sensitivity Matrix

When storing the sensitivity matrix during the linearized step of the DC problem, the two major obstacles to the solution of the Gauss-Newton problem are the large amount of memory required for storing the sensitivity matrix and the CPU time required for the application of the sensitivity matrix to model vectors. These are also points of concern for the general inversion of IP data. The DCIP3D v5.0 program library overcomes these difficulties by forming a sparse representation of th se sitivity matrix using a wavelet transform based on compactly supported, orthonormal wavelets For more details, the users are referred to Li and Oldenburg (2003, 2010). In the following, we give a brief description of the method necessary for the use of the DCIP3D v5.0 library.

Each row of the sensitivity matrix in a 3D DC resistivity or IP inversion can be treated as a 3D image and a 3D wavelet transform can be applied to it. By the properties of the wavelet transform, most transform coefficients are nearly or identically zero. When coefficients of small magnitudes are discarded (the process of thresholding), the remaining coefficients still contain much of the necessary information to reconstruct the sensitivity accurately. These retained coefficients form a sparse representation of the sensitivity in the wavelet domain. The need to store only these large coefficients means that the memory requirement is reduced. Further, the multiplication of the sensitivity with a vector can be carried out by a sparse multiplication in the wavelet domain. This greatly reduces the CPU time. Since the matrix-vector multiplication constitutes the core computation of the inversion, the CPU time for the inverse solution is reduced accordingly. The use of this approach increases the size of solvable problems by nearly two orders of magnitude.

We first denote \(\mathcal{W}\) as the symbolic matrix-representation of the 3D wavelet transform. Then applying the transform to each row of \(\mathbf{J}\) and forming a new matrix consisting of rows of transformed sensitivity is equivalent to the following operation:

\[\tilde{\mathbf{J}} = \mathbf{J} \mathcal{W}^T\]

where \(\tilde{\mathbf{J}}\) is the transformed matrix. The thresholding is applied to individual rows of \(\mathbf{J}\) by the following rule to form the sparse representation \(\tilde{\mathbf{J}}^S\):

(2.28)\[\begin{split}{\tilde{{J}}_{ij}}^s = \left\{ \begin{array}{cl} {\tilde{{J}}_{ij}} \text{ if } |{\tilde{{J}}_{ij}}| \geq \delta_i \\ 0 \text{ if } |{\tilde{{J}}_{ij}}| \geq \delta_i \end{array}\right\}, i=1..N\end{split}\]

where \(\delta_i\) is the threshold level, and \(\tilde{{J}}_{ij}\): and \({\tilde{{J}}_{ij}}^s\): are the elements of \(\tilde{\mathbf{J}}_{ij}\): and \({\tilde{\mathbf{J}}_{ij}}^s\):, respectively. The threshold level \(\delta_i\) are determined according to the allowable error of the reconstructed sensitivity, which is measured by the ratio of norm of the error in each row to the norm of that row, \(r_i (\delta_i)\). It can be evaluated directly in the wavelet domain by the following expression:

(2.29)\[r_i (\delta_i) = \sqrt{\frac{\sum_{|{\tilde{{J}}_{ij}}|\leq \delta_i} \tilde{{J}}_{ij}^2}{\sum_j \tilde{{J}}_{ij}^2}}, i=1..N\]

Here the numerator is the norm of the discarded coefficients and the denominator is the norm of all coefficients. The threshold level \(\delta_{i_0}\) is calculated on a representative row, \(i_0\). This threshold is then used to define a relative threshold \(\epsilon = \delta_{i_0} / \text{max}_j | {\tilde{{J}}_{ij}}|\). The absolute threshold level for each row is obtained by

\[\delta_{i} = \epsilon \text{max}_j | {\tilde{{J}}_{ij}}|, i=1..N\]

The program that implements this compression procedure is DCINV3D. The user is asked to specify the relative error \(r^*\) and the program will determine the relative threshold level \(\delta_i\). Usually a value of a few percent is appropriate for \(r^*\). When both surface and borehole data are present two different relative threshold levels are calculated by choosing a representative row or surfac data and another for borehole data. For experienced users, the program also allows the direct input of the relative threshold level.