Augmenting designs with controlled efficiency loss

library(optedr)

Motivation

In practice an experiment is rarely designed from scratch. A researcher may already have data collected at certain conditions and want to add new observations to improve estimation — without discarding what has already been measured. The key question is: where can new points be placed so that the efficiency of the augmented design stays above an acceptable threshold?

optedr answers this question with two functions used in sequence:

get_augment_region() — computes the candidate region: the set of design points whose addition keeps the D-efficiency of the augmented design above a user-specified threshold delta_val.
augment_design() — adds a chosen point to the initial design and rescales the weights.

Both functions support the same optimality criteria as opt_des() and work for any number of factors.

Key parameters

Parameter	Role
`init_design`	Current design (data frame with `Point`/`Weight` in 1D, or factor columns + `Weight` in multi-D)
`alpha`	Fraction of total weight assigned to the new point after augmentation
`delta_val`	Minimum acceptable D-efficiency of the augmented design
`calc_optimal_design`	If `TRUE`, also computes the optimal design and uses it as the reference for efficiency
`new_points`	Data frame of points to add (non-interactive mode); omit for interactive mode
`par_int`	Indices of parameters of interest (Ds-Optimality only)
`n_lhs`	Number of Latin-Hypercube candidates for the region search (multi-D)

One-factor augmentation

Step 1: compute the candidate region

We start with a uniform three-point design for Antoine’s equation and look for points that keep the D-efficiency of the augmented design above 85 %.

init_des <- data.frame(
  Point  = c(30, 60, 90),
  Weight = c(1/3, 1/3, 1/3)
)

region <- get_augment_region(
  criterion           = "D-Optimality",
  init_design         = init_des,
  alpha               = 0.25,
  model               = y ~ 10^(a - b / (c + x)),
  parameters          = c("a", "b", "c"),
  par_values          = c(8.07131, 1730.63, 233.426),
  design_space        = c(1, 100),
  calc_optimal_design = FALSE,
  delta_val           = 0.85
)

print(region)
#> Augment candidate region  (delta = 0.8500)
#>   Intervals: [5.361, 100]

region$region is a data frame of candidate intervals. Each row gives a lower and upper bound on the design space where the new point can be placed.

Step 2: choose a point and augment

new_pt <- mean(region$region[1:2])

augmented <- augment_design(
  criterion           = "D-Optimality",
  init_design         = init_des,
  alpha               = 0.25,
  model               = y ~ 10^(a - b / (c + x)),
  parameters          = c("a", "b", "c"),
  par_values          = c(8.07131, 1730.63, 233.426),
  design_space        = c(1, 100),
  calc_optimal_design = FALSE,
  delta_val           = 0.85,
  new_points          = data.frame(Point = new_pt, Weight = 1)
)

print(augmented)
#>      Point Weight
#> 1 30.00000   0.25
#> 2 60.00000   0.25
#> 3 90.00000   0.25
#> 4 52.68026   0.25
cat("Sum of weights:", sum(augmented$Weight), "\n")
#> Sum of weights: 1

Comparing efficiency before and after

result_opt <- opt_des(
  "D-Optimality",
  y ~ 10^(a - b / (c + x)), c("a", "b", "c"),
  c(8.07131, 1730.63, 233.426), c(1, 100)
)
#> 
#> ℹ Stop condition not reached, max iterations performed
#> ⠙ Calculating optimal design 22 done (37/s) | 599ms
#> ℹ The lower bound for efficiency is 99.9986187505804%

eff_before <- design_efficiency(init_des, result_opt)
#> ℹ The efficiency of the design is 38.5312233926718%
eff_after  <- design_efficiency(augmented, result_opt)
#> ℹ The efficiency of the design is 34.2933573563283%

cat("Efficiency before augmenting:", round(eff_before * 100, 2), "%\n")
#> Efficiency before augmenting: 38.53 %
cat("Efficiency after augmenting: ", round(eff_after  * 100, 2), "%\n")
#> Efficiency after augmenting:  34.29 %
cat("Gain:                        ", round((eff_after - eff_before) * 100, 2),
    "percentage points\n")
#> Gain:                         -4.24 percentage points

Using the optimal design as reference (`calc_optimal_design = TRUE`)

When calc_optimal_design = TRUE, the function internally computes the optimal design and uses it to define the efficiency threshold. This is the recommended mode when no optimal design has been computed yet:

region_opt <- get_augment_region(
  criterion           = "D-Optimality",
  init_design         = init_des,
  alpha               = 0.25,
  model               = y ~ 10^(a - b / (c + x)),
  parameters          = c("a", "b", "c"),
  par_values          = c(8.07131, 1730.63, 233.426),
  design_space        = c(1, 100),
  calc_optimal_design = TRUE,
  delta_val           = 0.85
)

Two-factor augmentation

In multi-dimensional spaces get_augment_region() samples candidate points with a Latin Hypercube (controlled by n_lhs) and returns a data frame of candidates together with their estimated efficiency gain. A heatmap of the efficiency function is displayed automatically.

Initial design and candidate region

init_2d <- data.frame(
  x1     = c(0.8, 10, 5),
  x2     = c(10, 0.8, 5),
  Weight = c(1/3, 1/3, 1/3)
)

result_2D <- opt_des(
  criterion    = "D-Optimality",
  model        = y ~ Vmax * x1 * x2 / ((K1 + x1) * (K2 + x2)),
  parameters   = c("Vmax", "K1", "K2"),
  par_values   = c(1, 1, 1),
  design_space = list(x1 = c(0.1, 10), x2 = c(0.1, 10))
)
#> 
#> ℹ Stop condition reached: difference between sensitivity and criterion < 1e-05
#> ⠙ Calculating optimal design 14 done (18/s) | 759ms
#> ℹ The lower bound for efficiency is 99.9990417429941%

region_2d <- get_augment_region(
  criterion           = "D-Optimality",
  init_design         = init_2d,
  alpha               = 0.25,
  model               = y ~ Vmax * x1 * x2 / ((K1 + x1) * (K2 + x2)),
  parameters          = c("Vmax", "K1", "K2"),
  par_values          = c(1, 1, 1),
  design_space        = list(x1 = c(0.1, 10), x2 = c(0.1, 10)),
  calc_optimal_design = FALSE,
  delta_val           = 0.85
)

#> ℹ 1908 candidate points with efficiency >= 0.85 (from LHS sample of 2000)

region_2d$region is a data frame of sampled candidates, each with an efficiency column. Pick the candidate that maximises efficiency:

best_2d <- region_2d$region[which.max(region_2d$region$efficiency), ]

eff_antes <- suppressMessages(design_efficiency(init_2d, result_2D))

aug_2d <- augment_design(
  criterion           = "D-Optimality",
  init_design         = init_2d,
  alpha               = 0.25,
  model               = y ~ Vmax * x1 * x2 / ((K1 + x1) * (K2 + x2)),
  parameters          = c("Vmax", "K1", "K2"),
  par_values          = c(1, 1, 1),
  design_space        = list(x1 = c(0.1, 10), x2 = c(0.1, 10)),
  calc_optimal_design = FALSE,
  delta_val           = 0.85,
  new_points          = data.frame(x1 = best_2d$x1, x2 = best_2d$x2, Weight = 1)
)

#> ℹ 1897 candidate points with efficiency >= 0.85 (from LHS sample of 2000)
#> Sample of candidate points:
#>          x1        x2 efficiency
#> 1  1.159021 5.7198266  0.9130710
#> 2  8.704474 6.6973783  1.1278073
#> 3  1.943044 8.6008467  0.8973246
#> 4  2.268775 9.2575649  0.9110199
#> 5  2.512734 5.9707006  0.8753815
#> 6  2.161179 4.6834668  0.8583050
#> 7  5.572574 9.1782547  1.0922690
#> 8  5.567190 2.8125450  0.8758437
#> 9  7.403983 5.4546094  1.0441308
#> 10 1.603604 0.3647445  0.8938971
#> 11 7.310526 2.8829943  0.9108408
#> 12 0.979907 7.0303918  0.9291814
#> 13 4.627258 6.0955799  0.9675580
#> 14 5.254325 9.8052106  1.0893833
#> 15 8.693012 7.9399654  1.1678525

eff_despues <- suppressMessages(design_efficiency(aug_2d, result_2D))

cat("Efficiency before:", round(eff_antes  * 100, 2), "%\n")
#> Efficiency before: 68.4 %
cat("Efficiency after: ", round(eff_despues * 100, 2), "%\n")
#> Efficiency after:  84.8 %
print(aug_2d)
#>          x1        x2 Weight
#> 1  0.800000 10.000000   0.25
#> 2 10.000000  0.800000   0.25
#> 3  5.000000  5.000000   0.25
#> 4  9.795318  9.807371   0.25

Three-factor augmentation

For three or more factors the candidate region is displayed as a scatter-matrix coloured by candidate/non-candidate status, with the current design shown as triangles.

init_3d <- data.frame(
  x1     = c(0.8, 10,  10,  0.8, 10),
  x2     = c(10,  0.8, 10,  10,  0.8),
  x3     = c(10,  10,  0.8, 0.8, 10),
  Weight = rep(0.2, 5)
)

region_3d <- get_augment_region(
  criterion           = "D-Optimality",
  init_design         = init_3d,
  alpha               = 0.45,
  model               = y ~ Vmax * x1 * x2 * x3 / ((K1+x1) * (K2+x2) * (K3+x3)),
  parameters          = c("Vmax", "K1", "K2", "K3"),
  par_values          = c(1, 1, 1, 1),
  design_space        = list(x1 = c(0.1, 10), x2 = c(0.1, 10), x3 = c(0.1, 10)),
  calc_optimal_design = FALSE,
  delta_val           = 0.93
)

#> ℹ 955 candidate points with efficiency >= 0.93 (from LHS sample of 2000)
cat("Number of candidate points:", nrow(region_3d$region), "\n")
#> Number of candidate points: 955
plot(region_3d$plot)

Augmenting with Ds-Optimality

When the goal is to augment while preserving estimation quality for a subset of parameters, use criterion = "Ds-Optimality" and pass par_int:

region_ds <- get_augment_region(
  criterion           = "Ds-Optimality",
  init_design         = init_2d,
  alpha               = 0.25,
  model               = y ~ Vmax * x1 * x2 / ((K1 + x1) * (K2 + x2)),
  parameters          = c("Vmax", "K1", "K2"),
  par_values          = c(1, 1, 1),
  design_space        = list(x1 = c(0.1, 10), x2 = c(0.1, 10)),
  calc_optimal_design = FALSE,
  par_int             = c(1),
  delta_val           = 0.85,
  n_lhs               = 5000
)

#> ℹ 3429 candidate points with efficiency >= 0.85 (from LHS sample of 5000)

best_ds <- region_ds$region[which.max(region_ds$region$efficiency), ]

aug_ds <- augment_design(
  criterion           = "Ds-Optimality",
  init_design         = init_2d,
  alpha               = 0.25,
  model               = y ~ Vmax * x1 * x2 / ((K1 + x1) * (K2 + x2)),
  parameters          = c("Vmax", "K1", "K2"),
  par_values          = c(1, 1, 1),
  design_space        = list(x1 = c(0.1, 10), x2 = c(0.1, 10)),
  calc_optimal_design = FALSE,
  par_int             = c(1),
  delta_val           = 0.85,
  new_points          = data.frame(x1 = best_ds$x1, x2 = best_ds$x2, Weight = 1),
  n_lhs               = 5000
)

#> ℹ 3519 candidate points with efficiency >= 0.85 (from LHS sample of 5000)
#> Sample of candidate points:
#>           x1        x2 efficiency
#> 1  3.3964325 5.4299693  0.9315541
#> 2  6.0410709 8.9488797  2.0161594
#> 3  6.8519952 4.0525941  1.2072523
#> 4  9.2119714 3.2177457  1.1180525
#> 5  3.4640591 8.2549850  1.1462742
#> 6  0.6055815 5.3386685  0.9086082
#> 7  9.7117979 9.1212394  2.8738231
#> 8  5.3726760 0.8898514  0.8502996
#> 9  5.5504378 4.9880671  1.2583024
#> 10 9.4440087 5.9074944  2.0368302
#> 11 7.7283380 4.8989349  1.5359707
#> 12 4.5766800 3.3969859  0.8634664
#> 13 7.8255510 8.5522039  2.4019050
#> 14 8.0094956 8.9320509  2.5084203
#> 15 6.5955117 7.1806920  1.8838472
print(aug_ds)
#>          x1        x2 Weight
#> 1  0.800000 10.000000   0.25
#> 2 10.000000  0.800000   0.25
#> 3  5.000000  5.000000   0.25
#> 4  9.992458  9.957832   0.25

Interactive mode

Omitting new_points (and delta_val) from both functions triggers an interactive session where the package plots the candidate region and asks the user to type a point. This mode is documented in ?augment_design.