# Space Compression

In high-dimensional hyperparameter optimization, the search space often contains many parameters,
but only a subset of them significantly influences the objective function.
**Space compression** reduces the effective dimensionality of the search space through a
pipeline of compression steps, making Bayesian optimization more efficient.

OpenBox provides a built-in compression framework that supports three categories of
compression: **dimension selection**, **range compression**, and **projection**.

## Quick Example

The simplest way to enable space compression is to set `compressor_type` when creating an
`Advisor` or `Optimizer`.

```python
from openbox import Advisor, space as sp

cs = sp.Space()
for i in range(20):
    cs.add_variable(sp.Real(f'x{i}', -10, 10))

advisor = Advisor(
    config_space=cs,
    num_objectives=1,
    compressor_type='llamatune',
    compressor_kwargs={
        'adapter_alias': 'rembo',
        'le_low_dim': 5,
        'max_num_values': 50,
    },
)
```

## Compression Pipeline

Compression is organized as a **pipeline** of ordered steps. Each step belongs to one of three
categories and is identified by a short string.

### Dimension Selection Steps

Select a subset of parameters by importance.

| String | Step Class | Description |
|--------|-----------|-------------|
| `d_shap` | `SHAPDimensionStep` | SHAP-based feature importance selection. Supports transfer learning. |
| `d_corr` | `CorrelationDimensionStep` | Spearman/Pearson correlation-based selection. Supports transfer learning. |
| `d_expert` | `ExpertDimensionStep` | Expert-specified parameter selection. |
| `d_adaptive` | `AdaptiveDimensionStep` | Adaptively adjusts the number of selected parameters during optimization. |

### Range Compression Steps

Narrow parameter value ranges to high-value regions.

| String | Step Class | Description |
|--------|-----------|-------------|
| `r_boundary` | `BoundaryRangeStep` | Mean ± σ boundary-based compression. |
| `r_shap` | `SHAPBoundaryRangeStep` | SHAP-weighted boundary compression. Supports transfer learning. |
| `r_kde` | `KDEBoundaryRangeStep` | KDE-based range compression. Supports transfer learning. |
| `r_expert` | `ExpertRangeStep` | Expert-specified parameter ranges. |

### Projection Steps

Transform the parameter space into a lower-dimensional representation.

| String | Step Class | Description |
|--------|-----------|-------------|
| `p_quant` | `QuantizationProjectionStep` | Integer quantization for large-range integer parameters. |
| `p_rembo` | `REMBOProjectionStep` | Random Embedding Bayesian Optimization. |
| `p_hesbo` | `HesBOProjectionStep` | Hashing-Enhanced Subspace Bayesian Optimization. |
| `p_kpca` | `KPCAProjectionStep` | Kernel PCA projection. |

## Using `compressor_type` Shortcuts

OpenBox provides several shortcut `compressor_type` values that automatically construct a
pipeline.

### `'llamatune'` — Quantization + Projection

Inspired by the [LlamaTune](https://www.vldb.org/pvldb/vol15/p2953-kanellis.pdf) method.
Combines quantization with optional REMBO or HesBO projection.

```python
advisor = Advisor(
    config_space=cs,
    num_objectives=1,
    compressor_type='llamatune',
    compressor_kwargs={
        'adapter_alias': 'rembo',  # 'rembo', 'hesbo', or 'none'
        'le_low_dim': 5,           # low-dimensional target dimension
        'max_num_values': 50,      # max discrete values for quantization
    },
)
```

+ `adapter_alias`: Projection method. `'rembo'` for REMBO, `'hesbo'` for HesBO,
  or `'none'` for quantization only.
+ `le_low_dim`: Target dimensionality of the projected space.
+ `max_num_values`: Maximum number of discrete values for integer parameter quantization.

### `'shap'` — SHAP Dimension Selection + Boundary Range

Uses SHAP-based importance to select parameters and optionally compresses their ranges.
Requires `transfer_learning_history` to compute SHAP importances from historical data.

```python
advisor = Advisor(
    config_space=cs,
    num_objectives=1,
    transfer_learning_history=source_histories,
    compressor_type='shap',
    compressor_kwargs={
        'topk': 10,
        'top_ratio': 0.8,
        'sigma': 2.0,
    },
)
```

+ `topk`: Number of top important parameters to keep.
+ `top_ratio`: Fraction of best configurations used for boundary estimation.
+ `sigma`: Width of the boundary (mean ± sigma × std).

### `'expert'` — Expert Knowledge Dimension Selection

Selects parameters specified by domain experts.

```python
advisor = Advisor(
    config_space=cs,
    num_objectives=1,
    compressor_type='expert',
    compressor_kwargs={
        'expert_params': ['x0', 'x3', 'x7'],
        'top_ratio': 0.9,
        'sigma': 2.0,
    },
)
```

## Building a Custom Pipeline

For full control, use `compressor_type='pipeline'` with step strings.

```python
advisor = Advisor(
    config_space=cs,
    num_objectives=1,
    transfer_learning_history=source_histories,
    compressor_type='pipeline',
    compressor_kwargs={
        'step_strings': ['d_shap', 'r_boundary', 'p_rembo'],
        'step_params': {
            'd_shap': {'topk': 10},
            'r_boundary': {'top_ratio': 0.8, 'sigma': 2.0},
            'p_rembo': {'low_dim': 5, 'seed': 42},
        },
    },
)
```

You can also pass pre-built step objects via the `steps` key:

```python
from openbox.compressor import SHAPDimensionStep, BoundaryRangeStep

steps = [
    SHAPDimensionStep(strategy='shap', topk=10),
    BoundaryRangeStep(method='boundary', top_ratio=0.8, sigma=2.0),
]

advisor = Advisor(
    config_space=cs,
    num_objectives=1,
    transfer_learning_history=source_histories,
    compressor_type='pipeline',
    compressor_kwargs={'steps': steps},
)
```

## Using a Pre-built Compressor

You can also construct a `Compressor` instance directly and pass it to the `Advisor` via the
`compressor` argument. Steps such as `SHAPDimensionStep` and `KDEBoundaryRangeStep` need source
histories for importance and range estimation—pass `transfer_learning_history` as you would for
the `'shap'` shortcut.

```python
from openbox.compressor import Compressor, SHAPDimensionStep, KDEBoundaryRangeStep

steps = [
    SHAPDimensionStep(strategy='shap', topk=10),
    KDEBoundaryRangeStep(top_ratio=0.8, kde_coverage=0.6),
]
compressor = Compressor(config_space=cs, steps=steps)

advisor = Advisor(
    config_space=cs,
    num_objectives=1,
    transfer_learning_history=source_histories,
    compressor=compressor,
)
```

## Space Concepts

When space compression is active, OpenBox works with multiple spaces:

+ **Original space** — the full configuration space as defined by the user.
+ **Sample space** — the space used by the acquisition optimizer to propose new configurations.
+ **Surrogate space** — the space used for surrogate model training and prediction; the final
  output of the pipeline.
+ **Unprojected space** — the space before the first projection step, used to map projected
  configurations back to the original space for evaluation.

```
Original space
    ↓ [Dimension Selection]
Dimension-reduced space
    ↓ [Range Compression]
Range-compressed space
    ↓ [Projection]
Compressed space
    ├── Sample space: for generating new configurations
    └── Surrogate space: for model training
```

## Adaptive Compression

The `AdaptiveDimensionStep` can dynamically adjust the number of selected parameters during
optimization based on an update strategy.

```python
advisor = Advisor(
    config_space=cs,
    num_objectives=1,
    compressor_type='pipeline',
    compressor_kwargs={
        'step_strings': ['d_adaptive'],
        'step_params': {
            'd_adaptive': {
                'importance_calculator': 'shap',
                'update_strategy': 'periodic',
                'update_strategy_kwargs': {'period': 10},
                'initial_topk': 15,
                'reduction_ratio': 0.2,
                'min_dimensions': 5,
            },
        },
    },
)
```

To trigger adaptive updates during the optimization loop:

```python
for i in range(max_iter):
    config = advisor.get_suggestion()
    result = objective(config)
    observation = Observation(config=config, objectives=result['objectives'])
    advisor.update_observation(observation)

    updated = advisor.update_compression(advisor.history)
    if updated:
        print(f'Compression policy updated at iteration {i}')
```

### Update Strategies

| Strategy | Behavior |
|----------|----------|
| `'periodic'` | Update every N iterations. Reduce dimensions. |
| `'stagnation'` | Increase dimensions when optimization stagnates. |
| `'improvement'` | Reduce dimensions after consecutive improvements. |
| `'hybrid'` | Combines periodic, stagnation, and improvement strategies. |

## Integration with Transfer Learning

Space compression works naturally with transfer learning. When `transfer_learning_history`
is provided, the compressor uses the source task data to:
1. Compute parameter importances (for SHAP/Correlation-based dimension selection).
2. Estimate value ranges (for boundary/KDE range compression).
3. Transform source task histories to the compressed space for the surrogate model.

```python
advisor = Advisor(
    config_space=cs,
    num_objectives=1,
    transfer_learning_history=source_histories,
    surrogate_type='tlbo_rgpe_gp',
    compressor_type='shap',
    compressor_kwargs={'topk': 10, 'top_ratio': 0.8},
)
```