# OpenBox: A Generalized Black-box Optimization Service (1)
## Introduction
Recent years have witnessed the rapid development of artificial intelligence and machine learning. Machine learning models have been widely applied to solve practical problems, such as data prediction and analysis, face recognition, product recommendation, etc. However, the application of machine learning is not easy, as the choice of hyperparameter configurations has a great impact on the model performance. Hyperparameter optimization, which is a typical example of black-box optimization, has become one of the important challenges of machine learning. There is no analytical form or gradient information of the optimization objective, and the evaluation is expensive. Its goal is to find the global optimum within a limited budget of evaluation times.
OpenBox is an open-source system designed for black-box optimization. Based on Bayesian optimization, OpenBox can solve black-box optimization problems efficiently. It supports not only traditional single objective black-box optimization problems (e.g., hyperparameter optimization), but also multi-objective optimization, optimization with constraints, multiple parameter types, transfer learning, distributed parallel evaluation, multi-fidelity optimization, etc. Moreover, OpenBox provides both standalone local usage and online optimization service. Users can monitor and manage the optimization tasks on web pages, and also deploy their private optimization service.
In this article, we will first introduce black-box optimization, and then OpenBox -- our generalized open source black-box optimization service.
## What is Black-box Optimization?
In white-box optimization, the specific form of the problem is known. For example, we can solve linear regression by analytic formula, or we can optimize deep neural networks by their gradient information. However, in black-box optimization, the objective function has no analytical form so that information such as the derivative of the objective function is unavailable. We cannot use the characteristics of the optimization objective to obtain its global optimum.
The figure above is a schematic diagram of a black-box function. The black-box function can be viewed in terms of its inputs and outputs, without any knowledge of its internal workings. We can only continuously input data into the black-box function, and then use the outputs to guess the structure information.
Taking hyperparameter optimization of machine learning as an example, our goal is to find a hyperparameter configuration that makes the machine learning model perform best. Therefore, the input of the black-box function is a hyperparameter configuration of the model, and the output is the performance evaluation of the model by training the model using this hyperparameter configuration and making predictions. The relationship between the model performance and the hyperparameters can not be described by specific expressions.
In addition to automatic hyperparameter tuning, black-box optimization has a wide range of applications in many fields, such as automatic A/B testing, experimental design, knobs tuning in database, processor architecture and circuit design, resource allocation, automatic chemical design, etc. (as shown in the figure below).
### Grid Search & Random Search
Grid search and random search are two naive methods to solve the black-box optimization problem. Grid search is also known as full factorial design. The user specifies a finite set of values for each hyperparameter, and grid search evaluates the Cartesian product of these sets. Obviously, grid search suffers from the curse of dimensionality, that is, with the increase of the number of hyperparameters, the number of evaluation times required increases exponentially.
Compared with grid search, random search is a more effective method. It will continuously sample hyperparameter randomly and evaluate it within a given resource (e.g., time) constraint. If there are unimportant hyperparameters, when performing grid search, we fix the value of important hyperparameters and try different values of unimportant hyperparameters, resulting in a small difference of evaluation results and low efficiency. However, random search avoids this problem and can search more different objective values corresponding to different important hyperparameters. The figure below shows the difference between grid search and random search.
The above figure is an example of Bayesian Optimization in one-dimensional input space. The three diagrams from top to bottom display the sequential optimization process. The black dotted line in the figure represents the real black-box function. In the initial case, two historical observations are used to train the probabilistic surrogate model. The predicted mean is represented by the black solid line, and the predicted variance (i.e. uncertainty) is represented by the blue area. The commonly used surrogate models are Gaussian Process, Random Forest, Tree-structured Parzen Estimator (TPE), etc.
According to the surrogate model, the acquisition function (orange curve in the figure) suggests the most promising candidate input parameters. The acquisition function needs to balance exploration and exploitation, that is, to choose the candidate with higher uncertainty or better predictive performance. Compared with the real black-box function, the evaluation cost of the acquisition function is much lower, so it can be fully optimized. Common acquisition functions include Expected Improvement, Lower Confidence Bound, Probability of Improvement, etc.
After optimizing the acquisition function, we get the candidate input point with the highest acquisition value (X on the orange curve). Then we evaluate this input point on the black-box function and get a new observation. We retrain the surrogate model for the next round of optimization.
## What is OpenBox?
OpenBox is an efficient generalized open-source system for black-box optimization. Its design satisfies the following desiderata:
+ Ease of use: Minimal user effort, and user-friendly visualization for tracking and managing BBO tasks.
+ Consistent performance: Host state-of-the-art optimization algorithms; choose the proper algorithm automatically.
+ Resource-aware management: Give cost-model-based advice to users, e.g., minimal workers or time-budget.
+ Scalability: Scale to dimensions on the number of input variables, objectives, tasks, trials, and parallel evaluations.
+ High efficiency: Effective use of parallel resources, system optimization with transfer-learning and multi-fidelities, etc.
+ Fault tolerance, extensibility, and data privacy protection.
OpenBox is coded with Python. You can visit our project here:
+ Multiple parameter types (FIOC, i.e Float, Integer, Ordinal, and Categorical): input parameters are not limited to float type (real number). For example, the kernel function of Supported Vector Machine is represented by a categorical hyperparameter. Using integer instead of ordinal or categorical type will attach an additional order relationship to parameters, which is not conducive to model optimization.
+ Multi-objective Optimization: Simultaneously optimize multiple different or even conflicting objectives, such as simultaneously optimize the accuracy of the machine learning model and model training/prediction time.
+ Optimization with Constraints: The (black-box) condition should be satisfied while optimizing the objective.
Few existing systems can support the above features at the same time. OpenBox supports not only all the above features but also:
+ Using historical task information to guide the current optimization task, namely transfer learning.
+ Provide parallel optimization algorithm. Support distributed evaluation. Make full use of parallel resources.
+ Multi-fidelity optimization algorithm is provided to further accelerate the search in high evaluation cost scenarios (such as training machine learning models on large datasets).
We will introduce how to use OpenBox in different scenarios in future articles.
### Various Optimization Strategies
OpenBox uses the Bayesian optimization algorithm based on the Random Forest surrogate model by default, which shows outstanding performance on hyperparameter optimization tasks. Compared with the SMAC3 library that uses the same algorithm, OpenBox adopts more optimization strategies, which leads to faster convergence and further improves the performance.
For Bayesian optimization, OpenBox also supports:
+ Gaussian Process surrogate model
+ Tree-structured Parzen Estimator (TPE) model
+ Various acquisition functions, such as EI, PI, LCB, EIC, EHVI, MESMO, USeMO, etc.
Users can choose the optimization strategy based on their needs.
### Usages
Among the existing systems, Google Vizier provides a service for hyperparameter optimization. Unlike the other algorithm libraries, users do not need to deploy the system and run optimization algorithms. They only need to interact with the service to get configuration suggestions, evaluate and update observations. However, Google Vizier is an internal service of Google and is not open-sourced.
Instead, OpenBox provides an online optimization service. Users may deploy the service on their servers using open source code to meet their privacy requirements.
So far, OpenBox supports all the platforms (Linux, macOS, Win10), and there are two ways to use OpenBox: the standalone Python package and distributed BBO service.
+ Standalone Python package: Users can use the black-box optimization algorithm by installing the Python package locally.
+ Distributed BBO service: Users can access the OpenBox service through API, get configuration suggestions from the server, and update observations to the server after evaluating the configuration performance (e.g., training machine learning model and making predictions). Users can monitor and manage the optimization process by visiting the service webpage.
OpenBox is open-source on GitHub: 
