Getting started¶
In this walkthrough, we will present how to tune an image classifier (e.g., ResNet-18 model) on CIFAR-10 dataset using Hydro.
Tip
Please refer to Ray Docs for more information if you are not familiar with Ray Tune.
Installation¶
To run this example, we need to install Hydro package beforehand. Further installation instructions can be found in here.
Import Libraries¶
Let's begin by importing the necessary modules:
Setup the Search Space¶
We need to define search space with Ray Tune API. Here is an example:
tune.qloguniform(lower, upper, q)
function samples in different orders of magnitude and quantizes the value to an integer increment of q
. For more search space functions, please refer to Ray Tune Search Space API.
Load Dataset¶
We first load the CIFAR10 dataset and use a FileLock
to prevent multiple processes from downloading the same data.
Train & Validation Function¶
To support inter-trial fusion feature, we add fusion_num
as an argument to the train and validation functions. Besides, we need to incorporate some code to resize specific tensors as highlighted below.
Wrap Model, Optimizer and DataLoader¶
We need to wrap model, optimizer and dataLoader with hydro.train
api.
Configure Tuner¶
HydroTuner
is the key interface of configuring hyperparameter tuning job. Users can specify maximum number of trials num_samples
, maximum epochs stop
, model scaling ratio scaling_num
and inter-trial fusion limit fusion_limit
.
Example Output¶
After tuning the models, we will find the best performing one and load the trained network from the checkpoint file. We then obtain the test set accuracy and report everything by printing.
If you run the code, an example output could look like this:
== Status ==
Current time: 2023-04-25 08:20:42 (running for 00:23:42.54)
Memory usage on this node: 25.5/251.5 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/64 CPUs, 0/4 GPUs, 0.0/157.96 GiB heap, 0.0/71.69 GiB objects (0.0/1.0 accelerator_type:G)
Current best trial: b38f6_T0001(target trial) with val_acc=0.9162 and parameters={'lr': 0.1102, 'momentum': 0.584, 'batch_size': 128, 'gamma': 0.14, 'dataset': 'cifar10', 'seed': 10, 'FUSION_N': 0, 'SCALING_N': 0}
Result logdir: ~/ray_results
Number of trials: 8/50 (8 TERMINATED)
+--------------------------+------------+----------------------+----------+------+----------------------+----------------------+----------------------+--------+------------------+--------------+---------------------+-----------------------+
| Trial name | status | loc | hydro | bs | gamma | lr | momentum | iter | total time (s) | _timestamp | _time_this_iter_s | _training_iteration |
|--------------------------+------------+----------------------+----------+------+----------------------+----------------------+----------------------+--------+------------------+--------------+---------------------+-----------------------|
| HydroTrainer_b38f6_T0001 | TERMINATED | 10.100.79.96:3657182 | Target | 128 | 0.14 | 0.1102 | 0.584 | 50 | 384.496 | 1682382041 | 7.71278 | 50 |
| HydroTrainer_b38f6_T0000 | TERMINATED | 10.100.79.96:3479472 | Target | 512 | 0.05 | 0.1827 | 0.846 | 50 | 279.453 | 1682381326 | 5.47435 | 50 |
| HydroTrainer_b38f6_F0000 | TERMINATED | 10.100.79.96:3223763 | F=9, S=8 | 256 | [0.74, 0.38, 0._df80 | [0.0204, 0.4689_6d40 | [0.584, 0.857, _dec0 | 50 | 427.166 | 1682381050 | 8.64703 | 50 |
| HydroTrainer_b38f6_F0001 | TERMINATED | 10.100.79.96:3223967 | F=8, S=8 | 256 | [0.32, 0.31, 0._fb80 | [0.013600000000_23c0 | [0.507, 0.69400_6500 | 50 | 415.149 | 1682381041 | 9.09435 | 50 |
| HydroTrainer_b38f6_F0002 | TERMINATED | 10.100.79.96:3223968 | F=9, S=8 | 512 | [0.46, 0.28, 0._2b00 | [0.004500000000_f080 | [0.615, 0.659, _2180 | 50 | 382.011 | 1682381008 | 7.61978 | 50 |
| HydroTrainer_b38f6_F0003 | TERMINATED | 10.100.79.96:3223969 | F=8, S=8 | 512 | [0.04, 0.4, 0.0_eb40 | [0.0011, 0.1303_eec0 | [0.65, 0.791, 0_fe00 | 50 | 357.47 | 1682380984 | 6.90358 | 50 |
| HydroTrainer_b38f6_F0004 | TERMINATED | 10.100.79.96:3451196 | F=8, S=8 | 128 | [0.14, 0.54, 0._8280 | [0.1102, 0.2675_a200 | [0.584, 0.675, _8400 | 50 | 773.026 | 1682381761 | 15.4453 | 50 |
| HydroTrainer_b38f6_F0005 | TERMINATED | 10.100.79.96:3464377 | F=8, S=8 | 128 | [0.42, 0.54, 0._a140 | [0.307100000000_e880 | [0.981, 0.81800_8f40 | 50 | 737.031 | 1682381749 | 14.6641 | 50 |
+--------------------------+------------+----------------------+----------+------+----------------------+----------------------+----------------------+--------+------------------+--------------+---------------------+-----------------------+
See More PyTorch Examples¶
vision
: Image Classification Example¶
-
Tuning ResNet-18 on CIFAR-10 dataset using Hydro.
-
The original Ray Tune script for reference.
language
: Language Modeling Example¶
-
Tuning HuggingFace GPT-2 on WikiText dataset using Hydro. To be compatible with most machines, we set
n_layer=2
by default. -
The original Ray Tune script for reference.