TuneConfig¶
Hydro args:¶
-
scaling_num
: The number of model width scaling. Hydro supports switch different tuning modesHydro (Scaling+Fusion) | Hydro (Fusion Only) | Ray (Classic HPO)
via setting this value. Specifically,0 = Using Ray Tune (disable both scaling and fusion),
1 = Using Hydro (fusion only, disable scaling),
Any integer > 1 (preferably powers of 2) enables both scaling and fusion.
Default value is 8.
-
fusion_limit
: User defined maximum model fusion number. Only work whenscaling_num
> 0.0 = Disabling fusion (scaling only).
1 = Similar with disabling fusion, but will still replace original model with Hydro modules.
If set to None, Hydro will automatically profile and determine the actual fusion number according to GPU memory capacity.
If set to a positive integer, Hydro will use this value as the fusion limit.
If set to a dict, Hydro will use this dict as the fusion for different batch size.
Default is None.
If you want to tune batch_size
hyperparameter
Please do name it as batch_size
, other names (like bs, bsz, etc.) will not be recognized by Hydro.
-
eager_transfer
: The ratio of maximum trials (num_samples
) to start a target model trial. Must in (0, 1].1 = Disabling eager transfer.
Default value is 0.5.
-
trial_compile
: Whether to enable torch.compile() to further accelerate model training throughput. If enabled, Hydro does not support model checkpointing and multi-fidelity tuning algorithms. Default is False.
Source Code:¶
@dataclass
class TuneConfig:
"""Tune specific configs.
Args:
======================================================================
Hydro args:
scaling_num: The number of model width scaling. Hydro supports switch
different tuning modes ``Hydro (Scaling+Fusion) | Hydro (Fusion Only)
| Ray (Classic HPO)`` via setting this value. Specifically,
0 = Using Ray Tune (disable both scaling and fusion),
1 = Using Hydro (fusion only, disable scaling),
Any integer > 1 (preferably powers of 2) enables both scaling and fusion
Default value is 8.
fusion_limit: User defined maximum model fusion number. Only work when
`scaling_num` > 0. Default is None.
0 = Disabling fusion (scaling only).
1 = Similar with disabling fusion, but will still replace original model
with Hydro modules.
If set to None, Hydro will automatically profile and determine the actual
fusion number according to GPU memory capacity.
If set to a positive integer, Hydro will use this value as the fusion limit.
If set to a dict, Hydro will use this dict as the fusion for different batch size.
eager_transfer: The ratio of maximum trials (`num_samples`) to start a
target model trial. Must in (0, 1].
1 = Disabling eager transfer. Default value is 0.5.
trial_compile: Whether to enable torch.compile() to further accelerate model
training throughput. If enabled, Hydro does not support model checkpointing
and multi-fidelity tuning algorithms. Default is False.
======================================================================
Ray args:
metric: Metric to optimize. This metric should be reported
with `tune.report()`. If set, will be passed to the search
algorithm and scheduler.
mode: Must be one of [min, max]. Determines whether objective is
minimizing or maximizing the metric attribute. If set, will be
passed to the search algorithm and scheduler.
search_alg: Search algorithm for optimization. Default to
random search.
scheduler: Scheduler for executing the experiment.
Choose among FIFO (default), MedianStopping,
AsyncHyperBand, HyperBand and PopulationBasedTraining. Refer to
ray.tune.schedulers for more options.
num_samples: Number of times to sample from the
hyperparameter space. Defaults to 1. If `grid_search` is
provided as an argument, the grid will be repeated
`num_samples` of times. If this is -1, (virtually) infinite
samples are generated until a stopping condition is met.
max_concurrent_trials: Maximum number of trials to run
concurrently. Must be non-negative. If None or 0, no limit will
be applied. This is achieved by wrapping the ``search_alg`` in
a :class:`ConcurrencyLimiter`, and thus setting this argument
will raise an exception if the ``search_alg`` is already a
:class:`ConcurrencyLimiter`. Defaults to None.
time_budget_s: Global time budget in
seconds after which all trials are stopped. Can also be a
``datetime.timedelta`` object.
reuse_actors: Whether to reuse actors between different trials
when possible. This can drastically speed up experiments that start
and stop actors often (e.g., PBT in time-multiplexing mode). This
requires trials to have the same resource requirements.
Defaults to ``True`` for function trainables (including most
Ray AIR trainers) and ``False`` for class and registered trainables
(e.g. RLlib).
trial_name_creator: Optional function that takes in a Trial and returns
its name (i.e. its string representation). Be sure to include some unique
identifier (such as `Trial.trial_id`) in each trial's name.
NOTE: This API is in alpha and subject to change.
trial_dirname_creator: Optional function that takes in a trial and
generates its trial directory name as a string. Be sure to include some
unique identifier (such as `Trial.trial_id`) is used in each trial's
directory name. Otherwise, trials could overwrite artifacts and checkpoints
of other trials. The return value cannot be a path.
NOTE: This API is in alpha and subject to change.
chdir_to_trial_dir: Whether to change the working directory of each worker
to its corresponding trial directory. Defaults to `True` to prevent
contention between workers saving trial-level outputs.
If set to `False`, files are accessible with paths relative to the
original working directory. However, all workers on the same node now
share the same working directory, so be sure to use
`session.get_trial_dir()` as the path to save any outputs.
"""
# Currently this is not at feature parity with `tune.run`, nor should it be.
# The goal is to reach a fine balance between API flexibility and conciseness.
# We should carefully introduce arguments here instead of just dumping everything.
mode: Optional[str] = None
metric: Optional[str] = None
search_alg: Optional[Union[Searcher, SearchAlgorithm]] = None
scheduler: Optional[TrialScheduler] = None
num_samples: int = 1
max_concurrent_trials: Optional[int] = None
time_budget_s: Optional[Union[int, float, datetime.timedelta]] = None
reuse_actors: Optional[bool] = None
trial_name_creator: Optional[Callable[[Trial], str]] = None
trial_dirname_creator: Optional[Callable[[Trial], str]] = None
chdir_to_trial_dir: bool = True
scaling_num: int = 8
fusion_limit: Optional[Union[int, Dict]] = None
eager_transfer: float = 0.5
trial_compile: bool = False