quarticgym.envs package
Subpackages
Submodules
quarticgym.envs.atropineenv module
AtropineEnv simulates an atropine production environment.
- class quarticgym.envs.atropineenv.AtropineEnvGym(dense_reward=True, normalize=True, debug_mode=False, action_dim=4, reward_function=None, done_calculator=None, observation_name=None, action_name=None, np_dtype=<class 'numpy.float32'>, max_steps=60, error_reward=-100000.0, x0_loc='https://raw.githubusercontent.com/Quarticai/QuarticGym/master/quarticgym/datasets/atropineenv/x0.txt', z0_loc='https://raw.githubusercontent.com/Quarticai/QuarticGym/master/quarticgym/datasets/atropineenv/z0.txt', model_loc='https://github.com/Quarticai/QuarticGym/blob/master/quarticgym/datasets/atropineenv/model.npy?raw=true', uss_subtracted=True, reward_on_ess_subtracted=False, reward_on_steady=True, reward_on_absolute_efactor=False, reward_on_actions_penalty=0.0, reward_on_reject_actions=True, relaxed_max_min_actions=False, observation_include_t=True, observation_include_action=False, observation_include_uss=True, observation_include_ess=True, observation_include_e=True, observation_include_kf=True, observation_include_z=True, observation_include_x=False)[source]
Bases:
quarticgym.envs.utils.QuarticGymEnvBase
- action_space: spaces.Space[ActType]
- done_calculator_standard(current_observation, step_count, reward, done=None, done_info=None)[source]
- check whether the current episode is considered finished.
returns a boolean value indicated done or not, and a dictionary with information. here in done_calculator_standard, done_info looks like {“terminal”: boolean, “timeout”: boolean}, where “timeout” is true when episode end due to reaching the maximum episode length, “terminal” is true when “timeout” or episode end due to termination conditions such as env error encountered. (basically done)
- Parameters
- Returns
done and done_info.
- Return type
- evaluate_rewards_mean_std_over_episodes(algorithms, num_episodes=1, error_reward=None, initial_states=None, to_plt=True, plot_dir='./plt_results', computer_on_episodes=False)[source]
returns: mean and std of rewards over all episodes. since the rewards_list is not aligned (e.g. some trajectories are shorter than the others), so we cannot directly convert it to numpy array. we have to convert and unwrap the nested list. if computer_on_episodes, we first average the rewards_list over episodes, then compute the mean and std. else, we directly compute the mean and std for each step.
- evalute_algorithms(algorithms, num_episodes=1, error_reward=None, initial_states=None, to_plt=True, plot_dir='./plt_results')[source]
when excecuting evalute_algorithms, the self.normalize should be False. algorithms: list of (algorithm, algorithm_name, normalize). algorithm has to have a method predict(observation) -> action: np.ndarray. num_episodes: number of episodes to run error_reward: overwrite self.error_reward initial_states: None, location of numpy file of initial states or a (numpy) list of initial states to_plt: whether generates plot or not plot_dir: None or directory to save plots returns: list of average_rewards over each episode and num of episodes
- observation_space: spaces.Space[ObsType]
- reset(initial_state=None, normalize=None)[source]
Required by gym, this function resets the environment and returns an initial observation.
- reward_function_standard(previous_observation, action, current_observation, reward=None)[source]
the s, a, r, s, a calculation.
- Parameters
previous_observation ([np.ndarray]) – This is denormalized observation, as usual.
action ([np.ndarray]) – This is denormalized action, as usual.
current_observation ([np.ndarray]) – This is denormalized observation, as usual.
reward ([float]) – If reward is provided, directly return the reward.
- Returns
reward.
- Return type
[float]
- class quarticgym.envs.atropineenv.AtropineMPC(model_loc='https://github.com/Quarticai/QuarticGym/blob/master/quarticgym/datasets/atropineenv/model.npy?raw=true', N=30, Nx=2, Nu=4, uss_subtracted=True, reward_on_ess_subtracted=False, reward_on_steady=True, reward_on_absolute_efactor=False, reward_on_actions_penalty=0.0, reward_on_reject_actions=True, relaxed_max_min_actions=False, observation_include_t=True, observation_include_action=False, observation_include_uss=True, observation_include_ess=True, observation_include_e=True, observation_include_kf=True, observation_include_z=True, observation_include_x=False)[source]
Bases:
object
quarticgym.envs.beerfmtenv module
BeerFMT simulates the Beer Fermentation process.
- class quarticgym.envs.beerfmtenv.BeerFMTEnvGym(dense_reward=True, normalize=True, debug_mode=False, action_dim=1, observation_dim=8, reward_function=None, done_calculator=None, max_observations=[15, 15, 15, 150, 150, 10, 10, 200], min_observations=[0, 0, 0, 0, 0, 0, 0, 0], max_actions=[16.0], min_actions=[9.0], observation_name=['X_A', 'X_L', 'X_D', 'S', 'EtOH', 'DY', 'EA', 'time'], action_name=['temperature'], np_dtype=<class 'numpy.float32'>, max_steps=200, error_reward=-200.0)[source]
Bases:
quarticgym.envs.utils.QuarticGymEnvBase
- action_space: spaces.Space[ActType]
- done_calculator_standard(current_observation, step_count, reward, update_prev_biomass=False, done=None, done_info=None)[source]
- check whether the current episode is considered finished.
returns a boolean value indicated done or not, and a dictionary with information. here in done_calculator_standard, done_info looks like {“terminal”: boolean, “timeout”: boolean}, where “timeout” is true when episode end due to reaching the maximum episode length, “terminal” is true when “timeout” or episode end due to termination conditions such as env error encountered. (basically done)
- Parameters
- Returns
done and done_info.
- Return type
- observation_space: spaces.Space[ObsType]
- reset(initial_state=None, normalize=None)[source]
required by gym. This function resets the environment and returns an initial observation.
- reward_function_standard(previous_observation, action, current_observation, reward=None)[source]
the s, a, r, s, a calculation.
- Parameters
previous_observation ([np.ndarray]) – This is denormalized observation, as usual.
action ([np.ndarray]) – This is denormalized action, as usual.
current_observation ([np.ndarray]) – This is denormalized observation, as usual.
reward ([float]) – If reward is provided, directly return the reward.
- Returns
reward.
- Return type
[float]
quarticgym.envs.pensimenv module
quarticgym.envs.reactorenv module
quarticgym.envs.utils module
- class quarticgym.envs.utils.QuarticGymEnvBase(dense_reward=True, normalize=True, debug_mode=False, action_dim=2, observation_dim=3, max_observations=[1.0, 1.0], min_observations=[-1.0, -1.0], max_actions=[1.0, 1.0], min_actions=[-1.0, -1.0], observation_name=None, action_name=None, initial_state_deviation_ratio=None, np_dtype=<class 'numpy.float32'>, max_steps=None, error_reward=-100.0)[source]
Bases:
gym.core.Env
- action_beyond_box(action)[source]
check if the action is beyond the box, which is what we don’t want.
- Parameters
action ([np.ndarray]) – This is denormalized action, as usual.
- Returns
action is beyond the box or not.
- Return type
[bool]
- action_space: spaces.Space[ActType]
- algorithms_to_algo_names(algorithms)[source]
- Parameters
algorithms – list of (algorithm, algorithm_name, normalize).
- Returns
list of algorithm_name.
- dataset_to_observations_actions_rewards_list(dataset)[source]
_summary_
- Parameters
dataset (_type_) – d4rl or torch format dataset obtained from generate_dataset_with_algorithm
- Returns
the same as evalute_algorithms
- done_calculator_standard(current_observation, step_count, reward, done=None, done_info=None)[source]
- check whether the current episode is considered finished.
returns a boolean value indicated done or not, and a dictionary with information. here in done_calculator_standard, done_info looks like {“terminal”: boolean, “timeout”: boolean}, where “timeout” is true when episode end due to reaching the maximum episode length, “terminal” is true when “timeout” or episode end due to termination conditions such as env error encountered. (basically done)
- Parameters
- Returns
done and done_info.
- Return type
- evalute_algorithms(algorithms, num_episodes=1, error_reward=None, initial_states=None, to_plt=True, plot_dir='./plt_results')[source]
when excecuting evalute_algorithms, the self.normalize should be False. algorithms: list of (algorithm, algorithm_name, normalize). algorithm has to have a method predict(observation) -> action: np.ndarray. num_episodes: number of episodes to run error_reward: overwrite self.error_reward initial_states: None, location of numpy file of initial states or a (numpy) list of initial states to_plt: whether generates plot or not plot_dir: None or directory to save plots returns: observations_list, actions_list, rewards_list
- evenly_spread_initial_states(val_per_state, dump_location=None)[source]
Evenly spread initial states. This function is needed only if the environment has steady_observations.
- Parameters
val_per_state (int) – how many values to sampler per state.
Returns: [initial_states]: evenly spread initial_states.
- generate_dataset_with_algorithm(algorithm, normalize=None, num_episodes=1, error_reward=- 1000.0, initial_states=None, format='d4rl')[source]
this function aims to create a dataset for offline reinforcement learning, in either d4rl or pytorch format. the trajectories are generated by the algorithm, which interacts with this env initialized by initial_states. algorithm: an instance that has a method predict(observation) -> action: np.ndarray. if format == ‘d4rl’, returns a dictionary in d4rl format. else if format == ‘torch’, returns an object of type torch.utils.data.Dataset.
- observation_beyond_box(observation)[source]
check if the observation is beyond the box, which is what we don’t want.
- Parameters
observation ([np.ndarray]) – This is denormalized observation, as usual.
- Returns
observation is beyond the box or not.
- Return type
[bool]
- observation_done_and_reward_calculator(current_observation, action, normalize=None, step_reward=None, done_info=None)[source]
the s, a, r, s, a rollout, with error checks.
- Parameters
current_observation (list or np.ndarray) – This is denormalized observation, as usual.
previous_observation (np.ndarray) – This is denormalized observation, as usual.
action (np.ndarray) – This is denormalized action, as usual.
normalize (bool) – Defaults to None.
step_reward (float, optional) – The reward of current step. Defaults to None.
done_info (dict, optional) – Defaults to None.
- Returns
This is the returned observation controlled by the normalize argument, for step function. [(float, bool, dict)]: reward, done and done_info. done_info looks like {“timeout”: boolean, “error_occurred”: boolean, “terminal”: boolean}, where “timeout” is true when episode end due to reaching the maximum episode length, “error_occurred”: is true when episode end due to env error encountered, “terminal” is true when “timeout” or episode end due to termination conditions such as product collection is finished. (basically done). “terminal” should be True whenever timeout or error_occurred.
- Return type
observation (np.ndarray)
- observation_space: spaces.Space[ObsType]
- report_rewards(rewards_list, algo_names=None, save_dir=None)[source]
returns: mean and std of rewards over all episodes. since the rewards_list is not aligned (e.g. some trajectories are shorter than the others), so we cannot directly convert it to numpy array. we have to convert and unwrap the nested list. on_episodes first average the rewards_list over episodes, then compute the mean and std. all_rewards directly compute the mean and std for each step. # rewards_list[i][j][t] is algorithm_i_game_j_reward_t.
- reset(initial_state=None, normalize=None)[source]
Required by gym, this function resets the environment and returns an initial observation.
- reward_function_standard(previous_observation, action, current_observation, reward=None)[source]
the s, a, r, s, a calculation.
- Parameters
previous_observation ([np.ndarray]) – This is denormalized observation, as usual.
action ([np.ndarray]) – This is denormalized action, as usual.
current_observation ([np.ndarray]) – This is denormalized observation, as usual.
reward ([float]) – If reward is provided, directly return the reward.
- Returns
reward.
- Return type
[float]