loading page

Flooding and Overflow Mitigation through a Model-free Deep Reinforcement Learning based on Koopman Emulators of Urban Drainage System
  • +2
  • Chong Tian,
  • Zhenliang Liao,
  • Zhiyu Zhang,
  • Hao Wu,
  • Kunlun Xin
Chong Tian
Unknown
Author Profile
Zhenliang Liao
Tongji University

Corresponding Author:[email protected]

Author Profile
Zhiyu Zhang
Tongji University
Author Profile
Hao Wu
Tongji University
Author Profile
Kunlun Xin
College of Environmental Science and Engineering, Tongji University, Shanghai, China
Author Profile

Abstract

Deep reinforcement learning has been used to establish real-time control of urban drainage system (UDS) for flooding mitigation in recent studies. However, only model-based reinforcement learning was under consideration, which means that a mathematical model of UDS is necessarily needed during RL’s training process. Although this is a natural way to establish RL system, it causes several problems, including (i) too much training time, (ii) too “rich” cache data, and (iii) too “perfect” training environment. To address these problems, a model-free RL training framework based on two Koopman emulators is provided and validated through simulation with respect to an UDS in a city located in eastern China. This framework achieves shorter training time and higher efficiency of data usage through the fast nonlinear emulation capability of Koopman emulators and the equalization between the dimension of emulator’s observable and RL’s state. Also, certain randomness is provided in RL training process through emulation. According to the results, compared with model-based RLs, this framework achieves a similar control effect with a 20 to 23 times faster training process and 79.67 times higher efficiency of data usage. The uncertainty analysis shows that slight perturbation which does not statistically change the control system in the training and testing process will not leverage the control effect of both model-based and model-free RLs. Meanwhile, the performances of the Koopman emulators of UDS are strongly related to their hyperparameters and the similarity between training data and test data.