基于PID-Lagrange-SAC的深度强化学习楼宇建筑用能行为调控策略

凌䶮飞; 陈涛; 高赐威

文章摘要

基于PID-Lagrange-SAC的深度强化学习楼宇建筑用能行为调控策略

A PID-Lagrange-SAC based deep reinforcement learning strategy for building energy management

投稿时间：2025-03-21 修订日期：2025-05-19

DOI：

中文关键词: 楼宇建筑用能行为调控深度强化学习 CMDP PID-Lagrange-SAC算法

英文关键词: buildings energy management deep reinforcement learning CMDP PID-Lagrange-SAC algorithm

基金项目:国家自然科学基金资助项目(52107079)

作者	单位	邮编
凌䶮飞	东南大学软件学院	215000
陈涛^*	东南大学电气工程学院	210096
高赐威	东南大学电气工程学院

摘要点击次数: 42

全文下载次数: 0

中文摘要:

楼宇建筑的用能行为有着巨大的调节潜力，为解决此问题提出了一种基于PID-Lagrange-SAC算法的调控方法。首先，将楼宇建筑用能行为调控问题建模为马尔科夫决策过程（Markov decision process, MDP）模型，将可调控设备的状态及外部变量建立为状态空间，可调控设备的运行功率作为决策变量，设计恰当的奖励函数以指导智能体学习较好的动作策略。为了抑制智能体违反约束条件的行为，进一步将问题建立为带约束项的马尔科夫决策过程(Constrained MDP, CMDP)，并将PID控制与Lagrange方法结合应用于Soft actor-critic算法进行训练。算例分析表明，最终得到的调控策略在满足用户舒适度的同时降低了楼宇建筑的运行成本及碳排放，说明了所提方法的有效性及优越性。

英文摘要:

There is huge potential in building energy management. To solve the problem, a PID-Lagrange-SAC algorithm-based regulation method is proposed. Firstly, the problem statement of regulating building energy consumption behavior is modeled as a Markov decision process (MDP) model. The state of controllable devices and external variables which introduce uncertainties are estab-lished as the state space, and the operating power of controllable devices is used as the decision variable to form the action space. Then, proper reward functions are designed to instruct the agent to learn better regulating strategies. The problem is further ex-tended to a Constrained Markov Decision Process (CMDP), and the Soft actor-critic algorithm is employed to train the agent while PID control and Lagrange method are applied to suppress the behavior of agents violating constraint conditions. The case study shows that the regulating strategy reduces the operating costs and carbon emissions of the building while meeting users’ comfort demand, demonstrating the effectiveness and superiority of the proposed method.

View Fulltext 查看/发表评论下载PDF阅读器

关闭