乔东平, 段绿旗, 黎宏磊, 肖艳秋. 基于深度强化学习的作业车间调度问题优化[J]. 制造技术与机床, 2023, (4): 148-155. DOI: 10.19287/j.mtmt.1005-2402.2023.04.023
引用本文: 乔东平, 段绿旗, 黎宏磊, 肖艳秋. 基于深度强化学习的作业车间调度问题优化[J]. 制造技术与机床, 2023, (4): 148-155. DOI: 10.19287/j.mtmt.1005-2402.2023.04.023
QIAO Dongping, DUAN Lvqi, LI Honglei, XIAO Yanqiu. Optimization of job shop scheduling problem based on deep reinforcement learning[J]. Manufacturing Technology & Machine Tool, 2023, (4): 148-155. DOI: 10.19287/j.mtmt.1005-2402.2023.04.023
Citation: QIAO Dongping, DUAN Lvqi, LI Honglei, XIAO Yanqiu. Optimization of job shop scheduling problem based on deep reinforcement learning[J]. Manufacturing Technology & Machine Tool, 2023, (4): 148-155. DOI: 10.19287/j.mtmt.1005-2402.2023.04.023

基于深度强化学习的作业车间调度问题优化

Optimization of job shop scheduling problem based on deep reinforcement learning

  • 摘要: 针对作业车间调度问题求解的复杂性,以最小化最大完工时间为目标,提出基于深度强化学习优化算法求解作业车间调度问题。首先,基于析取图模型构建深度强化学习的调度环境,并建立三通道状态特征,设计20种复合启发式调度规则作为动作空间,将奖励函数等价为机器利用率;利用深度卷积神经网络搭建动作网络和目标网络,以状态作为输入,输出每个动作的Q值,进而使用行动有效性探索和利用策略选取动作;最后,计算即时奖励和更新调度环境。使用标准案例验证了算法可以平衡求解质量和时间,训练好的智能体对非零初始状态下调度问题具有很好的泛化性。

     

    Abstract: Aiming at the optimization problem of minimizing the maximum completion time in job shop scheduling, a deep reinforcement learning optimization algorithm is proposed. First, a deep reinforcement learning scheduling environment is built based on the disjunctive graph model, and three channels of state characteristics are established. The action space consists of 20 designed combination scheduling rules. The reward function is designed based on the proportional relationship between the total work of the scheduled operation and the current maximum completion time. The deep convolutional neural network is used to construct action network and target network, and the state features are used as inputs to output the Q value of each action. Then, the action is selected by using the action validity exploration and exploitation strategy. Finally, the immediate reward is calculated and the scheduling environment is updated. Experiments are carried out using benchmark instances to verify the algorithm. The results show that it can balance solution quality and computation time effectively, and the trained agent has good generalization ability to the scheduling problem in the non-zero initial state.

     

/

返回文章
返回