文件类型:PDF文档
文件大小:154K
PROBLEM TO BE SOLVED : To provide a reinforcement learning method updating a model parameter each time one data item is observed.SOLUTION : The method includes an action selection/execution step, a learning step and a time updating step. In the action selection/execution step, action information ais selected with state information sat a time (t) of a control target as input, the action information ais outputted to the control target, state information sand reward information rare acquired as a response of the output, action information ais selected from the state information sand the state information sand s, the action information aand aand the reward information rare outputted to a learning part. In the learning step, the state information sand s, the action information aand aand the reward information rare inputted, an updating width of a model parameter for a measure function approximated by a linear model is calculated, and the model parameter is updated at the interval of the time (t) with the updating width and recorded in a model parameter recording part. In the time updating step, the time (t) is updated.