Hao Zhou
Jan 11, 2021

--

Thanks for the great break-down post. The select_child part looks unclear to me. Note that it functions in the while node.expanded() loop. I think it should do: i) select a child node with highest ucb socre, ii) expand the selected node so the loop can continue. But when does it stop? It should stop when a leaf node cannot be expanded. However, later Muzero does expand the leaf node. I'm confused here. iii) to prepare data for backpropagate, I think the select_child function also needs to save the immediate reward between nodes. Looks like it is not calling the neural network model. I checked the paper, it says the reward is queried from a table. But we haven't discussed any types of tables before. Those are my questions. Can you share some insights?

--

--

Hao Zhou
Hao Zhou

Written by Hao Zhou

Assit. Prof. in Transportation Eng

No responses yet