Previous talks at the SCCS Colloquium

Theodora-Augustina Dragan: Characteristics of Quantum Architectures in Reinforcement Learning Applications

SCCS Colloquium |


Reinforcement learning (RL) contributed to many industrial advancements, from robot navigation to drug development. It is a powerful tool for learning, since the algorithms this method uses learn as they explore the problem they have to solve. They only receive one sample of input data from the environment at each interaction. Naturally, this learning method leads to requiring a lot of computational effort. For example, for some RL tasks, even if the algorithm is executed on powerful state-of-the-art processors, it still needs several days and millions of interactions with the environment to train. After a review of current literature and its status, this thesis comes to ask and research whether using quantum computing to perform some of the tasks inside a reinforcement learning algorithm can help to reduce the computational needs and help to increase the performance achieved. Moreover, this thesis exactly investigates which kind of quantum architecture proves itself to be more useful.

In this thesis, the RL problem to be solved is that an agent has to successfully navigate through a maze-like environment and reach a desired position. In order to do that, a quantum RL algorithm is designed. The algorithm uses two quantum circuits, each of them with four qubits and containing at most 28 quantum parameters. On top of each quantum architecture, there are 25 classical parameters used for post-processing. The performance of each solution is assessed through the maximal positive feedback the agent receives while it trains on the environment, and the time it takes to reach and stabilize around this maximal positive feedback. To compare and distinguish different possible quantum architectures, their characteristics are investigated using quantum circuit metrics: expressibility, entanglement and the effective dimension. All quantum solutions are compared to a classical variant of the same algorithm, where neural networks are used instead of quantum circuits. Each neural network has two hidden layers and between 125 and 1245 parameters.

A first objective of this thesis is to see if adding quantum methods into the classical RL solutions brings any measurable advantage. As a second goal, any possible correlations between the RL metrics and the quantum ones are looked into, in order to choose which quantum architecture performs best. This would indicate a more thorough advantage and a clear path how to develop quantum RL algorithms. The procedure proposed and evaluated in this thesis uses quantum circuits as substitutes for neural networks in deep RL algorithms. Results show that when the quantum approach is used, the algorithm needs less interactions with the environment to learn,
and less trainable parameters to be modified during learning. This is evaluated with reference to achieving an equivalent performance to the one obtained by its classical counterpart. For example, one of the quantum solutions reaches a reward equal to 82% of the maximal possible reward while using 47 trainable weights, with the training process stabilizing at 19000 environment interactions. The classical variant with 125 parameters reaches 80% of the maximal reward and needs 25600 environment interactions to stabilize. All results are averaged across multiple experiments for each architecture used. No significant correlation was drawn between the performance and the metrics characterising the architectures, both in the classical and quantum case. Possible factors
for this are presented in the thesis, together with proposals of future work to improve the results presented and to better assess correlations.

Master's thesis presentation. Theodora-Augustina is advised by Prof. Christian Mendl.