# Story of Maryam’s mathematical magic

Gravity is a difficult concept to understand as a quantum theory. Therefore physicists (and their mathematician friends) study quantum gravity in 2 dimensions where equations are more tractable. Just like any good quantum theory, the path integral for quantum gravity is defined by $Z(\beta) = \frac{1}{\text{vol}}\int \mathcal{D}\phi\mathcal{D}g e^{-I}$ where “vol” is the volume of the diffeomorphism group implying that the integration should be done on $\phi$ and $g$ upto diffeomorphism and $I$ is the action of a model knowns as Jackiw-Titelboim (JT) theory. Once the partition function is known, a quantum theory can be solved.

The partition function for the 2 dimensional quantum gravity (after imposing appropriate boundary conditions on two manifold $U$ with the topology of a disc) turns out to be $Z_{U}(\beta) = \int_{0}^{\infty}dE\rho_0(E)e^{-\beta E}$ where $\rho_0=\frac{e^S}{4\pi^2}\sinh^2(2\pi\sqrt{E})$. $S$ is the renormalized coefficient for Einstein-Hilbert term (ignore it if you don’t know what it means).

## Holographic duality

From AdS/CFT correspondence we know that the 2 dimensional quantum gravity is dual to 1 dimensional simple quantum system on the boundary of $U$. This implies that the partition function that we computed earlier form JT model should be $Z_{U}(\beta) = \text{Tr}_{\text{H}}e^{-\beta \hat{H}}$ where $\hat{H}$ is the Hamiltonian operator of the associated 1 dimensional quantum system. In other words, $Z_{U}(\beta)=\int_0^\infty dE\sum \delta(E - E_i)e^{-\beta E}$. This clearly is not equal to the partition function computed in the previous paragraph and that is problematic. The dual theories should have same partition function

## Random Matrix theory

The idea here is to consider an ensemble of random matrices (with certain properties) and define the ensemble average of the partition function $\langle Z(\beta)\rangle = \langle \text{Tr}_{\text{H}}e^{-\beta \hat{H}} \rangle$. In the matrix literature it is common to work with the inverse Laplace transform of the partition function given by $R(E) = -\int_0^\infty d\beta Z(\beta)e^{\beta E}$ known as the resolvent. So the resolvent correlation function has the “genus” expansion as follows $\langle R(E)\rangle = \sum_{g=0}^{g=\infty}\frac{R_{g,n}(E)}{L^{2g+n-2}}$ (it is asymptotic series expansion) where $L$ is the dimension of the matrices of the ensemble. The terms $R_{g, n}$ can be computed systematically from what are known as “loop equations”, in terms of $\rho_0(E)$ which is the “average density of the eigenvalues of random matrix of the ensemble”. The loop equations are streamlined into simple recursion relations which yield the correlation function upto all orders (in $1/L^2$)  once $\rho_0(E)$ is specified.

## 2 dimensional gravity again

In the JT model, there is a prescription to compute the quantity $\langle Z(\beta) \rangle$ which involves integrating over connected 2d geometries with specified boundary length and value of dilation field at the boundary. The partition function turns out to be $\langle Z(\beta)\rangle = e^{S_0}Z^{\text{disc}}_{\text{sch}} + \sum_{g=1}^{\infty} e^{(1-2g)S_0}bdbV_{g, 1}(b)Z^{\text{trumpet}}_{\text{sch}}(\beta, b)$ where $b$ is the length of a geodesic and $V_{g, 1}(b)$ is the Weil-Petersson volume of moduli space of Riemann surface with genus $g$.

In the phenomenal paper Maryam Mirzakhani showed that the Weil-Petersson volume follow certain recursion  relation which was shown to be equivalent to the recursion relation of matrix ensemble, we saw earlier, if we use the leading (genus 0) contribution of the JT path integral as $\rho_0(E)$ thus resolving the problem described earlier.

There is analogous story for supersymmetric quantum gravity but I am in no mood to write about it. If you are interested read this paper by Witten and Stanford.

# Strings in AdS_3 and Spectral Flow

Studying string theory in $\text{AdS}_3$ is useful in understanding $\text{AdS/CFT}$ correspondence as it sheds the light on one half of the “million dollar” conjecture beyond gravity approximation. The recent progress with the exact $\text{AdS}_3\text{/CFT}_2$ duality makes it important to study the string theory and understand the isomorphism of the physical states on both the sides. In this blog-post, I will explain the bosonic version of the string theory with emphasis on something known as “Spectral Flow”.

The spectral flow was an answer to some earlier issues raised by the study of string theory on $\text{AdS}_3$ which involved existence of an upper bound in the internal energy of string (independent of the string coupling) and absence of the “long string” states in the spectrum. In 2000, Maldacena and Ooguri wrote seminal paper (actually it was a series of three papers 1, 2 and 3) which established the consistency of the theory and opened gates for further study in the field and the million dollar conjecture.

The worldsheet description of the theory takes the form of $SL(2,\mathbb{R})$ WZW model. Just like any story in physics, we first start with the classical version of the model. The action is $S = \frac{k}{8\pi\alpha^\prime}\int d^2\sigma\text{Tr}\left(g^{-1}\partial gg^{-1}\partial g\right)+k\Gamma_{\text{WZ}}$. Here $g$ is the $SL(2,\mathbb{R})$ group element. Next we define left and right moving coordinates on the world sheet as $x^{\pm}= \tau\pm\sigma$. Then the equation of motion is given by $\partial_{-}(\partial_{+}gg^{-1})=0$ such that the general solution is given by $g = g_{+}(x^{+})g_{-}(x^{-})$. Furthermore the string is closed under $\sigma\to \sigma + 2\pi$ which leads to constraint $g_{+}(x^{+}+2\pi) = g_{+}(x^{+})M$ and $g_{-}(x^{-}-2\pi) = M^{-1}g_{-}(x^-)$ where the monodromy $M$ is defined only upto the conjugation by $SL(2,\mathbb{R})$ and classical solutions of WZW model are defined according to conjugacy class of $M$. Of course we still need to impose the Virasoro constraint that the stress energy tensor vanishes on-shell (just like the bosonic strings theory on flat-space).

The classical stage is set. Now we study some examples. Consider the solution $g_{+}=Ue^{iv_{+}(x^{+})\sigma_{2}}$ and $g_{-}=e^{iu_{-}(x^{-})\sigma_{2}}V$ (it is easy to check that it satisfies equation of motion for $U = \mathbb{I}$) where $U$ and $V$ are constant $SL(2,\mathbb{R})$ matrices. The energy-momentum tensor of the solution is $T^{\text{AdS}}_{\pm\pm}=-k(\partial_{\pm}v_{+}/u_{-})^2$. Now let there be excitations in the compact $\mathcal{M}$ of $\text{AdS}_3\times\mathcal{M}$ and let the $T^{\mathcal{M}}_{\pm\pm}=h$ then the Virasoro constraints imply $(\partial_{+}v_{+})^2=(\partial_{-}u_{-})^2=\frac{h}{k}$. Solving for $v_{+}/u_{-}$ and substituting in the solution ansatz yields the result $g = U\begin{pmatrix}\cos\alpha\tau & \sin\alpha\tau\\-\sin\alpha\tau & \cos\alpha\tau\end{pmatrix}V$ where $\alpha = \pm\sqrt{\frac{4h}{k}}$. $\sigma$ drops out and solution is only in terms of $\tau$. This can be interpreted as “string collapsing to a point”. For $U=V=\mathbb{I}$, it is essentially a particle sitting at the center of $\text{AdS}_3$ as shown in the figure.

Similarly, one can write spacelike solution in the form $g = U\begin{pmatrix}e^{\alpha\tau} & 0\\0 & e^{\alpha\tau}\end{pmatrix}V$.

Now given a classical solution $g=\tilde{g}_{+}\tilde{g}_{-}$, one can generate new solutions $g_{+}=e^{i\frac{w_R}{2}x^{+}\sigma_2}\tilde{g}_{+}$ and $g_{-}=\tilde{g}_{-}e^{i\frac{w_L}{2}x^{-}\sigma_2}$ which may be regarded as an action of the loop group $\hat{SL}(2,\mathbb{R})\times \hat{SL}(2,\mathbb{R})$. This is known as the spectral flow in CFT literature.

# B for Behavior, Blackboard and Brain (3)

This is the  final blog-post of the series (for previous parts see here and here). As promised, here we will see the implementation of Reinforcement Learning through Behavior Tree (in natural Unreal Engine environment). The method written is fairly general and can  be used to generate BT corresponding to any RL based algorithm.

The k-armed bandit (which for our case turns into shooting colored boxes) algorithm can be categorized into 5 tasks.

1. Select a box (based on probabilities and estimates)
2. Rotate towards box (for more complex, locomotion may be involved)
3. Shoot the box
4. Wait (for reward assesement)
5. Update the estimates

The tree built  with this categorization of the tasks is shown in figure

The Engine also provides an powerful interface for Blueprints to interact with C++. This is a good way to unify the artistic and programming development of a game. We will show that in real time here.

Consider the task BTTask_FindAppBox node. The Blueprint implementation is shown in the figure

In the Engine, every BT task starts with the node Event Receive Execute AI. So we start with that and  make the connection to Cast node yielding the object corresponding to class MAIPlayerController. Once that is done, we invoke the C++ method through the node Look for App Box and by  setting the target pin as the casted object. The C++ method is posted here.

Similarly rest of the three tasks (except the Engine default task Wait) are implemented through this C++-Blueprint interface. Another example is BTTask_UpdateEstimates

with the corresponding C++ code posted here.

# B for Behavior, Blackboard and Brain (2)

This post is continuation of this blog-post. Here  we will understand the algorithm behind the emergent behavior, called Reinforcement Learning (RL). The situation of shooting colored boxes is equivalent to k-armed  bandit problem which can be stated as

You are faced repeatedly with a choice among k different options, or actions. After each choice you receive a numerical reward chosen from a stationary probability distribution that depends on the action you selected. Your objective is to maximize the expected total reward over some time period, for example, over 1000 action selections, or time steps.

The reward for this game is: 100 for shooting red box and -10 for shooting blue box. The AI agent can access the scoreboard and take the actions accordingly. Based on the rewards, the agent generates the expected value associated to an action. This value is, in a way, the memory of all the rewards obtained when that particular action was performed. In more complicated systems, the rewards my be even probability based.

For RL, it is not only important to take those actions which have high expected value associated with them (exploiting the knowledge-base), but, once in a while, it is necessary to pick a random action irrespective of the associated expected value (exploration). This makes sure that the system doesn’t get stuck in the local optimum. For instance in case of correlated rewards, some actions when performed in certain sequence might yield better rewards. Later in this post it is shown why exploring leads to global optimum configuration knowledge!

For the problem at hand, the rewards are deterministic and static. So the RL algorithm is as follows (this algorithm can be found in Sutton’s book, section 2.4 (working link July 23, 2019))

Initialize, for $a = 1$ to $k$:
$Q(a) \leftarrow 0$
$N (a) \leftarrow 0$
Repeat forever:

$\text{arg } \text{max}_a Q(a)$  with probability $1-\epsilon$ or a random action with probability $\epsilon$

$R \leftarrow \text{bandit}(A)$
$N (A) \leftarrow N (A) + 1$
$Q(A)\leftarrow Q(A) + \frac{1}{N(A)}\left[R - Q(A)\right]$

In Unreal Engine, the above algorithm can be easily implemented and is posted here (specifically look the member function UMAIBrainComponent::TickComponent). The game is played in the Unreal Editor where the rendered arena image is

The first two red boxes on left are levers 0 and 1 and last two boxes on right are 3 and 2 levers.

When the game begins, the control is passed over to AI agent who, as per the logic flowing through Tick function, starts shooting the boxes implementing RL technique. The log of first 22 iterations of the shootouts performed on July 23, 2019 can be found here.

### Log Analysis:

We start with 0th iteration. An action (lever) is chosen at random because all the estimates are set to 0. The reward obtained is 100 and the estimate of action 0 is updated. For the next 3 iterations, the action with the highest expected value is chosen by the agent which results in pawn firing at first red box. But during 5th (marked with iteration number 4) iteration, the agent explores and randomly* chooses action 2 (last blue box) in spite of its expected value as 0. As result reward is -10. But then again it starts exploiting the knowledge-base and obtains higher rewards. Then again during iteration number 11 it chooses action 3 (second last blue box) and gets negative reward. Again, it continues with highest reward action selection (based on knowledge-base).

Finally during iteration number 19, it chooses action 1 (second red box) and gets reward 100. The estimates are updated (as can be see from next iteration’s maximum estimates ) and now, agent has global high estimates in the knowledge-base which couldn’t have been registered if agent weren’t exploring.

In the next blog-post of this series, we will see the visual implementation of this logic (natural to UE codebase).

* Ignore the 1 – epsilon text typo in the log text.

# B for Behavior, Blackboard and Brain (1)

Blackboard is a very efficient tool utilized by AI agent for generating appropriate behavior. In this blog-post I shall demonstrate the Unreal Engine code, in action, using the BlackBoard class.

First we initialize the BlackBoard object in the following fashion with line 8


void AMAIController::SetGameRep(AMAIGameState *GR)
{
MAIGameRep = GR;

// setup blackboard component
if(!MAIGameRep->GetBlackBoard())
{
UBlackboardComponent* BB = NewObject(this, TEXT("BlackboardComponent"));
if(BB)
{
BB->RegisterComponent();
MAIGameRep->SetBlackBoard(BB);

UBlackboardData* BlackboardAsset = NewObject(BB);
BlackboardAsset->UpdatePersistentKey(FName("PlayerScore"));
BB->InitializeBlackboard(*BlackboardAsset);
}
}

}



Then, we register the component with line 11 and declare the UBlackBoardData with line 14.  Next we add the key PlayerScore in the UBlackBoardData and finally initialize the BlackBoard with the asset. This is the code equivalent to the editor state shown below

The BlackBoard keys can be accessed by the code


GetGameRep()->GetBlackBoard()->GetValueAsFloat(FName("PlayerScore"));



The idea behind the BlackBoard is to provide a scratch work space with relevant data, required for the decision making purposes. Furthermore it is equipped with the ability to send notifications once data has been updated and cache the calculations saving redundant CPU processing time and power.

BlackBoard can be shared among several AI instances. It basically provides a common ground for all the AI instances to operate in collaborative manner. This gives rise to new collective behavior leading to more realistic gameplay and recreation.

Next, we  want to focus our attention towards Reinforcement Learning algorithms which are more or less dormant in the game AI  world (I really don’t have the clue why). But good news is that people are now gearing towards its implementation in game Engines (Unity seems first one to include that). For more information watch this video. There have been some white papers based on RL in game AI for instance Capture The Flag: the emergence of complex cooperative agents and Implementing Reinforcement Learning in Unreal Engine 4 with Blueprints.

Using the BlackBoard class, I applied Reinforcement Learning on Unreal Engine AI controller which is shown in the video below

The aim was to train the AI to shoot red boxes without actually coding it. This is called emergent behavior which was not included in the compiled code, but based on the rewards, AI learnt to shoot red boxes! More information on how I implemented RL will follow in the sequel blog-post. Stay tuned.