This article introduces a procedure, which will allow us to derive topological objects from each dimension of the embedding manifold separately. The technique involves two steps. First we interpret each component of a hidden state vector in a langauge model as a time series over the word tokens in a sentence consumed by the model. We then view this time series as a topological manifold and compute its algebraic invariants.
In the previous approaches, we looked at each sentence as a point cloud within the representation space of our neural language model. Although, the order of words within each sentence is implicitly captured by the structure of the point cloud (because of the way word vectors are induced by LM), we did not explicitly take it into consideration when inducing topological features. In this approach we take the ordering of the embeddings directly into account by performing a re-representation step designed to model time series data. This allows us to study homological properties of each dimension within the representation manifold of our language model. When words from a corpus are fed into the neural network implementation of the language model, its hidden state vector traces out a path in the embedding space. We can interpret topological properties of these paths, and their relationship to corpus data, by analyzing each dimension of the hidden state vector as a time series.
SECTION 2Every sentence of the corpus generates multiple sequences of floating point numbers - one in each dimension of the representation manifold. We can transform those sequences, into topological objects, and study a notion of shape for each factor of the word embedding. In order to do this, we slide a window over the time series of the hidden states associated to the LM, and compute topological invariants of the resulting point clouds (see figure 1 for an illustration of the idea).
FIGURE 1
The first step is the construction of the sliding window embedding.
This step depends on two parameters:
$$ SW_{d,\tau}f_i(t) = \begin{bmatrix} f_i(t) \\ f_i(t+\tau ) \\ \vdots \\ f_i(t+(d-1)\tau) \end{bmatrix} \in \mathbb{R}^{d} $$
The dimension
This article introduces mathematical techniques in more detail than is possible in my journal publications and conference talks. It is useful as an introduction to some of the mathematical constructions I use, especially for Computational Linguistics audience unfamiliar with these ideas.