I will look at the representation manifolds induced by neural language models from raw corpus data in several complementary ways. The technique explained in this article forms a basis for one of these perspectives. It can be thought of as a topological dimensionality reduction method, where the goal is to summarize the shape of our representation space with a rough sketch in form of a low dimensional topological manifold.
The goal of this project is to summarize the shape of our representation space with a rough sketch in form of a low dimensional topological manifold.
This reduced representation can be thought of as a map approximating the shape of our embedding space.
Such description can be visually inspected by a human, while remaining more topologically informative than a naive projection.
Instead of growing
Figure 1 shows a visualization of this process for a point cloud sampled from the circle (
Given data points
This article introduces mathematical techniques in more detail than is possible in my journal publications and conference talks. It is useful as an introduction to some of the mathematical constructions I use, especially for Computational Linguistics audience unfamiliar with these ideas.