Reading the Graph
What You Are Looking At
The explorer shows a Wikipedia knowledge graph in three dimensions. Every visible point is one article. The point cloud is not decorative: its position, color, and linking structure all come from pipeline outputs produced offline and then rendered by the frontend.
Nodes
A node is a Wikipedia article. When you click one, the right panel shows its summary and immediate graph neighbors.
Colors
Color indicates cluster membership. Articles sharing a color were assigned to the same Louvain community in the filtered link graph.
Edges
Edges represent Wikipedia hyperlinks. They are factual graph connections, not inferred semantic similarity.
Golden rings
Golden rings mark bridge nodes: articles with high betweenness centrality that connect otherwise separate parts of the graph.
What Each Algorithm Does
1. Sentence-transformer embeddings
The pipeline encodes article title and summary text into a 384-dimensional semantic vector. This gives the system a content representation that captures topical similarity beyond exact keywords.
2. UMAP or force-directed layout
The 3D coordinates come from one of two layout strategies. UMAP compresses semantic vectors into 3D while preserving local neighborhood structure. Force-directed layout instead uses graph connectivity, pulling linked articles together and pushing all nodes apart.
3. Louvain clustering
Louvain finds densely connected communities in the hyperlink graph. It does not position nodes. It assigns cluster labels used for colors, cluster statistics, and cluster descriptions.
4. Betweenness centrality
Betweenness centrality identifies nodes that sit on many shortest paths. These become bridge nodes in the UI because they often connect different subject regions.
How to Interpret the View Correctly
Close together means similar, but only approximately
In UMAP mode, nearby nodes usually have semantically similar article content. In force-directed mode, nearby nodes usually have strong graph connectivity. In both cases, the 3D view is a compressed representation, not a perfect metric space.
Edges mean explicit links, not hidden similarity
Two nodes can sit close together without an edge because they are topically related but not explicitly linked in Wikipedia. The reverse can also happen: two articles can link while remaining far apart if the relationship is contextual rather than topically central.
Cluster color is graph-community membership
A cluster is not an official Wikipedia category. It is a community the algorithm inferred from graph structure. That means clusters are useful analytical groupings, not editorial truth.
Bridge nodes are more important than generic hubs for cross-domain analysis
High degree says a node has many links. High betweenness says the node helps connect different regions. For interdisciplinary analysis, betweenness is usually the more informative signal.
What Insights You Can Draw
- Dense same-color regions indicate well-connected topical communities with strong internal reference structure.
- Long sparse bridges between colored regions often reveal interdisciplinary concepts, historical pivots, or broadly reused methodologies.
- Clusters that are spatially near each other but weakly linked suggest conceptual overlap with relatively poor editorial interlinking.
- Scattered semantic-search hits indicate a cross-cutting concept; tightly grouped hits indicate a domain-specific concept.
- Large clusters with many internal links usually reflect mature, heavily developed parts of Wikipedia.
What You Should Not Infer
- Do not treat exact geometric distance as a scientifically precise similarity score.
- Do not assume a large cluster is better or more correct. It usually means denser coverage or heavier interlinking.
- Do not confuse bridge nodes with article quality. They measure graph position, not factual quality or editorial reliability.
- Do not interpret missing edges as absence of relationship. They only indicate missing explicit links in the dataset.