Semantic Coupling
Library for analyzing semantic relationships between source code files.
2 min read
What is it?
semantic-coupling is a library I built to analyze relationships between source code files by examining their semantic
similarity. Rather than relying solely on structural dependencies (imports, inheritance), it uses natural language
processing to identify how closely related different files are based on their content and concepts.
This reveals implicit connections that traditional static analysis might miss – like two classes that deal with related concepts but don’t directly import each other.
GitHub: https://github.com/loehnertz/semantic-coupling
Why I built it
Traditional coupling metrics (static, dynamic) only capture explicit structural relationships. But codebases also have semantic relationships – classes that should probably be in the same module because they deal with related concepts, even if they don’t directly depend on each other.
I built this library to explore whether NLP techniques could identify these hidden relationships. Turns out they can, and it’s quite useful for things like microservice decomposition or validating module boundaries.
How it works
The library uses the SemanticCouplingCalculator as its main entry point. You provide:
- A map of file names to their raw source code contents
- The programming language (currently only Java)
- The natural language context (currently only English)
The calculator returns a list of SemanticCoupling objects representing detected relationships with similarity scores.
Behind the scenes, it analyzes the textual content of your code (variable names, method names, comments, string literals) and computes semantic similarity using NLP techniques. Files that use similar terminology and concepts score higher, even if they’re not structurally coupled.
flowchart LR
A[Source Files] --> B[Token Extraction]
B --> C[NLP Processing]
C --> D[Similarity Scoring]
D --> E[Coupling Results] Use cases
Semantic coupling analysis is particularly useful for:
- Microservice Decomposition: Identifying which classes belong together conceptually, even if they don’t have direct dependencies
- Code Review: Spotting files that should be reviewed together because they deal with related concepts
- Architecture Validation: Checking if module boundaries align with semantic groupings
- Legacy Code Understanding: Discovering implicit relationships in unfamiliar codebases
Tech stack
- Language: Kotlin
The API provides both Kotlin and Java interfaces, making it easy to integrate into existing toolchains regardless of which JVM language you’re using.