Jakob Löhnertz

Senior Software Engineer & Leader

Semantic Coupling project preview - code relationship analysis library.

Semantic Coupling

Library for analyzing semantic relationships between source code files.

2 min read

What is it?

semantic-coupling is a library I built to analyze relationships between source code files by examining their semantic similarity. Rather than relying solely on structural dependencies (imports, inheritance), it uses natural language processing to identify how closely related different files are based on their content and concepts.

This reveals implicit connections that traditional static analysis might miss – like two classes that deal with related concepts but don’t directly import each other.

GitHub: https://github.com/loehnertz/semantic-coupling

Why I built it

Traditional coupling metrics (static, dynamic) only capture explicit structural relationships. But codebases also have semantic relationships – classes that should probably be in the same module because they deal with related concepts, even if they don’t directly depend on each other.

I built this library to explore whether NLP techniques could identify these hidden relationships. Turns out they can, and it’s quite useful for things like microservice decomposition or validating module boundaries.

How it works

The library uses the SemanticCouplingCalculator as its main entry point. You provide:

  • A map of file names to their raw source code contents
  • The programming language (currently only Java)
  • The natural language context (currently only English)

The calculator returns a list of SemanticCoupling objects representing detected relationships with similarity scores.

Behind the scenes, it analyzes the textual content of your code (variable names, method names, comments, string literals) and computes semantic similarity using NLP techniques. Files that use similar terminology and concepts score higher, even if they’re not structurally coupled.

flowchart LR
    A[Source Files] --> B[Token Extraction]
    B --> C[NLP Processing]
    C --> D[Similarity Scoring]
    D --> E[Coupling Results]
The semantic coupling analysis pipeline

Use cases

Semantic coupling analysis is particularly useful for:

  • Microservice Decomposition: Identifying which classes belong together conceptually, even if they don’t have direct dependencies
  • Code Review: Spotting files that should be reviewed together because they deal with related concepts
  • Architecture Validation: Checking if module boundaries align with semantic groupings
  • Legacy Code Understanding: Discovering implicit relationships in unfamiliar codebases

Tech stack

  • Language: Kotlin

The API provides both Kotlin and Java interfaces, making it easy to integrate into existing toolchains regardless of which JVM language you’re using.

CA