Doppelgänger

Interactive code clone detection tool for identifying and eliminating duplicated Java code.

What is it?

Doppelgänger is an interactive application I built to identify and help eliminate duplicated code in Java applications. It combines backend AST (Abstract Syntax Tree) analysis with visual exploration tools to make code clone detection both thorough and actionable.

The tool analyzes both local projects and those hosted on public Git repositories, providing an interactive interface to explore detected clones and work toward refactoring them.

GitHub: https://github.com/loehnertz/Doppelgaenger

Why I built it

Code duplication is one of those things that everyone knows is bad but somehow keeps happening anyway. You copy a method to tweak it slightly, you duplicate logic because it’s faster than abstracting it properly, and before you know it, you’re fixing the same bug in five different places.

I wanted a tool that would make it easy to spot these duplicates, understand their relationships, and systematically refactor them. Most code analysis tools either give you a list of line numbers (not helpful) or cost a fortune (also not helpful for side projects).

Clone detection types

Doppelgänger supports three levels of code clone detection:

Type One: Exact Copies – Identical code ignoring only whitespace and comments. The “I literally copy-pasted this” clones.

Type Two: Syntactical Similarities – Structurally similar code allowing identifier variations (like different variable names). The “I copy-pasted this and renamed some things” clones.

Type Three: Modified Copies – Code copies with modified, added, or removed statements. The “I copy-pasted this and changed a few lines” clones.

You can configure similarity and mass thresholds to balance detection accuracy against analysis speed, which is important for large codebases where you don’t want to wait forever for results.

How it works

The backend uses AST parsing and hashing to identify code clones. This is more reliable than simple text comparison because it understands code structure rather than just matching strings.

The frontend then lets you visually explore the detected clones – see where they are, how similar they are, and which ones would have the biggest impact if refactored.

Key features:

Detect duplicated source code across Java projects
Analyze local projects or public Git repositories
Advanced visualization interface for exploring clones
Interactive refactoring workflow
Configurable detection sensitivity

Tech stack

Backend:

Kotlin using the Ktor web framework
AST parsing and hashing for clone identification
Configurable similarity algorithms

Frontend:

Vue.js for the interactive visualization layer
Real-time exploration of detected clones

Deployment:

Docker containers orchestrated via docker-compose
Self-contained environment for easy setup

What I learned

Building Doppelgänger reinforced that finding code clones is the easy part – the hard part is making the results actionable. A list of 500 duplicates is overwhelming. An interactive visualization that lets you filter, sort, and prioritise based on impact is actually useful.

I also learned that different clone types matter for different reasons. Type One clones are the easiest to fix but often the least harmful (they’re usually intentional). Type Three clones are the most insidious because they’re similar enough that bugs propagate but different enough that fixes don’t automatically apply.

Licensed under MIT, the project provides a practical, open-source alternative to commercial code analysis tools. It’s been satisfying to see people actually use it to clean up their codebases.