StegoToolkit

Text Steganography Extractor

Auto-detect and extract hidden messages from plain text using homoglyph, whitespace, and spacing methods. Instant anomaly detection. Watermark attribution for leak detection. SNOW-compatible. 100% client-side.

Paste Suspicious Text

Paste any text to scan for hidden steganographic data. Anomaly detection runs instantly as you type.

100% client-side. Your text never leaves your browser. Detects trailing whitespace (SNOW), inter-word spacing, Unicode space substitution, and homoglyphs.

How to Extract a Hidden Message from Text (5 steps)

  1. Paste the suspicious text — anomaly detection runs instantly as you type
  2. Review the Suspicion Score and anomaly breakdown to see which method is most likely
  3. Select the method to try (or use Auto-Detect to try all four automatically)
  4. Enter the password if the payload was encrypted, then click Extract
  5. Download the extracted payload or use Watermark Attribution to identify the recipient

Extraction Methods — What This Tool Detects

MethodTechniqueSpeedBest For
Auto-DetectTries all 4 methods in priority order: Homoglyph → Trailing WS → Unicode WS → Inter-WordTakes ~1 secondBest starting point — use when method is unknown
Unicode HomoglyphScans for Cyrillic/special characters in Latin text. Each glyph position encodes one bitInstant scanEnterprise watermarks, leak detection, CTF challenges
Trailing Whitespace (SNOW)Reads trailing space/tab per line. Compatible with standard stegsnow toolInstant scanCTF challenges, SNOW-encoded files
Unicode WhitespaceReads thin/hair/en/em space variants (U+2009/200A/2002/2003) for 2-bit encodingInstant scanHigh-capacity email/docs watermarks
Inter-Word SpacingCounts single vs double spaces between wordsInstant scanPlain text files and basic watermarks

Frequently Asked Questions

What does the Suspicion Score mean?

It's a 0–100 composite score measuring the probability that the text contains hidden data. Components: trailing whitespace density (0–25 pts), double-space density (0–20 pts), Unicode non-ASCII space count (0–25 pts), and homoglyph candidate count (0–30 pts). A score above 50 is highly likely steganographic. The score appears immediately as you paste text.

Can I decode stegsnow-encoded files?

Yes. Select Trailing Whitespace (SNOW) method. The decoder is fully compatible with stegsnow — files encoded by `stegsnow` on Linux are decoded correctly here. Note: stegsnow's ICE encryption is not supported (it uses a 1997 cipher). AES-256-GCM encrypted payloads encoded by this tool are decrypted correctly.

What is Watermark Attribution?

If the text was encoded using batch watermarking (each recipient got a unique copy), the decoded payload contains `WM:{id}:{recipient}`. Paste your watermark-key.json file and the decoder will instantly identify which recipient's copy this text came from. Essential for enterprise leak detection.

What does 'Encrypted' kind mean in the results?

Shannon entropy above 7.2 bits/byte indicates strongly encrypted or compressed data. This is almost certainly an AES-256-GCM encrypted payload. Enter the password to decrypt. If you don't have the password, the payload is unreadable.

Can I detect homoglyphs without decoding?

Yes. The homoglyph visualizer highlights every substituted character with its Unicode code point in a tooltip. This is useful for security auditing — checking if a document you received has invisible identifiers embedded.

Does the Document Cleaner remove all hidden data?

The Document Cleaner removes homoglyphs (replacing with standard Latin equivalents), trailing whitespace, and normalizes double spaces and Unicode space characters. It produces a clean version of the text with all known steganographic markers removed — safe to redistribute.