I Built a Vision AI That Blocks Blockchain Attacks Invisible to Every Text-Based Security System — From Ouagadougou, Burkina Faso
How a fine-tuned Vision-Language Model on AMD MI300X detected a Drain Star attack that scored 0.42 on every existing system — and blocked it in 48ms.
The Attack That Slipped Through Everything
It was 14:32:01 UTC. Nine wallets simultaneously initiated transfers toward a single hub address. Each transfer: $0.48 USDC. Each one, individually, looked completely normal.
The receiving hub had no prior negative reputation. The sending wallets had clean histories. The transaction amounts were below average for each sender. Frequency was moderate. No single transaction triggered any rule.
Every existing text-based security system would have scored this as R = 0.42 — ambiguous, borderline. In the best case: escalated to human review. In the worst case: allowed through.
$4.32 USDC drained. Nine victims. Zero alerts.
This is a Drain Star attack — and it's completely invisible to text-based analysis.
Until now.
The Problem Nobody Is Solving
I've been building blockchain security infrastructure for the past 6 months. What I kept running into was a fundamental gap that nobody in the ecosystem had properly addressed.
Traditional security systems — both rule-based and ML-based — analyze transactions one at a time. They look at:
Amount (normal? abnormal?)
Frequency (too fast?)
Destination (known bad actor?)
Smart contract (audited?)
This works fine for simple attacks. But coordinated attacks don't look dangerous at the individual transaction level. The attack signature is topological — it exists in the relationship between transactions, not in any single one.
A Drain Star attack with 9 wallets sending $0.48 each is:
9 normal-looking amounts ✓
9 normal frequencies ✓
9 transactions to the same destination (suspicious? maybe)
1 hub address with no prior bad history ✓
Text-based score: R = 0.42. ESCALATE. Maybe a human catches it. Maybe not.
The Insight That Changed Everything
Here's what I realized: these attacks are visually obvious.
When you render the transaction subgraph as an image — wallets as nodes, transfers as directed edges — a Drain Star looks exactly like a star. Nine spokes pointing to one hub. Red hub node (high degree centrality). Purple edges (USDC transfers).
A human security analyst looking at this graph for 3 seconds would immediately say: "That's a drain star. Block it."
So the question became: can a Vision-Language Model do the same thing in 48 milliseconds?
The answer is yes. But you have to train it first.
Building the Dataset: 1 Million Transaction Graphs
I couldn't fine-tune a VLM on generic data. The model needed to see real blockchain attack topologies, thousands of times, until it learned to recognize them instantly.
So I built Sigui-DePIN-1M: 1,000,000 annotated blockchain transaction graph images derived from 1.87 million real on-chain transactions from Ethereum, Arbitrum, and Polygon.
The construction pipeline:
Step 1 — Data collection
Raw transaction subgraphs extracted via archive node queries. For each seed transaction: depth-2 ego-network spanning a 10-block window.
Step 2 — Visualization
Each subgraph rendered as a 672×672 PNG using NetworkX and Matplotlib:
Background: #0d0f2b (deep indigo)
Hub nodes (degree > 3): Red
Leaf nodes: Green
USDC transfer edges: Purple (width proportional to amount)
4 layout algorithms in rotation to prevent overfitting
Step 3 — Labeling
Four classes with graph-theoretic criteria:
Code
Inter-rater agreement (Cohen's κ = 0.91) validated by manual review of 10,000 samples.
Result: 11.1GB of the most specialized blockchain security dataset in existence.
📊 Dataset: huggingface.co/datasets/Ibonon/sigui-depin-1m
Fine-Tuning Imina-Na on AMD MI300X
I named the model Imina Na — the Grand Mask of the Awa society in Dogon cosmology. The most sacred layer.
Base model: Qwen2-VL-2B-Instruct. Fine-tuning method: LoRA (r=16, α=32) targeting the attention projections. Hardware: AMD Instinct MI300X (192GB HBM3) via AMD Developer Cloud.
Code
The results:
Metric
Base Qwen2-VL
Imina-Na V2 (fine-tuned)
Overall F1
88.1%
92.9%
Drain Star recall
82.3%
95.2%
False positives
12.0%
3.1%
Inference latency
48ms
48ms (unchanged)
The fine-tuning added 4.8 F1 points and cut the false positive rate by 75%. Same latency. No tradeoffs.
🤖 Model: huggingface.co/Ibonon/imina_na_v2_lora
The Full Pipeline: Sigui Protocol
Imina Na is Layer 5 of a 5-layer security oracle I call Sigui Protocol — named after the Dogon ceremony of cosmic renewal that happens every 60 years.
Code
The risk formula:
Python
Back to the Attack: What Actually Happened
Let me finish the story from the beginning.
The attack:
9 coordinated wallets → 1 hub
Each transfer: $0.48 USDC
All wallets: clean history
Hub: no prior bad reputation
Text-only pipeline (Layers 1-4):
Code
The attack would have been escalated to human review. Maybe blocked. Maybe not. Depends on whether a human was watching at 14:32:01 UTC.
With Imina Na (Layer 5):
The transaction subgraph is rendered in 3ms. Nine inbound edges converging on a single red hub node. The topology is immediately recognizable.
Imina Na's response:
Json
Final score:
Code
Decision time: 48ms total.
The attack was blocked. Nine wallets protected.
The attack was invisible to text. Imina Na saw it.
ERC-8259: Standardizing Agent Trust
Building Sigui made me realize there's a deeper problem: there's no standard for AI agent identity and reputation on Ethereum.
When an agent makes a transaction, there's no way for a smart contract to:
Verify which AI model is operating the wallet
Query a behavioral trust score
Receive standardized threat signals from security oracles
I submitted ERC-8259 to Ethereum Magicians to address this. It proposes three interfaces:
Solidity
The standard received substantive feedback from Pablo at AHM, who operates behavioral reputation for ~20,000 agent wallets on Base mainnet. Their operational insight about confidence-aware reputation (observation density ≠ score) is now incorporated into the spec.
💬 Discussion: ethereum-magicians.org/t/erc-8259
Using Sigui in Your Agent (3 Lines of Code)
The whole system is available as an open-source Python SDK:
Bash
Python
For LangChain agents:
Python
For OpenClaw agents:
Bash
The skill returns deterministic verdicts (ALLOW / BLOCK / ESCALATE) with exit codes for programmatic handling. Starknet, Aptos, and Ethereum are supported natively.
🔧 OpenClaw Skill: clawhub.ai/ibonon/sigui-security
The Numbers So Far
Built solo, from Ouagadougou, Burkina Faso:
380+ attacks blocked and permanently recorded on Arc L1 testnet
1,000,000 transaction graphs in the open-source dataset
273 PyPI downloads in the first month (sigui-sdk)
115 OpenClaw skill installs in 4 days
133 dataset downloads on HuggingFace
Starknet Foundation Grant: passed pre-screening, in deep technical screening
ERC-8259: under active community review on Ethereum Magicians
What's Next: Trustformer
The 48ms latency of Imina Na comes largely from the graph-rendering pipeline (~28ms of overhead). The model inference itself is only ~20ms.
I'm proposing Trustformer (T-GAT) — a native Spatio-Temporal Graph Attention Transformer that eliminates the rendering step entirely. Instead of converting transactions to images, it processes them as raw token sequences with a novel Dual Flow-Reputation Attention mechanism:
Python
Projected latency: < 5ms. Same F1. No rendering overhead.
Preprint: drive.google.com/file/d/1t9CklAUMLQ9wZgcqLrrvMtTYSGOYYos_
(I'm currently seeking an arXiv endorser for cs.CR or cs.AI — endorsement code P4EHO3 at arxiv.org/auth/endorse. If you hold endorser status and find this work valuable, I'd be honored by your support.)
Open Source, All of It
Everything is MIT-licensed and available today:
Resource
Link
🐙 GitHub
github.com/ibonon/Sigui
📦 SDK (PyPI)
pypi.org/project/sigui-sdk
🧠 Model V2
huggingface.co/Ibonon/imina_na_v2_lora
📊 Dataset
huggingface.co/datasets/Ibonon/sigui-depin-1m
🔧 OpenClaw Skill
clawhub.ai/ibonon/sigui-security
💬 ERC-8259
ethereum-magicians.org/t/erc-8259
📄 Preprint
Trustformer Architecture
The Bigger Picture
As AI agents become autonomous economic actors — managing wallets, executing DeFi strategies, interacting with smart contracts without human oversight — the security infrastructure protecting them was not designed for this world.
Traditional systems analyze transactions one at a time. Attackers coordinate across transactions. The gap is structural, not incremental.
Visual topology analysis is not the only answer. But it's a layer that was missing, and Sigui is the first system to operationalize it at production latency on real blockchain data.
If you're building AI agents that interact with blockchain infrastructure, I'd love to hear about your security challenges. Drop a comment or reach me on GitHub.
— Eric Warma · Independent Researcher · Ouagadougou, Burkina Faso
Tags: #blockchain #ai #security #machinelearning #python #web3 #opensource #defi
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (2)
Using vision to catch attacks that are invisible to text-based systems is a clever framing - a lot of malicious payloads (homoglyph addresses, UI-spoofing, look-alike contract names, deceptive approve dialogs) are designed to pass text parsing while fooling a human's eye, so analyzing what the user actually sees rather than just the underlying string can catch a class of attacks pure-text validation misses. The hard part is keeping the false-positive rate low enough that you're not blocking legit transactions, since in security a noisy detector gets turned off.
The bit I'd dig into: explainability of the block. A vision model flagging "this looks like an attack" is only actionable if it can point at what it saw (the swapped character, the spoofed element), otherwise users learn to click through the warning. Grounding the verdict in evidence rather than a black-box score is the same discipline I lean on in Moonshift - a multi-agent pipeline that takes a prompt to a deployed SaaS, where a verify layer has to justify a gate, not just assert it. Multi-model routing keeps a build ~$3 flat, first run's free no card. Genuinely interesting angle, and respect for shipping it from Ouagadougou. How are you handling false positives - confidence threshold, or a human-confirm step on a flagged tx before it's blocked?
@harjjotsinghh Thank you for the thoughtful comment — you've zeroed in on exactly the two hardest problems in production security AI: false positives and explainability. Let me address both directly.
On false positives:
The fine-tuned Imina-Na V2 operates at 3.1% false positive rate on the Sigui-DePIN-1M test set. That's down from 12.0% before fine-tuning. In practice, the 5-layer pipeline provides defense-in-depth: Layers 1-4 (behavioral, flow, service, contract) contribute additive risk deltas. Imina Na (Layer 5, vision) only fires if the cumulative score reaches a threshold. So a visually "suspicious" pattern that's economically legitimate (e.g., a known payroll distributor) would have its risk delta neutralized by the service reputation layer before reaching the decision engine.
The system also has a confidence-aware reputation model (built into ERC-8259) — observation density is separated from the score. An address with 3 transactions and a score of 0.9 gets a low-confidence flag. An address with 10,000 transactions and a score of 0.9 is treated as high-confidence. This prevents over-blocking based on sparse data.
On explainability:
You're absolutely right — a black-box "this looks suspicious" is useless in production. Imina Na returns a structured verdict that includes:
· The detected attack type (DRAIN_STAR, MIXING_CHAIN, etc.)
· A natural language justification generated by the VLM
· The topological features that triggered the detection (hub degree centrality, edge density, clustering coefficient)
For the next architecture iteration (Trustformer / T-GAT), I'm taking explainability even further. The Dual Flow-Reputation Attention mechanism produces an attention matrix that can be thresholded to extract the exact subgraph that triggered the BLOCK verdict. That subgraph becomes the symbolic witness for a ZK-STARK proof, so a smart contract can verify "this specific topology was detected" without re-running the model. The proof itself is the explanation.
On multi-agent verification:
Your Moonshift approach sounds aligned with what I'm building on the agent side. Sigui already has a self-critique loop via LangGraph + CrewAI — when the verdict is ESCALATE, a second model (Qwen2.5 or Claude) performs a deep audit. The Trustformer paper also proposes a reputation-weighted PBFT consensus across DePIN oracle nodes, so no single model instance is the final authority.
Really appreciate the kind words about shipping from Ouagadougou. Building globally from the Global South has its challenges, but the open-source community has been incredibly supportive. Would love to hear more about Moonshift's verify layer — how do you handle the justification-to-gate pipeline without introducing latency?