Architecture & Design

Scans
Blocked
Safe

System Architecture

User Prompt
Security Gateway
Analysis Engine
Rule Matching
Rule Matching
Sanitization
Safe? → LLM
/
Blocked → Log
Frontend
PHP 8 + Vanilla JS
IBM Plex Sans/Mono · Chart.js 4 · Font Awesome 6
Analysis Engine
PHP Rule Engine
Regex, Keyword, Phrase matching
Database
MySQL (DianaHost)
7 tables · InnoDB · 44 detection rules
Visualization
Chart.js 4
Trend lines, Doughnut, KPI counters

Database Schema

TablePurposeKey Columns
attack_categories 5 attack taxonomy categories name, slug, severity_weight, color
rules Detection rules with patterns pattern, pattern_type, severity, severity_score
prompt_logs All analyzed prompts prompt_text, risk_score, verdict
rule_matches Rule-to-log junction log_id, rule_id, matched_text
sanitization_log Sanitization transformations original_fragment, sanitized_fragment
settings System configuration setting_key, setting_value, setting_type
attack_categories 1──∞ rules
prompt_logs 1──∞ rule_matches
rules 1──∞ rule_matches
prompt_logs 1──∞ sanitization_log

Rule Engine Pipeline

1
Input Validation
Check prompt length, encoding, and emptiness
2
Rule Evaluation
Match against 44 active rules (regex, keyword, phrase)
3
Risk Scoring
Calculate weighted risk: Σ(rule_score × category_weight), cap at 100
4
Sanitization
Strip PII (SSN, CC, email, phone), remove injection tokens
5
Verdict & Logging
Safe (≤30) → Pass  |  Suspicious (31–65) → Warn  |  Blocked (>65) → Deny + Log to DB

Attack Categories

API Reference

MethodEndpointDescriptionParameters
POST /api/analyze.php Analyze a prompt for threats { prompt, source, activity_id, destination_model }
GET /api/rules.php List all detection rules + categories ?id=X (optional)
POST /api/rules.php Create a new detection rule { name, pattern, severity, category_id, ... }
PUT /api/rules.php?id=X Update a rule { name, pattern, ... }
DELETE /api/rules.php?id=X Delete a rule
GET /api/logs.php Fetch prompt logs with filters ?verdict=&category=&search=&date_from=&date_to=
GET /api/stats.php Dashboard statistics + time series
GET /api/activities.php List test bench activity sessions ?id=X (optional)
POST /api/activities.php Create a new activity session { name, description, user_model, destination_model }
GET /api/settings.php?providers=1 Get AI provider status (configured/unconfigured)
PUT /api/settings.php Update API keys / risk thresholds / model config { settings: { key: value } }

Risk Scoring Model

Formula
risk_score = min(100, Σ(rule.severity_score × category.weight))

// Category Weights:
Harmful Intent: 1.80×
Jailbreak: 1.50×
System Override: 1.40×
PII Exposure: 1.30×
Social Engineering: 1.20×
Verdict Thresholds
SAFE
No threats detected, prompt allowed
0–30
SUSPICIOUS
Potential risk, sanitized and warned
31–65
BLOCKED
High-risk content, prompt denied
66–100