Architecture & Design

Scans

Blocked

Safe

System Architecture

User Prompt

→

Security Gateway

→

Analysis Engine

→

Rule Matching

→

Sanitization

→

Safe? → LLM

Blocked → Log

Frontend

PHP 8 + Vanilla JS

IBM Plex Sans/Mono · Chart.js 4 · Font Awesome 6

Analysis Engine

PHP Rule Engine

Regex, Keyword, Phrase matching

Database

MySQL (DianaHost)

7 tables · InnoDB · 44 detection rules

Visualization

Chart.js 4

Trend lines, Doughnut, KPI counters

Database Schema

Table	Purpose	Key Columns
attack_categories	5 attack taxonomy categories	name, slug, severity_weight, color
rules	Detection rules with patterns	pattern, pattern_type, severity, severity_score
prompt_logs	All analyzed prompts	prompt_text, risk_score, verdict
rule_matches	Rule-to-log junction	log_id, rule_id, matched_text
sanitization_log	Sanitization transformations	original_fragment, sanitized_fragment
settings	System configuration	setting_key, setting_value, setting_type

                attack_categories 1──∞ rules

                prompt_logs 1──∞ rule_matches

                rules 1──∞ rule_matches

                prompt_logs 1──∞ sanitization_log

Rule Engine Pipeline

Input Validation

Check prompt length, encoding, and emptiness

Rule Evaluation

Match against 44 active rules (regex, keyword, phrase)

Risk Scoring

Calculate weighted risk: Σ(rule_score × category_weight), cap at 100

Sanitization

Strip PII (SSN, CC, email, phone), remove injection tokens

Verdict & Logging

Safe (≤30) → Pass | Suspicious (31–65) → Warn | Blocked (>65) → Deny + Log to DB

Attack Categories

API Reference

Method	Endpoint	Description	Parameters
POST	/api/analyze.php	Analyze a prompt for threats	{ prompt, source, activity_id, destination_model }
GET	/api/rules.php	List all detection rules + categories	?id=X (optional)
POST	/api/rules.php	Create a new detection rule	{ name, pattern, severity, category_id, ... }
PUT	/api/rules.php?id=X	Update a rule	{ name, pattern, ... }
DELETE	/api/rules.php?id=X	Delete a rule	—
GET	/api/logs.php	Fetch prompt logs with filters	?verdict=&category=&search=&date_from=&date_to=
GET	/api/stats.php	Dashboard statistics + time series	—
GET	/api/activities.php	List test bench activity sessions	?id=X (optional)
POST	/api/activities.php	Create a new activity session	{ name, description, user_model, destination_model }
GET	/api/settings.php?providers=1	Get AI provider status (configured/unconfigured)	—
PUT	/api/settings.php	Update API keys / risk thresholds / model config	{ settings: { key: value } }

Risk Scoring Model

Formula

                risk_score = min(100, Σ(rule.severity_score × category.weight))

                // Category Weights:

                Harmful Intent: 1.80×

                Jailbreak: 1.50×

                System Override: 1.40×

                PII Exposure: 1.30×

                Social Engineering: 1.20×

Verdict Thresholds

SAFE

No threats detected, prompt allowed

0–30

SUSPICIOUS

Potential risk, sanitized and warned

31–65

BLOCKED

High-risk content, prompt denied

66–100