Dual-Stage Toxic Moderation App

Production-ready multi-modal content moderation with LlamaGuard and DistilBERT+LoRA

Week 2
Duration: 1 week
Dual-Stage, LlamaGuard, DistilBERT+LoRA, BLIP

Project Overview

Understanding the challenge and solution approach

Problem Statement

Build a production-ready dual-stage moderation system that can analyze both text and images for harmful content, providing comprehensive safety assessments with real-time processing capabilities for digital platforms.

Solution Approach

Implemented a dual-stage pipeline combining LlamaGuard for hard filtering and DistilBERT+LoRA for fine-grained classification, with BLIP integration for multi-modal content analysis through a modular Streamlit application.

Expected Outcome

Create a production-ready dual-stage moderation system capable of detecting harmful content across multiple modalities with high accuracy, real-time performance, and comprehensive safety assessments.

Quick Links & How to Run

Live demo, repo and local setup for reproducing the Week 2 app

How to run locally

Reproduce the Streamlit app locally using the project requirements and a local copy of the fine-tuned model.

pip install -r requirements.txt
# copy .env.example to .env and add your OpenRouter API key
# place model weights under the path used by the app (see README)
streamlit run app_streamlit.py

Model files used in Week 2 are expected at: C:/Users/NightPrince/OneDrive/Desktop/Cellula-Internship/Week1/peft-distilbert-toxic-classifier/last-checkpoint/ — update paths in the app if needed.

Week 2 Highlights

What was achieved during the second week of the internship

  • Modular dual-stage moderation pipeline: Llama Guard (hard filter) followed by DistilBERT+LoRA (9-class classifier).
  • Multi-modal support via BLIP image captioning to handle images through the same text pipeline.
  • Production-ready Streamlit UI with clear feedback, class probabilities, and error handling.
  • Addressed class imbalance using SMOTE, oversampling, and class weights; documented experiments in reports.
  • Deployed a public demo on Hugging Face Spaces for evaluation and sharing.

Week 2 File Structure (excerpt)

Week2/
├── app_streamlit.py         # Main Streamlit app
├── pipeline/                # Modular pipeline
│   ├── blip_caption.py
│   ├── llama_guard.py
│   └── toxic_classifier.py
├── requirements.txt
├── .env.example
├── README.md
└── internship_week2_report.html
                

System Architecture

How the multi-modal safety system works

Dual-Stage Architecture

Pre-processing Stage

  • Accepts raw text and image inputs
  • Applies tokenization and image preprocessing
  • Handles different input formats (text, image, URL)

Analysis Stage

  • LlamaGuard for text safety analysis
  • BLIP for image captioning and content analysis
  • Separate pipelines for text and image

Safety Assessment

  • Combines results from both stages
  • Generates a final safety score
  • Outputs safety labels (e.g., "Safe", "Unsafe")

Modular Pipeline

Input Processing

Accepts text and image inputs through Streamlit interface

Text Analysis

LlamaGuard analyzes text for safety violations

Image Analysis

BLIP generates captions and analyzes image content

Safety Assessment

Combines results for comprehensive safety evaluation

Methodology & Implementation

Step-by-step approach to building the multi-modal safety system

  1. Dual-Stage Architecture Design: Designed a modular dual-stage pipeline with Stage 1 (LlamaGuard hard filter) and Stage 2 (DistilBERT+LoRA fine-grained classification) for comprehensive content moderation.
  2. Stage 1 - LlamaGuard Integration: Integrated LlamaGuard via OpenRouter API for instant hard filtering of legally or ethically unsafe content, ensuring only 'safe' or 'unsafe' responses for maximum reliability.
  3. Stage 2 - DistilBERT+LoRA Implementation: Deployed the fine-tuned DistilBERT model with PEFT-LoRA for nuanced 9-class toxic content classification, addressing class imbalance through SMOTE and weighted loss functions.
  4. Multi-Modal BLIP Integration: Implemented BLIP model for image captioning, enabling visual content moderation by converting images to text and processing through the same dual-stage pipeline.
  5. Modular Pipeline Development: Built a clean, modular architecture with separate components for BLIP captioning, LlamaGuard filtering, and toxic classification, ensuring maintainability and extensibility.
  6. Production-Ready Streamlit App: Created a robust Streamlit application with comprehensive error handling, real-time processing, professional UI, and support for both text and image inputs.

Results & Performance

Key metrics and achievements from the multi-modal safety system

Dual-Stage
Moderation Pipeline
9 Categories
Toxic Classification
Multi-Modal
Text & Image Support
Production-Ready
Live Demo Available

Key Achievements

  • Successfully implemented a production-ready dual-stage moderation system with real-time processing
  • Created a modular, extensible pipeline architecture with clean separation of concerns
  • Integrated multiple state-of-the-art models (LlamaGuard, DistilBERT+LoRA, BLIP) into a unified system
  • Built a comprehensive Streamlit application with professional UI and robust error handling
  • Deployed live demo on Hugging Face Spaces for public access and testing

Technical Stack

Technologies and frameworks used in the project

LlamaGuard DistilBERT+LoRA BLIP Streamlit OpenRouter API PEFT Transformers Python Modular Pipeline Multi-Modal

Project Links & Resources

Access to live demo, code, and documentation

Challenges & Solutions

Key obstacles encountered and how they were overcome

  • Dual-Stage Pipeline Integration: Successfully integrated LlamaGuard (Stage 1) and DistilBERT+LoRA (Stage 2) by creating a unified interface and handling different API formats and model outputs.
  • Multi-Modal Processing: Implemented BLIP image captioning and developed algorithms to effectively combine text and image analysis results for comprehensive safety evaluation.
  • Modular Architecture Design: Designed a clean, modular pipeline with separate components for BLIP captioning, LlamaGuard filtering, and toxic classification, ensuring maintainability and extensibility.
  • Production Deployment: Optimized model loading, inference pipelines, and error handling to achieve real-time performance while maintaining accuracy for production deployment on Hugging Face Spaces.
Back to Cellula Technologies