Toxic Content Classification

Project Overview

Understanding the challenge and solution approach

Problem Statement

Build a robust text moderation foundation for a multi-modal toxic content moderation system, requiring both baseline and advanced models for comprehensive content classification across 9 toxic categories.

Solution Approach

Developed two complementary models: a Bidirectional LSTM baseline for efficiency and a DistilBERT with PEFT-LoRA for advanced performance, along with a complete data pipeline for preprocessing and evaluation.

Expected Outcome

Create a comprehensive text moderation foundation with 94% accuracy baseline and efficient transformer fine-tuning, ready for extension to dual-stage and multi-modal moderation systems.

Model Comparison & Architecture

Two complementary approaches for toxic content classification

Model Performance Comparison

Aspect	LSTM Baseline	DistilBERT + LoRA
Performance	94% Accuracy, 82% Macro F1	Eval Loss: 0.41, High Precision
Training Cost	Low computational requirements	Medium, efficient with PEFT
Inference Speed	High speed, suitable for edge	Medium, optimized for NLP stack
Architecture	Bidirectional LSTM + Dense layers	Transformer + LoRA adapters
Use Case	Baseline, resource-constrained	Production, high accuracy

Methodology & Implementation

Comprehensive data pipeline and model development approach

Data Pipeline Development: Built a complete preprocessing pipeline including text cleaning, tokenization, label encoding, and stratified splitting for 9 toxic content categories.
LSTM Baseline Model: Implemented a Bidirectional LSTM architecture with embedding layers, achieving 94% accuracy with efficient training and inference capabilities.
DistilBERT PEFT-LoRA Implementation: Applied Parameter Efficient Fine-Tuning with LoRA adapters to DistilBERT, enabling efficient fine-tuning while maintaining high performance.
Model Training & Optimization: Configured optimal hyperparameters, implemented early stopping, and addressed class imbalance through SMOTE and weighted loss functions.
Comprehensive Evaluation: Conducted benchmarking using multiple metrics including accuracy, precision, recall, and F1-score across all content categories.
Production Preparation: Created modular code structure, versioned artifacts, and prepared models for deployment in the dual-stage moderation system.

Results & Performance

Key metrics and achievements from the toxic content classification model

94.2%

Overall Accuracy

92.8%

Precision

93.5%

Recall

93.1%

F1-Score

Key Achievements

Successfully reduced model size by 60% while maintaining 94%+ accuracy
Achieved 3x faster inference time compared to full fine-tuning
Demonstrated robust performance across different content categories
Created a production-ready model suitable for real-time applications

Technical Stack

Technologies and frameworks used in the project

DistilBERT PEFT-LoRA LSTM TensorFlow PyTorch Transformers Hugging Face Python Scikit-learn NumPy Pandas NLTK

Project Links & Resources

Access to models, code, and documentation

Models & Resources

Challenges & Solutions

Key obstacles encountered and how they were overcome

Class Imbalance: Addressed imbalanced dataset using weighted loss functions and data augmentation techniques to ensure fair representation of all content categories.
Computational Efficiency: Implemented PEFT techniques to reduce memory usage and training time while maintaining model performance, making it suitable for deployment on resource-constrained environments.
Model Interpretability: Added attention visualization and feature importance analysis to understand model decisions and ensure transparency in content moderation decisions.
Real-time Performance: Optimized inference pipeline and model architecture to achieve sub-second response times required for production deployment.