Cellula Technologies

NLP Engineer Intern

June 27 – August 27, 2025

As an NLP Engineer Intern at Cellula Technologies, I worked on developing and optimizing natural language processing models for toxic content classification and safety applications. Focused on implementing state-of-the-art transformer models and building production-ready AI solutions that enhance digital safety and content moderation.

Featured Projects

Three comprehensive projects showcasing advanced NLP techniques and multi-modal AI systems

Toxic Content Classification

Week 1

Developed two comprehensive toxic content classification models: a deep learning baseline using Bidirectional LSTM (94% accuracy) and a transformer-based model using DistilBERT with PEFT-LoRA fine-tuning. Built a complete data pipeline with preprocessing, tokenization, and evaluation for 9 toxic content categories.

DistilBERT PEFT-LoRA LSTM NLP Data Pipeline

View Project DistilBERT Model LSTM Model

Dual-Stage Toxic Moderation App

Week 2

Built a production-ready Streamlit application implementing a dual-stage moderation pipeline combining LlamaGuard for hard filtering and DistilBERT+LoRA for fine-grained classification. Integrated BLIP for image captioning to enable multi-modal content moderation with real-time safety assessments.

LlamaGuard BLIP Streamlit Multi-Modal Production App

View Project Live Demo Source Code

CodeGenBot - RAG Code Assistant

Week 3

Developed CodeGenBot, a retrieval-augmented code generation assistant using the HumanEval dataset. Implemented semantic search with Sentence Transformers and integrated DeepSeek-R1-Distill-Qwen-1.5B for context-aware Python code generation through a conversational Streamlit interface.

RAG CodeGen HumanEval Vector Search Chatbot

View Project Live Demo Source Code

Key Responsibilities

Core contributions and achievements during the internship

Developed two toxic content classification models: Bidirectional LSTM baseline (94% accuracy) and DistilBERT with PEFT-LoRA fine-tuning, including complete data pipeline with preprocessing and evaluation
Built a production-ready dual-stage moderation system combining LlamaGuard for hard filtering and DistilBERT+LoRA for fine-grained classification, with BLIP integration for multi-modal content analysis
Implemented a retrieval-augmented code generation assistant using HumanEval dataset, semantic search with Sentence Transformers, and DeepSeek-R1-Distill-Qwen-1.5B for context-aware Python code generation
Created modular, production-ready Streamlit applications with comprehensive error handling, real-time processing, and professional user interfaces
Conducted extensive benchmarking, model selection, and performance optimization, addressing class imbalance and ensuring real-world reliability

Skills & Technologies

Technologies, frameworks, and methodologies mastered during the internship

Machine Learning

PyTorch TensorFlow PEFT-LoRA LSTM Transformers

NLP & Vision Models

DistilBERT LlamaGuard BLIP DeepSeek-R1 Sentence Transformers

Development & Tools

Python Streamlit Git Hugging Face Vector Search

AI/ML Applications

Content Moderation RAG Systems Multi-Modal AI Code Generation Production Deployment

Certification

Official recognition of successful internship completion

NLP Engineer Internship Certificate

Successfully completed the NLP Engineer Internship at Cellula Technologies, demonstrating proficiency in transformer models, fine-tuning techniques, and building production-ready AI applications for content safety and moderation.

View Certificate

This internship was a transformative experience that deepened my understanding of natural language processing, multi-modal AI systems, and production-ready AI applications. Working on three distinct but interconnected projects allowed me to develop expertise across the entire AI development pipeline, from data preprocessing to production deployment.

I am particularly proud of the Dual-Stage Toxic Moderation App I developed in Week 2, which successfully integrated LlamaGuard, DistilBERT+LoRA, and BLIP to create a comprehensive multi-modal content moderation system. This project demonstrated my ability to combine multiple state-of-the-art models into a production-ready application with real-time processing capabilities.

The experience at Cellula Technologies has equipped me with the confidence and expertise to tackle complex AI challenges, from building robust data pipelines to implementing cutting-edge transformer models and deploying scalable AI applications that make a meaningful impact on digital safety and developer productivity.