As an NLP Engineer Intern at Cellula Technologies, I worked on developing and optimizing natural language processing models for toxic content classification and safety applications. Focused on implementing state-of-the-art transformer models and building production-ready AI solutions that enhance digital safety and content moderation.
Three comprehensive projects showcasing advanced NLP techniques and multi-modal AI systems
Developed two comprehensive toxic content classification models: a deep learning baseline using Bidirectional LSTM (94% accuracy) and a transformer-based model using DistilBERT with PEFT-LoRA fine-tuning. Built a complete data pipeline with preprocessing, tokenization, and evaluation for 9 toxic content categories.
Built a production-ready Streamlit application implementing a dual-stage moderation pipeline combining LlamaGuard for hard filtering and DistilBERT+LoRA for fine-grained classification. Integrated BLIP for image captioning to enable multi-modal content moderation with real-time safety assessments.
Developed CodeGenBot, a retrieval-augmented code generation assistant using the HumanEval dataset. Implemented semantic search with Sentence Transformers and integrated DeepSeek-R1-Distill-Qwen-1.5B for context-aware Python code generation through a conversational Streamlit interface.
Core contributions and achievements during the internship
Technologies, frameworks, and methodologies mastered during the internship
Official recognition of successful internship completion
Successfully completed the NLP Engineer Internship at Cellula Technologies, demonstrating proficiency in transformer models, fine-tuning techniques, and building production-ready AI applications for content safety and moderation.
View CertificateThis internship was a transformative experience that deepened my understanding of natural language processing, multi-modal AI systems, and production-ready AI applications. Working on three distinct but interconnected projects allowed me to develop expertise across the entire AI development pipeline, from data preprocessing to production deployment.
I am particularly proud of the Dual-Stage Toxic Moderation App I developed in Week 2, which successfully integrated LlamaGuard, DistilBERT+LoRA, and BLIP to create a comprehensive multi-modal content moderation system. This project demonstrated my ability to combine multiple state-of-the-art models into a production-ready application with real-time processing capabilities.
The experience at Cellula Technologies has equipped me with the confidence and expertise to tackle complex AI challenges, from building robust data pipelines to implementing cutting-edge transformer models and deploying scalable AI applications that make a meaningful impact on digital safety and developer productivity.