Building Sentiment Analysis Models with TensorFlow and Flask: My Internship Experience

December 18, 2024 (11mo ago)

Building Sentiment Analysis Models with TensorFlow and Flask: My Internship Experience

During my internship at Norsys Afrique, I had the incredible opportunity to work on a sentiment analysis project that would become one of my most valuable learning experiences. This project not only solidified my passion for practical AI applications but also taught me the importance of bridging the gap between theoretical knowledge and real-world implementation.

The Challenge: Understanding Customer Sentiment at Scale

Norsys Afrique needed a system to analyze customer feedback from multiple channels:

The goal was to automatically categorize sentiment and provide actionable insights to improve customer experience and business decisions.

Technical Approach

1. Data Collection and Preprocessing

The first challenge was gathering and cleaning diverse text data:

import pandas as pd
import re
from textblob import TextBlob
import nltk
from nltk.corpus import stopwords
 
class DataPreprocessor:
    def __init__(self):
        self.stop_words = set(stopwords.words('french', 'arabic', 'english'))
        
    def clean_text(self, text):
        # Remove special characters and normalize
        text = re.sub(r'[^\w\s]', '', text)
        text = text.lower()
        
        # Remove stopwords
        words = text.split()
        words = [word for word in words if word not in self.stop_words]
        
        return ' '.join(words)
    
    def preprocess_dataset(self, df):
        df['cleaned_text'] = df['text'].apply(self.clean_text)
        return df

2. Model Architecture with TensorFlow

I designed a hybrid approach combining traditional NLP techniques with deep learning:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
 
class SentimentAnalysisModel:
    def __init__(self, vocab_size=10000, max_length=100):
        self.vocab_size = vocab_size
        self.max_length = max_length
        self.tokenizer = Tokenizer(num_words=vocab_size)
        
    def build_model(self):
        model = Sequential([
            Embedding(self.vocab_size, 128, input_length=self.max_length),
            LSTM(64, return_sequences=True),
            Dropout(0.3),
            LSTM(32),
            Dropout(0.3),
            Dense(16, activation='relu'),
            Dense(3, activation='softmax')  # Positive, Negative, Neutral
        ])
        
        model.compile(
            optimizer='adam',
            loss='categorical_crossentropy',
            metrics=['accuracy']
        )
        
        return model
    
    def train_model(self, X_train, y_train, X_val, y_val):
        model = self.build_model()
        
        # Tokenize and pad sequences
        X_train_seq = self.tokenizer.texts_to_sequences(X_train)
        X_train_padded = pad_sequences(X_train_seq, maxlen=self.max_length)
        
        X_val_seq = self.tokenizer.texts_to_sequences(X_val)
        X_val_padded = pad_sequences(X_val_seq, maxlen=self.max_length)
        
        # Train the model
        history = model.fit(
            X_train_padded, y_train,
            validation_data=(X_val_padded, y_val),
            epochs=50,
            batch_size=32,
            verbose=1
        )
        
        return model, history

3. Flask API Development

To make the model accessible, I created a RESTful API:

from flask import Flask, request, jsonify
import pickle
import numpy as np
 
app = Flask(__name__)
 
# Load the trained model
model = tf.keras.models.load_model('sentiment_model.h5')
tokenizer = pickle.load(open('tokenizer.pkl', 'rb'))
 
@app.route('/predict', methods=['POST'])
def predict_sentiment():
    try:
        data = request.get_json()
        text = data['text']
        
        # Preprocess the text
        cleaned_text = preprocess_text(text)
        
        # Tokenize and pad
        sequence = tokenizer.texts_to_sequences([cleaned_text])
        padded_sequence = pad_sequences(sequence, maxlen=100)
        
        # Make prediction
        prediction = model.predict(padded_sequence)
        sentiment = np.argmax(prediction[0])
        
        # Map to sentiment labels
        sentiment_labels = {0: 'Negative', 1: 'Neutral', 2: 'Positive'}
        confidence = float(np.max(prediction[0]))
        
        return jsonify({
            'sentiment': sentiment_labels[sentiment],
            'confidence': confidence,
            'text': text
        })
        
    except Exception as e:
        return jsonify({'error': str(e)}), 500
 
if __name__ == '__main__':
    app.run(debug=True)

Model Performance and Results

Training Results:

Business Impact:

Challenges and Solutions

1. Multilingual Support

Challenge: Handling French, Arabic, and English text Solution: Implemented language detection and separate preprocessing pipelines

2. Imbalanced Dataset

Challenge: More neutral samples than positive/negative Solution: Used SMOTE for oversampling and class weights in training

3. Real-time Processing

Challenge: API response time requirements Solution: Model optimization and caching strategies

# Caching implementation
from functools import lru_cache
 
@lru_cache(maxsize=1000)
def cached_prediction(text_hash, text):
    return model.predict(padded_sequence)

Key Learnings

Technical Skills Gained:

Soft Skills Developed:

Future Improvements

  1. Advanced Architectures: Experimenting with BERT and transformer models
  2. Real-time Streaming: Implementing Kafka for real-time sentiment analysis
  3. Multi-modal Analysis: Incorporating images and videos
  4. A/B Testing: Framework for model performance comparison

Code Repository Structure

sentiment-analysis/
├── data/
│   ├── raw/
│   └── processed/
├── models/
│   ├── sentiment_model.h5
│   └── tokenizer.pkl
├── src/
│   ├── preprocessing.py
│   ├── model_training.py
│   └── api.py
├── tests/
└── requirements.txt

Conclusion

This project was a turning point in my understanding of how AI can solve real business problems. The combination of theoretical knowledge with practical implementation taught me that successful AI projects require:

  1. Clear Problem Definition: Understanding the business need
  2. Quality Data: The foundation of any ML project
  3. Iterative Development: Continuous improvement and testing
  4. User-Centric Design: Building solutions that people actually want to use

The experience at Norsys Afrique reinforced my passion for building practical AI applications that make a real difference in business operations.


Want to discuss sentiment analysis or AI implementation? Connect with me on LinkedIn or reach out via email.