Building Sentiment Analysis Models with TensorFlow and Flask: My Internship Experience
During my internship at Norsys Afrique, I had the incredible opportunity to work on a sentiment analysis project that would become one of my most valuable learning experiences. This project not only solidified my passion for practical AI applications but also taught me the importance of bridging the gap between theoretical knowledge and real-world implementation.
The Challenge: Understanding Customer Sentiment at Scale
Norsys Afrique needed a system to analyze customer feedback from multiple channels:
- Social media mentions
- Customer support tickets
- Product reviews
- Survey responses
The goal was to automatically categorize sentiment and provide actionable insights to improve customer experience and business decisions.
Technical Approach
1. Data Collection and Preprocessing
The first challenge was gathering and cleaning diverse text data:
import pandas as pd
import re
from textblob import TextBlob
import nltk
from nltk.corpus import stopwords
class DataPreprocessor:
def __init__(self):
self.stop_words = set(stopwords.words('french', 'arabic', 'english'))
def clean_text(self, text):
# Remove special characters and normalize
text = re.sub(r'[^\w\s]', '', text)
text = text.lower()
# Remove stopwords
words = text.split()
words = [word for word in words if word not in self.stop_words]
return ' '.join(words)
def preprocess_dataset(self, df):
df['cleaned_text'] = df['text'].apply(self.clean_text)
return df2. Model Architecture with TensorFlow
I designed a hybrid approach combining traditional NLP techniques with deep learning:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
class SentimentAnalysisModel:
def __init__(self, vocab_size=10000, max_length=100):
self.vocab_size = vocab_size
self.max_length = max_length
self.tokenizer = Tokenizer(num_words=vocab_size)
def build_model(self):
model = Sequential([
Embedding(self.vocab_size, 128, input_length=self.max_length),
LSTM(64, return_sequences=True),
Dropout(0.3),
LSTM(32),
Dropout(0.3),
Dense(16, activation='relu'),
Dense(3, activation='softmax') # Positive, Negative, Neutral
])
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
return model
def train_model(self, X_train, y_train, X_val, y_val):
model = self.build_model()
# Tokenize and pad sequences
X_train_seq = self.tokenizer.texts_to_sequences(X_train)
X_train_padded = pad_sequences(X_train_seq, maxlen=self.max_length)
X_val_seq = self.tokenizer.texts_to_sequences(X_val)
X_val_padded = pad_sequences(X_val_seq, maxlen=self.max_length)
# Train the model
history = model.fit(
X_train_padded, y_train,
validation_data=(X_val_padded, y_val),
epochs=50,
batch_size=32,
verbose=1
)
return model, history3. Flask API Development
To make the model accessible, I created a RESTful API:
from flask import Flask, request, jsonify
import pickle
import numpy as np
app = Flask(__name__)
# Load the trained model
model = tf.keras.models.load_model('sentiment_model.h5')
tokenizer = pickle.load(open('tokenizer.pkl', 'rb'))
@app.route('/predict', methods=['POST'])
def predict_sentiment():
try:
data = request.get_json()
text = data['text']
# Preprocess the text
cleaned_text = preprocess_text(text)
# Tokenize and pad
sequence = tokenizer.texts_to_sequences([cleaned_text])
padded_sequence = pad_sequences(sequence, maxlen=100)
# Make prediction
prediction = model.predict(padded_sequence)
sentiment = np.argmax(prediction[0])
# Map to sentiment labels
sentiment_labels = {0: 'Negative', 1: 'Neutral', 2: 'Positive'}
confidence = float(np.max(prediction[0]))
return jsonify({
'sentiment': sentiment_labels[sentiment],
'confidence': confidence,
'text': text
})
except Exception as e:
return jsonify({'error': str(e)}), 500
if __name__ == '__main__':
app.run(debug=True)Model Performance and Results
Training Results:
- Accuracy: 87.3% on validation set
- Precision: 0.89 (Positive), 0.85 (Negative), 0.88 (Neutral)
- Recall: 0.87 (Positive), 0.89 (Negative), 0.86 (Neutral)
- F1-Score: 0.88 (Positive), 0.87 (Negative), 0.87 (Neutral)
Business Impact:
- Processing Speed: 1000x faster than manual analysis
- Consistency: 95% agreement with human annotators
- Scalability: Can process 10,000+ texts per hour
- Cost Reduction: 80% reduction in manual sentiment analysis costs
Challenges and Solutions
1. Multilingual Support
Challenge: Handling French, Arabic, and English text Solution: Implemented language detection and separate preprocessing pipelines
2. Imbalanced Dataset
Challenge: More neutral samples than positive/negative Solution: Used SMOTE for oversampling and class weights in training
3. Real-time Processing
Challenge: API response time requirements Solution: Model optimization and caching strategies
# Caching implementation
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_prediction(text_hash, text):
return model.predict(padded_sequence)Key Learnings
Technical Skills Gained:
- TensorFlow/Keras: Deep learning model development
- NLP Techniques: Text preprocessing, tokenization, embedding
- API Development: Flask, RESTful services, error handling
- Model Deployment: Production considerations, performance optimization
Soft Skills Developed:
- Problem Solving: Breaking down complex requirements
- Communication: Explaining technical concepts to non-technical stakeholders
- Project Management: Managing timelines and deliverables
- Documentation: Creating comprehensive technical documentation
Future Improvements
- Advanced Architectures: Experimenting with BERT and transformer models
- Real-time Streaming: Implementing Kafka for real-time sentiment analysis
- Multi-modal Analysis: Incorporating images and videos
- A/B Testing: Framework for model performance comparison
Code Repository Structure
sentiment-analysis/
├── data/
│ ├── raw/
│ └── processed/
├── models/
│ ├── sentiment_model.h5
│ └── tokenizer.pkl
├── src/
│ ├── preprocessing.py
│ ├── model_training.py
│ └── api.py
├── tests/
└── requirements.txt
Conclusion
This project was a turning point in my understanding of how AI can solve real business problems. The combination of theoretical knowledge with practical implementation taught me that successful AI projects require:
- Clear Problem Definition: Understanding the business need
- Quality Data: The foundation of any ML project
- Iterative Development: Continuous improvement and testing
- User-Centric Design: Building solutions that people actually want to use
The experience at Norsys Afrique reinforced my passion for building practical AI applications that make a real difference in business operations.
Want to discuss sentiment analysis or AI implementation? Connect with me on LinkedIn or reach out via email.