When we started working with a Dakar-based fintech in early 2025, we knew that language would be a critical barrier. Their customer service interactions were overwhelmingly in Wolof – a language spoken by over 10 million people but with almost no digital language resources. This article shares how we built a sentiment analysis model from scratch, the lessons learned, and the impact on financial inclusion.
The challenge: low-resource languages
Francophone Africa is home to hundreds of languages, but most NLP models are trained on English, French, or other high-resource languages. Wolof, like many African languages, lacks large annotated corpora, pre-trained embeddings, or even basic tokenizers. Our client, a mobile money operator, wanted to automatically analyse customer feedback to detect frustration and improve service. They were receiving thousands of messages daily in Wolof, French, and a mix of both.
"We needed to understand our customers – many of whom prefer Wolof – to reduce churn and build trust. Off-the-shelf solutions failed completely." – Moussa Diop, Head of Product
Our approach: build from the ground up
We adopted a three-stage strategy:
- Corpus construction: We worked with local linguists to annotate 50,000 WhatsApp messages in Wolof, covering sentiment (positive, negative, neutral) and intents.
- Transfer learning with multilingual models: We fine-tuned a multilingual transformer (AfriBERTa) on our Wolof corpus, achieving 82% accuracy.
- Edge deployment: The model was compressed and deployed on the operator's servers, running inference in real-time without cloud dependency.
Technical deep dive
We used a combination of HuggingFace's tokenizers and custom pre-processing for Wolof-specific phenomena (e.g., vowel length, contractions). The final model is a distilled version of AfriBERTa with 4 layers, running on CPUs. Below is a simplified training snippet:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("castorini/afriberta_large", num_labels=3)
tokenizer = AutoTokenizer.from_pretrained("castorini/afriberta_large")
# custom Wolof tokenization adjustments
...
Results and impact
Six months after deployment, the fintech reported:
- 34% increase in customer engagement (more users interacting via messages).
- 28% reduction in complaints resolved within 24h.
- The model now also handles code-switching (Wolof-French).
The project opened our eyes to the immense opportunity in low-resource African languages. We've since started similar work in Hausa, Yoruba, and Swahili, and we're releasing our Wolof corpus as open-source to encourage further research.
Lessons for the field
For data scientists working on African languages, here are three takeaways:
- Involve native speakers from day one – language is cultural, not just linguistic.
- Don't assume cloud – edge deployment is often necessary for latency and sovereignty.
- Start small, iterate – even a few thousand annotated messages can yield a useful model.
We believe that language should never be a barrier to accessing financial services, healthcare, or information. If you're working on similar challenges, we'd love to collaborate.