NLP Research Engineer · Speech

Abdou Aziz Diop

PhD Researcher · Efficient NLP & Speech · Dakar, Senegal

I build speech and language technology for low-resource African languages — with a focus on automatic speech recognition, neural machine translation, and self-supervised learning for speech.

Currently a PhD researcher at Université Iba Der Thiam de Thiès(UIDT), working on end-to-end Speech-to-Speech Translation for Wolof, Pulaar, and other West African languages. Lead Data Scientist at LAfricaMobile, where I led a team building NLU/NLG and end-to-end TTS systems for local languages.

Open to research collaborations and roles building production speech systems for under-served languages.

Abdou Aziz Diop
01 — News

Recent updates

2026
Working on a unified end-to-end S2ST architecture for Wolof–French and Pulaar–French language pairs.
2025
Released a fine-tuned Whisper checkpoint for Wolof — among the first publicly available Wolof ASR models. open-source
2024
Co-author on AfriQA — cross-lingual open-retrieval QA across 10 African languages, presented at EMNLP 2023 Findings.
2023
Started PhD in Speech-to-Speech Translation at UCAD, Dakar.
2022
Released the open-source Wolof library on PyTorch + Transformers — text classification, NMT, and ASR for Wolof.
2021
YACINE, our Wolof voice assistant, won 1st prize at the Cheikh Anta Diop Day organized by IFAN. 1st prize
02 — Experience

Where I've worked

2023 — present

PhD Researcher · Speech-to-Speech Translation

Université Cheikh Anta Diop · Dakar, Senegal

  • End-to-end and cascaded S2ST systems for Wolof, Pulaar, and other West African languages.
  • Self-supervised learning for ASR — pre-training and fine-tuning Wav2Vec2 / Whisper for low-resource settings.
  • Teaching Assistant for Machine Learning, Deep Learning, and NLP courses.
2022 — present

Lead Data Scientist

LAfricaMobile · Dakar, Senegal

  • Lead the data science team on NLU/NLG and end-to-end speech systems for local languages.
  • Designed and shipped an end-to-end TTS system for a Senegalese language — dataset, model, deployment.
  • Production ML infrastructure: model serving, monitoring, and continuous evaluation.
2021 — 2022

NLP Research Engineer · Wolof Project Co-Lead

Omdena · Remote

  • Co-led an Omdena chapter project on text classification and NLP for Wolof.
  • Built and open-sourced the Wolof library for ASR, NMT, and classification.
2020 — 2022

Data Scientist · Software Engineer

Various roles · Senegal

  • Compliance modelling for banking clients (project Ganeyi Compliance).
  • Mail / "Affranchissement" module as a software engineer.
03 — Research

Selected publications

Full publication list on Google Scholar
04 — Projects

Open-source work

Wolof ★ 31

Python · PyTorch · 🤗 Transformers

An NLP toolkit for the Wolof language: text classification, neural machine translation, and automatic speech recognition. Designed to be simple, easy to use, and a starting point for low-resource African NLP.

NLP ASR NMT

Translatotron

Jupyter · TensorFlow / PyTorch

An implementation study of direct speech-to-speech translation with a sequence-to-sequence model — re-creating the Translatotron architecture as a foundation for African-language S2ST experiments.

S2ST seq2seq

NAC-ASR

Python · Research prototype

A study of Neural Audio Codecs as discrete representations for ASR — investigating whether quantized codec tokens (à la EnCodec / SoundStream) can serve as efficient inputs for downstream speech recognition.

ASR audio codecs tokenization
More on GitHub
05 — Stack

Technical strengths

Speech

Whisper · Wav2Vec2 · HuBERT · ASR · TTS · S2ST · Self-supervised pre-training · Neural audio codecs

NLP

Transformers · NMT · Tokenization · Language modelling · Multilingual evaluation · Low-resource methods

Engineering & MLOps

Python · PyTorch · 🤗 Transformers / Datasets · FastAPI · Docker · MLflow · W&B · AWS

Foundations

Mathematics · Coding theory · Statistical learning · Information systems · Software engineering

Languages spoken

Wolof · French · English · Arabic (intermediate)

Teaching

Machine Learning · Deep Learning · NLP · at DIT