Back

Daffodil Software

Trusted partner for custom software development

In this Case-study

Icon

Share It On:

Building an Automatic Speech Recognition engine for AI-enabled legal tech software

Description

The client is a Nigerian legal technology solutions company that spearheads the development of innovative software products for seamless judgment delivery. The firm has also been providing a diverse catalog of digital solutions that assist 1000s of law students, legal practitioners, lecturers, and judges in their research work.

Situation:

Transcription of court case proceedings are essential for helping legal practitioners and judges in delivering trial outcomes accurately. Manual transcription requires very delicate work and careful attention to every single spoken detail during a particular deposition, which often leads to human errors in the final transcript.

Additionally, there is a considerable shift in accents while moving from region to region within Nigeria, and human transcribers are unlikely to be accustomed to all of these accents. So they would sometimes miss out entire sentences while listening and transcribing.

The legal technology company had the innovative idea of facilitating error-free transcription in Nigerian courts with an audio transcription software. The company chose to leverage Daffodil’s expertise in AI, and in enabling businesses to train machines to process large volumes of natural language data flawlessly.

Challenge

The Daffodil team was required to ensure that the audio transcription software they would develop satisfies the following conditions:

  • Enable the capability to transcribe streaming audio and video
  • Automate the process of data augmentation into the appropriate format and size that would be consumable for the software
  • Train the software to understand variations in accents and intonations in spoken English across regions in Nigeria
  • Eliminate issues with noise that could lead to unnecessary aberrations in the transcripts
  • Regularly maintain and train the software solution based on new data sets collected from proceedings and judgements.

Solutions

The Daffodil AI team developed an Automated Speech Recognition (ASR) engine using a transfer learning-based Machine Learning (ML) model to transcribe court depositions. This process :

Automating Data Preparation: The ASR model required thousands of hours of audio from depositions and previously transcribed texts. The audio was standardized to mono-channel format with a 16 kHz sampling rate for improved accuracy. Our team automated this data preparation to help the engine recognize various local Nigerian accents and vocabulary nuances.

Pre-Training for Data Augmentation: After data integration, optional modules for data augmentation and pre-training were added. Pre-training involved fine-tuning the model's accuracy using transfer learning, with audio files in WAV format featuring diverse speakers pronouncing the same words. Data training utilized over 100,000 words and linguistic samples, with audio conversion and continuous monitoring for model performance.

Incremental Readability: The ASR model’s post-processing pipeline can produce text with readability issues. To address this, an Inverse Text Normalization (ITN) mechanism was implemented to enhance the readability of transcribed depositions.

Accurate Text Records: The ASR engine achieves near-perfect transcription accuracy, maintaining a Word Error Rate (WER) and Character Error Rate (CER) below 20%. It effectively transcribes depositions and hearings without redundancy from noise, ensuring precise documentation.

Impact

The legal technology solutions company was able to use the final product to help hundreds of courts across Nigeria to automate the transcription of case proceedings, hearings, and depositions and achieve 80% accuracy of speech-to-text conversion. This made for greater accuracy in the resultant transcripts and ease in delivering judgments. This also reduced the TAT of documentation process by 35%. Daffodil’s innovative AI approach in developing the solution and swift turnaround time were extremely appreciated by the company.

Key stats:

  • 100,000+ size of training dataset
  • 80% accurate text records of audio depositions
  • 35% decrease in documentation TAT
Location
Date
Industry
Expertise Custom Software Development Artificial Intelligence Application Testing

You might also like