December 1, 2021
By Manik Gupta Interra系统公司工程总监
Blog

如何有效地为流媒体视频点播部署自动字幕解决方案

人工智能(AI)正在改变视频流媒体世界. 虽然人工智能作为一种技术已经存在了一段时间, the digitization of data coupled with the need for such solutions have pushed the industry to adopt AI quicker than expected. AI-based systems now exist for speech recognition, data analytics, and other deep learning platforms. They offer both accuracy and scalability that not only complement human input but have the capability to exceed human efficiency.

人工智能提供多种好处的一个领域是自动语音识别(ASR)。. Speech recognition is a field of AI that enables recognition and translation of spoken language into text. ASR是多个系统的核心组件, 包括视频点播(VOD)流媒体环境中使用的自动字幕系统.

为什么自动字幕对流媒体很重要

字幕是视频点播流媒体服务的重要组成部分. Using captions, OTT providers offering VOD services can extend their reach and make streaming content accessible to millions of viewers across the globe with ease.

多年来，字幕都是手工制作的. However, OTT service providers are dealing with a massive volume of streaming content for an increasingly global audience. 手动标注所有内容是不可能的，也不划算. Captioning is a specialized job and needs to be carried out by experts who are aware of language intricacies. 降低成本，提高效率, 自动字幕已经成为一项非常重要的人工智能任务.

自动字幕解决方案的关键组件

There are several essential components of auto-captioning solutions for ensuring VOD streaming occurs with a high degree of accuracy and quality (Figure 1).

Figure 1. 用于自动生成字幕的组件

The ASR engine is the core component that is responsible for transcribing the speech to text. 如果OTT服务提供商想要确保内容的有效全球覆盖和准确性, they need an ASR engine that supports most languages and important dialects for each language.

从技术的角度来看, 较新的ASR技术提供了更好的准确性——对于干净的语音内容，准确率超过95%.

Choosing an ASR solution that is capable of identifying speaker change in transcripts is also important. Speaker identification can help with proper positioning of captions to ensure each caption is close to the speaker. 它还可以在有多个说话者的情况下提供清晰度.

In addition, the ASR solution should provide a transcription of non-speech sounds such as “hmm” and “oh” to maintain close accuracy between what is spoken and what is being transcribed.

自然语言处理(NLP)是整个自动字幕解决方案的关键部分, ensuring accurate 标点和智能分句. 借助NLP, OTT服务提供商可以给句子加标点以提高可读性. NLP can also aid with providing line breaks at natural points in captions to further optimize readability.

Additionally, it is imperative for streaming service providers to comply with regional requirements. 自动字幕系统可以帮助服务提供商管理字幕质量, such as words per minute, 用于显示标题的最大行数, 以及对脏话的敏感使用.

Having a solution with a custom dictionary will increase the accuracy of ASR systems by providing context before ASR is invoked. Let’s say a service provider is trying to auto caption a television series for its streaming offering. 所有人物的名字都是已知的，其中一些是困难的. ASR engines can prioritize these names during the recognition phase to ensure that the transcriber maintains good accuracy.

部署ASR系统的最佳实践

Adopting an ASR engine that offers a flexible deployment strategy is ideal for VOD streaming applications. OTT service providers should look for an ASR system that can be deployed on-premises as well as on different cloud services like AWS and Google Cloud. 特别是基于云的解决方案，可以更快地部署到市场上.

与20年前相比，自动字幕解决方案已经取得了进步. 它们现在广泛应用于现实世界的视频流应用中. 但准确性是有限制的. 因为口音和语言的数量, 要一直保持高精度是不可能的.

克服自动标注解决方案的精度限制, a growing number of service providers are embracing a hybrid model to where the auto-captioning results are manually inspected before video is streamed to global audiences. Manual inspection is only needed in cases where there is a need for higher compliance and availability of clean dialog is not feasible (Figure 2).

Figure 2. 自动字幕的混合模型

对生成的标题执行完整的手动检查可能是一项非常繁琐的任务. Review tools were created to help service providers review and correct generated captions in the most efficient way possible. Review tools should have the capability to sort utterances based on confidence score so that ones with a low confidence score can be reviewed first as they are most likely to have errors. Review tools need to be able to play all utterances along with audio in a loop for fast inspection. Once an error is detected, the tool must be able to provide means to correct its attributes (i.e.，文本，字体样式，时间代码，颜色等.) in an easy fashion. 这将确保更快地审查自动标注任务并加快交付时间.

Conclusion

ASR系统解决了当今VOD流媒体行业的关键问题, enabling service providers to improve the accuracy of captions created leveraging speech-to-text processing. 然而，ASR系统并非没有局限性.

By taking a hybrid approach that combines auto captioning with quick manual inspection before delivery, OTT service providers can improve accuracy and introduce significantly higher efficiencies into their VOD streaming workflow.

[编者注:这是来自 Interra Systems. 流媒体接受供应商署名完全基于它们对我们读者的价值.]

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

视频字幕难题

James Broberg of StreamShark discusses why video captioning is essential for accessibility to video content and increasing viewer comprehension.

17 Feb 2023

基于机器学习的视频流字幕自动对齐方法

To ensure a high-quality viewing experience—while maintaining compliance with regional regulations—it's imperative that audio and captions are in alignment. This can be achieved efficiently and cost-effectively with an auto-alignment system that utilizes machine learning. The result is a viewing experience that meets the high expectations of today's global audiences and drives growth.

21 Nov 2022