Everyone wants to be heard. And with more people than ever video calling or live streaming from their home office, rich audio without echo hiccups and background noises like dogs barking is key to better online experiences.
NVIDIA Maxine offers GPU-accelerated, AI-enabled SDKs to help developers create scalable, low-latency audio and video effects pipelines that improve call quality and user experience.
Today, NVIDIA announced at GTC that Maxine is adding acoustic echo cancellation and AI-based upsampling for better sound quality.
Acoustic echo cancellation removes acoustic echo from the audio stream in real time, preserving speech quality even during double talk. Using AI-based technology, Maxine achieves more effective echo cancellation than that achieved through traditional digital signal processing algorithms.
Super Resolution Audio improves the quality of a low-bandwidth audio signal by restoring lost energy in higher frequency bands using AI-based techniques. Maxine Audio Super Resolution supports upsampling of audio from 8 kHz (narrowband) to 16 kHz (wideband), 16 kHz to 48 kHz (ultra wideband), and 8 kHz to 48 kHz. Lower sample rates such as 8 kHz often result in muffled voices and accentuate artifacts such as sibilance and make speech difficult to understand.
Modern film and television studios often use a sample rate of 48 kHz (or higher) for audio recording, in order to maintain the fidelity of the original signal and preserve clarity. Super Resolution Audio can help restore the fidelity of older audio recordings, derived from magnetic tapes or other low-bandwidth media.
Bridging the Sound Gap
Most modern telecommunications use wideband or ultra-wideband audio. Because NVIDIA Audio Super Resolution can upsample and restore narrowband audio in real time, the technology can be used effectively to bridge the quality gap between traditional copper wire telephone lines and broadband communication systems. band-based modern VoIP.
Real-time communication – whether for conference calls, call centers or live streaming of all kinds – takes a big leap forward with Maxine.
Since its initial release, Maxine has been adopted by many of the world’s leading providers of video communications, content creation and live streaming.
The global video conferencing market is expected to reach nearly $13 billion in 2028, from approximately $6.3 billion in 2021, according to Fortune Business Insights.
FMH: a way of life
The shift to working from home, or telecommuting, has become an accepted norm across all businesses, and organizations are adapting to the new expectations.
Analyst firm Gartner estimates that only a quarter of meetings for companies will be in person in 2024, down from 60% before the pandemic.
Virtual collaboration in the United States has played a significant role as people have moved into hybrid and remote roles for the past two years amid the pandemic.
But as organizations seek to maintain company culture and work experience, the stakes have increased for better media interactivity.
Solve the cocktail problem
But sometimes work and family life collide. As a result, meetings are often filled with background noise from children, construction work outside, or emergency vehicle sirens, causing brief interruptions in the flow of conference calls.
Maxine helps solve an age-old audio problem known as the cocktail problem. With AI, it can filter out unwanted background noise, allowing users to be heard better whether they’re in a home office or on the road.
The GPU-accelerated Maxine platform provides an end-to-end deep learning pipeline that integrates with customizable edge models, enabling high-quality functionality with a standard microphone and camera.
Sound like your best self
In addition to being affected by background noise, audio quality in virtual activities can sometimes sound thin, miss low and mid frequencies, or even be barely audible.
Maxine enables real-time upsampling of audio so vocals are fuller, deeper and more audible.
Logitech: better sound for Blue Yeti headsets and microphones
Logitech, a leading peripheral manufacturer, implements Maxine for better interactions with its popular headsets and microphones.
Drawing from AI libraries, Logitech has integrated Maxine directly into G Hub audio drivers to improve communications with its devices without the need for additional software. Maxine leverages the powerful Tensor Cores of NVIDIA RTX GPUs so consumers can enjoy real-time processing of their micro signal.
Logitech now uses Maxine’s state-of-the-art denoising in its G Hub software. This allowed it to remove echoes and background noises – such as fans, as well as keyboard and mouse clicks – that can distract from video conferences or live streaming sessions.
“NVIDIA Maxine allows Logitech G gamers to quickly and easily clean up their mic signal and eliminate unwanted background noise with a single click.” said Ujesh Desai, General Manager of Logitech G. “You can even use G HUB to test your mic signal to make sure your Maxine settings are set correctly.”
Logitech is now taking advantage of Maxine’s advanced denoising in its G Hub software. This allowed it to remove echoes and background noises – such as fans, as well as keyboard and mouse clicks – that can distract from video conferences or live streaming sessions.
“NVIDIA Maxine allows users to quickly and easily clean up their mic signal and eliminate unwanted background noise with just one click,” said Ujesh Desai, vice president of Logitech. “You can even test your mic signal to find the perfect settings for your setup.”
Tencent Cloud boosts content creators
Tencent Cloud helps content creators in their productions by offering NVIDIA Maxine technology that makes it quick and easy to add creative backgrounds.
NVIDIA Maxine’s AI Green Screen feature allows users to create a more immersive presence with high-quality foreground and background separation, without the need for a traditional green screen. Once the real background is separated, it can easily be replaced with a virtual background, or blurred to create a depth of field effect. Tencent Cloud offers this new feature as a software-as-a-service package for content creators.
NVIDIA Maxine’s AI Green Screen technology helps content creators in their productions by enabling more immersive, high-quality experiences without the need for specialized equipment and lighting,” said Vulture Li, Director of Product Center, of the Tencent Cloud audio and video platform.
Improve virtual experiences
NVIDIA Maxine provides state-of-the-art real-time AI audio, video, and augmented reality capabilities that can be integrated into end-to-end customizable deep learning pipelines.
Maxine’s AI-powered SDKs help developers build applications that include audio and image denoising, super resolution, gaze correction, 3D body pose estimation, and translation features.
Maxine also enables real-time voice-to-text translation for a growing number of languages. At GTC, NVIDIA demonstrated that Maxine translates between English, French, German, and Spanish.
These effects will enable millions of people to enjoy high quality and engaging live video on any device.
Join us on GTC this week to learn more about Maxine in the following session: