GTC — NVIDIA today announced the NVIDIA Maxine platform, which provides developers with a cloud-based suite of GPU-accelerated AI video conferencing software to enhance streaming video — the internet’s No. 1 source of traffic.
NVIDIA Maxine is a cloud-native streaming video AI platform that makes it possible for service providers to bring new AI-powered capabilities to the more than 30 million web meetings estimated to take place every day. Video conference service providers running the platform on NVIDIA GPUs in the cloud can offer users new AI effects -- including gaze correction, super-resolution, noise cancellation, face relighting and more.
Because the data is processed in the cloud rather than on local devices, end users can enjoy the new features without any specialized hardware.
“Video conferencing is now a part of everyday life, helping millions of people work, learn and play, and even see the doctor,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “NVIDIA Maxine integrates our most advanced video, audio and conversational AI capabilities to bring breakthrough efficiency and new capabilities to the platforms that are keeping us all connected.”
Breakthrough AI Efficiency Slashes Bandwidth to Boost Call Quality
The NVIDIA Maxine platform dramatically reduces how much bandwidth is required for video calls. Instead of streaming the entire screen of pixels, the AI software analyzes the key facial points of each person on a call and then intelligently re-animates the face in the video on the other side. This makes it possible to stream video with far less data flowing back and forth across the internet.
Using this new AI-based video compression technology running on NVIDIA GPUs, developers can reduce video bandwidth consumption down to one-tenth of the requirements of the H.264 streaming video compression standard. This cuts costs for providers and delivers a smoother video conferencing experience for end users, who can enjoy more AI-powered services while streaming less data on their computers, tablets and phones.
AI Features Improve Video Conferencing Experiences
New breakthroughs by NVIDIA researchers that will be included in Maxine make video conferencing feel more like face-to-face conversation. Video conference service providers will be able to take advantage of NVIDIA research in GANs, or generative adversarial networks, to offer a variety of new features.
For example, face alignment enables faces to be automatically adjusted so that people appear to be facing each other during a call, while gaze correction helps simulate eye contact, even if the camera isn’t aligned with the user’s screen. With video conferencing growing by 10x since the beginning of the year, these features help people stay engaged in the conversation rather than looking at their camera.
Developers can also add features that allow call participants to choose their own animated avatars with realistic animation automatically driven by their voice and emotional tone in real time. An auto frame option allows the video feed to follow the speaker even if they move away from the screen.
Using conversational AI features powered by the NVIDIA Jarvis SDK, developers can integrate virtual assistants that use state-of-the-art AI language models for speech recognition, language understanding and speech generation. The virtual assistants can take notes, set action items and answer questions in human-like voices. Additional conversational AI services such as translations, closed captioning and transcriptions help ensure participants can understand what is being discussed on the call.
Cloud-Native Architecture Delivers Savings and AI at Scale
Demand for video conferencing at any given time can be hard to predict, with hundreds or even thousands of users potentially trying to join the same call. NVIDIA Maxine takes advantage of AI microservices running in Kubernetes container clusters on NVIDIA GPUs to help developers scale their services according to real-time demands. Users can run multiple AI features simultaneously while remaining well within application latency requirements.
Video conference service providers can use Maxine to deliver leading AI capabilities to hundreds of thousands of users by running AI inference workloads on NVIDIA GPUs in the cloud. The modular design of the Maxine platform enables developers to easily select AI capabilities to integrate into their video conferencing solutions.
All-Star Suite of NVIDIA AI Developer Tools
The Maxine platform integrates technology from several NVIDIA AI SDKs and APIs. In addition to NVIDIA Jarvis, the Maxine platform leverages the NVIDIA DeepStream high-throughput audio and video streaming SDK and the NVIDIA TensorRT™ SDK for high-performance deep learning inference.
The AI audio, video and natural language capabilities provided in the NVIDIA SDKs used in the Maxine platform were developed through hundreds of thousands of training hours on NVIDIA DGX™ systems, the world’s leading platform for training, inference and data science workloads.
Computer vision AI developers, software partners, startups and computer manufacturers creating audio and video apps and services can apply for early access to the NVIDIA Maxine platform.
Video: NVIDIA Maxine Streaming Video AI SDK demo
Video: NVIDIA Research AI Video Compression demo
Blog: AI Can See Clearly Now: GANs Take the Jitters Out of Video Calls
Blog: NVIDIA Jarvis and Merlin Enter Open Beta, Enabling Conversational AI and Democratizing Recommenders
Blog: What’s a Generative Adversarial Network?