Surely anyone of us has encountered a situation where the video frame is garbled when you watch it on a mobile device. Irritability and annoyance are probably the feelings that appear immediately, but you still have to accept and watch the whole video because you simply can’t do anything else.
To address this seemingly small but significant user experience, Google’s AI research team has successfully developed an open source solution called Autoflip, which can automatically fine-tune videos so that Suitable for certain devices or screen sizes according to different modes, such as landscape, square, portrait …
Basically, Autoflip operates in three main stages: scene detection, video content analysis, and finally fine-tuning.
At the scene detection stage, the machine learning model will pay close attention to the point before cutting or jumping from one scene to another. As a result, it is possible to compare a frame with the previous one to detect changes in color and associated elements.
After identifying a scene as a standard, the AI model will switch to video content analysis to identify important objects in a scene. To do so, Autoflip will use a deep learning neural network to help it identify not only objects appearing in the scene, such as people, animals, vehicles, trees. … but also the movement as well as the moving state of the object.
In the final stage, the AI model will determine whether it should use stationary mode for scenes taking place in a single space or tracking mode as the objects of interest are constantly moving. Based on this factor and the target size that the video needs to be displayed, Autoflip will cut the display frame to ensure smoothness and especially retain full of interesting content.
Google AI researchers say Autoflip can be used to convert videos to a variety of screen sizes and formats without human intervention. At the next stage, the team wants to improve subject tracking in Autoflip interviews and cartoons. To do this, the AI model needs to be equipped with text and image blur detection techniques to place the subject in the foreground and background in a better frame.
You can refer to the code of Autoflip code HERE.