In today’s digital age, video content has become a powerful tool for marketing and product promotion. Creating compelling videos can be time-consuming and resource-intensive. However, with advancements in artificial intelligence (AI), it is now possible to automate the process of video generation from existing text and picture-based product pitches. In this blog post, I will explore the applications, challenges, and potential solutions for utilizing AI in video creation.
The AI-driven video generation process involves two main aspects: asset scraping and voice-over. As an example, I focused on creating a product video featuring one of Icecat’s clients, using Icecat product data sheets along with database pictures and sales text. Asset scraping involves extracting high-resolution product images from existing databases and product data sheets. Voice-over capabilities enabling the conversion of product descriptions and sales texts into speech using AI-generated voices.
While asset scraping and voice-over can be automated effectively, the editing process of videos remains a challenge. Currently, there are no online editors capable of precise and autonomous video creation according to the specific requirements of e-retail. To address this, a custom program for an existing non-linear video editing program is needed. This program should be able to input various file formats such as .wav, .mp3, .png, .jpeg, and output video formats like h.264 and/or h.265.
To demonstrate the AI video generation process, I utilized existing Icecat product data sheets, database pictures, and sales text. I scraped high-resolution images and used them directly in the video editor. It’s essential to ensure the accuracy and appropriateness of the images, as generative AI techniques may inadvertently distort or omit important details. Therefore, I did not use AI for image manipulation.
The text content, excluding headlines, I converted into speech using a text-to-speech program. However, the resulting audio may require human proof-listening, particularly when technical jargon or acronyms are present. Each text under a specific headline, I separated into different files to create distinct video chapters. The headlines themselves serve as chapter indications, appearing at the bottom left or top left of the screen.
To establish branding, I placed the company logo on the bottom left of the screen. Custom logos should be created or sourced for each company seeking to utilize AI video generation. The logos are preferably black on an alpha layer in .png format. I made a distinction between product pictures (with a white background) and location pictures. Treating them differently avoids a monotonous feel within the video.
The editing process involves specific techniques to enhance visual appeal. Product pictures with white backgrounds can smoothly transition into one another through fading effects. On the other hand, location pictures are best animated from right to left to create movement. Proper scaling and positioning of images within the frame are crucial to ensure seamless horizontal movement without revealing image edges.
White background product images typically last 2-4 seconds, while the intro and ending images have a duration of 4 seconds. In between images (location pictures) need to be zoomed in to approximately 110% of the total frame width, with logarithmic movement over several seconds. Fading effects accompany image transitions, and the movement speed gradually increases during the initial 20% and decreases during the final 20% of the movement duration.
To achieve the desired visual composition, the minimum required video layers, from top to bottom, include:
For title elements, multiple styles can be utilized. One approach is to fade in the title after its underlying darker layer appears. Alternatively, the text can slowly appear alongside the underlying darker layer from left to right. This layer is essential to ensure text visibility, especially when using white or black text on varying image backgrounds.
During the AI video generation process, I employed various tools. Initially, I used Speechify for text-to-speech conversion. But, it had limitations when dealing with acronyms and specific technical language, as can be heard in words such as but not limited to:“High-res” Pronounced High reis by AI“i9 processor” Pronounced eynin processor by AI“Matte” Pronounced mattie by AI
Happily, I found that Google’s Cloud Text-to-Speech (TTS) tool provided more convincing results. While Google Cloud TTS may still have minor pronunciation issues, it outperforms competitors and addresses the challenges faced with acronyms and technical terms.
Once the necessary tools and workflows are established, the process can be scaled effectively. By inputting the link to a product, each aspect of the AI video generation pipeline can be automatically processed and prepared for integration into the video editor.
AI video generation from existing text and picture-based product pitches offers a practical solution for creating numerous videos across various products. While the current state of pre-trained AI models may not support the reliable generation of 3D imagery, this approach allows for efficient automation of the video creation process. As technology continues to advance, we can anticipate further improvements and opportunities in AI-driven video generation for marketing and promotional purposes.
Read further: News, Research, AI, ai video, automatization, datasheets, product content, tech, video, video content, video generation
Cinematographer and Photographer. Bachelor of Arts in Cinematography and Film/tv production. Founder of Kip Media, focused on business video productions.
Your email address will not be published. Required fields are marked *