News

Creating Engaging Videos with AI-Driven Text and Image Conversion

In today’s digital age, video content has become a powerful tool for marketing and product promotion. Creating compelling videos can be time-consuming and resource-intensive. However, with advancements in artificial intelligence (AI), it is now possible to automate the process of video generation from existing text and picture-based product pitches. In this blog post, I will explore the applications, challenges, and potential solutions for utilizing AI in video creation.

AI Aspects

The AI-driven video generation process involves two main aspects: asset scraping and voice-over. As an example, I focused on creating a product video featuring one of Icecat’s clients, using Icecat product data sheets along with database pictures and sales text. Asset scraping involves extracting high-resolution product images from existing databases and product data sheets. Voice-over capabilities enabling the conversion of product descriptions and sales texts into speech using AI-generated voices.

Potential Automatization (Editing)

While asset scraping and voice-over can be automated effectively, the editing process of videos remains a challenge. Currently, there are no online editors capable of precise and autonomous video creation according to the specific requirements of e-retail. To address this, a custom program for an existing non-linear video editing program is needed. This program should be able to input various file formats such as .wav, .mp3, .png, .jpeg, and output video formats like h.264 and/or h.265.

The Process

To demonstrate the AI video generation process, I utilized existing Icecat product data sheets, database pictures, and sales text. I scraped high-resolution images and used them directly in the video editor. It’s essential to ensure the accuracy and appropriateness of the images, as generative AI techniques may inadvertently distort or omit important details. Therefore, I did not use AI for image manipulation.

The text content, excluding headlines, I converted into speech using a text-to-speech program. However, the resulting audio may require human proof-listening, particularly when technical jargon or acronyms are present. Each text under a specific headline, I separated into different files to create distinct video chapters. The headlines themselves serve as chapter indications, appearing at the bottom left or top left of the screen.

Logo Placement and Image Treatment

To establish branding, I placed the company logo on the bottom left of the screen. Custom logos should be created or sourced for each company seeking to utilize AI video generation. The logos are preferably black on an alpha layer in .png format. I made a distinction between product pictures (with a white background) and location pictures. Treating them differently avoids a monotonous feel within the video.

Editing Techniques

The editing process involves specific techniques to enhance visual appeal. Product pictures with white backgrounds can smoothly transition into one another through fading effects. On the other hand, location pictures are best animated from right to left to create movement. Proper scaling and positioning of images within the frame are crucial to ensure seamless horizontal movement without revealing image edges.

Image Duration and Movement

White background product images typically last 2-4 seconds, while the intro and ending images have a duration of 4 seconds. In between images (location pictures) need to be zoomed in to approximately 110% of the total frame width, with logarithmic movement over several seconds. Fading effects accompany image transitions, and the movement speed gradually increases during the initial 20% and decreases during the final 20% of the movement duration.

Required Video Layers

To achieve the desired visual composition, the minimum required video layers, from top to bottom, include:

  1. Company logo (with 30% opacity)
  2. Headline titles
  3. Background for headline title
  4. Images
  5. White background

Titles and Styling

For title elements, multiple styles can be utilized. One approach is to fade in the title after its underlying darker layer appears. Alternatively, the text can slowly appear alongside the underlying darker layer from left to right. This layer is essential to ensure text visibility, especially when using white or black text on varying image backgrounds.

Tools Used

During the AI video generation process, I employed various tools. Initially, I used Speechify for text-to-speech conversion. But, it had limitations when dealing with acronyms and specific technical language, as can be heard in words such as but not limited to:
“High-res” Pronounced High reis by AI
“i9 processor” Pronounced eynin processor by AI
“Matte” Pronounced mattie by AI

Happily, I found that Google’s Cloud Text-to-Speech (TTS) tool provided more convincing results. While Google Cloud TTS may still have minor pronunciation issues, it outperforms competitors and addresses the challenges faced with acronyms and technical terms.

Scalability

Once the necessary tools and workflows are established, the process can be scaled effectively. By inputting the link to a product, each aspect of the AI video generation pipeline can be automatically processed and prepared for integration into the video editor.

Conclusion

AI video generation from existing text and picture-based product pitches offers a practical solution for creating numerous videos across various products. While the current state of pre-trained AI models may not support the reliable generation of 3D imagery, this approach allows for efficient automation of the video creation process. As technology continues to advance, we can anticipate further improvements and opportunities in AI-driven video generation for marketing and promotional purposes.

Cinematographer and Photographer. Bachelor of Arts in Cinematography and Film/tv production. Founder of Kip Media, focused on business video productions.

Elmo Hoogeveen

Cinematographer and Photographer. Bachelor of Arts in Cinematography and Film/tv production. Founder of Kip Media, focused on business video productions.

Recent Posts

Annual Report 2023 Icecat

Icecat N.V. Annual Report 2023 (concept) (PDF) Icecat Press Release Annual Figures 2023 (PDF) During 2023,…

24 hours ago

How Iceshop Helps Resellers Connect with Trusted Resellers

As an e-commerce solutions company, Iceshop relies heavily on a network of reliable and trusted…

2 days ago

What is a Product Data Sheet and Why is It Important?

You might be surprised to learn that product sheets are a crucial element of the…

3 days ago

Top E-commerce Stores in Europe

Europe's e-commerce market is a thriving force, experiencing significant growth in recent years. After 2023's…

4 days ago

Manual: How to Upload Your Brand’s Product Content in Google Manufacturer Center via Icecat

Icecat and Google co-operate to make it easier for brand owners to add their product…

7 days ago

Icecat Release Notes 196: Driving Efficiency and Innovation

In Icecat release notes 196, we're excited to showcase how our recent developments are revolutionizing…

1 week ago