Pioneering the Future of AI-Driven Image Filters & Video LUT’s

This article explores my creative workflow with Generative AI Image and Video tools and my thoughts on the evolution of post-processing filters and LUT’s.

Training models with Midjourney Image2Text + /describe + Runway Custom Elements

Often times my creative workflow starts with using a found or photographed inspiration image as the reference input to Midjourney’s /describe command. This command enables users to generate four concise text prompts that accurately represent a given image. This feature essentially reverses the typical image generation process, allowing you to obtain prompts from existing images rather than creating images from text prompts. This functionality is particularly useful for replicating or emulating specific visual styles in your own generated images, while also providing insight in to what the models see and interpret.

Seeking inspiration from one of my favourite bands, Thee Oh Sees, I used the album cover for “The Master’s Bedroom is Worth Spending a Night In” to generate the base prompt:

A psychedelic album cover for the band "The master's bedroom is valulth to be", featuring an illustration of a monster with yellow hair and black eyes, the text reads "oh see! The master's room" in bold letters, with colorful patterns around it, designed by Mati Klarwein, style of Giger, style of Egon Schiele, style of Mof Righ. orange tones, purple hues, high contrast, vibrant colors, dark background, simple design, simple lines, simple shapes, simple details, low detail, low resolution, low color saturation, 2d.

Noting the inacuracy of Midjourney’s ability to extract or create stylized typography, I left the prompt text as-is for the image generations. While it’s unclear in the Midjourney documentation, it’s likely that under-the-hood new generations are using the text prompt in combination with other “hidden” parameters such as the original image url for style reference, —seed, and —chaos.

Reverse engineering creativity - what does the model see?

Midjourney's /describe feature takes creativity to a new level by enabling users to reverse-engineer visuals through descriptive prompts. With this tool, users can upload an image, and Midjourney generates a detailed text description that captures its key elements, such as style, color, composition, and mood. This feature is especially useful for artists and creators seeking inspiration or wanting to replicate or evolve certain aesthetics. By translating images into words, Midjourney bridges the gap between visual and verbal creativity, offering new ways to explore and expand artistic possibilities.

Midjourney-Generated Training Set

After export from Midjourney, these images formed the initial training data set with Custom Styles feature of Runway Custom Elements. Runway recommends at least 12-15 images for training.

Runway Custom Styles

Runway Custom Styles empower creators to design and apply personalized visual aesthetics to their projects using AI-driven tools. By training models on unique artistic styles, users can easily integrate their custom looks into generated media, making creative expression more dynamic and tailored than ever before. I use Runway’s Text to Image and Image to Image tools to generate creative concepts using input reference images and additional text prompting in combination with my custom style models.

Conclusion / Prediction

We are at the dawn of a new era in which AI and machine learning (ML) are revolutionizing consumer and professional image and video editing tools. Leading innovators like Twelve Labs, Runway, Midjourney, Black Forest Labs, and Luma have developed powerful Generative AI features with diverse user experiences, but most remain confined to browser-based applications. These tools have yet to fully integrate into professional workflows within industry-standard software like Adobe Lightroom, CaptureOne Pro, Premiere, and Avid. While filters and LUTs are staples in both consumer and professional editing, tools like Runway currently limit these features to Text-to-Image and Image-to-Image prompts. Imagine the possibilities when we can train models on custom creative aesthetics and dynamically apply them to video and film using Gen-3 or OpenAI’s Sora. I believe custom style models will soon become a new standard for image filtering and video LUTs, paving the way for marketplaces where creators can license and share their unique visual styles and storytelling techniques.

Thanks to Thee Oh See’s for the inspiration. Check them out on Spotify.

Next
Next

Midjourney Style Reference