Human-AI collaboration in the age of interactive multimedia

August 4, 2025 10:00 AM (GMT+7) → 12:20 PM

<aside> ℹ️

WS note: Thầy Triết

0> Multimedia system

New Generation of Multimedia

1> Image Generation and Editing

Text - to - Image:

Prompt to generate image

In space n- dims, point, vector in this space reflect to other space (encode - decode)

Latent Diffusion Model

Zero-shot image editing

Photoshop: cut, past, color, editing
compare to AI: generating new image,
use cross-attention to keep some features
Ex: (Qualitative results), thematic, Intelligent concept design system, creative image editing… human+AI to edit image(pose transition, obj replacement-fixed, dynamic, )

2> More

Personalized text to image: Keep image features, change..

Trade-off between reconstruction and editability:

Concept flows: Learning but not effect or noise with previous

**KronA-WED Adapter:

Learning Strategy: from original image, generate image with different scenes…

Potential apps: adsvertersing…

3> LLM-Empowered Generation

Automated image recognition framework:

Prompt crafting in AIR-Gen+Textual prompt Extracting in AIR-Aug ⇒ ImageGeneration ⇒ Duplicated and outlier removal ⇒ output
Ex: Forest, Early forest fire…
Concept interpretation
CLIP usually using for encode data

Question: Safety, Fraud, Valid…
Interact system and allow it generate image,… must be careful
Limitation: abnormal image not like real image… improve qualitive results: powerful hardware, math,… prompt to generate best image
Generic if data big enough, so should use specific data that service your purpose if not enough data
Difficult to evaluate results, solutions: compare results if assign work for multiple agents and random check if all agents have the same results…

</aside>