August 4, 2025 10:00 AM (GMT+7) → 12:20 PM
<aside>
ℹ️
WS note: Thầy Triết
0> Multimedia system
New Generation of Multimedia
1> Image Generation and Editing
Text - to - Image:
Prompt to generate image
- In space n- dims, point, vector in this space reflect to other space (encode - decode)
Latent Diffusion Model
- PP: “High-Resolution Image Synthesis with Latent Diffusion model”, CVPR2022
Zero-shot image editing
- Photoshop: cut, past, color, editing
- compare to AI: generating new image,
- use cross-attention to keep some features
- Ex: (Qualitative results), thematic, Intelligent concept design system, creative image editing… human+AI to edit image(pose transition, obj replacement-fixed, dynamic, )
2> More
Personalized text to image: Keep image features, change..
Trade-off between reconstruction and editability:
Concept flows: Learning but not effect or noise with previous
**KronA-WED Adapter:
Learning Strategy: from original image, generate image with different scenes…
Potential apps: adsvertersing…
3> LLM-Empowered Generation
Automated image recognition framework:
- Prompt crafting in AIR-Gen+Textual prompt Extracting in AIR-Aug ⇒ ImageGeneration ⇒ Duplicated and outlier removal ⇒ output
- Ex: Forest, Early forest fire…
- Concept interpretation
- CLIP usually using for encode data
4>
- Question: Safety, Fraud, Valid…
- Interact system and allow it generate image,… must be careful
- Limitation: abnormal image not like real image… improve qualitive results: powerful hardware, math,… prompt to generate best image
- Generic if data big enough, so should use specific data that service your purpose if not enough data
- Difficult to evaluate results, solutions: compare results if assign work for multiple agents and random check if all agents have the same results…
</aside>