August 5, 2025 8:00 AM (GMT+7) → 10:00 AM

@[email protected]

https://ctsv.uit.edu.vn/bai-viet/talkshow-4-gioi-thieu-dien-gia-ts-pham-hy-hieu-thanh-vien-phat-trien-grok-3

https://hyhieu.github.io/

From AI research:


📌 1. What Are Multi-Agent Systems (MAS)?

Multi-Agent Systems are systems composed of multiple autonomous entities (agents) that interact with each other and their environment to achieve individual or collective goals.

MAS systems require real-time processing, decision-making, coordination, and scalability — where hardware and system-level components like kernels and AI chips become essential.

<aside> ℹ️

Agents are systems

heart of agents is llm

llm generate output, llm return feedback, env continue give text to llm

llm appear 1990s

system can receive response

every agents is llm

multi-agent systems: self reflection

why using? more accurate and faster than single-agent systems

what computations do multi-agent systems perform?

agents different rf: it not change paramenters

llm: next-word prediction base on statistic

auto-regressive generating: from sentence, calculate statistic for each words, generate 1 word, add it to the context and repeat

how to receive feedback into a next-word predictor?

Transformers: words ⇒(word embedding) d-dim vectors(number) ⇒ (attention) d-dims vectors⇒(ffn)d-dim vectors ⇒(projection) statistic for each words (Repeat many times)

Attention: is slow O(N^2), GPUs are very bad at computing exp(x)

FFN: same attention, also slow, typically consists of a few matmul, GPU also have RAM(Global memory), tensor is in specific mem to matmul, bandwidth transform data slower than tensor core

5 petalFLOPs: computation for each seconds

Matrix A(mA), mB copy to tensor core to compute matmul, result mC, then copy mC back

compute bound: is slow about compute - F

memory bound: is slow about memory - M

calculate F/M

auto regressive generating: each time add 1 words ⇒ F/M <<625 drags down tensor core by 625 at least 625 times, mem bound, ffn is very slow

AI Agents(AA) giống RAG nhưng nó tổng quát hơn

**What we can do to make those computations faster?

(The Roles of Kernels and AI Chips in Multi-Agent Systems)

1> Kernels: low-level, high-optimized functions to run on accelerators: remove redundancies, improve locality(arrange to copy faster), specialize computes(tensor core faster cause by it only cal matmul)

ex: stream-K matmul (NVIDIA)-specialize computes

EX: Flash attention (Tri Dao) - should read his papers - remove redundancies

kernel is hard to write

2> Chip

NVDIA design chip, 1 company in Taiwan receive dsg and create chip

Technology growing very fast, but dsg chip take > 3 years

Ggle create TPU chip, but also good for Ggle, in outside it not good as GPU

AMD…

NVDIA is doc quyen trong moi truong chip

AI Compilers: very potential, work on kernels and chips for AI

Use research AI, what do u want create value? cursor call API Anthropic, research to use model to help client works efficiently, users habit with cursor. How to use AI model available efficiency

Elon Musk enthusiasm person, hardworking, listening

QA:

New of models and neural architectures

[Q&A - Google Docs](https://docs.google.com/document/d/11Gfk2TsCtkpLWcvm7wmVBJyfL0qKtBkCQCF9hCea-aY/preview?tab=t.0)


🔧 2. What Are AI Chips?

AI chips are specialized hardware designed to accelerate AI tasks like: