August 5, 2025 8:00 AM (GMT+7) → 10:00 AM
Multi-Agent Systems are systems composed of multiple autonomous entities (agents) that interact with each other and their environment to achieve individual or collective goals.
MAS systems require real-time processing, decision-making, coordination, and scalability — where hardware and system-level components like kernels and AI chips become essential.
<aside> ℹ️
Agents are systems
heart of agents is llm
llm generate output, llm return feedback, env continue give text to llm
llm appear 1990s
system can receive response
every agents is llm
multi-agent systems: self reflection
why using? more accurate and faster than single-agent systems
what computations do multi-agent systems perform?
agents different rf: it not change paramenters
llm: next-word prediction base on statistic
auto-regressive generating: from sentence, calculate statistic for each words, generate 1 word, add it to the context and repeat
how to receive feedback into a next-word predictor?
Transformers: words ⇒(word embedding) d-dim vectors(number) ⇒ (attention) d-dims vectors⇒(ffn)d-dim vectors ⇒(projection) statistic for each words (Repeat many times)
Attention: is slow O(N^2), GPUs are very bad at computing exp(x)
FFN: same attention, also slow, typically consists of a few matmul, GPU also have RAM(Global memory), tensor is in specific mem to matmul, bandwidth transform data slower than tensor core
5 petalFLOPs: computation for each seconds
Matrix A(mA), mB copy to tensor core to compute matmul, result mC, then copy mC back
compute bound: is slow about compute - F
memory bound: is slow about memory - M
calculate F/M
auto regressive generating: each time add 1 words ⇒ F/M <<625 drags down tensor core by 625 at least 625 times, mem bound, ffn is very slow
AI Agents(AA) giống RAG nhưng nó tổng quát hơn
**What we can do to make those computations faster?
(The Roles of Kernels and AI Chips in Multi-Agent Systems)
1> Kernels: low-level, high-optimized functions to run on accelerators: remove redundancies, improve locality(arrange to copy faster), specialize computes(tensor core faster cause by it only cal matmul)
ex: stream-K matmul (NVIDIA)-specialize computes
EX: Flash attention (Tri Dao) - should read his papers - remove redundancies
kernel is hard to write
2> Chip
NVDIA design chip, 1 company in Taiwan receive dsg and create chip
Technology growing very fast, but dsg chip take > 3 years
Ggle create TPU chip, but also good for Ggle, in outside it not good as GPU
AMD…
NVDIA is doc quyen trong moi truong chip
AI Compilers: very potential, work on kernels and chips for AI
Use research AI, what do u want create value? cursor call API Anthropic, research to use model to help client works efficiently, users habit with cursor. How to use AI model available efficiency
Elon Musk enthusiasm person, hardworking, listening
QA:
New of models and neural architectures
[Q&A - Google Docs](https://docs.google.com/document/d/11Gfk2TsCtkpLWcvm7wmVBJyfL0qKtBkCQCF9hCea-aY/preview?tab=t.0)
AI chips are specialized hardware designed to accelerate AI tasks like: