Google’s AI Agents & DeepSeek Explained in 5 Mins
Skip the long research papers—here’s a concise, no-fluff breakdown of Google’s AI Agents, how they work, and why DeepSeek-R1 is making waves in AI reasoning.
AI is no longer just about language models answering questions—it’s about AI acting. But why now? And what makes AI agents different from the large language models (LLMs) we’ve been using?
This week, I break down two of the most talked-about advancements in AI right now:
AI Agents – Why they’re creating a buzz in the AI and product community.
DeepSeek-R1 – A reinforcement learning-based approach that claims to push AI reasoning to new heights.
Let’s dive in.
Hi, I'm Snigdha! I'm on a journey to become a 1% better Product Manager every day, and I’m excited to share my learnings with you. My goal is to provide bite-sized insights and practical tips for those exploring the world of product management, helping you grow your PM skills one small step at a time.
AI Agents - General Concept
AI agents operate using three fundamental layers:
Model: The AI brain (e.g., GPT-4, Gemini, Claude) that generates outputs.
Tools & Functions: External integrations that allow the agent to interact with databases, APIs, and the real world.
Orchestration Layer: A control system that helps the agent govern reasoning, memory, and decision-making, using frameworks like ReAct, Chain-of-Thought (CoT), and Tree-of-Thoughts (ToT).
But how does this actually work?
What Makes AI Agents Work?
The Logic of ReACT (Reasoning + Acting)
The ReACT framework is at the core of how modern AI agents function. Instead of just responding, the agent iterates through three key cycles:
Observation → Think: The agent processes the current state, recalling past memory if needed.
Plan → Decide: The agent generates Chain-of-Thought (CoT) reasoning and picks the next best action.
Act → Reflect: The agent executes an action (API call, function invocation, tool usage, or next reasoning step), then reevaluates the outcome for the next iteration.
Example of ReACT in action: Let’s say an AI agent is tasked with booking a flight.
Step 1: Observes Input → User asks for a cheap flight to Tokyo.
Step 2: Plans Next Steps → The agent breaks it down: "I need to check Skyscanner first, then compare it to Google Flights."
Step 3: Executes the API Calls → The agent queries Skyscanner and fetches results.
Step 4: Reflects on Results → If the price is higher than expected, the agent tries different dates to optimize for a better deal. The agent evaluates different alternatives. “I see cheaper flight tickets on Google Flights, I will recommend this option.”
Step 5: Continues Until Completion → The agent either books the flight or asks the user for more information.
This multi-step, adaptive execution model is what makes AI agents different from traditional LLMs, which just return a single response and stop.
The Role of Tools in AI Agents
AI Agents integrate web APIs, retrieval-augmented generation (RAG), and function calling to expand their capabilities.
Example: An AI agent in customer support could fetch live data on a user’s last purchase from a CRM instead of relying on pre-trained responses
The Role of Vector Search & RAG for Real-Time Knowledge Retrieval
Instead of relying solely on pre-trained data, AI agents leverage retrieval-augmented generation (RAG) to fetch relevant real-time data:
Convert a user query into an embedding (numerical representation of text, images, or other data).
Match it with stored data in a vector database (stores and manages high-dimensional embeddings), example, SCaNN for nearest-neighbor search.
Retrieve the most relevant knowledge.
Provide a response informed by up-to-date data
Example RAG in action: Imagine an AI-powered legal assistant. When a lawyer asks, "What are the key clauses in a standard NDA?", the system:
Embeds the query into a numerical representation.
Uses ScaNN to search the vector database of legal documents.
Finds the most relevant NDA clauses based on similarity scores.
Retrieves & summarizes them in a natural language response (an AI agent can also perform successive actions.
Comparison Between Agents and Standard LLMs
I built a quick comparison guide for you to see the stark differences between AI Agents and LLM models.
Now let’s shift some gears and quickly learn what is making DeepSeek R1 LLM different than existing LLMs.
DeepSeek-R1: The Next Leap in AI Reasoning
What’s Special About DeepSeek-R1?
DeepSeek-R1 is not your typical large language model (LLM). Unlike traditional models that rely heavily on supervised fine-tuning (SFT), DeepSeek-R1 pioneers an RL-first approach—meaning it learns reasoning capabilities purely through Reinforcement Learning (RL) without pre-training on human-labeled datasets.
This is a big deal because:
It shows that LLMs can evolve reasoning skills autonomously without massive human-labeled data.
It achieves performance comparable to OpenAI-o1-1217, setting new benchmarks in math, logic, and coding tasks.
It introduces a multi-stage training pipeline that blends RL and rejection sampling to fine-tune reasoning ability iteratively.
How Does It Work?
DeepSeek-R1’s secret sauce lies in its multi-stage reinforcement learning process:
DeepSeek-R1-Zero:
A model trained entirely via RL, without any human-curated datasets.
It self-learns Chain-of-Thought (CoT) reasoning but initially struggles with readability and consistency.
Shows "Aha moments", where it figures out better problem-solving methods mid-training.
Cold Start Training:
To fix the readability issues, a small batch of human-curated Chain-of-Thought examples is introduced.
Instead of manually labeled training data, it uses Group Relative Policy Optimization (GRPO), which trains the model by comparing multiple generated outputs and optimizing based on reward functions.
This improves coherence, making the model’s reasoning more understandable and user-friendly.
Refined RL Training:
The model undergoes a second round of reinforcement learning, now with a mix of structured CoT and new data.
Introduces "self-reflection loops", where the model re-evaluates and improves its own reasoning.
It scores them based on:
Accuracy Rewards: Evaluates correctness (e.g., verifying math answers using predefined test cases).
Format Rewards: Ensures outputs follow structured reasoning by enforcing format constraints like
<think> reason </think> <answer> final response </answer>.
Distillation for Smaller Models:
DeepSeek-R1 knowledge is distilled into smaller, dense models (1.5B to 70B parameters).
Smaller models like DeepSeek-R1-Distill-Qwen-7B outperform larger traditional LLMs in reasoning tasks.
Why Is DeepSeek-R1 Popular?
Breaks the dependency on human-labeled data → Uses RL to self-learn reasoning.
Excels in STEM fields → Dominates coding (Codeforces, SWE-bench), math (AIME), and logic tasks.
Bridges the gap with OpenAI’s o1-1217 model → Matches its performance on math benchmarks.
Optimized training pipeline → Balances RL, fine-tuning, and self-reflection loops for better reasoning.
Pioneers a new AI training paradigm → Moving away from pure supervised learning to self-improving AI.
Why Should You Care?
DeepSeek-R1 isn’t just another LLM—it’s a paradigm shift towards self-learning AI. This approach reduces reliance on expensive, labor-intensive human fine-tuning while achieving near-state-of-the-art results.
What This Means for AI Agents: DeepSeek-R1's RL-driven approach makes it an ideal foundation for AI Agents that need iterative problem-solving.
Why This Matters for Product Builders & Product Managers
If you’re working in tech or product management, AI Agents and DeepSeek-R1 signal a shift in how we design and build intelligent products.
Some key implications:
AI will move beyond Q&A to execution: Expect to see AI assistants that can take real actions instead of just providing insights.
The future of search is AI-driven: AI agents that actively search, analyze, and compile results instead of just retrieving links will be the next wave of intelligent browsing.
Example: an AI-shopper understands your budget, buying needs, past-preferences, and searches for the ideal item after reviewing and reasoning reviews, product specifications, and alternative options.
Decision-making automation will become a reality: With models like DeepSeek-R1, AI could soon analyze, reason, and optimize business decisions at scale.
The key takeaway?
AI is shifting from being reactive to being proactive. As a product builder, now is the time to start experimenting with AI-powered workflows and automation.
References
I watched this YouTube video where Maya Murad explains more about AI Agents. Highly encourage it!
Next Week
Ever wondered how systems detect fraud, cybersecurity threats, or unexpected failures before they happen?
Next week, I’ll be diving deep into Anomaly Detection Algorithms—how they work, the techniques behind them, and why they’re critical for AI-driven decision-making.
Stay tuned—because next week, we’re turning anomalies into insights!
If you found this insight valuable, don’t forget to like, share, and subscribe! Let’s keep learning together and Become 1% Better PM. Your support helps me reach more aspiring product managers on their journey




