Mike’s Agent Insights #7
Welcome to the 7th edition of Mike’s Agent Insights coming to you from the Sonoran Desert in Arizona. I have appreciated all of the conversations which the last couple of post
In a world where AI increasingly shapes our lives and AI agents will increasingly transform even more aspects of our day to day lives, it's important to stay informed and understand the implications of this technology. That's why I'm launching Mike's Agent Insights, a newsletter dedicated to exploring the latest research, investment, advancements and applications of AI agents.
This week I want to explore (there was a lot);
Actionable insights on agents
Some new agents research
New agent investments
How agents could affect our lives
Honorable mention, the story of Meta and Llama
Actionable insights on agents
The rise of AI agents, or autonomous digital clones, is transforming industries by handling complex tasks, but requires critical infrastructure and thoughtful integration to succeed - (Endless Agents, Infinite Homers - Meenal Nalwaya @ Meta Gen AI w/Hemant Mohapatra)
Meenal is a PM building great things with Multimodal LLMs
AI agents promise to revolutionize workflows, from customer service to enterprise solutions, by acting as strategic executors rather than passive copilots. However, scaling their use demands innovation in planning, collaboration, memory management, energy efficiency, and safety. Inspired by tools like OpenAI’s O1 and Anthropic’s computer-use agent, AI agents are moving from reactive systems to proactive planners capable of orchestrating multi-step workflows and integrating with diverse software ecosystems.
The future of AI agents hinges on building robust, energy-efficient platforms that address memory, orchestration, and compliance challenges. These agents could expand into robotics and physical environments, reshaping industries from healthcare to logistics.
Key infrastructure components:
Planning: Breaking tasks into steps and leveraging APIs or specialized models to execute them.
Orchestration: Ensuring multi-agent collaboration with tools like LangChain and OpenAI’s Swarm.
Memory: Creating adaptive, human-like interactions and handling edge cases with tools like LangMem.
Energy efficiency: Innovating compute systems for cost-effective scaling, e.g., Modular and Ray.
Evaluation and safety: Developing diagnostics and real-time safety measures to manage risks.
Use case: Customer service agents streamline workflows by handling account verification, eligibility checks, and escalations, freeing humans for higher-value tasks.
Challenges: Balancing autonomy and safety, ensuring compliance in regulated industries, and securing sensitive data.
Vision: AI agents will integrate across digital and physical domains, enabling intelligent, scalable solutions that are context-aware and mission-driven.
My thoughts: First, Meenal is really awesome and am glad to get these insights from her. The diagram above shows the diversity of startup investments which are out there, with more every week. If you are trying to adopt agents today there is a lot of bubble gum and duct tape to make it all work together. There are not clear standards, there are not many end to end use cases which are fully functional yet and the talent the expertise to make this all work together goes beyond what a lot of enterprises have today. We have seen this exact pattern in other tech moments in the past; web services, relational databases, compute infrastructure, network, security and most recently authentication. Today a lot of this is noise until there are reference deployments at scale. We will continue to see significant investment here, the space is too large and there is a lot of room to build competing solutions.
Some new agents research
MIT researchers develop MBTL as an efficient training algorithm to improve AI agents for complex tasks - (MIT News - Adam Zewe) (Original Paper - Model-Based Transfer Learning for
Contextual Reinforcement Learning)
The new approach enhances the reliability of AI systems in dynamic environments like traffic control, reducing training costs while maintaining high performance across variable conditions. Reinforcement learning models often fail when faced with task variability. MIT's Model-Based Transfer Learning (MBTL) algorithm selects key training tasks, improving efficiency by 5–50x compared to standard methods.
MBTL advances AI training for applications in mobility, robotics, and other fields by balancing performance and resource efficiency, paving the way for broader adoption of intelligent decision-making systems.
Core innovation: MBTL uses "zero-shot transfer learning" to generalize training from a subset of tasks to a broader task set.
Efficiency boost: Achieves comparable performance using far fewer data points (e.g., 2 tasks vs. 100).
Applications tested: Traffic signal control, real-time speed advisories, and classic control tasks.
Training approach: Strategically selects tasks to maximize overall algorithm performance.
Limitations: A limitation is that MBTL is designed for a single-dimensional context variation.
Future goals: Extend MBTL to high-dimensional tasks and real-world problems like next-gen mobility systems.
My thoughts: Training data for agent models are significantly different from data needs for foundational models. MBTL proposed above takes a single dimensional approach to training on task approaches. Which is a great start from an efficiency and annotation perspective, but will create some inherent limits in models trained with this approach. This is an early proposal and as they mention in their paper, there is a lot of further research ongoing already. Excited to see more from these researchers in their next findings.
Microsoft Research develops "Droidspeak," a machine language to enhance communication between AI agents - (SingularityHub - Edd Gent)
(Original Research Paper: DroidSpeak: Enhancing cross-LLM communication)
Droidspeak significantly improves the speed and efficiency of AI collaboration, unlocking potential for multi-agent systems to tackle complex, multi-step tasks more effectively than current natural language methods allow. AI agents, powered by large language models (LLMs), traditionally communicate using natural language like English. While expressive, these human languages create computational bottlenecks due to the need for processing extensive conversational histories. Droidspeak bypasses this limitation by leveraging the mathematical representations underlying LLMs.
Droidspeak hints at a future where machine languages evolve alongside human ones, enabling more scalable, responsive, and powerful AI systems capable of solving intricate challenges collaboratively.
Efficiency boost: Droidspeak enables AI agents to communicate 2.78 times faster with minimal loss in accuracy.
How it works: Instead of exchanging natural language, agents share high-dimensional data from computational steps directly, reducing processing time.
Current limitations: The system works only between versions of the same LLM; interoperability between different models is a future goal.
Scalability potential: Faster communication could allow multi-agent systems to address larger, more complex problems.
Next steps: Researchers aim to enable communication between models of varying sizes and compress shared data for even greater speed.
My thoughts: Agents are so hot for 2025, but by the end of 2025 I anticipate the agent to agent (A2A) communication will be critical to broader agent adoption. The compute efficiency boost with the Droispeak proposal is also quite significant. Where the same foundational model is deployed in a series of foundational models the ability to cut inference times saves time, performance and overall power needs for these types of applications. This is a very early proposal in a space which is going to start to see more and more alternatives coming soon. I expect efficiency will be important, but so will security, privacy and authentication capabilities in A2A deployments.
DynaSaur🦖, a groundbreaking LLM agent, creates its own Python functions to adapt and evolve in real time - (MarkTechPost - Aswin Ak)
(Original Research Paper: DynaSaur🦖: Large Language Agents Beyond Predefined Actions)
DynaSaur overcomes the rigidity of predefined actions in traditional LLM agents, enabling AI systems to handle complex, dynamic tasks and unforeseen challenges with greater flexibility and efficiency. Traditional LLM agents rely on static, predefined actions, limiting their adaptability in real-world, evolving environments. Researchers from the University of Maryland and Adobe developed DynaSaur to address these limitations by allowing agents to dynamically generate and refine their own tools during operation.
DynaSaur sets a new standard for AI agent adaptability by blending dynamic action generation with reusable libraries, paving the way for more robust and autonomous AI systems across industries.
How it works: DynaSaur generates Python functions in real time, stores them in a growing library, and retrieves relevant actions via similarity search.
Experimental success: Achieved 38.21% average accuracy on the GAIA benchmark, outperforming all baselines, and showed 81.59% improvement when combining generated and human-designed tools.
Integration with Python ecosystem: Enables seamless interaction with external tools, web data, and computational tasks, enhancing real-world applicability.
Top performer: Secured the highest position on the GAIA public leaderboard for adaptability and problem-solving.
Implications: Marks a shift from static to self-evolving AI agents, unlocking potential for dynamic and versatile applications across diverse domains.
My thoughts: 👀This is an exciting collaboration model for tool use from a long term perspective. The progress on GAIA is very impressive and the dynamic nature of tool/code generation could have significant architectural impact on agent systems and the problems they are trying to solve. The challenge with an architecture like this becomes observability and governance. I will be interested to see how others evaluate or use this proposed architecture.
New agent investments
/dev/agents is building the operating system to enable developers to fully unlock the potential of AI agents - (CapitalG Blog - Jill Chase)
AI agents represent a transformative computing paradigm, automating tasks with autonomy. /dev/agents’ platform aims to standardize and simplify development, enabling scalable, agent-driven innovation across industries. As AI agents evolve from copilots to autonomous operators, today’s software ecosystems—designed for humans—are inadequate for AI use. /dev/agents is addressing this gap by creating foundational tools and infrastructure, much like mobile platforms revolutionized app ecosystems.
By creating a central platform for agentic AI, /dev/agents seeks to redefine how software operates, enabling developers to build smarter, more intuitive, and transformative applications.
Vision: A shared system for understanding user needs, providing UI tools and APIs for building agentic apps.
Leadership: Founded by pioneers from Stripe, Google, Meta, and Figma with deep expertise in mobile, VR, and AI platform development.
Current focus: Solving UX and system challenges created by AI agents, with a small, skilled team in San Francisco.
Investment: $56M Seed round led by CapitalG and Index Ventures, with participation from top investors and AI luminaries.
Impact: Positioning AI agents as a cornerstone of the next computing wave, much like mobile and internet platforms before them.
My thoughts: This is a solid team taking on the problems I outlined in the first section up above. This is a team which has built nascent platforms in the past. Both of the founders built consumer platforms, enterprise lifecycles and requirements are very different. The framing as an OS for agents makes a bunch of sense to me. One of the core components of an agentic system is the way for tools to be used. The agent model needs to be able to generate code or instructions but it also needs an environment in order to run those tasks. Where those tasks run is the true agent race, especially in the enterprise. It is the rail that everything else will need in order to function. Will be keeping a close eye on their progress.
How agents could affect our lives
As AI agents evolve to mimic human behavior and act autonomously, society must address the risks of misuse, deepfake potential, and the right to know whether we’re interacting with an AI or a person - (MIT Technology Review - James O'Donnell)
AI agents come in two types: tool-based agents that perform tasks using external tools (e.g., Salesforce and Anthropic’s new agents) and simulation agents designed to replicate human personalities and behavior. The lines between these types are blurring, with new research showing agents can convincingly mimic human values and preferences after brief interactions.
AI agents could revolutionize daily life and work, acting as personal assistants or avatars, but unchecked development risks eroding trust, amplifying harm, and deepening privacy concerns.
Capabilities: Agents are transitioning from task execution to mimicking human behavior, combining personality simulation with autonomous action.
Ethical risks: Increased potential for harmful deepfakes and exploitation of personal data. For example, AI-generated replicas could be used maliciously without consent.
Transparency concerns: Users may not know whether they're interacting with a person or an AI agent, raising questions about disclosure and accountability.
Case study: Recent Stanford research showed simulation agents could replicate participants' values and behaviors with stunning accuracy after two-hour interviews.
Future implications: As these technologies advance, robust ethical frameworks and regulations are urgently needed to balance innovation with societal trust.
AI agents offer transformative potential but must be used responsibly to enhance autonomy, relationships, and decision-making without compromising ethics or privacy - (Forbes - Cornelia Walther)
AI agents can emulate human behavior and perform tasks autonomously, providing tools for self-growth, efficiency, and empathy. However, risks like social isolation, data misuse, and overreliance require cautious integration. AI agents fall into two categories: tool agents for task automation (e.g., scheduling or email management) and simulation agents that mimic human behavior using data from interviews or histories. While promising, their ability to replicate personalities raises significant ethical questions.
As AI agents redefine interactions, they present opportunities for self-reflection, bias awareness, and skill-building but require frameworks that prioritize ethical development, privacy, and human-centric usage.
Opportunities: Simulation agents enable granular self-reflection, practice environments for difficult scenarios, and insights into cognitive biases.
Risks: Overreliance may erode autonomy, harm genuine social connections, and lead to ethical issues like misuse of personal data or deepfake exploitation.
Practical steps:
Analysis: Differentiate agent types and understand their impact.
Assessment: Regularly review their role in your life—aid or crutch?
Adaptation: Use agents to align with values, enhancing productivity without replacing human interaction.
Advocacy: Promote transparency, consent, and responsible AI use.
Vision: AI agents should complement human thought and connection, empowering us rather than replacing what makes us uniquely human.
AI is boosting productivity across industries but risks alienating workers by automating the most meaningful parts of their jobs - (Strange Loop Canon Substack - Rohit Krishnan)
While AI enhances efficiency and democratizes certain skills, it can reduce job satisfaction by shifting roles from creative problem-solving to overseeing automated systems, raising concerns about fulfillment and workplace alienation. Studies show AI-assisted researchers discover 44% more materials, but report a 44% drop in job satisfaction as idea generation—deemed the most rewarding part of their work—is largely automated. Similarly, tools like GitHub Copilot upskill lower-skilled developers but diminish autonomy and creativity in higher-skilled roles.
AI's promise of increased productivity could alienate workers by turning creative and skilled professionals into monitors of automated processes, echoing the repetitive, dehumanized tasks of past industrial automation waves.
Efficiency gains: AI automates idea generation and project management, reallocating human effort to execution and oversight.
Job satisfaction drop: Workers value mastery and ownership of tasks; automation risks eroding these elements by centralizing creativity in machines.
Historical parallels: Similar patterns appeared with assembly lines, algorithmic warehouse management, and banking automation.
Democratization of skills: AI narrows the gap between high- and low-skilled workers but may leave specialists feeling undervalued.
Future challenge: Avoiding alienation requires redefining fulfillment, emphasizing mission-driven work, and reintegrating creativity into AI-enhanced roles.
My thoughts on all 3 of these: Now that we are starting to see agents in the market, we are seeing a shift in concerns around AI in general to being more specific with agents. This is because the tools which agents have access to open up more opportunities to operate directly with the world around them. The way agentic systems work, the data the models are trained on and even the ability to take multiple steps are so different from LLMs. Asking the questions posed in this article are very important, but it is also very evident that most of the agent space is already doing this. An example would be the developer release by Anthropic and the transparency in their own safety and privacy analysis. Microsoft has also done this with Recall, pulling the product from market plans multiple times themselves for not adequately covering some of these concerns. Seeing proactive work being done here is so much better than waiting for these concerns to be addressed in regulation in the future.
I also think the jobs discussion is important. How do we design new roles to work efficiently and in a rewarding way with agents? I am hoping to see some organizational research in this area soon, which can help influence user interface design as well as human oversight plans for these products. Academic papers refer to the model interacting with a human as collaboration or human-in-the-loop. Today collaboration is used to supplement information needed to complete a task. Perhaps collaboration in the future will take on an important role in not just getting information but motivating the human in their role.
Honorable mention, the story of Meta and Llama
Mark Zuckerberg rebuilds Meta around Llama, transforming the company’s AI strategy and industry standing - (Fortune - Sharon Goldman) (Reprint Free version on Yahoo News)
Llama’s open-source approach has positioned Meta as a leader in generative AI, enabling rapid innovation while reigniting debates on openness versus proprietary models in AI development. Initially released in 2023, Llama AI models marked a pivot for Meta after struggles with the metaverse. Open-sourcing Llama accelerated adoption, catching up with competitors like OpenAI and Google by leveraging external contributions and Meta’s vast user data.
Llama represents a broader shift in the AI industry toward open ecosystems, potentially democratizing AI while challenging Big Tech monopolies. However, its openness raises ethical, security, and geopolitical concerns.
Llama’s reach: Now powers Meta’s AI assistant, Ray-Ban glasses, Quest headsets, and advertising tools, with over 600M downloads globally.
Why open-source: Accelerates innovation through community contributions while establishing Llama as a potential industry standard akin to Linux.
Criticism and risks: Concerns include misuse by malicious actors and geopolitical tensions, with critics pointing to China’s military use of Llama.
Economic puzzle: Despite $40B+ in annual AI investments, Llama is free to most users, leaving monetization strategies reliant on future ad and subscription services.
Zuckerberg’s vision: The move secures Meta’s position in the next tech wave, avoiding dependency on closed ecosystems like Apple and Google while pursuing advanced AI goals, including AGI.
My thoughts: This was such a pivotal moment for Meta and the industry. I loved reading this look back article about Llama. Really proud of everything we did.
Mike, I save and read these notes. They are helpful. Please continue to write them!