We Don’t Need More Apps. We Need an Agent.
Today I listened to a podcast episode that touched on a topic I’ve been thinking about for months. As soon as it ended, I went straight to building a demo.
To be honest, I don’t really like smartphones.
Not in the sense that I don’t use one — I carry mine every day, like everyone else. But I’ve realized that so many of the things I use my phone for — sending messages, taking notes, setting reminders, starting timers, managing my calendar, tracking expenses — could be done much more naturally by simply speaking.
Instead, I unlock my phone, tap through a few screens, and somewhere along the way I end up opening Instagram “for a second,” which somehow turns into two hours. It’s frustrating, and honestly kind of absurd.
That’s when it hit me:
We don’t actually need a huge number of apps. We need an agent.
Sending a message, logging an expense, setting a reminder, saving a gym locker code — none of these tasks really require a beautifully designed app or a dashboard. They just need to get done.
An AI agent should be able to handle all of it in the background. Whether it’s calling an Uber, saving a note, or storing a password, the interface shouldn’t matter. The task should simply be completed.
I genuinely believe that in the near future, every household will have a self-hosted AI assistant.
The phone will still exist, but mostly as a thin client — just a device for input and output. The actual agent will live on a server at home. Your memories, ideas, passwords, records, and personal data will be stored locally, not scattered across someone else’s cloud. At the same time, this agent can still connect to external services like Google Calendar, WhatsApp, and whatever else you need.
At that point, the phone becomes little more than a microphone and speaker. Or maybe there’s a home device, something like Alexa. The hardware won’t matter much. The agent is the brain; the device is just the interface.
Since I’d been thinking about this for so long, I decided to build a demo to test the idea.
The architecture is simple: three layers.
- Voice activation: Mycroft wake word + OpenAI Realtime API
- Agent reasoning: OpenCode
- Tool execution
So far, I’ve gotten basic memory, messaging, and logging features working. It’s rough, but the full three-layer loop works end to end: I speak, the agent thinks, and the tools act.
I’m not trying to promote a project here. I just want to share a direction that feels more and more correct the longer I think about it.
Mycroft is gone. Humane failed. Rabbit burned through hundreds of millions. But I think all of those attempts proved one thing: the demand is real. The problem was the approach.
Now we have LLMs. We’re no longer limited to template matching. We don’t need to build new hardware. This can run in Docker. Tools can be modular — drop in a file and the agent gains a new capability.
I know there are people already experimenting in this space — some building on Brainwave, others heavily modifying OpenCode — so it’s very possible someone has already built a better demo than mine.
But I still believe in the broader trend:
Phones will become thinner. Agents will become heavier.
I’d love to hear what others think.
And one more question I keep coming back to: could this become something commercially viable? Or are the big companies already too far ahead?