Adapted from our recent webinar with Andy Siemer (co-founder and CEO) and Eric Hexter (CTO), moderated by Mimi Fernandez. We pulled the questions from the room and the chat, and edited the answers for reading. The goal was a plain conversation about what actually works when you put AI agents on real software work, not what looks good in a demo.
Everyone says AI makes software 10x faster. Is that real?
It is real often enough that you should take it seriously, but the headline number hides the part that matters. You can generate an enormous amount of software in a day or two. Then you spend the next half day cleaning up, debugging, and rubbing off the rough edges. That second part is where people get frustrated and say they cannot trust "that AI thing."
Step back to the 30,000 foot view before you judge it. The thing you just built in two days would have cost hundreds of thousands of dollars in the non-AI world, with a full team of people. So the speed is real. The cleanup is also real. Both things are true at once, and the second one is not a reason to dismiss the first. It is the same cycle human teams have always had. You build a bunch of code, you integrate it, you get it into test, and then there is a round of fixing.
Why is AI so bad at estimating how long work will take?
Two reasons.
First, it was trained on the internet, which means it was trained on humans, and software developers have always been terrible at estimating. We went from story points to hours to man days to man months, and we were off by a factor of ten the whole time. The model learned from that. When it tells you something will take three weeks, it is pattern-matching to how a human would have answered, not measuring how long an agent will actually take.
Second, it is giving you a human time-scale estimate for work an agent might finish in a fraction of that. It says three weeks. You already know it is probably three hours. Green field work, a clean slate with few constraints, is easy to estimate. Brown field work, inside an existing system, is murkier.
The practical move is to make the model show its work. The same way you frame the technology stack and the business context for it, you have to ask how it is framing the estimate. What went into it? What risks did it weigh? If you do not ask, it just grabs a number out of the air the way a developer would.
If the estimates are unreliable, how do I plan and report timelines?
You still have a boss to report to, so you cannot just shrug. A few things help.
Treat the model's number as an input, not an answer. The real question is how its estimate translates to your team and your process. Two months on its scale might be two hours of generation. But those two hours can be followed by two days of debugging, where you find everything it skipped and everything it assumed wrong. Or, if you spent an hour up front letting it grill you on the plan, defined the architecture, set your guardrails, and forced tests to be written along the way, those two hours can land you on pretty much working software. The plan is what decides which version you get.
Longer term, start tracking your own data from the first project. As the AI builds smaller components, have it record what it did, how long it actually took, how long it waited, how many tokens it used, and the requirements that went in. That becomes training data for a more traditional estimation model you can back-test against later. A story point never meant anything until a team had burn-in time on a codebase, because the estimate is unique to that team, that product, that design. The same is true here. An estimate is still an estimate. There are too many dynamic inputs for it to be anything else, including whether the customer actually understands what they need.
If AI can build this much, what do experienced people still add?
Judgment, and it is the whole game.
Running a team of agents is like running a team of humans. It takes years of experience to course-correct and to spot the thing that is going to bite you later. The developers do not go away. You still need the experience to know which architectural choices to make and which requirements to double-click into and refine. Without that, the AI makes a pile of decisions on its own. Some are good. Some are quietly catastrophic months later.
Experienced people also think past the feature in front of them. Models will happily decide a bug "isn't in scope" or "was created by somebody else." But the AI is the only one who has touched the codebase. It created the bug. It should fix it. Knowing to push on that, and knowing what to fix now versus later, is experience, not prompting.
I keep hearing about vibe coding. Is it good or bad?
It is genuinely useful, and it is not the same thing as a finished product.
There are tens of thousands of vibe-coded apps being built a day. That is a good thing, because people can finally see their idea instead of describing it. A lot of people bring us a written idea or a picture and have not thought through the business requirements or how the flow actually works. These tools take an idea past paper and into something you can touch and put in front of customers to validate.
The other side: most vibe-coded solutions stay as ideas and never go live. Some do go to production, and some of those reportedly make real money. But that is also where you see social security numbers leaking, credit cards leaking, and apps that fall over at scale. Eventually that work comes back to someone with the experience to fix it. We have thought about a "vibe code rescue" service for exactly that, diving into the one-year-old MVP and working through the rough edges.
I tried to scale with AI and kept hitting walls. What approaches did you try, and where did each one break?
This could be the whole conversation, so here is the short version of the journey.
Start with no code. In a tool like Airtable or Make, the rule was: write zero code, see how far you get. Make is amazing for about 80 percent of a business trying to connect one data source to another, reshape data, and publish it. You can go a long way without a terminal or a coding agent.
Then you hit the edges. The moment you need multiple inputs to start a process, or you need to fan a data flow out and then fan it back in, the tool says no. You start building hacks around it.
So you move to a more developer-friendly tool like n8n, same category, more for a coding mind. We built a content engine in it: research, generate content, reshape it, publish it. It worked. It was also the wrong tool for that particular job. These platforms are a heavy graphical abstraction built so you do not have to write code to move data around. But we ended up writing custom code nodes and reaching out to APIs at nearly every step, jumping through hoops, and basically not using the graphical part of the graphical tool anymore.
That is the signal to go write real code. We rebuilt it as an actual AI solution in LangChain and LangGraph, and the pain went away, along with the graphical interface that was the whole appeal in the first place. So the journey runs a full circle: no code, no terminal, no IDE, all the way over to running an IDE and generating and shipping real code.
The lesson is not "always write code." It is that most people only know one tool, so they keep trying to shoehorn the job into it. Each tier solves about 80 percent and then stops. Know where the edges are so you stop fighting the wrong tool.
What is an agentic harness, and why does it matter more than the model now?
We have moved from "the model is everything" to "the harness is everything." The harness is what keeps a fleet of agents organized and moving.
The simplest way to feel the problem: you open one tab, then a second, then a third, and now you do not know which agent you were talking to, and you are scanning a wall of text like the Matrix trying to find your place. A harness keeps track of the tabs a human loses track of. Think of it like a project manager with a ticketing system. It makes tickets and assigns them to agents the way a PM would, and it gives you auditability and communication between them.
Two hard-won lessons from building several harnesses:
Simpler is usually better. The first version was built almost vibe-cody, with every feature anyone could want. After a while it got in the way, so it got rebuilt into smaller, simpler harnesses for different purposes. Same pattern as the abstraction tools above. The fancy layer eventually becomes the thing slowing you down.
The single most important feature is a heartbeat. It sounds silly, but every fifteen minutes or so the harness pings the agent: what were you doing, did you crash, did you finish? That nudge is what keeps an agent going. You can walk away, come back a couple hours later, and it has made progress. It might have stalled, but it picked itself back up.
There is a related point about abstractions. We keep building layers so a business user can generate automation without touching code. But the AI is most successful when those layers get out of its way. It can talk straight to the computer, look up reference implementations, and know where to tweak them. The more you let it work at that level, the better it does, as long as you make sure the output is tested, validated, and covers the edges.
If I will need my vibe-coded software rescued later, what should I make sure is built in now?
Things like traceability, documentation, and visibility into what the code is doing do not show up on their own. The agent will not suggest endpoint health reports, call frequency, or response-time monitoring unless those things are in the plan. Whoever is driving the tool needs the experience to put them there.
The way to get them is to make them part of every phase of the work, not an afterthought. For each piece, state three things: here is what I want, here is how you test that it was built correctly, and here is the go-to-production story for how I want to monitor it once it is live. There is a real difference between a designed experience, a dev-designed experience, and an AI-designed experience, which is exactly why you want more than one kind of experienced input shaping the plan.
One more habit that pays off: when you and the agent work through a painful problem and finally fix it, the next instruction is "save that to your memory." Have it summarize the problem and the fix and store it so you never debug it from scratch again. That memory is what gives you long-term speed and reliability.
What part of a software project does AI not make cheaper?
Look first at anything where human-to-human contact still matters, because that is often exactly where the value is.
Product discovery is the clearest case. A vendor can show you a hundred ways to build something, but you and your customer should be the ones deciding which two or three are right and validating them against the market. Plenty of the surrounding product work does automate well: building estimates, backlogs, and user stories, all the artifacts that define the work for a human or an AI team. But you still want a product manager interviewing the customer. We have AI interview us all the time and it is efficient and thorough, yet most customers do not want to talk to a bot with a voice. It is awkward, even when it is impressive.
Testing is the next one. Does the thing you paid for actually work, not just look like it works? A good exploratory tester finds the strange edge cases, and that role does not disappear.
Go-live support is the third. When you put a real, new capability in front of real users, there will be issues. You made assumptions, even after talking to the customer, and some of them turn out to be awkward or slow in practice. That is when you need the service side of delivery, not just the build.
The honest summary: almost every part of software and product development can be done with a lot more output in a lot less time, which should mean a lot less money. It does not get rid of humans. It empowers them to go faster. The exceptions are industries that do not allow it yet, like parts of gaming and film protecting their IP. The likely outcome is not fewer people, it is more projects.
When is running agents in parallel actually worth it, and when does it just create more cleanup?
It depends on the person and the setup, and there are real tradeoffs at every level.
Capacity is personal. One person might manage five tabs across different branches and codebases. Someone else can handle ten. The moment you add another human, you also add the cost of staying in sync, so the math changes.
Every configuration trades one set of problems for another. Five tabs on one file system creates conflicts. Five tabs on five separate file systems removes those conflicts and creates new ones. Moving to a harness where the agents run themselves removes that set and introduces a different set again. None of this means the old discipline goes away. Pull requests and branch management still matter. They just go faster.
The way to make parallel work pay off is to make each track run long enough that you are not constantly babysitting it, and give each one a plan that includes its own testing, an adversarial tester, and a reviewer, so it surfaces its own issues. A useful pattern: keep one main thread you actively nudge on the hard problem, a second thread where AI is grilling you on the plan for the next big thing, and two or three unrelated tracks that cannot collide, plus maybe one window for small bug fixes and safe merges. That structure reduces stress because you trust the big thing is moving without needing you every ten minutes. Prepare those, start them, and go to bed.
There are two real costs. The first is mental. Running five to ten parallel agents is exhausting, and at some point you cannot deal with anything afterward. That is a genuine limit, not a character flaw. The second is that parallel means you are no longer driving. The agent is driving itself, and you are trusting it to do the right thing, which it does not always do. You can wake up to a pile of emails where it got stuck on a problem, made no progress, and quietly ran up the bill.
Does the economics actually add up? The bills add up too.
They do add up, and the frustration is mostly a trick of scale.
You get genuinely annoyed at a sixteen dollar bill. Be realistic: it is sixteen dollars, not two million. That work replaced a product manager, a designer, multiple engineers, and an architect. It is already cheaper by an enormous margin. The strange part is that we can now go very fast for very little, and we get angry on the rare occasion it gets slightly expensive.
The market underneath is in motion, which works in your favor. Anthropic is straining to scale its infrastructure and keeps adjusting pricing and policy, so there is a gray zone where certain heavy work quietly gets billed against your API even with subscription tokens left. At the same time, OpenAI's Codex shipped a release performing on par with the top Anthropic models, and several open source models are reaching similar results, though running those moves the cost from a subscription to infrastructure or a beefy machine you own.
So the guidance is: do not get good at one provider. Get good at driving the tools, because the tools change constantly. Do not make the model the cheese. Make your process and your learning the cheese, so you can swap providers in and out without getting stuck.
And start small. Get one piece working and producing the results you want before you try to parallelize anything. These are new tools, you are re-educating yourself on how they behave, and it is not always intuitive. Build up, and always check your work.
How do I stay tool-agnostic when pricing and models keep changing?
Own your knowledge layer, and the rest gets easy to swap.
The instinct to go fully agnostic can backfire. Going all-in on an open, model-agnostic setup sounds right until you notice a subscription gives you something like 30x the usage of paying per API call. Be realistic and chase the best bang for the buck, even if that means using a first-party tool like Claude Code today.
What lets you stay flexible is not avoiding the good tools. It is refusing to let your context live inside them. The magic is memory and knowledge of the system being worked on. When you spin up a fresh agent, does it start with amnesia, or can it find the breadcrumbs to learn everything?
The way to guarantee the latter: have the AI build its own wiki and memory system inside the code repository. Plain markdown, organized hierarchically, far more thoroughly than a person would bother with, because the model does not get lazy about it. Every time something breaks, save the fix there. Now if you open two different tools side by side, both read the same reference information, and none of it is sitting on one vendor's servers. You own the synthesized knowledge, so you can take full advantage of each tool while staying free to switch based on the current deal of the month. It does take intention to set up, but it is where everyone ends up, because the tools are not going away.
Should I bring you a half-built product to fix, or bring ideas first?
Both are valid, and a rescue is not the only reason to bring something half-built.
Vibe coding is not bad, and a working MVP does not automatically mean something broken. Long before AI, startups built an MVP to get funding and customer feedback, then came to us a year or two later saying the experience needed expertise and it was time to think about production, because the single server was wobbling and performance was lagging. That was a good thing that got built and rarely needed a full rewrite. The same holds now whether a human or an agent did the building. Sometimes you are not fixing anything. You just need to scale the idea.
The bigger advice, whatever your role: push AI into everything, into the people around you, and into your vendors. The old vendor conversation was "who are you, and what protects me when you go rogue?" Extend it. Ask how a vendor uses AI to satisfy your requirements faster, and how they do it without leaving broken things behind as they build new ones. It takes some experience to ask the right questions, and those questions help you tell which vendors are further down this road and which have not started.
What about platforms like Replit? Are they easier or harder to rescue later?
That whole category gets you a real distance and produces working code, and it shares two limits.
The first is control over detail. These tools are happy to collect data and generate an experience. They are not good at the precise, picky problems: this button is slightly greener than that one, these corners are square instead of round, this form is not capturing data correctly, the schema is not shaped the way you need. It becomes the 90 percent problem. You spend 90 percent of your time getting it working, then spend another "90 percent" in the last 10 to make it behave exactly the way you want, because you do not have the control you would have driving an agent in a traditional coding environment.
The second is the fixed stack. A set solution means more out-of-the-box value, but at some point what you want will not fit the stack, and bolting on a different technology may simply not be supported. That is the tradeoff.
None of that makes them bad. They get you up and running fast enough to answer the real question early: is this the right thing, and now that I can see it in bits and bytes instead of a wish on a PowerPoint, is it doing what I had in my head? For that, they are great.
Can AI handle something as detail-sensitive as branded presentations?
It can, but only after you do the hard part of teaching it your taste.
Here is a real example of the tension. If you care about brand and want every output exactly on point, that fights against wanting to generate quotes and presentations fast. The off-the-shelf "turn my ideas into a deck" tools never got the granularity right: the story being told, the look and feel, the colors. Even the AI built into the tools we already live in, like Google Drive, produces things you cannot start from, let alone use.
The fix was to stop pointing and clicking and build a tool instead. A full day and a full day of credits went into making something that takes the human inputs you would normally use to build a deck, works through phases to shape the story, and then plans that story into the specific slide types in a real Google Slides template. It took creativity to teach it where each element goes. The team will attest the output is great. Now anyone could hand it all the transcripts from every meeting with a customer, and it pulls out what that customer needs and speaks to their pain points better than a person could, pulls in images, their PowerPoints, their PDFs, and finishes in minutes or hours what used to take days, polished and in the right place. The point is not the deck. It is that the precise, brand-sensitive work people assume AI cannot do becomes possible once you invest the time to teach it.
What is the "second brain," and why does owning your own context matter?
The second brain is the set of artifacts that lets AI act as you, and the recurring lesson is to own it yourself.
It is the same idea as giving every code repo its own wiki that serves as the brain of the project, applied everywhere. We talk about context in terms of the AI's window, but the real thing is the knowledge about each project plus your own overall preferences. Feed it the artifacts it needs to be you, and at minimum you get something to react to instead of a blank screen.
There is a half-joking label for where this leads: turning into a "freedom hippie" who wants to own all of their own information. It is funny coming from people who never expected to think that way, but the freedom and ownership are the point. Once you own that information and bring it in, the outputs match your preferences far more closely. The flip side explains why mental load feels so heavy. When you are not getting the outputs you want, you are constantly tweaking and wrangling, producing more content but still having to tell it what to do, and you end the day spent.
Does AI change who the most valuable engineer is?
It flips an old rule on its head.
When you manage programmers, you occasionally meet someone far smarter than everyone else whose instinct is "I can build it myself, why would I use an open-source tool maintained by thousands of people?" That person used to be a fire creator, the one you did not want to hire, because reinventing everything was a liability.
In this world that same instinct is often right. When you go looking at how Obsidian works, or any other reference for something like a second brain, and they all disagree, it is now genuinely doable to say "hold my beer, I'll make my own." When a harness like paperclip works for one person but not for you, building your own is a reasonable move rather than a red flag. The thing that makes it safe is experience to guide the output. Without that, building your own is still a trap. With it, it is a fascinating time to be the person who wants to build the thing themselves.
If juniors skip the early grunt work, what happens to experience in ten years?
This is a real problem, and it is one we have flagged internally.
A consultancy that brings experts depends on experienced, confident people who can speak transparently about their experience and say no when they should. That confidence gets built by stepping across a lot of lily pads, the small early tasks that teach you the craft. The catch is that those exact tasks are AI's sweet spot, so they are disappearing right as fewer people sign up for computer science programs because they assume AI will take the job.
If your business thrives on talent, you have to contribute to building it, because that pool is likely to dry up. That is part of why this company started as a coding school: teach people, keep the ones who turn out to be exceptional as long as you can. In practice it means sitting with developers for hours, starting from scratch, letting them type every line, and teaching them how to learn about the AI while they use it. A lot of the work is changing the mindset from a blank screen and "what do I even ask it" to actively asking what the tool can do. Being transparent that the work is changing matters too, framed as caring about their skills for wherever they go next, not just for this company. Investing in people does not go away. It may be twice as important.
How do you cut through the AI hype and the doom online?
Follow a lot of people, take all of it with a grain of salt, and form your own opinion.
There is real worry and fear out there, including documentaries about what to be afraid of, which is exactly why these conversations are worth having: to show in real time what actually happens when you use these tools, instead of the YouTube version telling you a tool is amazing or that the world is going to crap. Some of the loudest negative voices make money from being negative and do not have the hands-on experience to back the opinions they are spreading to a large following.
That does not mean avoid the critics. Sit in some negative channels, because they are thought-provoking, just do not let them be your whole diet. The honest takeaway from two people who have been building software for decades is that this is an incredibly exciting time, if you are wired to enjoy learning. Every week there are thousands of new things to learn. If "who moved my cheese" bothers you, this will be a hard stretch. If you are curious, it is the best kind of problem to have.







