I built my own agent, called AIgernon, what sets it apart is a cognitive framework on top of it: based on the language (or the direct commands) the agent performs in one of the 3 main realms: Assess, Decide or Do.
Based on your distinction, I am in a constant back and forth between intern and contractor. At the very beginning of an app it’s mostly intern: I check output, code, give very detailed specs. As the app matures (let’s say after we had 2-3 versions in the App Store) I switch to contractor: I set a specific goal and let it work until it achieves it.
I’m weary of letting it fully autonomous because I’m not convinced agents really have reasoning in the same way we have. They can be good or very good autocomplete Turing machines, but cannot handle the unexpected in the same way we do. And I don’t think it’s even possible for us to formalize this, our own reaction to the unexpected.
The upfront design problem is the real bottleneck the intern/contractor/peer spectrum exposes. Defining the contract or the reward function forces you to understand the task well enough to specify what good looks like. That's the hard part. The sandbox is easy. Knowing exactly what success means before the agent starts running is where most tasks fall apart. Intern mode is comfortable because it lets you figure out what you actually want as you go.
I built my own agent, called AIgernon, what sets it apart is a cognitive framework on top of it: based on the language (or the direct commands) the agent performs in one of the 3 main realms: Assess, Decide or Do.
Based on your distinction, I am in a constant back and forth between intern and contractor. At the very beginning of an app it’s mostly intern: I check output, code, give very detailed specs. As the app matures (let’s say after we had 2-3 versions in the App Store) I switch to contractor: I set a specific goal and let it work until it achieves it.
I’m weary of letting it fully autonomous because I’m not convinced agents really have reasoning in the same way we have. They can be good or very good autocomplete Turing machines, but cannot handle the unexpected in the same way we do. And I don’t think it’s even possible for us to formalize this, our own reaction to the unexpected.
The upfront design problem is the real bottleneck the intern/contractor/peer spectrum exposes. Defining the contract or the reward function forces you to understand the task well enough to specify what good looks like. That's the hard part. The sandbox is easy. Knowing exactly what success means before the agent starts running is where most tasks fall apart. Intern mode is comfortable because it lets you figure out what you actually want as you go.
Fully agree with the assessment. Thank you for reading!