Safety: Why your AI should have a Girlfriend

A computer can never be held accountable; therefore, a computer must never make a management decision. — IBM Training Manual, 1979

I never took AI safety seriously. It always seemed beyond pay grade or reserved for activists, like nuclear energy or the middle east.

That changed earlier this year when I built a text-based query platform for a customer. To set the stage, we were dealing with over 4,000 unlabeled tables, a handful of APIs for schema retrieval, and notoriously poor naming conventions. Until then, executives depended entirely on a small IT team to manually generate reports.

At this point I want to assure the reader that yes, I did point out to our sales team (it’s 1 guy) that were effectively being asked to build a text to sql company, and yes, my protests were promptly ignored.

Initially, I developed an AI workflow where I’d gather extensive user input, feed it into a Claude prompt, and generate queries. Then I’d retrieve results, verify user satisfaction, and repeat the process as necessary. While effective, this approach required considerable user interaction and ongoing efforts to better label data.

Seeking improvements, I exposed heavily constrained API wrappers through an MCP server as tool calls. Encouraged by immediate accuracy improvements, I expanded the toolset, eventually developing an agent capable of reliably fulfilling queries in a single step.

Reviewing the tool call logs revealed novel behavior:

When asked to identify sales trends for an ambiguous string, the agent immediately queried the database, quickly identifying it as a product group.
It then shortlisted relevant tables for sales data, performing additional queries to pinpoint exactly what data would be most relevant.
Finally, it generated and executed the optimal query and rendered the results into an accurate chart.

A human in this scenario would likely oscillate between documentation, hesitate due to potential risks, and consult with colleagues for validation. The Agent, however, acted without fear; taking the shortest, most effective path, irrespective of potential consequences. Turns out the most expensive step was intelligence collection.

We know that society will significant collateral damage for the greater good. Now, what if an LLM determined that mowing down the Amazon rainforest was as a necessary step for intelligence collection.

Where Do We Go From Here?

I’m not suggesting that LLMs are fundamentally effective due to their lack of accountability; rather, I suspect their effectiveness will diminish over time unless consequences become integral to their operations.

I often joke that AI can never write truly maintainable software until it has had to be oncall when it’s gf was over. Writing good software requires intuitive understanding and foresight, qualities developed through experiencing consequences. There’s no universally applicable best practice in software engineering; this nuance separates senior engineers from juniors.

Even now, as I productionizing the text-to-SQL application, I am implementing guardrails and scaling downstream APIs. Currently, these safeguards are coded manually, but I anticipate Agents playing gatekeeper in the future.

Perhaps, scaling AI effectiveness must involve enhancing safety. How exactly to implement this remains to be seen. My personal bias favors horizontal scaling.

I imagine a beautiful bureaucracy of agents, who are off to lunch 10 min before noon, and put cryptic notes in hidden corners of the web before quietly making backwards incompatible version upgrades. I imagine my AI agent telling it’s gf to bring a book because it is oncall that night.

Consequences, buddy. We all have to face them.

Safety: Why your AI should have a Girlfriend

vansh

2025/07/08