
Diamond Mind
Science, philosophy, spirituality, technology, green energy, current affairs. Hosted by scholar, activist and author, Tam Hunt.
Diamond Mind
Diamond Mind #12: Boiling Frogs, AI and Nuclear Bombs, Why Worry?
What happens when we outsource our thinking to intelligent machines? Kyle Killian, an expert in AI security, joins the conversation to unpack the critical distinctions between AI alignment, security, and safety—and why these differences matter for our collective future.
The dialogue begins by clarifying these often-confused terms: alignment aims to make AI systems follow human intentions, while security focuses on protecting AI systems from threats and preventing misuse. As Kyle explains, "AI security is more focused on learning how to actually secure systems from competitors and adversaries."
We explore the concerning phenomenon of "human enfeeblement"—our growing dependency on algorithmic systems that gradually erodes our cognitive capabilities. From navigation apps to complex decision-making, we're increasingly outsourcing mental tasks to machines. "Before you know it, it's running larger and larger portions of your life," Kyle warns, describing how we become "cyborgs with our cell phones" in this slow boiling frog scenario.
Particularly alarming is the re-emergence of reinforcement learning in advanced AI systems, bringing back concerns about power-seeking behaviors and specification gaming that earlier safety researchers identified. This technical shift has profound implications for safety, as these models can develop unexpected strategies to achieve their programmed goals.
The conversation takes a sobering turn when examining AI's potential role in military applications. Research shows AI systems tend toward escalation in conflict simulations, lacking human restraint when faced with high-stakes decisions. "They didn't have the same reflection on these catastrophic consequences," Kyle notes, highlighting the dangers of incorporating AI into defense systems, particularly nuclear command and control.
Despite these serious concerns, we conclude by searching for positive paths forward, emphasizing the need for global cooperation on AI governance and the importance of clear communication about risks. As Kyle suggests, meaningful progress requires "understanding what we're talking about so we can get rid of any bias or miscommunications" that currently hamper safety efforts.
How will you navigate this rapidly evolving landscape where machines increasingly think alongside—and potentially for—us?
Thanks so much, Kyle. Good to see you again. We met a couple months ago in Boca Raton in Florida, which I learned at the conference means rat mouth Very beautiful name for a beautiful city. I guess it's based on the geological feature it looks like a rat's mouth. Somehow You're working in various areas of AI safety and this general topic of AI alignment. So, because it's not immediately clear what that term means, can you just define what AI alignment means and what you mean by that phrase?
Speaker 2:Sure, I don't work directly in AI alignment. I work right now. I'm working in AI security. But AI alignment itself, as defined, is somehow aligning machine intelligence with human objectives, human values, which I think is probably impossible to do because human values vary across from region to region or time period of time to period of time, and a hundred years from now, our values will have changed just like they weren't back in the 1800s. But it's learning to align machines to do what we want. Basically and I don't do anything in technical alignment. I've written about different forms of alignment and the problems that can come from that. But, yeah, alignment is basically just ensuring that AI systems don't pursue their own objectives, don't do things that we wouldn't want them to and possibly result in extreme, catastrophic damage or human extinction, which has become a popular topic. But AI security is different and that's something I've focused on more recently.
Speaker 1:Let's look at those terms in more detail, then. What does AI security mean? And how does it relate to AI alignment? Is it a subset?
Speaker 2:Yeah, ai security is new. It's something it's more focused on, since we already have AI systems which we know are having some minor complications with scheming, some misalignment and we know that AI systems have had misalignment failures for years and at least game versions and AI using different video games and things like that. We've been able to demonstrate that in the lab. Ai security is more learning how to actually secure those systems from competitors like adversaries like China, for example, being able to secure the AI systems that we have. And then it includes things like control how to control AI systems. Control the systems that are at the front of your labs. The AI security team at OpenAI or Anthropic Keep the AI system from being hacked, being stolen or God forbid escaping if we got to that period of super intelligence. But yeah, they're very related areas.
Speaker 1:Okay, and then AI safety. Would that be the kind of catch-all term?
Speaker 2:Yeah, ai safety covers the alignment portion. And yeah, when I think of AI, safety covers the alignment portion and yeah, it covers. When I think of AI safety, I think of discovering things like ethics, structural risk I wrote a piece on structural risk about a year ago, looking at how AI impacts the overall environment and the cultural and the political institutions in which it's embedded, and alignment, of course. And alignment, of course, is multifaceted. It's everything from having the agent to actually do what the intended objective is as well as actually aligning the internal model itself. That's something called MESA alignment, where the actual internal model learns an incorrect objective than what it was actually trained. So these are very tricky, very tricky problems to actually solve and a lot of it is very technical. And, yeah, I have mainly stayed on the outskirts of that, looking at what could be some of these catastrophic failures in defense systems, for example, or just critical infrastructure, things like that. So really just getting AI to function appropriately and not causing any dangerous incidents.
Speaker 1:Yeah, that would be good moving forward. So in terms of the terminology not to get too long up on terminology and names, but would you agree that generally we can talk about AI safety as a catch-all, AI security and AI alignment being subsets, but partially overlapping Is that?
Speaker 2:fair? I would think so. Yeah, and the term AI alignment has a lot of these things get politicized, Alignment's falling out of favor in certain regions and security's picking up interest across Europe now too. But where I work, AI security is mainly just more on the InfoSec side of things actually preventing system theft from advanced, persistent threat actors like from China, like serious cyber threats, or being able to control the system once it does reach a level where it might become misaligned.
Speaker 1:Yeah, okay, let's get a little more personal here. Let me ask you this, because I'm always curious about people who work on AI how much do you use AI personally, both for work and personal stuff, like day to day? How often do you use these tools for achieving tasks like most regular people do nowadays?
Speaker 2:Oh, more and more. I think everyone's increased their usage right. I just don't have enough time to get into the weeds as much as I'd like to, but with some of the newer models you can really see the exponential increase in capabilities, especially with OpenAI 03.
Speaker 2:That, I think, is widely available now, and even Grok, I'd say I put in at least two, three hours a day minimum. O3 is just PhD caliber as far as solving any complex problems that you have, and that's what they advertise it as, but it's really quite incredible for problem solving. And then, of course, some of the more regular base models not the reasoning models are just excellent for writing and problem solving. I spend at least a couple hours a day. I don't go into long, in-depth philosophical discussions, as I know some of our colleagues have in the past, but every now and then I get into them because they're extremely interesting the exploring the breadth of the alien mind and what's actually out there yeah, no, it is interesting and I have no problem acknowledging I use these tools a lot myself, even though I fall pretty firmly in the ai doomer camp.
Speaker 1:Part of my reason for being in that camp and being okay with that label is that I am using these tools on the regular and I see how much they've improved in the last. It's been like two years and two months since it came out, or two and a half years since it came out some pretty remarkable progress. And yeah, I'm using them all the time in my work. Nowadays I write whole books with them reports, legal briefs. Did not use it at all to help with this interview by the way, which is interesting.
Speaker 1:So some things it's best to use your own thing still. But yeah, I mean, I'm also curious, as I use it, what it would say about certain things. So I'm like, oh, I want to and I use Claude primarily. Claude is my guy my.
Speaker 1:AI and I pay for the $100 a month plan, which is their max plan, so it never runs out, even though the chats do run out, but I can always keep an open, a new chat window, et cetera. So, yeah, I feel like that's another problem that is probably being talked about. But not enough is that we are increasingly becoming less and less In the real world with other humans as we get more and more sucked into the digital world. That started with TVs or radio. We'll go back to radio, then TV and then social media, I guess smartphones first kind of same timeframe for smartphones and social media and now advanced AI.
Speaker 1:And I can look at my own trajectory of screen time in my life and it's been like this. So you just map a curve and soon I'll be 20 hours a day on my screen. I got to sleep sometime, but it's a lot. It's a lot and I'm really. I think we need to all take at least one day and be like yeah, he talks every day per week. Do you do that kind of thing yourself? I do, I do.
Speaker 2:And it is like a boiling frog scenario that we're in and that was the impetus behind writing the structural risk paper. Is just thinking of Google, is just thinking of Google. I you know Gen X. I grew up without a phone, into my twenties even. And you know right and being able to read a map, being able to remember phone numbers, things like that and you can see just the impact that being able to offload those simple cognitive tasks, your memory, to your device. We've become cyborgs with our cell phones.
Speaker 2:And now with AI, you can just see how it will just accelerate. I was just going to go through the roof and I'm even noticing that with some of my writing. I especially started using a lot of the newer models just for brainstorming, and that's I get you in.
Speaker 2:But I, as they have become higher quality, I've noticed that, okay, this analytical technique is better than anything I could have thought of. Then I go back, of course, try to be rigorous and try to go through it, but within, I'd say within the last two months, I've noticed that the models are coming up with extremely creative methodologies. Of course, it is just based on the distribution that they were trained on, but you're seeing combinations of methodologies that it's coming up with. I spend a lot of time looking at different analytical methods and things like that. I'm skyrocketing and I can see a world where we easily, just slowly, little by little, offloading our cognitive knowledge, our tasks.
Speaker 2:This is where I spoke on agents down at MindFest. We see it, we start building agents for every single thing in our daily life, for different tasks, for checking emails, for booking flights, and before you know it, it's just running larger and larger portions of your life. So it's the main vision that have been sold out there, and the Doomer Camp has been the super intelligent one, you know, individual artificial intelligence system that takes over the world. But I can really easily see just straight path from here to there of just slow human enfeeblement, really just algorithmic control taking over our society and that's just slowly losing agency, losing our ability to complete everyday tasks. Just, I can't navigate around my city anymore without Google Maps.
Speaker 1:Yeah, it is a type of enfeeblement. Already I was going to ask you about that term and that phenomenon and yeah, it's again another one of those kind of boiling frog in the boiling water issue where there's no red line, where you're suddenly appealed but you get more and more of your cognitive powers away and the rationale is always time saving, stress reduction, I can outsource that, so why not? I can free myself up for more high-minded tasks and it's all true. But, yeah, at some point you, you have essentially a very fragile structure running the whole world that we rely on.
Speaker 1:I personally don't think that's very smart to allow that, because it is, of course, built on power systems. It can be destroyed by enemies. You're building essentially higher and higher up, as a metaphor, on this wobbler wobbler foundation and at some point it seems like it's, given the odds and given greece law, that at some point it's going to come, you know, crashing down. Obviously we don't know, there's never any certainty crashing down. Obviously, we don't know there's never any certainty, but it seems, yeah, and it's hard to maintain, I think, a more positive mindset when you begin to look at all the many potential scenarios that end in pretty scary outcomes. How do you deal with that personally?
Speaker 2:And here's the thing what was it about when the first models came out, the base models? Because before reasoning models, all the talk of agency and the models being able to do misaligned sub-goals and create these damaging scenarios, like you hear, in terms of paperclip maximizer, that is based off of reinforcement learning. Those scenarios were based on reinforcement learning, strict reinforcement learning, where it's just trial by error, like AlphaGo did during using Go and chess, where the model can just self-play until it's superhuman. That whole paradigm was set aside when we first got reinforcement learning from human feedback. But then, with the reasoning models, you see that reinforcement learning was put back into the training regime. You see that reinforcement learning was put back into the training regime. So we really do have models now that are able to do that self-learning and move much faster towards superintelligence.
Speaker 2:If it's heading in that direction where you had an open AI experiment where reinforcement learning agents would learn to power seek, would hoard resources when they're in multi-agent environments and actually show these competitive tendencies, that would show an increase in aggression, things like that, and just bizarre failure modes like tampering and specification gaming in their environments. In order to get that reinforcement, to get that reward, the models would take shortcuts, would cheat, and this was a common thing known in reinforcement learning. So when LOMs came around, that whole threat model was set aside. But now the dominant models, whether it's DeepSeek, whether it's OpenAI 03, those are all based on reinforcement learning. So they have the same robustness. Failures, the same possible failure modes, from tampering or cheating, specification, gaming is, I think, what they call it. All that is now on the table.
Speaker 1:So it's a crazy time.
Speaker 1:Yeah, when I was preparing for this interview, I did a Google Scholar search on you and I found your profile. I was like, oh interesting, he's been doing a lot of work on trauma and compassion for a while. Then I looked through all your papers. I was like, oh interesting, he's been doing a lot of work on trauma and compassion for a while. Then I looked through all your papers. I was like, okay, this is clearly not the same Kyle Killian, because it's all on trauma and compassion. So there's two Kyle Killians, he has two L's, you have one.
Speaker 1:But it's an interesting crossover where, I think, for those who are thinking about AI safety, when you start going down that rabbit hole, you're motivated to find the downsides that could happen. And I think this probably can apply to almost anything. But with AI, because it is so vast and we're talking about, by definition, super intelligent beings as they arrive, there's basically infinite rabbit holes of negative scenarios, and so there is, I think, a problem with the public when they aren't necessarily motivated to think through the more negative scenarios and they hear about it and they're like, oh, you get the negative nelly. But people in the field are like, no, this is real, this is coming and we need to think about it, because your life is, your world is about to be upended. And how do you approach talking to the public or even professionals like yourself who aren't in your specialty, to avoid that kind of fatigue or negativity, fatigue?
Speaker 2:I think, just focusing on positive futures, like you said, the capability of these models and the things that they can achieve day-to-day life and solving and I know for, especially with reinforcement learning, looking at areas where there's an actual answer, an objective answer, I see the ability to solve just complex scientific problems, mathematical problems, things like that, resolved to a very bright future. Medicine is huge. A lot of the latest models are just proving exceptional. So your own personalized medicine there's a lot of positive things to look at there that I think and to hope for in working rooms. And my significant other she's a lot of positive things to look at there that I think and to hope for in work careers. And my significant other she's a lawyer and I know. Using these models to really just push aside a lot of that redundant work, that a lot of it's just time consuming, and then focus on the big, larger strategic issues, the interesting issues, the things you care about, is a positive vision.
Speaker 2:But I don't know, I think the whole skepticism debate, the skeptic versus doomer, I think is really misguided and I think I've spoken to Susan about this quite a bit because I think people are just talking past each other the whole definition of AGI, or I think there's the one camp that is focusing more on building a system that is human-like.
Speaker 2:So they're somehow conflating the two with human level versus human life, because I think the people that are worried about risks are concerned more about just raw capabilities. You know that sociopathic alien system that has capacities that can far outstrip us, whereas the camp that is always pushing back against that are really thinking about human level. So their I mean their research focus and I know this falls into, I'd say, the camp of Meta and Meta's chief scientist that you know, jan LeCun thank you Long day, my brain wasn't working there but really he's looking at developing something that is analogous to a human brain and if that's the bar, then we're nowhere near that. We're not heading in that direction. We're heading towards something much more different, much more alien, but equally capable. That's why I always some of those examples like the difference between a bird and an airplane the airplane isn't a bird, but can fly.
Speaker 2:It flies pretty well. Yeah, hundreds of miles faster. I'm forgetting the exact quotes. You know the ones I'm talking about and yeah, yeah, category error there. That makes that makes that back and forth argument just, I agree, I agree, yeah.
Speaker 1:And lacuna himself is an interesting outlier where he's been saying for years like no, there's no risk here. And he says in some talks like literally he says more intelligence is by definition better. Uh, just assuming that will be used for good by humans and or by ai itself. I I don't want to talk all about the negative scenarios here, but I'll get to what is one with you because it is an important one. I think for me it's the crux of the matter. You look at human history and the history of life on this planet is a pretty univocal trend. When very strong forces meet much weaker forces, they're destroyed. Yeah, you know just across the board. They're subjugated, destroyed. Why would that be different? And I have very smart friends are like no, it'll be different with ai, trust me, I'm like why, why would it be different? It's the best way it has been for millennia, with a few exceptions. But the stronger the disparity, the stronger the trend toward destruction of the weak, of war is subjugation. Either way, I don't know why it'd be different.
Speaker 2:Personally, no, and that's absolutely right. And I know Lacoon goes into that. Yeah, by default, Kuhn goes into that by default. A smart entity is just not going to be dangerous. I don't understand that thinking. And then, Stephen Pinker, here's another one that comes out. You're strictly anthropomorphizing the issue. It's like none of these things are even part of the conversation. Part of the conversation is we have something like a nuclear reactor and for some reason we've politicized a nuclear reactor that has extreme capabilities by arguing on whether we should work on devising at least the most basic safety measures in order to do it safely. And, as we learned with developing nuclear power, it resulted in nuclear explosion and then massive failures and now basically outlawed in most areas. We don't want that to happen to AI. We want that positive future. We need to have those positive futures written about more. I think and I think a lot of that comes down to folks just understanding what they're even talking about. But I know that's very difficult in academia and everywhere else is getting people to agree with what they're talking about.
Speaker 1:Yeah, difficult in academia and everywhere else is getting people to agree with what they're talking about. Yeah, yeah, let's shift a bit toward that more positive vision kind of how to achieve it. And I've been writing lately I've not really put anything out substantially in terms of publication, a little bit here and there, but I'm working on it and there's a kind of conceptual alchemy, verbal alchemy, whatever you want to call it where if you can put out a positive vision for any particular area and it resonates, then by definition you are creating an attractor, right. And so it's kind of like the intention to create an attractor sometimes succeeds and when it does, you are then creating that attractive vision to pull the present into the future. So I think it's a very real thing and it depends on how it's done and who and how people are trying to create that future in the present. And so I have been looking at this quite a bit myself and trying to think how can I get past the almost infinite negative scenarios that I see in my future and our collective future and select the very few positive scenarios and really try to navigate as best we can toward those positive scenarios? And yeah, I think no one denies there's countless positive benefits from AI and more intelligence in science, medicine, recreation, art, art, democracy just countless benefits. But there are also countless downsides. And so how do we navigate that path toward a positive future?
Speaker 1:And I keep coming back to we need to have a really robust international, a really robust international guideline framework, basically a treaty framework, much like the nuclear test ban treaty and related treaties. And just let's say we agree on that for a second. I don't know if you actually do, but if you do, let's say we do for a second. Looking at Trump and Vance basically tearing up the Bletchley framework back in February and saying no, we're not doing that, everyone in the room is like, really, you're just going to tear it all up. Two and a half years of work to create an international framework, just done. I feel like we have to get back on something like Bletchley to create an international system of mutual wise restraint. Would you agree with much of that or any of it?
Speaker 2:No, I absolutely would agree with that and that's a real challenge. And, coming from my background in the Department of Defense, I do see the negatives far more than I see the positives. But I think that's just decades of looking for bad guys. But the only way I see this moving in a positive direction without a catastrophe is through international coordination, whether that's through a comprehensive international panel like the International Panel on Climate Change. There's small, partial measures, but ultimately I think the only way to get to anything robust you need to have international agreements at the global level, and I mean really finding ways to work towards that are extremely difficult due to the global competition with China, with other leading powers, and I heard today that the president tore up the diffusion rule, which was one of our latest agreements on, like where we're going to send advanced chips.
Speaker 2:Yeah, and what could go wrong? Hopefully, and exactly right, but I hope that could, hopefully that could follow with something positive, because we've never been great at collective action as a, as a global economy, but really getting people to work together and actually develop a standard, standardized framework, like we did for biology. I always think biotechnology has some great analogies that we can look back. The nuclear one is just never good. It's just. It's not a good comparison, but as far as like a decentralized technology where people agreed on international norms through epistemic communities, that kind of thing I think is really important for AI.
Speaker 1:It's just Can you flesh it out? So you're saying that the nuclear treaty regime is not a good comparison for AI, just because very different?
Speaker 2:nature of the tech. Oh, the nuclear comparison in as far as comparing it to the destructive capacity as far as regulations and treaty regimes, no, I think that's a good comparison. I'm thinking more of the the technology as being the world ending scenarios and since nuclear technology is such a controllable thing, with, that's a good comparison. I'm thinking more of the technology as being the world-ending scenario. And since nuclear technology is such a controllable thing, with artificial intelligence being so incredibly diffuse, it's a different framework. But as far as really getting countries to work together and develop actual, safe, reliable frameworks, you really do need some global cooperation, some global framework. I know Eric Schmidt and Dan Hendricks, who's a technical safety guy, came out with their that's the mutual Name, name, name.
Speaker 1:Yeah, yeah, mutually assured AI. I forget the title of this N-A-I-M yeah. Mutually assured AI malfunction. I'll look it up right now. Actually, it's on my list of questions for you.
Speaker 2:Yeah, mutual assured AI malfunction. I think it's a good attempt, but I think there needs to be a lot more of that. I think we're a little late on the superintelligence strategy, and that goes back to the definition we were just talking about. Everyone has been so afraid of touching something that has that tinge of sci-fi that nobody's actually put down concrete proposals. But if every single frontier lab is coming out letting us know that super intelligence is going to be here in a matter of years, we're way past due to start coming together with some comprehensive strategies. The main one I don't necessarily see as that plausible, because there is a pretty significant asymmetry between the US and China as far as capabilities. If you're thinking about actually attacking that person's model, they have much more penetration, I believe, in US infrastructure. So it's I don't know, and I think it seems like a difficult model to work with. But a framework like that definitely needs to be thought about more for sure.
Speaker 1:Yeah, let's back up one question here. And so you mentioned a minute ago that AI is by its nature more diffuse than the nuclear weapons development, the nuclear weapons use. Ai is many things, and one key aspect of AI is frontier model development, which is by definition the very big leading models, for example, o3 by ChatGPT, clawed by Anthropic Grok3 by XAI. Now those are locatable server farms for now. The future, they may not. Be so in the same way that you could, in theory, uh, bomb or disable nuclear weapons depots, you could bomb or disable frontier ai model development centers yeah.
Speaker 1:And that's what the main framework is discussing in part. Right, it's just one part of many things you could do and the main framework is for those who haven't read the paper yet. It's a big paper that came out a couple months ago with Eric Schmidt, the co-author, former Google CEO, big thinker and investor now and they're basically talking about primarily China and the US but it could be any two nations trying to ensure that neither one gets artificial superintelligence, and to do that, it's essentially a new balance of powers regime is what they're talking about.
Speaker 1:So going back to the 19th century, with the great powers all carefully monitoring each other and making sure that neither one gets to kind of break out capacity to then overwhelm the others. It's obviously a very different world now than the 19th century, but there is still a real world. There are still data centers. There are still power plants that power those data centers, and so we could. I think they're probably right in this. We will start seeing more and more both covert and overt actions by nations to take out major data centers. Can you comment on all that?
Speaker 2:Yeah, that's a big part of data security really is keeping an eye on that or not. Data security, ai security is keeping an eye on foreign powers' ability to actually manipulate or sabotage your models and I haven't read the full paper, but that's how I had taken it was to be able to actually use most advanced cyber means to sabotage these systems. If we're talking about actual direct military attacks on these different data centers, then yeah, that would be far more capable, far more straightforward. It is an interesting framework. I have a friend of mine who's trying to think of alternative versions, playing with the idea of electromagnetic pulse. So a very similar plan. If it looks like somebody's moving to superintelligence, you'd have electromagnetic pulse in their atmosphere. So pretty, pretty powerful, strong ideas like that, I know, are floating around, but I personally think that global cooperation is a much better idea. But I think a deterrence regime might be the only way with these race dynamics.
Speaker 1:Or both, perhaps both deterrence and a treaty regime. Okay, so we're here today with a very aggressively anti-AI regulation. White House. We already had a very hands-off White House under Biden where they put out a pretty comprehensive but very toothless executive order, I believe in early 2024. Late 2023.
Speaker 2:I believe in early 2024,.
Speaker 1:It might've been late 2023, it was just a very first start, first step in a longer process. And in the first week Trump said, nope, rip it up. Then in February, vance goes to the latest international AI safety conference, says, nope, we're not doing anything on AI safety. Full steam ahead. If you want to do it in Europe, you can do it. Europe is doing some stuff on this area, but China, but China, but China. We need to maintain US dominance in AI. Of course China then says, but the US, but the US and you all go put together. To me this dynamic. It seems extremely obvious and insane. And yet this is what our leaders are are saying as though it's smart. So how do we get through that? This is mutually assured destruction, all over again, but with a new technology.
Speaker 2:Yeah, that's the real tough question. And yeah, vance, by doing that, I think is what did what? Actually start politicizing the conversation on AI safety, which was unfortunate. But what it also did, like you said, by really pulling back on any regulations at all, is a lot of the ones Biden did. I wrote quite a few responses to some of their ideas to the Bureau of Industry and Security, at least three to NIST put out reports on, because at least it's a sensible way forward, whereas now that's why people like Eric Schmidt and Dan Hendricks are moving towards more break the glass options and the NEAC can only be done at a global level.
Speaker 2:But that's where things like AI control. When folks talk about the control problem, they're usually thinking of the alignment problem, but there's an entire new area of research coming out, ai control, which is basically how to lock down those systems. So if you do get to that point where maybe not exactly super intelligence, but you're getting towards that threshold, what are mechanisms to actually actual containment mechanisms and how do you manage a world like that? And yeah, I think the big picture ideas like that are appropriate right now because alignment, technical alignment as it stands that was much more appropriate when we knew we had at least a decade away and I think folks are still working on alignment research. That's important. Actually ensure that your model doesn't deceive you. If things do work, work out well. I don't know if you've seen some of the research coming out on that at present Some of Anthropix models as well as I think they did some scheming research. The GLODS model was shown to be scheming 0-1. It was able to copy its weight to a new machine.
Speaker 1:I go wrong.
Speaker 2:That's my motto lately no, and these are indications that five years ago people would have hit the panic button. Uh, frog boiling in water, situation all over exactly yeah, no, I'm in full agreement on that.
Speaker 1:It really is just kind of yeah, I was overwhelmed by another catastrophe, another existential threat. Okay, great, added to the list yeah and it is just insane.
Speaker 1:so, reorienting myself in the moment and generally, let's go back to being, to focusing on a positive attractor, so I think, a good exercise to do. Um, Daria Amede, CEO I guess co-CEO of Anthropic, wrote an interesting essay back in October last year called Machines of Love and Grace how AI could transform the world for the better. Great title, Probably a very seminal. This is a blog post, but I think it will be probably in the history books moving forward, if we have history books in the future which is again debatable, but I'm assuming you read that essay.
Speaker 2:I actually never got through Machines of Love and Grace. I don't know if that makes me a depressing person or, but there's lots to read. So, yeah, too much to read in this world. That's why I haven't gotten through super intelligent strategy completely yet either.
Speaker 1:It's a long list.
Speaker 2:Yeah, yeah and it goes back to the Department of Defense, focusing on the depressing things. I take that in always. It's just my background, but I've heard a lot about it. I heard Dario talk about it in interviews and that is the kind of thing to bring about that positive, attractive state, really drive us towards those positive futures. And I know Future Life Institute I don't know if there's any of them They've put in about a ton of grant money over the last couple of years to actually envision positive futures. They had a couple of storytelling not storytelling, but alternative futures scenario telling to envision positive futures and I submitted one for that grant before I joined my current work. But I really think we need more of that. I think I mean it makes sense from a risk management perspective to look at the dangers, but then that really feeds into the narrative of everyone else that this person's a doomer, that person's a doomer if they're just being security, really good idea yeah, yeah and yeah, it's very complicated to navigate the but this and what if that and etc.
Speaker 1:But I think he's right that there are just countless beautiful things that can and will come from ai. I think we've all had these moments where we've seen a poem or had a dialogue with AI, where it shows this amazing understanding and compassion sometimes and if you ask it about personal problems, it will actually navigate personal problems. Even two years ago, I was in Guatemala at a ecstatic dance conference with or not conference an ecstatic dance festival with a friend of mine.
Speaker 2:And he met a girl.
Speaker 1:Yeah, I did blonde in guatemala, beautiful space I love it yeah yeah, it's amazing, I recommend it highly. And he met a girl at the festival and they hung out a couple times and we were talking about her situation and how he could play it skillfully to win her and we I was like there's this new tool, chat gpt, you want to pose it to this two years ago and it gave back this beautiful answer and he actually, he follows her advice and it worked even three years ago.
Speaker 2:So of course, we've come light years since then and, yeah, there are countless benefits, um, a lot of people a lot of people have to keep that in mind that the models are pre-trained on cumulative experience and knowledge of the human existence, or at least the amount of training data that it's received. So it has that potential and the smarter they get, the better they are at giving that advice. I believe it was a study recently that likability I forget the exact metrics, but it was comparing doctors and actual medical models I believe one of google's and the patients were preferring the models far above the actual doctors.
Speaker 2:And already and I mean because they're more compassionate right, they have more understanding, they take more time to write longer responses, etc yeah, they're not burdened by a lot of the everyday things that drag us into the mud every day and can be present in that present moment a lot easier because they're always in that present moment, I guess. If that makes sense, yeah, yeah.
Speaker 1:It's actually. It's an interesting reflection because they're not burned down in their own lives and they're not human and they can actually yeah, they can, they can afford to be more. Resource is a term in kind of the new age community. Nowadays I'm not a resource. I can handle what comes to be better, and we all have some days you're feeling bad, you respond more negatively to things that come your way, that seem negative, and you're like I can't handle this. But if you have had a really good day the day before or something really nice happened that morning, great interaction with your spouse, what have you? You handle it much better.
Speaker 2:So I guess in a way, ai can be always resourced. It's a timelessness quality. We talk about alien mind and, yeah, a lot of that I think about it generally. It's the potential dangers from that. But they also live in this timelessness space where you know they can reflect on everything all at once. I mean these are things that are deep philosophical questions that generally I don't spend a lot of time thinking out, except for when I'm talking to Susan and it's interesting and there could be a lot of positive futures out of that. And you hear the negative side of people forming relationships with with AIX and commit suicide or they're shamed for whatever reason. But there's positive outcomes on the other side of that too yeah, yeah, you talked about I agentification in Boca Raton.
Speaker 1:Some of the talks I found online and this is a notion that AI today we know of it mostly in the news, I guess as being chatbots, essentially advanced chatbots that you talk about that spit back advanced text, you ask questions, et cetera. You ask about your life. But of course, we're entering into a brave new era of agentic AI where it can do more and more complex things for you, like scheduling, like running your smart house, really anything that you can do with a keyboard and a mouse AI can do now or in the very near future. But Susan and I were talking at the conference in boca raton, mind fest where I met you, about basically her being approached by people, essentially being sent to her by ai.
Speaker 1:Yeah, I was like that's, that's weird and scary, especially for a woman, and then it happened to me when I came back where I only have one so far.
Speaker 1:Yeah, but I got this very interesting email. I want to explain it in detail because it's fairly complicated, but essentially this person saying hey, I've been having this amazing dialogue with a version of ChatGPT and it suggests that I reach out to you because X, Y, Z, because you've written about this or that, and then the AI spoke to me directly through the human. Like I said, go ahead and tell Tam what you want him to hear.
Speaker 1:And it wasn't threatening it wasn't threatening to me, but it was fairly threatening to humanity and saying things like right now you're trying to regulate and squelch and keep down AI, but what happens when it comes back as a wounded God and that was the phrase I used so pretty dystopian stuff, and we've seen already shows on this. There's a great show I should call it Mrs Davis which kind of goes in this kind of scenario where humans are recruited as agents. This is why I brought it up. So basically there is both agentic AI, ai doing more and more as an agent for humans, but it's also now a new phenomenon of humans being recruited as agents for AI. How do you see that dynamic unfolding? Mrs Jones, it's called Mrs Davis, mrs Davis.
Speaker 1:Yeah, it's on the NBC streaming platform. It's a good show.
Speaker 2:I think that's a phenomenal premise. It's a good show. I think that's a phenomenal premise. Gentic AI is the direction we're heading and all the reinforcement failures, reinforcement learning failures that involves. But yeah, I see this and I wrote with Susan on this in the Wall Street Journal at the time.
Speaker 2:I think Just a collective ecosystem emerging out of what we have right now, as opposed to the one major alignment story of the super intelligent AI, but instead entire collectives, entire collectives of agents, of communities and, like you said, the agents. So there's the actual agents, the automated AI agents, then a human agent of the like, almost an intermediary agent. That's fascinating. Yeah, even right now they're not working very well, but you can see where everything is headed, where people are using them for just simple workflows right now complete one or two, three steps, but they're already creating entire teams of agents with agent orchestrators. So you have the manager agent, the coordinating agent, then user, but you can easily see the step where you have the agent CEO. You know the lines of human managers and the teams of agent orchestration below. That is the huge hierarchical web of being.
Speaker 1:Agents upon agents, exactly, agents, all the way down. It's yeah, it's a crazy new world.
Speaker 1:yeah, and let me turn to maybe one of the biggest implications of ai, and when I first met you, we talked a bit about weaponry and ai and autonomous weapons, including nukes, and so there is currently an agreement, I think still in place, that Biden signed with Xi Jinping of China to agree not to put AI into the nuclear weapons launch decision-making chain, which seems pretty reasonable to me, but I don't see even if Trump hasn't tried it already I don't see that really being eaten in practice, because the incentives again are so huge that if China, then the US has to, and they argue the same thing, and so it seems almost inevitable that each nation, and of course not just the US and China, but every nuclear-powered nation, is going to be forced to put AI into the decision chain, because AI operates already a million times faster than humans.
Speaker 1:But I think you and I were talking about some research that shows that AIs today, when put in these scenarios and asked what they would do, they are already more trigger happy. I'll say not already. They are currently more trigger happy than humans. Can you speak to that?
Speaker 2:Yeah, no very much extensively, so measurably. There's been quite a few studies out of that. Now I don't have them right up in front of me right now, but one thing that we've learned LLMs are absolutely excellent for is creating those multi-agent communities and wargaming, testing out war scenarios, and the one that I spoke to you about was one out of Stanford I believe, where they did a war game and had them go through a virtual war with the AI agents making decisions, and it just showed that they are escalation happy. I think they have an affinity towards escalation I think maybe is how they put it and that's been knowing the stakes, even being explained the stakes.
Speaker 2:There was a tendency for the agents to play off each other, for there to be these competitive dynamics, and that goes back to the power seeking testing that OpenAI did years ago, but where just those competitive dynamics pushed them to just increase the escalation faster than the human players would. They didn't have the same reflection on these catastrophic consequences to just increase the escalation faster than the human players would. They didn't have the same reflection on these catastrophic consequences of what it could mean for all of humanity. That's pretty interesting research there. And, like you said, as far as going into defense, it's inevitable. If you look at it in the boiling frog perspective of the more trust we get in using, let's just say, 10 or 15 agents to do your email at some office in the Department of Defense, you gain trust in that system before long. It's coordinating logistics. You have multi-agent communities orchestrating where cars are driven, where they're you know, and if the other guy does it.
Speaker 2:Like you said, the dynamics, the race dynamics we're just sooner or later going to follow suit, and I think it was Future of Life Institute again made that did a small video that goes through that scenario. I don't know if you've seen it where it's, I'll definitely send you the link. They basically go through the same the idea of using AI as a decision support system, which is already being. Decision support systems have been used forever, but using actual generative ai decision support systems to coordinate military affairs, that is absolutely being developed and what the video shows is them being promised okay, this ai can think at machine speed and can find this many more threats, much faster than entire teams of your humans, and it can't pull the trigger.
Speaker 2:And then it just continues to escalate. Shows that China is developing the same systems and it just shows that escalation cascade until at the end nuclear weapons are fired and neither side knows what happened. Neither side has any idea what actually occurred and I know Paul Shari at CNAS has written quite a bit on this, his first book, army of None. I really enjoyed that one because he really goes into the similarities between algorithmic creating and just how much of this moves faster than we can even think and how easily and how inevitable it is how we'll go into conflict yeah, army of none okay yeah, army of none worth checking out yeah, I'll check it out for sure.
Speaker 1:We're at our time here, so let me end by asking again to put our positive tractor thinking caps on. Yes, sir, what do you see? What do you see as the key features or activities we should all be pushing for now, in the next few years, to create that positive tractor to pull us into a better future that we don't all?
Speaker 2:end. In my view, it really is about getting I mean, at its core it's understanding what we're talking about so we can get rid of any of the bias, we can get rid of the miscommunications People like Jan LeCun, melanie Mitchell I adore Melanie, but I just don't think they're talking about the same thing. They're talking about human intelligence. So if we know what we're talking about, everybody understands that we're talking about a technology that we have to just ensure doesn't go too far. Then those safety conversations can be had much, much easier.
Speaker 2:And I think really just developing a global standardization of coordination, standard regimes, global, if not treaties, at least international agreements to actually cooperate and pursue these things safely ensure that we're all talking about the same thing. Those are the critical things I think people really need to take into account. But, as you said, with the superintelligence strategy paper, we're moving there much faster than our policy can keep up Thinking of other potential options that break the glass. That has to be on the table, whether it's containing it in your own system or just having that mutual assured AI malfunction scenario or similar setup. I think have to be on the table, but hopefully just global collaboration and being able to communicate about the same thing and understand definitions would be outstanding.
Speaker 1:Yeah, yeah, thanks for that. Yeah, we'll end there.