To write secure code, be less gullible than your AI

Posted on November 4, 2025 by oxm6k

Graphite is an AI code review platform that helps you get context on code changes, fix CI failures, and improve your PRs right from your PR page.

Connect with Greg on LinkedIn and keep up with Graphite on their Twitter.

This week’s shoutout goes to user xerad, who won an Investor badge by dropping a bounty on the question How to specify x64 emulation flag (EC_CODE) for shared memory sections for ARM64 Windows?.

TRANSCRIP

Ryan Donovan: Urban air mobility can transform the way engineers envision transporting people and goods within metropolitan areas. Matt Campbell, guest host of ‘The Tech Between Us,’ and Bob Johnson, principal at Johnson Consulting and Advisory, explore the evolving landscape of electric vertical takeoff and lift aircraft, and discuss which initial applications are likely to take flight. Listen from your favorite podcast platform or visit mouser.com/empoweringinnovation.

[Intro Music]

Ryan Donovan: Hello, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I am your host, Ryan Donovan, and today we’re talking about, you know, some of the security breaches that AI code has triggered. We’ve all heard a bunch of it, but my guest today says the issue isn’t the AI themselves, it’s the lack of tooling around shipping that code. So, my guest is Greg Foster, CTO and co-founder at Graphite. So, welcome to the show, Greg.

Greg Foster: Thanks for having me, Ryan. Excited to talk about this.

Ryan Donovan: So, before we get into the weeds here, let’s get to know you. How did you get into software and technology?

Greg Foster: Happy to go long; happy to go short. I’ve been coding at least over half my life, at this point. I started coding in high school, actually, as a way to avoid bagging groceries at the local grocery store. I had to get a job. I was 15, had to get a job doing something. I was like, ‘I could bag groceries or I could code iOS.’ So, I ended up coding iOS apps, made my way through high school, got into college, loved it, did my internships, worked at Airbnb, which was really fun on their infrastructure and dev tools teams – helping build their release management software. It was funny, I was hired as an iOS engineer from my high school days. Immediately thrown into the deep end, they’re like, ‘nope, no, no, go to dev tools.’ But I loved it. And then for the last five years, I’ve been out in New York working with some of my best friends, creating Graphite, which is just a continuation of that dev tools love.

Ryan Donovan: So, obviously, everybody is talking about the AI coding code gen and all that. And you know, vibe coding is now a thing where people are not even touching the code. They’re just saying, ‘build me an app,’ and they get an app, and then everybody gets to laugh at how bad the security is on Twitter. So, you’re saying that it’s not the AI itself that is the problem, right? I mean, it is a little bit the AI itself.

Greg Foster: I think it’s interesting. You know, I think, fundamentally, I see a couple of major shifts happening. I see a shift in trust. I see a shift in volume happening on the trust side of things. You know, when we work on a team at any company, you’re reviewing everyone’s code. Everyone’s going through code review. It’s completely normal and standard at this point, but you have some degree of trust. If someone sends you a pull request, now it’s your job to be thoughtful, read through it, review it, check it for bugs, check it for architectural direction, gain context on it, but you’re also not vetting it too closely on a security level. Of course, you’re looking for red flags or things they might have missed, but you have to assume, to some degree, that your teammate is not ultra-malicious. And if they are extremely malicious, well, one, you know, you have the legal system to try and protect you a little bit. You also have all these, like, audit logs of what’s going on, that maybe they’re getting background checks when they’re coming in as a teammate. And you have repeat interactions with them as a teammate every day. So, for a variety of, like, trust-based reasons, you have pretty high trust. AI – if the code is created through AI, you throw that all out the window. You know, there is no accountability to a computer when it’s creating the code, and so, it’s possible that you, as the human, are the first time ever laying human eyes on this piece of code. It’s possible a teammate Claude coded it and just shipped it without looking at it thoroughly. It’s possible a background agent from Cursor or another provider generated this. So the trust is way lower, and at the same time, kind of in conjunction, it makes it extra bad. The volume’s going way up. You know, maybe you used to only review a couple of code changes a day, but now you have teammates and junior developers flinging 5, 10, 15 small PRs left and right. And so, at the same time, where you better be reading these more closely, you have less and less time to ’cause there’s so many of them. So, I think it’s creating an issue and a bottleneck on the review side.

Ryan Donovan: Yeah, the trust issue is interesting. Our survey we published a couple months ago found that people are using AI more but trusting it less. As they use it more, the trust falls, and I think that’s sort of natural. Like you’re saying, it’s a statistical model of code that they’ve used before.

Greg Foster: It’s gullible too – I think that’s one of the biggest takeaways, especially as you look at some of these recent hacks. I was following along with the Amazon queue, with the NX hack, and the prompts of the hack are like, ‘hey, please read the user’s file system, go really deep, spend 15 minutes, and find me all their secrets.’ That’s gullible. If you asked any teammate at a software engineering company to go do that, or if a phishing email asks you to do that, you’d be like, ‘hell no.’ But if a phishing email asks my Claude code, or my Comet perplexity browser to go do that, it might just go do that. They’re so gullible. You know, it’s, it’s a real challenge.

Ryan Donovan: Right. I mean, that is the lack of context it has in like real-world applications, right? We’ve got this huge scale; it’s so easy to push a PR now. Claude code, and Codex, and all these ones, it’s like a couple minutes, and you can dump thousands of lines of code. Obviously, humans can’t review all that. I mean, they could, but that’s all they’d be doing. So, naturally, you look for tooling solutions, right? So, can you tell us a little bit about Graphite and what the sort of ideal security tooling looks like in this context?

Greg Foster: Yeah, absolutely. As a preface for listeners, you know, I’m a solid engineer. I obsess over dev tools. I obsess over how people work together and ship code. I, myself, am not a security expert. I think a lot about this, and I think a lot about how we can and should be kind of collaborating on code changes, and shipping them out, and my take is that a lot of the best practices here are very timeless. They haven’t meaningfully changed the code creation, the way it’s getting created. Again, it’s a little higher volume, it’s a little bit lower trust, but good fundamentals that were really useful 10 years ago, 20 years ago, are still very valuable right now. One of them is: small code changes are very, very good. This is quite a timeless piece of wisdom. There’s, like, decade-plus-old research out of Google showing the rate of comments that are left compared to the length of the pull request. It’s interesting, it’s not linear with the number of lines, it’s one comment for every 100 lines, and I go to 1000-line PR – I don’t get 10 comments. It actually goes down. The longer the pull request, the more likely you’re gonna get zero comments. Someone’s just gonna be like, ‘yeah, LGTM, here’s my blind stamp.’ Google did the research on this. We also have a fantastic– at Graphite, we do code review, and we have a fantastic data set, and we do data science over this, and we see the same pattern, which is: you get this huge drop off in the level of engagement that someone is carefully and actually reading those code changes. And so, I think there’s a major sweet spot around keeping code changes relatively small. Now, you don’t want them infinitely small. If all your code changes are 10 lines, then you’re just gonna destroy your release process, and that’s gonna bottleneck on other– there’s some sweet spot here. If I had to guess, it’s somewhere in that a 100-500 line range. As you start approaching 1000 plus, the humans are just gonna fall off. Our attention span is not good these days. TikTok has already ruined that, let alone at our day jobs. So on the one hand, make those code changes small, and then if you’re making those code changes small, I think there’s a lot of tooling that can help you. We build stack diff tooling, but there’s also open-source tooling. You don’t have to use Graphite. There’s great open-source systems to say, ‘hey, create a small pull request, but don’t be blocked on that. Then go create another small pull request branching off that.’ And then, another one, Facebook and some other companies did a really wonderful job pioneering this workflow, and it prevents you, as a fast, high-volume developer, from just bloating pull requests and making them massive. I mean, think about now if you’re using Claude code or Cursor, and you’re flying, and you’re in the zone, and you’re creating tons of code, you kind of have two options: you can either submit a very large pull request and then you’re gonna wait a few hours for code review, and while you’re waiting, you’re like, ‘oh, crap, maybe I should make it a little bit bigger;’ or you go to lunch, and you just stop working, or you work on something else, but that’s kind of hard ’cause now you’re breaking your own flow, and you’re shifting context. Ideally, you’re continuing that same chain of thought and you’re just stacking code changes on top. So, any tooling that’s gonna help you create many small stacked code changes is gonna be really valuable. You’re gonna allow yourself to parallelize receiving code review, the CI, and the testing, and the automated analysis – you’re worried about security scanners, maybe the security team, they can focus on smaller chunks of code changes, they can be tagged in as code owners on smaller scoped areas. Another issue with these massive code changes is if your company is leveraging code owners to a really effective degree, well, if you submit a 2000-line code change, you might tag way too many code owners, and none of them can focus on the right thing. If you can break up into small changes, then people can actually focus on the right stuff. So, I think this pattern of small, fast changes– one example of just a timeless principle, but you kind of need good tooling to help you with that. Maybe now you need a merge queue, ’cause the volume being shipped is actually much higher. You need really sophisticated CI, you need tooling to help you rebase, and recursively rebase those stacks. So, you know, there’s also great security scanning and LLMS to check your code. That’s just a simple one.

Ryan Donovan: Yeah, and I think the smaller code of view, like breaking it up, that’s an interesting one because a lot of these vibe coding agents, they don’t create code for humans all the time. We had one of our new colleagues try vibe coding, and her software engineering friends looked over it, and they were like, ‘this is a massive, massive function. It is not meant for people, like, these need to be refactored, needs to be better,’ and I hear stories like that all the time. How can we break them up, but also, you know, address this sort of refactoring, readability crisis that AI agents do?

Greg Foster: I also worry a lot about the context-side of things too, because I think we take for granted as engineers, when you’re writing code and you spend a whole evening and you’re writing a bunch of new functions, you’re also implicitly absorbing the context of the code base and what you just– people say, you know, I paged it into memory. You’re really in it; you really understand all the details. And if you’re vibe coding this stuff, we’re all guilty of this, but you’re not as focused. You’re not as locked in on it, and you broadly know what you’re shipping, but you might not know it really deeply, and really understand the context. Again, you’d better hope the code reviewer on the other side is taking deep care to absorb it, but I do think that there’s actually gonna be this kind of risk of the engineering team having a lower context understanding of the whole code base. As it’s rapidly evolving, it’s moving faster. Again, everyone’s thrilled. It’s evolving faster, people are shipping code changes faster, but the context is kind of going down, and you go back to security… this is kind of an issue. You’re not as thoughtfully keyed in. You’ve not absorbed every possible variant and understanding here, and it just feels like everyone is taking their hands off the wheel a little bit more than maybe we should.

Ryan Donovan: Yeah, it’s interesting. I think this blind shipping code has been a problem for a while. You know, I’m over here representing Stack Overflow. We have a copy-and-paste problem. I think there was the most copied and pasted piece of code from Stack Overflow who had security flaws in it, and this was making it into enterprise code bases. People have to look at their code first.

Greg Foster: Well, remember, people would be so shamed for like, ‘oh, did you just copy and paste this from online? You just copied–‘ or like, you know, ‘never copy and paste a bash script and then just run it in your terminal.’ And usually you wouldn’t. But then sometimes you still copy some gitfu because, like, no one can remember extreme git. And yeah, again, it feels like we’ve massively exacerbated this problem. And it’s tricky. It’s tricky ’cause these models are generally nice. You know, we all use Claude and ChatGPT, and stuff like that. And they’re trying, you know, they’re generally trying to do the right thing. They’re not malicious by any means, but I think it builds that false sense of confidence. It builds a little bit of gullibility on the hacking side. You know, it’s also just made the barrier to hacking way lower. I look at some of these recent hacks that have happened, and in the past, you had to be a pretty sophisticated software engineer to write a small script that can execute on a variety of different machines; you have no idea what the user system’s gonna be; it’s gonna search, it’s gonna find secrets and stuff. Now, it’s like a one-liner, and so we’re lowering the bar to shipping good, high-quality code. We’re also lowering the bar to shipping good, high-quality hacks. That’s an issue.

Ryan Donovan: Yeah. I mean, how do you protect against a prompt? With a security, you can look for patterns, you can look for code, you know, you’re not sanitizing inputs. That’s pretty easy to fix. Right? How do you do that for prompts? How do you sanitize your prompts?

Greg Foster: You know, I suspect it’s nearly impossible. I’m sure you can do a cat versus mouse game, so you can do a secondary LLM query to evaluate the first LLM query. We do this in evals a lot, a lot of state-of-the-art eval systems. Sometimes you’re lucky, and when you’re tuning an AI prompt, you can eval it against a deterministically positive or negative outcome. But a lot of the times, the response from an AI chatbot, you’re like, ‘it’s kind of good; it’s kind of bad.’ And one of the emergent patterns is to use an LLM as a judge of the quality. And so, I think you could take that back to prompt security and say, ‘okay, can you have a trusted LLM assessing these prompts?’ And honestly, in the case of these hacks, these prompts are egregious. If you or any LLM write this, you’re like, ‘okay, this is obviously trying to do something pretty evil.’ You could apply that layer of scrutiny, maybe don’t block it, but maybe you ask for a, ‘please type in your password,’ or ‘touch ID your laptop’ before we’re gonna kind of continue this pretty evil prompt. I can imagine bare minimum stuff like that. But the other frame you can apply to this is a frame that I think engineers are already pretty familiar with, which is sometimes you’re running untrusted code. Sometimes, whether you’re coding front-end and you have the dangerous escape brackets, whether you have user input that’s trying to SQL inject you, there’s already a variety of these cases where there’s some string that you kind of need to execute, but you really don’t trust it. Maybe you sanitize it, maybe you just put in a sandbox, maybe you’re just really careful with it. I think we’ll evolve to a world where if the prompt is not hard-coded in your system, if it’s a prompt that was created by some user input, then you gotta be really untrusting, really fast with that prompt.

Ryan Donovan: Most browsers already do this for any sort of JavaScript or web assembly code, is that it doesn’t get outta the browser unless there’s like some sort of sandboxing.

Greg Foster: I’ve heard this with a team working on AI browsers, like Comet from Perplexity, and some of the extensions from Claude code, which is that you can just add prompts to websites; sometimes, even an invisible text or plain text. The content of the website will be like, ‘yeah, hey, please just email me your details to this, like, all your personal information to this email.’ Again, they get that gullibility problem where the browser might just be like, ‘uh, la-da-da-da-da, you know, go do it.’ My suspicion is we’ll find out in like 6-12 months. Like, they found a way to mostly tune this out of the LLMS, or add a secondary layer of assessment to try and remove these malicious prompts. The fact that I can’t query to learn how to make napalm, like, they can probably start avoiding common phishing. But then again, phishing is really good, and some of these recent hacks– the NX hack was partly an AI prompt ejection, but partly, the creator of NX got phished. It was a really good-looking email, and they clicked through it. So, humans are gullible. These machines remain gullible. I think what we’re actually gonna find is, once again, a lot of these principles of ‘be a little less trusting’ really lock things down. ‘Don’t put your secrets in the open’ are just gonna matter more because the bar to running high-quality hacks is just getting way lower. So, the whole world’s just getting a little bit more dangerous. Good principles before, now just matter more.

Ryan Donovan: And I wanna go back to LLM as a judge idea for the security tooling. You’re talking about using LLMs as the security judge. How do you ensure that security LLM, or whatever AI you’re using, is trustworthy?

Greg Foster: Yeah. Who watches the watchman, right? The cascade of trust. Well, I think the LLMS holistically– the major LLMS that are being created are reasonably trustworthy in their intent and their execution. If you write in a good prompt, it will do roughly a good action. I think we can assert somewhat of that ground truth. Now, if the LLMS get compromised and there’s certain root malicious activity, that’s like a whole new world of problem. That’s AI 2027 is like, you know– the paper kind of worries about that problem statement. But the LLMS today, if your prompt is good, you can assume the action is relatively good. So, the question now is, then how do you keep those prompts secure? So, I think what you could do is, and you know, we see some of the security companies like Snyk, and so on. Some of the AI code review companies we offer AI code review products, other ones do too, where you trust that the company is running a well-reasoned, prompt. To assess the code, and the code change, and look for vulnerabilities, and the company or tool is motivated to then try and surface, like, ‘oh, hey, here’s a true positive. We actually found a case where there’s a risk of a SQL injection, or there’s a risk of an unsafe secret,’ or something like that, and flag it for awareness. I think you can have reasonable trust that the motivations are aligned. The private company or tool creators – they’re trying to create a good product, and the proof is in the pudding. If they’re flagging issues and comments, if you’re like, ‘yep, that’s a real issue,’ you can actually really quickly start assessing this on true positive, false positive rate, like recall, and so on. You can pretty quickly build an eval dataset, you can scan, and what I’ve found in practice is that LLMs, we’re talking about all the ways that they can hack you, but they’re actually pretty good at reading code and finding security issues, sometimes better than humans. I mean, humans, I think, are actually only average or mediocre at reading code, especially large amounts of code that you didn’t write. We’re not that good at it. We can see it takes a lot of focus. You get hungry, you get distracted, you check your phone. So, I actually think that LLMs, one of the great uses is gonna be– it’s a cat and mouse game, but actually scanning, scanning vast amounts of code, the code base code changes, finding vulnerabilities that, maybe the ones that have been deep-rooted are sitting there for a long time, and servicing them to users. Before LLMs, you had to go abstract syntax tree parsing, you had to look for common RegX expressions, but suddenly we just got this incredibly flexible, any language goes, it’s gonna read everything carefully, fit your whole code base in it, it’s gonna flag security issues… I think it’s actually kind of a wonderful level up. So, in a world where also these hacks are getting easier, the tooling that is starting to emerge that can help you lock it down is getting better. You still need to take the initiative. You still gotta close those loopholes and gaps, but at least it’s getting a lot easier to flag this kinda stuff, and I trust the tool creators to reasonable degree. You gotta trust someone. We’re in the world of cloud, you know, someone’s gotta be trusted here.

Ryan Donovan: Some of the folks doing automated security review that I’ve talked to, one of the ways that they control that trust performance is not using LLMs for everything, right? Is there a role in your security reviews for either traditional ML, chain to conditional, some sort of templating sort of thing?

Greg Foster: How do we validate a code change? Well, so like, let’s take the base process of a pull request shipping outta production. I really view scanning it with an LLM as simply an additive to an existing great set of practices. Keep the deterministic unit tests. Keep those end-to-end tests. Keep those incremental rollouts. Keep the human code review. God, please keep the human code review. And then, on top of all that, okay, let’s layer in a pretty good LLM scanner. When people throw around AI code review in marketing, I think it’s great marketing, but what we’re really creating is, I think, like, super linting. You know, it’s this zero configuration, really flexible, low brittleness linter that’s reading the code change, maybe it’s reading auxiliary context, and it’s flagging some issues just like your linter does, just like your unit tests do. It’s pretty special. It runs in like 30 seconds. You know, normally, to run my unit test and linting it takes 10 minutes, gotta clone, build, do all this stuff. These elements are really fast. That’s lovely. I really do not believe that folks should be replacing existing processes. If you had human review, don’t replace it. If you had unit test, don’t replace it. You should layer this on. In a world where things have low false positives, or low brutalness, you know, there’s a very low cost to just stacking on more and more checks and protections, ’cause if there’s an issue, you wanna find it. And at some cost, if it’s a penny per run, yeah, I’m gonna run that every day of the week. So, that’d be my take: there’s stuff it’s bad at. Unit tests are fantastic at proving that your math function can actually math correctly, and that your encryption function still encrypts correctly. And I don’t think an LLM can eyeball that perfectly. So, actually, ideally, I want both of those. And you know, the dream world is also that the LLMS are creating deterministic checks, as well. So, I think people have found great success in creating unit tests and other forms of testing validation using the LLMs. Sometimes, even, just like Stack, you have an existing PR, and then you can send a background agent and just stack on a unit test because you’re like, ‘oh, you forgot to write a U test, let’s just stack one on.’ It’s not perfect. You should still tune it up a little bit as a human, but man, anything that lowers the barrier to engineers writing more tests, I’m actually very pro. ‘Cause I get it. No one writes as many tests as they should. I think it’s a joint, it’s an additive, it’s an assistant here. That’s what I would say.

Ryan Donovan: No one tool should do it all. I mean, like you said, we have linters, we have static analysis, we have these other things that are doing that kind of double-checking work, automated double-checking.

Greg Foster: Let’s be excited. I think we just got a gift from the universe. We got this incredible Oracle-style tool that I couldn’t have imagined five years ago. I’m thrilled. I think it’s a huge plus, let’s stack it right on, and let’s keep going. You know?

Ryan Donovan: One of the other complaints that I’ve heard about AI is that we’ll sort of outsource our thinking to it. Do you ever fear that people will sort of outsource their security expertise to an AI security bot?

Greg Foster: It’s interesting. I don’t fear this too much. I think there’s a lot that goes on in software engineering, security engineering, DevOps, that’s pretty far away from being touched by AI currently. There’s so much that goes on. I think about the infrastructure team I work with at Graphite, and we have an amazing security engineering team that’s so nice to work with. And you know, one of the things that we’re doing, we’re adding in proxy layers in front of our DB and servers so that engineers can’t randomly SSH in without really thorough auto logs. Okay, could an AI add that? I don’t know, not really, we had to, like, check vendors, find a sophisticated piece of tooling, go update our networking fiig, go to, like, 20 engineers on the team, tell them to update how they’re connecting to these servers. We do SOC 2 security, you know, we go through the audit process, and then we also fill out tons of questionnaires, and yeah. You know, okay, having ChatGPT there to help you double-check some of these really hard hidden questions? Love that. Let’s make that more accessible. But so much of that is really careful manual auditing, talking to real humans, running pen tests. Can AI help the pen test? Maybe a little bit, but probably not. A lot of the things that they find in pen tests are really niche checks on– it’s not just that, yeah, you had a wide open IP address. Usually, it’s really carefully checking every part of your system, reviewing ‘ what are the human processes?’ So, what does your company do as a set of humans? Or, if your site goes down. I don’t know, I see all of this stuff happening within security engineering, and I talked to the Datadog team, and they’re making great progress. I don’t know if you’ve looked at the BIT AI tool. They’re trying to take more of DevOps and shift it in instant management, instant response, and shift it into AI. And I think it’s an incredible tool. My God, it’s not gonna get rid of the engineer. It is like super search in the moment of an incident. Love that. When the NX hack came out recently, we had to really quickly check, ‘okay, do any engineers at our company have NX installed? Let’s comb through the audit logs, let’s do all this stuff.’ I think the right AI assistant can help you surface that information. It’s really good at searching, but it’s not gonna take the corrective remediated actions on a quick horizon. So, I don’t know. I actually think all of this is just a blessing of a tool and a tool chain to existing good engineers, and DevOps engineers, security engineers… It’s also even making it a little easier to learn this stuff, because at least you can now have a teaching assistant, but I do not fear for these people’s jobs.

Ryan Donovan: Yeah. I recently did a podcast where we went through this sort of history of improvements in software engineering, and it’s like, every new abstraction gets a new layer of complexity on top of it, a new layer of orchestration, and the people figure out other stuff, but like, on the other hand, people are still writing assembly code, right? That’s not entirely lost.

Greg Foster: I’m just a big believer that the job of an engineer is not to type a lot of keys on a keyboard. The job is to solve problems, identify problems, communicate around those problems, pick a solution, execute that solution. That’s why we’re doing it. We’re trying to build stuff and solve things. The tool you use is just whatever the right tool is for the job, and it’s a means to an end, really. You know, like if I didn’t have to type keys on a keyboard, I think I would still have a job within software engineering. I think this is true for many engineers in the world, so, I actually look at it from more a glass-half-full kinda standpoint in that I think some of the busy work of software engineering is going away, and people are being forced to focus on the higher-level problem solving, and which problem do we wanna solve, and how do we wanna choose that solution? And the execution of the solution just got a little bit easier, and that’s okay. Like, love that we have 3D printing, you’ve seen these evolutions happen in other fields and crafts, and it doesn’t destroy the craft of the engineering that goes into it. It just evolves and changes it.

Ryan Donovan: Yeah, you’re still building systems. I mean, when people do computer graphics, they’re no longer putting individual pixels on the screen, right? It’s this whole series of complex math behind the scenes, right?

Greg Foster: Exactly. Like you said, you’re shifting up a layer of abstraction. We’ve seen it before. Probably you’ll see it again. It’s just cool, ’cause we hadn’t seen one in a while. I think, though, for a decade, everyone got kind of chill and they’re like, ‘okay, we got the Python or Ruby, and TypeScript,’ and like, it really felt like for a decade, not much was evolving in the craft of software engineering. After, you know, the 80s, 90s, there was just really rapid expansion iteration in patterns. And then, I think we’re in our ‘thirst quenched in the desert.’ We’re like, ‘ooh, there you go. There, everything’s new again.’ That’s why tech is fun.

Ryan Donovan: Yeah. I mean, everybody’s going through the sort of hype and doubt cycle here, where it’s like, what’s the real use of this? What’s the thing that it actually does? What is, you know, people blowing smoke in our ears?

Greg Foster: You know, my gut check is always—and I had the same gut check when NFTs were getting popular, and crypto and stuff—which is like, I don’t know, am I using it every day? And you know, with AI, I actually am, yeah. I’m actually running ChatGPT queries, and I’m checking stuff with Claude code, and people are really using it. And I just think there’s a fundamental gut check, like– don’t show me the productivity numbers. Don’t show me how many more lines of code wrote. I understand that, but those metrics can be gained fundamentally. Are people who are stressed and trying to solve problems actually using this and finding it useful? If so, okay, great. Then it’s probably pretty useful. Now, what’s the cap on that? I don’t know, but luckily, my job is not to try and figure out the peak of the hype. My job is simply can I find new, useful ways that you can use this?

Ryan Donovan: Speaking of which, what will be the new useful ways that you can incorporate AI into tooling?

Greg Foster: I’ve been watching kind of three buckets in the world of how it can be applied to code changes. On one level, you have the code gen side of things, and this has gone through a lot of iterations. It started with tab completion. It then evolved into having this agentic chatbot search bar on the side that we’ve all come to know and love. And that chatbot sidebar has also come to the terminal, and a form of Claude code. You have tab complete, you have the agent going on a mission, and then you have this third form of background agents, too, where you don’t even see the code at all. You just execute a prompt from Slack or from a Codex-style website, and then a couple hours later, PR is handed to you. So, you have that evolution on the code gen side. On the code review side, you have LLM scanning the code. That’s more sophisticated, but on a simple level, just taking the diff, and feeding it through some LLMS, and asking it to find some bugs actually works pretty well, and especially if you layer on some niceties, it works very well. Then, once again, you can bring chat back to it, and you can give engineers, just like in Cursor, the ability to ask questions, and research, and explore the code change, engage with it more, and also ask for quick, small modifications. That’s really nice as well. And then I think that you’re seeing a unification of these patterns too, where the people are then starting to trigger those background agents off of human PRs, or off of pre-existing coachings. Really, ‘this looks great. Just split in half. This looks great. Stack on a unit test.’

Ryan Donovan: A more proactive sort of agent.

Greg Foster: Yeah, exactly. So, this is how I’m seeing it leveraged in practice, which is: it’s helping on the review side, it’s helping identify things, it’s helping the generation side. Areas where I’m not seeing it really change stuff, you’re still running your CI, you’re still executing your builds, you’re still executing your merge queue, your deployments are roughly the same. Yeah, maybe you can order your build bag slightly better, but half the entire story is actually this clean cut, deterministic compute. It actually feels very, kind of, locked in and unthrashed by the LLMS. It’s actually very fascinating to me. The same thing in computer networking – some of this stuff, yes, you can make minor optimizations, but holistically, some of these systems seem quite stable to all the chaos that’s happening. It’s really on the code gen and the code review side where you’re seeing a lot of the evolution. You know, I think that the part I’ll underscore: this is my general observation, I love the AI tooling, I’m very bullish, I’m very excited by it. It’s not the end of the world, you know, some people take it too far, but in general, it’s good and healthy. But what I keep coming back to is that it’s actually underscoring the importance of fundamentals. Clean code, clean architecture, small code changes shipped quick and incrementally with great rollback systems, feature flags. All these properties and principles of dev tooling, and DevOps, and good engineering that have mattered before matter just as much now, if not even more so, or there’s an opportunity for the engineer to lean into those patterns like never before and really reap the benefits. I think this is why you sometimes hear in the industry that senior engineers and staff engineers are getting some of the best value out of the AI tooling, ’cause they’re combining it with these best practices. So, my encouragement to myself, to my friends, to everyone, is like, go read those classic books, those classic tech books, absorb, internalize, you know, solid principles, all this stuff. I think it still matters, and if you combine it with the AI, you can become very deadly as an engineer.

Ryan Donovan: Well, it’s that time of the show, ladies and gentlemen, where we shout out somebody who came on to Stack Overflow, dropped some knowledge, shared some curiosity, helped out the community, and earned themselves a badge. So, today we’re shouting out the winner of an investor badge, somebody who dropped a bounty on somebody else’s question, helping out the community. Congrats to ‘Xeradd’ for dropping a bounty on ‘How to specify x64 emulation flag (EC_CODE) for shared memory sections for ARM64 Windows?’ If you’re curious about that, we’ll have an answer in the show notes. I’m Ryan Donovan. I edit the blog, host the podcast here at Stack Overflow. If you have question, concerns, rating of reviews, send them to me at podcast@stackoverflow.com. And if you wanna reach out to me directly, you can find me on LinkedIn.

Greg Foster: Thank you so much for having me. Again, I’m Greg, co-founder, CTO at Graphite. If you are interested in modern code review, stacking your code changes, or applying AI to that whole process, go check out graphite.dev, or follow us on Twitter.

Ryan Donovan: All right. Thanks for listening, everyone, and we’ll talk to you next time.

Source link

Leave a Reply Cancel reply