In case you missed it, last week we hosted legendary AI Researcher Jim Fan from NVIDIA AI for an incredible discussion on all things Generative AI. During a one-hour conversation with Amjad Masad, CEO and co-founder of Replit, and Michele Catasta, Replit's ML Advisor, the group discussed the recent advancements in AI and the potential impact of multi-modality on the field.
For the intro. Hi, everyone. I'm Michele. I'm gonna be the moderator for today's panel. In the past, I've worked quite a lot with large language models, especially applied to source code, and I've been recently working as an AI adviser at Gradleit. Very excited to have both Jin and Amjad today with us. So, Jin, why don't you take it away and tell us briefly about you? Absolutely. Yeah. Thank you so much for having me. So hi, everyone. I am Jin Fan, currently a research scientist at NVIDIA AI. I got my PhD degree from Stanford in twenty twenty one, advised by professor Fei Fei Li. I have worked in AI for a decade. So the beginning of my journey was in twenty twelve, which, coincided with the introduction of AtlasNet that essentially marked the inception of deep learning. And then over the next ten years, I had a great fortune to collaborate with many of the most brilliant minds in AI, like, professor Fei Fei Li, Yoshua Banjo, Andrew Yin Ilya Salskever, and Andrew Kaparthi. And they really shaped my perspectives on AI as an aspiring scientist. I watched the field grow from recognizing cat versus dog to g p d four and mid journey. So this is beyond exciting for me, and I want to share this excitement with everyone in the audience. I'll pass the mic to Anja. Thanks. I'm Anja, CEO and cofounder of Replit. I've been working on developer tools basically all my career. I was founding engineer at Codecademy. I was one of the founding engineers on React JS and React Native, and just built a lot of developer tools and worked on compilers and worked on interpreters and parsers. And one of the things that always felt sort of lacking is how, how little machine learning and the statistical approaches we applied to, to code, and got interested in that. And I remember in twenty fourteen, fifteen, I read the paper, that's became a classic on the naturalness of software that claims that you can apply NLP in particularly in gram language models to code to, to generate code and build an autocomplete system. I had a like a toy implementation of that sort of always dreamed to be able to build that into Replit and use all the data that we're we have at Replit to, to make our developers a lot more productive to make it so that we can help you learn how to code. And with the introduction of GPT two, I felt for the first time in twenty nineteen, twenty twenty, that we can build something really compelling. And since then we've been building on top of language models. Perfect. Thanks for the interest, guys. We have so much to talk about today, so it was great that we we kept it short. Let's, you know, let's start from possibly the most recent news happened yesterday. So we had GTC, which at this point has become possibly the equivalent of WWDC for AI. So this very large indeed event, and Johnson gave an amazing keynote. I was watching that yesterday. You're already having tens plus million views in a few hours. So it's clear that there is a lot of attention going on. And Jim, you have been experiencing this firsthand being part of the team and having contributed to some of the amazing advancements that have been announced there. Why don't you tell us more? And why also don't you give us your take about what you're excited about? Absolutely. Yeah. Thank you so much for bringing up NVIDIA GTC. I think this year, twenty twenty three, is an inflection point. So NVIDIA, as we all know, right, we provide and build, the best GPUs, which are used to train the best AI models. But I think NVIDIA is going beyond a pure hardware provider and becoming an enterprise first AI provider. Personally, of course, I'm excited by, you know, every time Jensen goes on stage, we'll have, like, a better GPU, a better, supercomputer. But I think I'm most excited by, the NVIDIA AI foundations, which is a kind of new foundation model as a service. It is coming to enterprise. And I think what is special about it is you can customize these large language models or even better multimodal language models for your enterprise, your unique use cases, and all your proprietary data. So you can bring our images and videos and even three d data, and we will create custom multimodal language models as well as general models for your use case. And that I am super excited about. And the the other thing is, we also have some amazing partners from day one. So Jensen announced we are collaborating with, Getty Images, Shutterstock, and Adobe So you don't need to lose sleep over copyrights anymore, and you can use these generative art with ease. And we also provide some unique types of language models like biology, which is a pretty special modality that we haven't seen deployed elsewhere. So we provide a first large scale, alpha fold API and fine tuning service, and that will power up your, drug discovery or biology research pipelines. So I'm super excited by NVIDIA AI foundations. You should check it out. And, actually, GTC is still ongoing. So go to NVIDIA's website, and you can find lots of very interesting sessions. And some of them are hand on tutorials, some of them are introduction to the latest AI technology, and graphics technology. So please check it out. Wonderful. I think, you know, you touched on a lot of several interesting topics, especially things that are happening in twenty twenty three and that are about to revolutionize the field. The one that I find most exciting personally is multimodality. The reason being, we use text as a communication with computers since in the past few years, thanks to LLMs. But I do think that we're going to see superpowers happening because the combination of using images as input and also being able to generate different modes. What is your take on it? What do you think is going to happen in the next six to twelve months in this field? What kind of models, perhaps many of the foundations will put in the hands of companies and researchers? And what kind of products would you predict we're gonna be seeing by the end of the year, thanks to multimodality? Yeah, that's a great question. So first let me quickly introduce what do we mean by multimodality? So large language models in the literal sense of the word, it is processing text, right, which are strings of characters. You've got a bunch of strings in, and then the system process it and you have strings out. And you can have it in a dialogue and and and whatnot. And multimodal LLMs will mixing other modalities like computer vision. For example, now this language model will support images and videos. And even better, maybe in the future, can support other modalities like sensory signals such as audio or even, you know, three d or even touch, robot control signals. I think that the the word multimodal has many layers of meaning, and it will just expand. Its meaning will just expand as we move towards the future. And we have already seen huge utilities of purely textual language models. It can do coding. It can write emails for you. It can already do a lot of things. But just imagine giving the system eyes and ears. What can it do? Right? It can look at your your house and maybe, you know, pitch, like, a new decoration plan for you. It can look at your notebook. You can have, like, doodles on a notebook. You can draw diagrams, and it will able to parse the pixels and then make your notebook come alive. You can talk to your notes and not just using text. And you can have, you know, fancy handwriting, and it will be able to recognize it without going through an extra step of optical scan or of OCR. So I think, you know, the possibilities here are really limitless. And recently, my team also worked on multimodal language models. So we had a paper released called Prisma, and it is actually open source. So if you cannot get your hands on GPT-four's vision API, you know, this is the best you have right now. It is open source. Everyone can download a model from Hugging Face and then run it on use case or even, you know, fine tune it for our own proprietary data. So super excited. Do I to submit a what Anjid, Do I need to submit a form to get, access to Prism? Great question. Great question. No paywall, no forms. That is our new model. Right? That's that's the real innovation. That's that's the real innovation. Yeah. No more important will be on four chan, you know, in a few days anyway. That's the recent. Ouch. No. I appreciate it. Everything is open. I'm I'm sure a lot of amazing research is gonna stem how that release. So thanks for making that happen. And as a personal take, the reason why I'm so excited about the master modality, apart from what you explained brilliantly before, is the fact that we are all exposed to machine learning and AI implicitly. Billions of users are affected by that. The very fact that you have a Facebook account or a Google account means that ML is part of your experience, even though you don't know about it yet. What multimedia is going to do is it's going to change the UX of all the applications that today look very dry or very boring to use, And it's going to enable completely new types of applications that I can't even think of myself. And I do expect this exponential adoption of AI by consumers, by end users. Because, again, the input will not be any longer writing code necessarily. You will not be any longer knowing how to use a very complicated UI. You will literally be talking and doodling and writing text. And that to me, that that's a completely new part. It's a it's a part of the shift, to be fair. I can't wait for that to happen. Absolutely. Yeah. I con completely agree. I think, you know, UX design may be more important than AI model design sometime in the future because we want these AI tools to really empower humans. And to design an intuitive interface is the same as boosting the productivity of the human users. So I think, you know, the field of human computer interaction or HCI will become more and more important as the AI models gain more and more capabilities. So I completely agree with you. Totally. Got it. Why don't we do a deep dive on LLMs, which are the dear to my heart? We we explain what is being done in the in the recent past, why they become so powerful. And I think we can circle back to this fundamental question about UX because LLMs are changing the way in which we develop applications. And they're also challenging a lot of assumptions of how a good UX looks like. And they're putting the human in the loop in a completely different way. So we're going to be talking about hallucinations and how reliable are the tags or code that's being generated. But let's start from the building blocks. And I think we can anchor to the fact that last week has probably been an unprecedented week in AI. Maybe it's going to be completely shadowed in the next few months by who knows how many other amazing releases. But we have been seeing this sequence of GPT-four being released and the Palm API being released by Google and Cloud following up right after. So all the key LLMs, all these big and strong LLMs have become available for end users. We have been seeing a lot of viral tweets and a lot of applications and all the examples of how powerful they are. Why don't we talk about what is the reason why they become so powerful? Like, is the special ingredient you know, that made GPT-four, Claude, Bart, and or CHAD GPT so exceptional compared to what was available even just a few months ago? Mjhao, would you like to take a shot at it? Sure, yeah. I mean, there's, I think what Jim said about UX is important. Like the ChatGPT was an innovation in a few ways. But one of the more important ways it was an innovation is that it put a very intuitive interface on top of GPT, right? And, you know, I think nerds, engineers sort of underestimate how important that is for regular people to be able to grok and understand how to use something, prompting something in the playground and getting green text that's totally inaccessible for the normal individual. And I think the chat interface was very, very accessible. The other thing is what happened is I think the price performance of these models has started to improve tremendously, partly driven by the hardware. Obviously, Nvidia is designing. But also, I think the sort of open secret is that ChatGPT is probably a smaller model than GPT-three. And you could tell by how fast it is. We don't know exactly what the architecture is. I know, you know, for example, Jim, your work has introduced this concept of mixture of experts. I think there's a lot of new ways where we could pack more power into these models without sacrificing on speed and performance. The other, you know, hint that could tell us that these models are actually getting slightly smaller is the work coming out of Sanford on alpaca, which took meta's LAMA model. LAMA is a large language model, and meta release behind a research form, a bunch of weights, but then they got leaked on the internet. The weights are checkpoints for model sizes between 7B and fifty two billion parameters. Alpaca was reproduced in open source on the seven billion parameter lama. It's actually phenomenal, like you can use it and you can get a lot out of it, it can answer your questions, it feels in a lot of ways chat GPT like, it was actually distilled from GPT-three, and instructed. And then so when you took, when you take a look at innovations like that, you start to understand that, oh, okay, like, we can actually make more economical models a lot more powerful via algorithmic innovations, and other architecture innovations. So I think it created a moment where you could actually create a service like this and give it away for free. It's still pretty darn expensive on OpenAI, I'm sure. But there's a lot of free services out there. They reduce chat gpts, API costs by 10x, It's also perhaps 10x faster. And all that created a moment for these models to go absolutely viral. I think GPT-four is just the latest in that. And, you know, the fact that that's also multimodal that just like icing on on the cake. And I think the cool thing about GPT four is that, you know, chat GPT, you know, often gets you eighty percent of the way into building an application, but then it fails and all sorts of interesting ways. So right now we're on this like March of the last twenty percent, I think. So GPT-four is improving on the performance. And so things that were not possible or were kind of buggy for ChatGPT now are possible GPT-four. And I think we'll just continue seeing that of the GPT-four or five, etc. So to sum up, I think it's combination of UX, price performance improvements, and just better accuracy for other models. Yeah. I think sorry. Go ahead. Go go, Jim. No worries. Okay. I completely agree with Amtrak. I think these are fantastic points, especially on the economical side. So I I did a quick back of envelope calculation when chat GBT's API pricing was published. So if you can concatenate the entire seven books of Harry Potter, just end to end concat, and then you give it to child. You tokenize it. You give it to child GBT. And how much does a single pass through the entire higher volume cost? And the calculation tells us it's four dollars and thirty cents to process the entire, you know, this whole brick of Harry Potter. Right? That's less than a cup of coffee in San Francisco. Come on. Like, holy cow. That was like a holy cow moment. Right? Like, the economy changes a lot of things as I'm just said. I will not repeat that. I think you've already covered that great point, but just to give the audience a number to to ground to. Right? So that is one thing. And I want to discuss a little bit, you know, from CHI GP to GPT four, kinda what are some of the technical insights that are new besides the usual suspects, like, you know, more compute, right, more compute, more money. I do think there are a few things that have fundamentally changed since ChiGP two until GPT four. The first is data flywheel. So we all know that OpenAI has a great, like, data pipeline. Right? You curate a lot of text from Internet, and, you kind of train and scale your model on that dataset. But, you know, what changed between ChatGPT and g v four is that ChatGPT is one of the most popular apps in history. It gained, like, one million users in five days and probably a hundred million users now. So that means we're actually giving OpenAI a lot of data by interacting with their system. And in this way, they actually know what users want. Because what users want, like, we want to write emails and we want to write code, may not necessarily be the same as you know the text, the training text on Reddit. Right? So OpenAI actually has a really good kind of data flywheel going. Because of the popularity of their app, they can collect more relevant data, user data, and because of those user data, they can train a better model. And that wouldn't be possible unless you are popular in the first place. So that's why we see, like, you know, just so much better performance in GBT four in reasoning in these kind of, like, textual problems. So I I think that is one thing that changed. And the second is human alignment. So charGBT was already kind of using these ideas of, like, instructional fine tuning or, aligning to human preferences, but I feel that GPT four took it to, like, a much greater height. So I've, seen on news reports that OpenAI hired armies of programmers to kind of curate data, and then, you know, tutor GBE four. So that's very interesting because previously, like, human annotators, you don't need a lot of domain expertise. You just come in. You look at, you know, if this image is a cattle dog and you give me a label. That that was how ImageNet was labeled. Anyone, with, you know, reasonable intelligence will be able to do that. But now, actually, the the the the bar, even for human annotators, have raised. And that is so interesting. Like, you know, someday I may not even be eligible as a human tutor to g p d five or g p d eight because I'm simply not that great at programming anymore. You know, not better at programming than g p d eight. So that is another thing that changed. And and the final part is the multimodal part. I I I do think there are just a lot more information captured in images and videos online. And we are just starting to tap into it because, you know, there is actually a limited amount of text, useful text online. And I'm pretty sure, you know, huge models like GD four is pushing the limit on the text. Right? Like, you can maybe gain a little bit more on that, but maybe not much more. But now we're tapping into this huge repository of videos. Just think about the volume of data on YouTube every single day. And we and g p d four, at least of its current form, it only processes that image. And we have not touched the video part yet. Just imagine kind of the information and intelligence that we can unleash by learning from the videos. And I think GD four is starting to do that using pixels. But the next step will be using spatial temporal pixels, which are videos, pixels that evolve over time. And there are just so much information locked in that, and we're just starting to peek into it. So I'm very excited. I think these are fundamental things that changed and just more exciting stuff will happen next. Yeah. Totally agree, Jim. I think we have been talking about scaling laws for several years at this point, and for a good reason. They have been definitely revolutionary, and they keep paying off. My personal take is that we're not going to be talking anymore about number of parameters in models. As a matter of fact, the GPT-four technical report doesn't even tell us the size of the model or the amount of data that has been used. I think we're going to have this concept of how many flops have been devoted to train a certain model. And it's all about how do we use these flops. Are they all gonna be on text? Are we gonna be on collecting as many tokens as possible? Or how are we gonna bring in more modalities? Are we gonna be smarter in the way in which we connect text with images? You know, for example, we we have been seeing semantics in graphs in vision for many years. But currently, the multimodal LMs don't even employ that kind of mapping. They just take a lot of images, a lot of text as input, and we hope that the mapping happens sort of magically. All this space has to be explored. We know that more compute pays off, but I think it's on the reserve side, there is a big opportunity in figuring out how to best use these computes. So oftentimes, people ask me, and I'm sure, Jim, it happens the same to you working in big tech. How and why would I do research as an academic in this field? It doesn't make sense because I don't have the compute power that big companies do, which is true. It's a matter of fact. But my my reply is always, there is so much ingenuity that can come, you know, from outside research and not from big tech that where we usually try to explore scaling laws as much as possible. And those ingenious ideas will make the difference in the field. Those will be the ideas that would allow us to explore scaling laws even more to an extreme and make better use of those clocks. So I think we covered why models are becoming cheaper and faster, because we found better ways to train them, starting from the Chichita paper from the mine last year that told us that model were severely under trained. And then there has been this trend in the last few months where models have become better and better. I'm extremely excited about it because like anything else in ML, it started from models that were way too expensive to deploy. Even text to speech, speech to text back in the days was running in the cloud. Now it runs on your iPhone or your Android locally, and it's extremely powerful. So the same thing is gonna happen with LLMs. This is one of the few things I'm certain about. On the other end, though, we have been talking about accuracy today, and I think it's worth to do a longer deep dive. I heard you mentioning instruct tuning, our LHF. I think we should also be covering constitutional AI from cloud and map basically how cloud behaves versus charge GPT. Let's you know, let me open the stage on this topic because I I know you will have a lot to talk about. I'm excited to hear what are your takes on that. Please, Jim, you can start. Okay. I I guess I'll briefly kind of touch on, first, ROHF. So for the audience who are not that familiar with this term, so ROI trends stands for reinforcement learning from human being feedback. So what this means is, let's, you know, first kind of visit how language models are being trained. They train using a technique called self supervised learning, which is that you download a huge amount of text from the Internet, and then you train a large model to predict the next word. And this is, like, this super simple objective is actually very rich, and it induces a lot of emergent behavior. Behaviors like reasoning or, you know, just kind of in general, like text understanding. And, you know, all of the like also including coding, like coding auto completion is a type of next word prediction. And if you can predict the next symbol in the code very well, then you probably already know a lot about kind of how programs and structures work. So this was how the original GPT-three was trained. But then the research community found that g p three is not really kind of aligned with what humans want. So just to give an example, suppose you ask g p three, explain Luna Landing to me like I am five years old. And what GBT three will give you is a bunch of sentences that look like your question, like, you know, explain how why the sky is blue like I am five year old. Explain why, you know, water feels wet like I'm five year old. Because there's nothing in the next word prediction that forces it to answer the question. Right? Like, it is equally plausible that somewhere on the Internet, it's just a bullet point of questions, and q d three just kind of complete it as if you're asking for more kind of question templates. So what the next kind of instruct GBT did is to align these models to follow the instruction that you want, and that is what we call human alignment, aligning with what we really need the model to accomplish. And the way to do it is, first, you kind of collect a dataset, and that is expensive. So you have humans looking at those instructions and write kind of four paragraphs that answer that instruction. And once you have that dataset, well, it is pretty expensive, but you will be able to fine tune your model on this expensive dataset. So it at least kind of gets how to follow instructions roughly. But now you want to refine it even more. So the way you do it is you query the model, let's say, five times given the same instruction, and the model will give you five different answers. And you have a human annotator that instead of writing the answers to that instruction, the human annotator just ranks on the the five answers that the model generated. So that is a much easier task and much less expensive in annotating, much less human effort. It's just to give you a ranking. Like, I prefer, b to c to d to a to e, for example. And from this preference ranking, you'll be able to learn a reward model. And the reward model essentially tells you that given this instruction and an answer, how good this answer is. And now this reward model kinda it's kind of acting as an autonomous human annotator. Right? You don't need to hire more human annotators to give you more reward because the model already learns how to do it. And then you can use this model to just do reinforcement learning, which means the model will figure out on its own using trial and error, how to improve this reward score. And then the rest of it is completely autonomous. Yeah, this is a separate model. This is not the same pretrained model. This is a separate reward model. Is that right? At least in the original InstructGPT paper, it is a separate model, separately trained. And that model is acting as a judge. Okay, like these. This is the question. This is the answer. Are you doing well? Yes or no? That is what the model does. And then you know this kind of this GPT model will try to maximize the score against this judge by doing trial and error. And then the rest is just, you know, completely autonomous without a human error loop. So during the RL training loop, you're actually calling into this reward model Yep. To judge. Yep. I'm interested. Exactly. Exactly. And then you can use techniques like reinforcement learning to improve what a model generates against this judge. And the rest, you know, the model can can just improve on its own. Yeah. So that is kind of how reinforcement learning from human feedback works. And, MJ, do you want to talk about constitutional AI or? I don't have a, you know, good expertise in that. But I've been looking at the self instruct, which is instruct tuning similar to what you described, but it doesn't use RL. Self instruct is what's used. It's more like a fine tuning technique. And basically, what they've done with alpaca is they would generate basically pairs of instructions and answers from GPT-three from a larger model. And you would generate 1000s of examples. And then you would find you in a smaller model based on these like, basically grand truths. And you can like, you can take a pre trained model, like a llama seven billion and like train it on the, you know, self destruct data set. And you end up with something, you know, very similar to RLHF, where it is actually listening to your commands, it's actually like able to, you know, able to follow your orders and your questions more clearly. Because, you know, anyone who remembers cpt three twenty twenty, you know, pre all these algorithmic innovations. It sometimes just ignores what you're saying, and it goes and does something else. And it had its own charm, but now people are just used to models that actually listen to them. And I constitution AI is a is another is another one. So maybe I'll pass it back to you to to explain it. Yeah. I'll I'll roughly explain kind of the intuition behind constitutional AI. So and the paper has a lot more technical details. It is a work from from Anthropic. I encourage everyone to read it if you're interested in the particular implementation details. So, like, roughly speaking, constitutional AI is fine to have a model kind of play the part of of the human in the instruct paradigm that we just discussed. So, you know, many of you might have heard Asimov's three laws of robotics. Like, the first law is you you can never hurt human. The second law is, like, you should protect yourself unless you're hurting a human, blah blah. So there are, three laws. And can we kind of have an AI observe these laws? Instead of giving them, you know, ten thousand examples, can we just specify these are the rules and never to violate them? So I developed a way where they actually use another language model that given these laws and then given a question and given an answer, tell the model if your answer is following these laws or not. And they also show that at certain scale, large language models have an emergent ability to be a really good judge. And it's actually easier to be a judge than to generate a text. I think it is intuitive. You know? For to be a judge, you just need to read a text and then give me a b c d. Are you doing well against the law, or are you breaking the law? It is relatively easy. It is more like a classification problem. But to actually generate, you you need a lot more reasoning and a lot more cognitive capacity. So kind of by exploiting this asymmetry, you can construct, again, a reinforcement learning loop. But this time, you are not doing a reinforcement learning from human feedback. You are doing reinforcement learning from AI feedback. So kind of RLAIF. Right? These acronyms going going out of time. But, you know, you know, just keeping in mind the intuition, you know, regardless of the acronym. So you are now learning from an AI judge, and the AI judge reads the constitution and tell you kind of classify your response. And in this way, you can pitch these two models kind of adversarially. So I see this as a new generation of GAN or generative adversary generative adversarial network where the generator is the language model and the discriminator is the judge. But now they're playing a much more complicated game. And by doing this, you know, back and forth and back and forth, you can align a language model against the loss that you wrote down. So I I think, you know, when I'm trying out GPT four, I find it much more aligned and also much more observant to safety rules than CHARGPT. So this is kind of a side product. You can align it to provide more useful answers. You can also align it to refuse to answer certain toxic or dangerous queries. And GBD4 does that a lot more conservatively. So, yeah, there's that phenomenon. Yeah. I I think the sort of meta point here is that generative pre trained transformer models hack a lot of intelligence, but in an undirected way. It's really hard to get them to do the thing that we want them to do, just because they are, they almost have a mind of their own. And so I think we're just discovering all these techniques to, to get these models to behave in a way that makes them useful. And for applications we wanna we wanna build with them. I I think I'm actually has raised a really interesting point. I just think instead of language model having a mind of their own, I think they have way too many minds. Yes. Yeah. Because they are training on the entire Internet, and people have just the wildest distribution of emotions and reasoning and worldviews online and language model learn from all of them at the same time. So I I see the online language models like the original GDs three as kind of a juxtaposition of mine. It has maybe, like, a million different minds juxtaposed together. But ROHF and ROAIF, these constitutional AI, is trying to collapse these minds. It was more like a Exactly. Exactly. It's kind of like, you know, in in quantum quantum mechanics, you have quantum entanglement, but you kinda force them to collapse to a certain state. That is actually a useful AI assistant. So that begs the question. Right? Like, how about the rest of the minds? I I do think the rest of our minds are still useful in another setting, maybe in a creative setting. So for example, if you want g p d four to play a role of, let's say, a grumpy old man, it it it will refuse to do so because it's kind of collapsed to the AI system. But you could imagine that in storytelling in, you know, like, cinema or scripting or all of these or, like, movie directing, you may need all the other minds that ROFF or ROAI kind of eliminated. So it's very interesting to to to think in this lens, and I'm glad you brought this up. Yeah. Maybe if I have to bring up a shortcoming, at least, you know, in my humble opinion, is the fact that we jump from one extreme to the other. So we go from this juxtaposition of several different minds into a very inflexible system. So if you especially followed, you know, how it evolved in the past few months, and there are a lot of Twitter screenshots that we missed, it has become more close and close. There have been more topics that can't be covered anymore, but it's very chatty. So if you try to use it in any other way apart from the chats, you know, arguing, it doesn't really serve a purpose. So we I think we still have to find out what is the right level of flexibility in terms of confining the brain of the LLM to map more similar to what our human would behave, and at the same time, still putting all these safety guards that are needed to put such an LLM in production. It's gonna be a journey. Think constitutional AI, in my opinion, is a step forward compared to RLHF. But it's, you know, it's still not there. Know, I don't think anyone would dare to say that code is much better than judge a bit at this point. I think we heard mostly only the opposite feedback about those two systems. Yeah, I mean, if you've seen recently, OpenAI wanted to deprecate Codecs, but they got a lot of pushback from the research community. And they were surprised by it because da Vinci three is actually better at code than codex. The reason people wanted it is because it's not instructed. And they actually like the fact that it's not instructed. So the shortcomings of da Vinci three is that, you know, it is too opinionated, whereas the original Codex was just like, you know, was just was just more honest completion engine. Yeah. It's interesting because the underlying tech is exactly the same. The vast majority of the training data is the same. You tweak just a few miles, basically, and it builds in a completely different way. So that's not what you would expect from a standard software engineer, for example, no matter how rampant they will cap a certain day. Actually, we this fun experiment with So as you know, Michaela, since you're helping, we're actually training self hosting and a lot of our our sort of code models for code complete. But just for fun, we hooked up ChatGPT inside the IDE, inside the editor as a completion engine. And it sucked really bad. Because of how adamant about chatting it is, no matter how much you prompt it, it just really wants to be a chatbot. And so we're like, because we're looking at the speed and economics, we're like, oh, maybe this is actually useful as like a completion engine turns out that you're losing a lot of flexibility once you instruct these models. Yeah. Yeah. Yeah. Yeah. It forces you Chargeability tries to interleave a lot of natural language and code because it tries to explain the code that is actually generating. Yes. That works well in a chat interface. It's of course a visible foreign ID. Unless you're doing Donald Newth Literative Programming. Did you know that? Yeah. So unless you're doing literate programming, yeah, typically it's not useful. Maybe that's where we're going apart from the crazy syntax you have at this stage. Maybe we should just condition it to, like, you know, mimic Donovan's. Exactly. Maybe that's always some prompting channels. Danny will really be a superhuman intelligence at that point. Yeah. If it's gonna mimic it. Alright. Given that we're talking about coding, you know, why don't we jump on on that? And why won't why don't we cover a bit AI for code? So to be fair enough, it was the most exciting applications of LLANS back in the days of the fact that GPT-three was so powerful at generating code and, you know, and everything that we've we've been seeing with Codex. A lot has happened in the in the past, you know, couple of years at this point. Models are becoming more and more powerful across several different programming languages, not only in generating code, but also in taking over some of the ability to start the software engineer waste hours a day on. We have seen this CoPilot announcements yesterday or this morning. I'm I'm losing track of how many things are happening. Jim, maybe we can have a quick chat about it. You know, give us the highlights of what they're excited about for that announcement. Yes. So it is an announcement from, GitHub. It's called GitHub for PilotX, and they implemented a few kind of new components based on g v e four. And and I do want to point out that, like, none of these features are particularly new because we have seen, like, either in Replit or in many community built, like, Versus Code plugins that many of these ideas have already floated around. Just that because GitHub and Microsoft, they have privileged access to g p d four, and g p four is exceptionally good at coding. So that's why, like, their their announcement, their tools, may may feel better in terms of performance because they are backed by a better language modeling coding. So I I will quickly kind of, talk about the features. And there are four kind of major features, major new features to Copilot. The first one is that, you can chat with a code, but I I think that is no surprise to anybody. Like, when ChatGPT came out, you can like, any piece of text database will be chatable, and the code base is not an exception. You'll be able to chat with a code, ask questions about it instead of reading through it and scratching your head, especially if the code base you are not familiar with. And the second feature is pull request. So the Copilot X can actually, like, look at pull request and maybe give you comments and, check bug, check check for bugs or unit test coverage, things like that. I think this is interesting because, now we're seeing these code models, code language models improving not just a single developer, but human collaboration. Right? Like, powering entire offer open source communities. Because previously, like a human, a maintainer of the repo will need to go through hundreds of poll requests every day, and they decide, you which one to add and all of that. And it's it is just anyone who has maintained huge, open source projects know, like, how mentally taxing that is. And if you don't get back to people quickly, they will complain. Right? That that's how people get help work. But now we can have this augmented capability. Maybe a single maintainer, will be able to use these tools to kind of get a quick daily summary of all the pull requests and, you know, rule out some pull requests that have obvious bugs or are obviously not aligned with the current, you know, to do list. And a human can be a lot more productive in maintaining these community contributor projects. So that's the second one. And the third one is a new command line interface that can use Copilot. So things like Bash is can be pretty awkward. And especially for some of the command line tools, some flags are really obscure. You need to go to Stack Overflow and search for a long time to find the answers. An example is FFmpeg. Any advanced capability of FFmpeg will require esoteric flags, and it's gonna take you a long time. But now you can just ask g p four in the command line without ever leaving a terminal. And the final feature is now, now that g p d four supports a much longer context length, which is up to thirty two k tokens, that is, like, a lot more, like, order almost order of magnitude more than previous models, you can now fit entire documentations in content. And that is huge because now, you know, without any, like, retrieval, like, retrieval or search or anything, you just fit in the documentation in context, and g p t four will immediately be able to answer questions about this particular tool that is well documented. So that is another very useful tool to be added to a software engineer's toolbox. So, yeah, these are some of the major features I'm most excited about. Do you know if it's a, like, a separate product that's gonna be charged differently, or is it is it, like, is it, like, more experimental Copilot features? I'm not sure. So I I tried to join all the waitlist. And by the way, the full feature somehow require four different waitlist. I I don't know why. But Yeah. No. Remember to sign up for all four if you want all of these. It's not, you know, sufficient for the first link. Yeah. So I'm I'm not sure. Maybe they will be charged differently. Yeah. Because thirty two k tokens, you can, like, really fit the entire repo in the prompt, which is pretty, pretty freaking amazing, but also it would be very expensive. Indeed. Yeah. Yeah. Each single query would be math. Yeah. We charge on thirty to cater on it. Yeah. But being able to prompt the entire repo is just because most of the work we do in ghostwriter is prompting really. It's like a lot of prompt engineering, a lot of context collapse. Go straight to chat, we do a lot of like paging out of different contexts. We use a lot of embeddings to figure out what context do you need. We use a lot of heuristics, what file you were on, etc. So most of what we do, it sort of reminds me of like, wasn't a programmer then, but like, you know, I became a programmer when computers like were starting to become really powerful and cheap in the nineties. But I think in the eighties and seventies, when you're like making game for Atari or whatever, you do all these hacks to fit fit into like, you know, a thirty two ks memory or whatever, right? And I think Carmack was really, you know, one of his first claims to fame was building scrolling in such an efficient way to build scroll games. And that's basically what like the best programmers were doing was like managing very little amount of resources. So the context issue today just feels like that, like most of the applications are bottlenecked on how much context you can sort of put into the LLM. Price. Yeah, we agree that LMs are the new computer of the of this decade, right? And we're seeing all the limitations that we were facing in the 'seventeen and the '80s, and we're learning how to program them correctly. So I love your analogy because it made me think when I started to write basic in my very young years, faced exactly the same struggles. Yeah, sorry to interrupt you, but. No, absolutely. I mean, that's, that's basically the the hard thing about it. I mean, yeah, Microsoft has a privileged access to these models. They have privileged access to the to the weights, obviously, they can optimize it in ways, you know, other OpenAI customers can't really optimize. So maybe it's like more economical for them. But the tricky part as an application developer is like, how much do you optimize? Because you know, the price performance is gonna get better. And and basically, maybe from a business perspective, do you like lose money on it in the short term, in order to make sure your users are happy, but you know that the margins will improve in the future? Hamjad, let me ask you the right question because we cover some amazing features that software developers are gonna have in the rants now or very soon. I think we all hand the people that are starting to learn how to code today because compared to when we started, they have much more in the RANs, and the experience will be much less frustrating. Why don't you give me a prediction of what do you think is going happen for people that are learning to code together with AI? And also, what's going to happen to more seasoned developers like us? How more productive are they gonna become? I heard your podcast, the 100000x developer. I would love to hear your take about this. Yeah, I think it's not unreasonable to imagine 1000x developers. And I don't see it in the current generation of LMs and AI software. I think maybe by the end of the decade, we're going to have more sort of autonomous agents that are acting on your behalf and you end up sort of being at a much higher level. So I think, like the concept of the programmer will change like today. Yes, you still have to learn how to code. You still maybe you're reviewing code a lot more than you're actually typing code. But for the best programmers typing code is actually somewhat of a bottleneck. Like if you're Carmack, like typing code, like, there's this meme that typing code is not the bottleneck or whatever, but actually some of the most productive people. Like they have a lot more ideas, and they have a lot more productivity and ways things that they want to achieve that actually typing code ends up becoming a bottleneck. So in the short term, I think thinking about Copilot ghostwriter etc. As typing aids are kind of, you know, might seem reductionist, but it's actually very very important right. I think we're entering a new phase now. So you go from typing aids. The next phase is the chat, right? So now you have sort of an assistant that is that is helping you find code that is helping you answer questions about your repo that is helping you be creative, actually like being able to talk to Ghostragger and say like, hey, what do you think this function? What do you think I should do with it? And brainstorm with it. It's actually a very fun experience. And so now you have something that's acting more like an assistant, I think the next generation and who knows, maybe it's this year, maybe it's in five years, it's going to be more a team of AIs, like, I think they're going to feel more like humans, that you're going to see them in your editor, more like, you know, talking to you, you're giving them tasks them looking at PRs and reviewing PRs submitting entire PRs. And maybe at that point, you're not actually coding anymore. Maybe you're reviewing code. Maybe you're sort of designing the system as a whole. But you are commanding an army of AIs that doing a lot of the menial type work for you. In terms of like learning how to code. I think today, it's like, you know, the experience of learning how to code is so much more fun, because you could do something really quickly. We're seeing all these threads. There was such a explosion of replete mentions on Twitter that people thought we were doing a campaign that basically replete is like the natural way to try a lot of these AI technologies. And you had people that were just learning how to code able to build pong, able to build flappy birds, able to build a lot of these interesting games that would have taken you years. And so the time from idea to something tangible that you can sense and you feel and share with someone else is collapsing. And that's a very important thing for people motivation. Part of the reason, learning how to code was very demotivating, because the reward for seeing, you know, a real application in the world was like months to years away from that your first line of code. So at a replant, we always value this idea of like, how do you reduce the time from an idea to a product or something that you can share? And with AI, it's really collapsing as in total fleet to zero. And I think we're gonna see a lot more people getting into programming. I will say that there are people that are very resistant to this technology, ideologically, somewhat just because they don't want to change their ways. And I think they're really like they should reconsider because otherwise they're gonna be left out. Think a lot of it's it's a it's a it's a rising tide that will lift all boats. But if if your boat is not willing to be lifted, you're gonna be you're gonna be left out. Yeah, and I think we hear a lot of the impact on prosperity that the GI will have. Maybe that gets from today, but I do think in the short term what's gonna happen, thanks to developer becoming way more productive, it can be ignored. We are already seeing how much software NML have done in the past few decades. Imagine what's going to happen in the next ten years of superpower developers, how much they're going to be able to create, how much wealth they're gonna be able to generate in the world. So it's extremely exciting scenario. Jim, let me ask you the last question of the panel so that we can quickly jump to the q and a. It's gonna be the usual million dollar question. So say we're going to be here one year from today having the same panel, and we're going to be talking about LLMs once again, what a surprise. Where do you think we're going to be at that point? How powerful LLMs would be? What kind of capabilities would they have? And how do you see them evolving, you know, in the next twelve months? I think that's a fascinating question that I dream about every night. And So here's my prediction. I think we'll see a lot more modalities weaved more naturally into LLMs. And g d four, the Vision API is just the first. I mentioned that we'll have a video and three d integrated very soon, and I, like, there are already exciting works at NVIDIA doing this. I'm pretty sure, like, other places are thinking about the same thing. The whole research community is gravitating towards modalities beyond text. So that is almost for sure gonna happen. And the next thing, kind of the next set of modalities would be action modalities, kind of like journalist agents rather than just language models. And what I mean by journalist agent are embodied AI that can have actuators. Like they can perceive the world in multi modality, you know, in image, three d, video, and then they can take actions to explore the environment, enact changes, and then complete tasks. And to me, that is a strict superset of charGPT or GBD four or Claude. Right? Like, from the book, thinking fast and slow, we know that humans have both system one and system two thinking. And what GBD four addresses is system two thinking, which is systematic, deliberate reasoning. Like examples are writing emails, taking exams, coding, whatever GB four has already impressed you. But what system one does is more like a reflex, which doesn't require a lot of conscious thinking. Examples are like walking, jumping, or picking up an object using five fingers, and that is robotics. And, actually, robotics is somehow a much harder problem right now than coding. And it's both, you know, happy and sad for me. Happy because I'll keep my job for a while. I am actively doing research on journalist agents, either, you know, controlling AI to play games or controlling AI to do robot tasks to control robotic arms and legs. So I'm happy. I'm gonna keep the job for a while, but I'm also kind of sad that, you know, we're still lagging behind language models. We have not seen journalist agents yet, like a robot that can go ahead and cook a spaghetti for you just from the ingredients in the fridge. It is just so much harder somehow, and that is the classical example of the Moravix paradox. The things that we think are hardest somehow turns out to be easier for AI. Somehow system one that we don't even think about is harder than system two. And I feel that, you know, GBD four, the next set of capabilities and modalities will be adding things from system one. But before system one, we still have a lot of system two problems not yet solved as we discussed in the panel today about alignment, about better coding? Can we ask agents that are just pure coding agents not trying to chat with you all the time? Can we first make a replet a thousand expat? These are already very difficult problems. Next, can it process so much videos from the world, from YouTube, from all of these places? Can it process all the videos? And then after that, we'll start to answer, can it control physical bodies? Can it be truding bodied? And all I want is a delicious meatball spaghetti. Jim, I'm curious. Like, has anyone tried to just like do massive unsupervised learning on robotics, just like translating perception into tokens and and actions and just like stick it into like a big ass model and see what happens. Was that tried? Absolutely. Like, a lot of researchers, including my own team are trying these approaches, you know, like Google Robotics and NVIDIA and many Berkeley, Stanford, many many places are starting to do this. But I think the bottleneck here is data because there are a bunch just way too much text and videos online, but there aren't any robot data like control signals or touch sensing, or all of these data online. It's not readily available online. That means we need to collect it ourselves, but then you don't scale up. The robot hardware are so expensive. Right? Like, even the industry labs, research labs cannot afford beyond maybe a dozen robot arms or two dozens. It's just so expensive. And, also, you have a rate limiting the physical world. Unlike text unlike ROHF, you can do it as long as you have you you as long as you buy more GPUs. You know, you can't really do that in the physical world. And there are many ways kind of to address this. One way is to build simulations, but I suppose we don't have time to get into that. But that's a whole other interesting topic on how we address system one. But I think, you know, we'll probably close today with many problem unsolved problems and aspirations on system two because we're not there yet. And I think there are so much economic value and utility just to growing more modalities and doing coding better in the system two recently. Yeah. Thanks for the question. Great. Absolutely. Absolutely. And I think we have an amazing panel. We have one hour in a row, especially the closing. Much to be done. I know we got more people excited to dive head first in AI and contribute to the field in the years to follow. So we have a few more minutes left. Hope, you know, attendees can stick around for a while longer because we received several amazing questions from the community. I will cherry pick a few of them, not in order of importance or how valuable they are. I'm just trying to pick some of them because I found them fascinating. But thanks, everyone, for contributing. We have one from Tinker. I I think, you know, I I will address this to you. Amjad is asking, how do we see the the process of developing and architecting your application changing because of AI? And I think you briefly covered that when you were talking about those router chats. So that's some kind of application that we we wouldn't have even thought about one year ago. But right now, the way we design has changed from the ground up. Yeah, but another thing that I've been really thinking a lot about is how large language models are not just helping us write the software, but also our component pieces inside the software itself. Right? So you know, I'll give you an example. A lot of times when you're writing software, you hit into a parsing issue, you know, that's a common task. GPT three can learn parsing with like a fifteen minute prompting session. To write like a really good parser, you know, it's like a week's work, at least if you're a good, good engineer. So it just feels obvious that people are gonna just like the iron law of software engineers is that we're all lazy. And so it's just obvious that people are going to plug in these Llms prompted for a specific task as part of the software sort of call graph. Right? And so then, but the problem is like, what are the best practices there? How do you test it? How do you make it predictable? How do you ship it? How do you develop it? How do you make it reproducible? How do you collaborate around it? And that's something that we're starting to think really deeply about at Revolut. And, you know, it feels very obvious to me that in the future, piece of software will have embedded in it some kind of large language model that's doing a lot of different tasks. The question is like, what are the best practices around that? That would also require the software to be connected to the internet all the time. Well, if you're calling an API, right? Right, right. If you're running it locally, that for sure if you're running the model Yeah. It feels like, be curious to hear your thoughts about this, but just seeing like I have now a small llama alpaca on every computer that I own just for fun. It just feels like pretty soon, like we're gonna just gonna have these models everywhere. They still, because you will need GPUs to run any software. Please. You don't think Yeah. You don't think LMs on CPU is gonna be possible? I think it's gonna be difficult. Or if, I mean, if inference time is not a big bottleneck, then maybe we can relax that that kind of constraint. But if you do require some kind of inference efficiency, then I think GPUs or equivalent accelerators will be required. Gotcha. Yeah. The good news is most of the consumer out there at this point has some level of good accelerators. I mean, even the m one, m twos or, you NVIDIA GPUs are embedded in in Yeah. Laptops. Not as powerful, of course, at one hundred, H100s, but, you know, still good enough to run small models. Right. And I totally agree that eventually those LLMs or let's call it an SLM, small I saw. Models will exactly. Yeah. They will find out their spots in our local applications, and we're gonna be able to bring in superpowers to them. Jim, I have a question for you. I think it's gonna be a perfect fit given the Harry Potter example from before. So Sure. We have been talking a lot about prompt engineering. It has become a professional at this point. I think there are several staff prompt engineer job openings on LinkedIn at this point. How do you think it's gonna the field is gonna evolve? Do we feel it's gonna be still important and valuable when every LLM out there would support, say, thirty two k or sixty four k context window, or is it gonna change fundamentally? And we are learning a lot about it, and maybe in six months, it's gonna be utterly useless. Yeah. I'll borrow a quote from. He said English is the new hottest programming language, and I I do completely believe that. And I think as Anjad also mentioned, I like I feel that it's very similar to the vision of Replit. You know, people, software engineers ultimately will just be responsible for specifying what to build. And then Replit or bottles, other codex models will do the how to build part. We just specify what we because always humans will need to know what we want. Right? Ultimately, these things serve humans and benefit us. So we'll always be the one specifying what to do. But as GPT four, f five, and six are coming, they will be better and better on how to do it. And now regarding prompt engineering. Prompt engineering is essentially specifying what to do to the language models, And I think they are becoming more and more natural and intuitive. And, if we look at the older GPT-three, you need very complex prompt engineering. You need some very particular templates, even magical words that make no sense to an average human. You need magical words to coerce certain behaviors from the language model. But now I think with the instruction, gpts, which are gbt three point five and gbt four and also called. Right all of these models are already able to support these intuitive prompts. You don't need to be super awkward in the wording. But that doesn't mean you can just, you know, say whatever you want, and it will do exactly as you want. You need to kind of specify it in in ways that the language model will adhere to. You need to maybe break down a task into smaller pieces it can understand, and maybe it cannot do certain tasks very well if you phrase it in big terms. You need to be super clear about it, or maybe you need to explain it in greater details than you would explain otherwise to a human. So I do think there are still like these little things that make it different from just talking to a human assistant. But the general trend is that prompting will become more and more natural and intuitive. And at some point, I feel that software engineering classes will just be communication classes. Need to work on your communication skill. If you have a team of humans and if you cannot communicate well as a manager, you won't get any software engineering projects done anyway. So maybe we should kind of start start practicing practicing that. And, also, as Anjas said, we'll maybe have a group of AIs working for you. Different AIs may specialize in different languages. I don't know. Someday. Then you need to learn how to talk to them, right, how to kind of have them work together, for your benefit. Then at some point, everybody will be a project manager. Yeah. Yeah. And that will be the point where billions of people will be exposed to the modern AI. As as I was mentioning before, we're not gonna be any more past players exposed to decisions made by ML models. We're going to be actively interacting with AI. And I think that's the direction where we're going. Of course, it's to take time. But I've been excited also by relatively less glamorous announcements such as, you know, the integration of LLMs in Google Docs Suite or in Office Office three sixty five because that's that's really, you know, democratizing access to generating models to to anyone. And I'm sure that the three of us have a lot of bias in our use and allowance because we started from CLIs and playgrounds. So we are forced to think as prompt engineers. But I can't wait to see, like, my family or my nieces to play with Office three sixty five and see how they react in front of this stack. So good times, good times. Let's wrap up with the last question here, I really liked. And, again, it's gonna give us a chance to predict how the future is gonna look like. We we we know that AI is gonna have affect effects on several different verticals. Some of them, it's absolutely the case. Coding has a quintessential example. That's why we're here talking about it today. What are some of the use cases and applications that you expect it to have seen more impacted by AI, but nothing is still happening in there, and you predict, you know, maybe in the next year or two with LLMs or generative models becoming more powerful, we're gonna see, like, this emergence of, you know, this vertical being impacted by AI. Well, I think it's it's like the reverse question is probably easier answer, which is like which verticals are not going to be impacted by AI in the near term. And I think Jim gave us an idea that like, you know, chefs and plumbers and anyone who's doing work with their hands, they're kind of protected a little bit from this. But I think every information work is actually pretty open to massive change based on this technology. I feel like, you know, right now, you know, what Microsoft calls Copilot, They're calling everything Copilot is the dominant use case. It, it, the idea that there's something that's sort of sitting behind your shoulder and it is, you know, typing things out for you completing things out and you have to review them. But I think I think that's like a the least imaginative use of this technology. I think ultimately, anything that can be represented in tokens will be impacted by LLMs. And I don't think that, you know, I don't think they're all going to look the same. I don't think Copilot is the end state of of LLM technology. So if we're right, and this is the new computer as we called it, and it's going to just continue to evolve, and it's going to have short term memory and long term memory in the future, and it's gonna get optimized and be able to do all these different things. I think it's almost hard to imagine, you know, any part of our world that's not going to get remade by this technology. I think if you're an entrepreneur, one, one way to go make a lot of money today is to go look at industries that are boring. And most entrepreneurs would not touch because everyone wants to work on the sexy, exciting new thing. And so go look at insurance, right? Like what kind of processes in insurance can be automated? And you probably gonna be able to build a really big company there. So I think right now we are in the. You know, jokingly, you know, in the early iPhone app era, where a lot of the apps were like fart apps, and some apps are useful. Some of their games like Angry Birds, etcetera. But then you had Ubers and the Airbnb is that really use the medium as natively as possible. And I don't think we've really entered there yet. And if I, you know, if I had all the proper imagination to do that, I would have already been been working on it. And we're trying to, you know, think about it a little differently at Replit. But I think this is like an evolving field. So we're gonna see massive changes over the next eighteen months or so. Totally. I completely agree. I think what we're seeing here AI, I think it's not just a technology or even a significant technology. I think it is a civilizational technology where it will really tear into the fabric of civilization of our economy and society, but it hasn't done that yet. But just imagine as as Anjat brought up this iPhone moment. Right? Like, in two thousand seven when Steve Jobs walk on stage with iPhone, people were just, you know, treating this as something exotic, something cool, and, you know, what can we do with all the full screen. Right? Like, oh, is that gonna change our life? But look at us right now. We're all hooked to the screen. That is iPhone was a civilizational technology. It changed the way we interact. It changed the way we live and, do economic output fundamentally. And I think that is the role of AI. And as Anjat said, some jobs will be impacted less and maybe much later, like the jobs that require system one. But anything that deals with information, they will be impacted. And humans, I think all of us need to learn how to embrace this technology and how to use it to boost our productivities and also to enhance our lives. And I think this is like a new skill everyone everyone will need to pick up. And, hopefully, it's not that hard to pick up because now you just talk to your chatbot. You just need to learn how to communicate. And I honestly don't think that is a very high bar. And I'm glad to see the advancement in AI technology actually lowering the barrier to the technology. If you can process the entire seven books of Harry Potter in four dollars, I think you should do it. I think you should start to use this technology. There's really no excuse not to use it, and this is a civilizational technology. Well, I think we couldn't close this panel with the most, you know, inspirational, insightful quotes from Jim. So thanks a lot. I know I've been a greedy moderator because we went over time, but I just wanted to pick your brain more and hear talking more about it. So thanks for everyone who was also attending online and stayed around until now. It was great. Hope we're gonna be able to do this again maybe one year from today, as we promised each other before, to know what's going on with LLMs. Thanks, everyone. Thanks, Jim. Thanks, Anjan. Thank you. Appreciate it. Thank you so much. Yeah. Thanks, everyone.
Event Recap
Jim Fan has worked in AI for a decade and has collaborated with several prominent AI researchers. He highlights the growth of AI from image recognition to large language models like GPT-4. Amjad shares his background in developer tools and his excitement about applying machine learning and statistical approaches to code.
2:50 - The discussion starts with the recent NVIDIA GTC event, with Jim describing NVIDIA's transition from a hardware provider to an enterprise-focused AI provider. He is mainly excited about NVIDIA AI Foundations, which offer customization services that allow enterprises to create unique use cases with multimodal language models. These models will also help incorporate images, videos, and 3D data into AI systems.
5:45 - Michele highlights the importance of multi-modality and how it grants superpowers in communications with computers. Jim envisions a range of possibilities with multi-modal language models, including being able to interact with more natural human input, enhancing note-taking, and automating home decoration plans.
11:10 - The group takes a deep dive into LLMs, specifically focusing on user experience. Amjad reflects on the newest influx of LLMs on the block, noting how the chat interface of ChatGPT made LLMs accessible to everyone.
21:50 - Michele notes that the size or number of parameters for a specific model is no longer reflective of its power. Jim explains reinforcement learning, self-supervised learning, and constitutional AI.
37:15 - The group considers GPT-4, its strengths and weaknesses, and what it means for the future of generative AI.
40:55 - Models are becoming more and more powerful over several different programming languages - how do we optimize AI for coding? Jim summarizes the GitHub Copilot X launch and the four major features.
48:15 - LLMs are the new computer of this decade. Amjad predicts the future for both beginner coders who are using AI at the start of their journeys, as well as professional developers. Jim explores what the world will look like a year from now.
The future of AI is more exciting than ever. Multi-modal language models will continue to expand and offer endless possibilities. By incorporating various sensory signals in addition to text, AI systems will become more powerful and versatile, leading to groundbreaking innovations in a variety of industries.
Today, the easiest way to harness the power of AI is on Replit. To take you from idea to product faster, we have our own coding AI, Ghostwriter, that you can try for free today. Ghostwriter is like a more powerful ChatGPT that lives in your editor. It has chat, contextual assists, smart code completion, and a proactive debugger. Sign up for Hacker Pro today, and start living in the future.





