July 15, 2024
With no doubts, there has been quite a bit of buzz around LLMs in the last year, if not even more, since OpenAI launched GPT3 and the world went crazy. People started becoming creative on what they could do with it, we’ve seen more and more companies and products taking advantage of the generic nature of the model, and we’ve seen the focus changing from other areas towards building bigger and bigger language models.
All this is great, the industry advances, more and more productivity tools are created and the average day in a life gets better. Productivity goes up, valuation of AI companies goes up, GPU manufacturers take advantage of this, big advancements in research are being made, and ultimately more and more people, with or without experience in AI, claim to be experts in the field. However, there’s one thing that is lacking in this: understanding of what actually is an LLM.
Our goals in this article are to
1) explain what is an LLM
2) discourage you from AI-first thinking and
3) raise awareness about the simplicity of going from 0 to good, but the complexities of going from good to great when building data driven/ML products.
Why would you care about what we tell you? Don’t trust us, trust our background.
We’re Mihai and Tomas, tech founders with extensive expertise in AI, Data Engineering, and Product development, with over 7 years of experience each.
Mihai has worked in both big tech at Google and Bloomberg, and with startups in various industries, has built an AI startup. He pursued a Data Science MSc, and is currently a ML Lead engineer, while working on his AI focused software agency. With a passion in AI from the early days, he understand the smallest details that go in big ML architectures, and is able to build them from scratch.
Tomas loves the cloud and big data architectures. He has worked since 2018 in high intensive data engineering related projects at the academic level at the Institute of Intelligent Systems and Numeric Applications in Engineering and the private sector, ranging from big consulting firms, acquired startups such as Factmata and big telecom like Telia.
AI is nothing new. It has been around since the 20th century with the development of algorithms that are the cornerstone of important “AI” processes nowadays. Just so you have an idea, the Attention Is All You Need article published in 2017 explains the foundations of chat GPT, Claude, Gemini… Transformers.
Everybody and their mom are talking about AI nowadays. Nothing wrong with that. In fact, the more people experiment with AI, the more the industry goes and the better tools are created.
The challenge, though, is understanding who knows what, and who to believe when they tell you about the latest RAG system. It all looks good on a nice presentation or a high level chart. But having a nice looking graphic does not take you that far. Ideally, you want to integrate it in your business, feed it some relevant data, potentially fine-tune it, understand trade-offs, deploy it at scale and improve your customer’s journey as a result.
There’s many moving pieces when it comes to using ML models, and LLMs specifically, in production. It's fairly easy to go from 0 to good. But good is mediocre sometimes. The real challenge is to go from good to great. And not anyone can do it.
We've seen the following many times: “I have this data on my drive, can we throw it into an LLM and use it to classify some data in my business with an accuracy of 95%?”. Many things can go wrong having the belief that this can be easily achieved.
Asking the real question here. Many people see the hype, many open ChatGPT and think they have seen it all, and many let themselves be guided by the toxic marketing and think it’s a tool to solve all problems.
The main risk worth talking about is the perception that LLMs are a magic box, where data can be thrown and exceptional results can be achieved, regardless of the problem a business is trying to solve. We’ve seen this belief with clients many times, and while ML and in particular language models can achieve spectacular performances, it can lead to false expectations.
So what’s an LLM? Numbers. A huge chain of mathematical operations, that when exposed to the right data, converge to a place where as a whole, are able to generate meaningful results. LLMs are neural networks with a single task: to predict the next word or a series of next words given some input. And trust me when I tell you, it’s all numbers.
When you ask a question, the model has no clue of what that means. Before it’s even passed to the model, it gets encoded in a series of numbers, something the model has seen before. We also call those “tokens”. Having those tokens, the model predicts the most likely next series of tokens, that in turn get converted back to words or sequences. That’s it, this is how a LLM works at a high level.
The purpose of this introductory article is to stay high level. We’ll go into the depths of how they actually work later in the series. However, it’s worth noting that one of the most important pieces of research behind LLMs is the transformer, a Machine Learning architecture proposed in 2017, in the paper Attention is All You Need. What happened later is more or less a variation of the paper, plus a massive scaling of both architectures and training capabilities, that led to the useful models we know today.
”AI” and “LLMs” are nice buzzwords that certainly look good on a startup pitch deck. But do you actually need to use an LLM to make your business go to the next level?
Nothing wrong with using AI in as many situations as possible. Keeping things simple is key to getting good products off the ground. LLMs are clearly a breakthrough in the last 2 years. But overusing them for the sake of displaying “AI'' on your landing page is a poor decision.
Let’s craft a checklist of when it might make sense to use an LLM. It’s by no means 100% comprehensive and complete. Everyone has a different view on the subject.
Anyone can get access to a good language model. Just go on ChatGPT and start using it. No tech skills required, no code, not much work, no finetuning. And you end up with a good language model. But good is far from great, and aiming for the highest level of performance is where you aim in business.
In this section, we’ll go through what’s needed to get way more performance than a standard, one-size-fits-all language model can deliver. And hopefully, solve some misconceptions around the “magic button” that makes an LLM boost your business significantly.
Going from good to great is significantly slower than going from 0 to good. It requires constant R&D cycles, good decisions, and what’s more important, deep expertise on what to do when things don’t go your way. Each cycle might improve your product by 1% or lower, but staying committed is necessary to getting to a point where you can say: “it’s not yet a great product, but it’s getting closer”.
You might have heard the notion of prompt engineering. This is not something LLM specific, but rather linked to communication. Imagine 2 people going to the same restaurant. The first one says “I’m hungry”, the other says “I’d like a 3 course menu, centered around Italian cuisine”. Both of them want the same outcome, but one knows how to communicate it better than the other. In LLMs, we call this prompt engineering. Basically, how do you frame your question in a way that makes the mathematical model behind understand it better and then be able to give you something relevant in return.
Going from good to great requires you to follow some rules when trying to ask a language model to solve a problem or answer a question.
The average open source language model (take GPT-4 as an example) is very generic. It knows a bit of everything, and is able to give good answers in many areas. The more access you have to domain specific, maybe proprietary information, the higher the chances are that your model will know more of what you ask for. Having this data stored, processed, curated and fed to the model in an efficient and relevant way is key here. It takes more than just calling an API to design such a data pipeline and use it effectively.
You have your proprietary data, nicely stored and structured. Now what? You need to feed it in the model somehow. RAG is one of the most popular approaches to do this. Instead of asking the model a question, without any context, RAG systems augment your question, and prepend it with relevant information from your data store.
It’s like asking a question for an exam, having instant access to all the curriculum. You’ll be able to give a more informed, likely correct answer. RAGs are extremely useful, and can easily take a model from good to great, if implemented and used properly. However, there are complexities that need to be dealt with, from data pipelines, to data storage and live inference together with a language model.
Much of the hype lately was going bigger and bigger with models. Multiple companies competing over who has the largest number of parameters and what they can achieve with them. There’s another side of the story. Those models are highly capable, however they are heavy on memory, need very expensive and capable infrastructure to run on, and might not be affordable for the average startup (to run and maintain it).
The other side of the story is companies going smaller and smaller in language models. Instead of building a single monolith to rule them all, they build a series of smaller, more specialized models. You might wonder how they decide which one to use at runtime. Apart from those many small and specialized models, there’s another master node that has a single task. Take a query, and distribute it to the model that most likely has the knowledge to answer it properly (this is what Mistral does).
Doing this, not only you reduce the cost and runtime, but you also make it more accessible to run language models on smaller infrastructures, while also opening up the door for people to build even more custom LLMs that could, at some point, be a part of the bigger system.
With the whole hype around LLMs, it’s easy to fall into the trap of wanting to use them in any product. What’s worse, listening to overnight experts might get you stuck in a place where the deliverable is good, but not great.
Our goal through this series of articles is to educate entrepreneurs into finding the right tools and people to implement them, so they can go from good to great with language models, and actually benefit.
If someone sells you “magic AI tools”, stay away. Don’t believe people when they tell you your product must use AI. It might not need to for the first few iterations. However, laying the foundation of your product with an AI future in mind is crucial, so when the time comes, you can seamlessly make use of LLMs, the right way.
Stay tuned to our series to learn more about ML and data, and how it can actually make an impact on your business. Don’t just ride the wave, focus on going from good to great in your AI journey.