As part of our recent Unlocked program, connecting business leaders across the ECI portfolio, we recently welcomed Kat James from Faculty, Europe’s leading applied AI firm and OpenAI’s first technology implementation partner. Kat provided a teach-in on how to implement generative AI effectively and securely. Here we capture a Q&A hopefully answering everything you’ve ever wanted to know about Generative AI!
How does a large language model actually work?
- Effectively an LLM predicts the next word in a sentence, so while it appears good at completing specific tasks or searching for answers to questions, it is actually just identifying the word which has the highest likelihood of coming next being right based on a model with billions of parameters and half a trillion words of training data. It’s not looking up answers, the model doesn’t know if it is getting the answer right. It is closer to Google Translate than Google Search.
- If it feels like the market for generative AI has suddenly taken off, it’s because it has. For a range of tasks, from reading comprehension to language understanding, it is in the last few years that AI has been able to outperform humans at these tasks, so there has been a sudden acceleration in potential use cases.
What are some risks when using generative AI?
- Hallucinations, i.e. providing false or nonsensical responses to queries
- Bias around data, the model is only as good as the information it is given, which often has an intrinsic bias
- IP around data is still a thorny issue, who owns the outputs, and are you happy providing information to OpenAI?
- It is frozen in time, trained on data from 2021, so it isn’t regularly updated
How does a business implement Generative AI?
- The first step is working out what the right use case is for your business, and whether it’s worth investing. Faculty’s advice is that the most valuable areas to focus on are those that are not sensitive to small errors – like sentiment analysis or knowledge management – and things that are language heavy. There are better tools out there for solving numerical problems.
- Once you have identified a use case there would typically be a discovery phase to understand data availability and refine the requirements for the LLM. This is generally followed by a proof-of-concept phase where the feasibility is tested, different models explored, and the likely impact/opportunity understood. If this progresses well then next would come an implementation phase to productionise the LLM and integrate it into business systems and processes, including establishing approaches for ongoing monitoring, maintenance, and improvements to the model.
- Due to the risks around hallucination and bias, having a human in the loop is seen as best practice at this point in time to minimise business risks. For example, if legal firms are using it to generate contracts, it is important that they are reviewed by a human and not automatically routed directly to a client.
- When it comes to system integration, there are four ways LLMs are usually integrated:
- Interactive: Users interface directly with pre-trained models.
- Integrated: Put generative capabilities into existing tools – e.g., the Slack plugin
- Customised: Refined/optimised for organisations
- Fully integrated: Models are deeply integrated into existing ML models.
What are some common areas that businesses are starting to use it for?
- Customer service is a key area, in particular triaging, chatbots, sentiment analysis and call summarisation.
- Knowledge management – the tool can summarise unstructured data, turn it into easy-to-digest content and help surface hidden insights.
- E-commerce – personalised search, product categorisation and translation.
How much does generative AI cost?
Unsurprisingly it does depend, but there are three main components to the costs:
- You are charged both for sending the prompt and for the response you receive, with the charge being based on the volume of data inputted/outputted.
- The cost is based on tokens, and as a rule of thumb, a token is around 4 characters or roughly ¾ of a word. If you have a 30-minute discussion to input, that will likely be 6.5-9.5k tokens.
- Token cost depends on what model you are using, so for example, if you were using GPT3 the cost would be about $0.002 per 1k tokens, but if you wanted to use GPT4, that increases to $0.12 per 1k tokens. You don’t need the latest model for many use cases, so it’s worth considering the balance between price and performance on a case-by-case basis as the differences in cost can be material.
- The third cost for many, but not all, LLM deployments is for finetuning the model, where you use information specific to your business to optimise the LLM. The costs here vary significantly by use case and the level of fine-tuning required.
How much data do you need for finetuning a LLM?
- While the more data you use the better it will be, the relevance of tuning data is likely to be more important than volume. Relevant data might be things such as documents that capture your business’s tone of voice, company policies, etc. It is also worth keeping data back to use as a test, to see how well it is performing as you go through the finetuning process.
Is it worth deploying your own LLM, or will other software products soon have their own plug-ins for customers?
- There is little doubt that longer term you will see many software platforms release generative AI plugins for customers to use. These will likely cost less than developing and implementing your own fine-tuned model. So, the question all depends on your use case, what the expected impact is on efficiency or service quality, how bespoke your problem is, and how beneficial the first-mover advantage will be. If the use case is broad and the ROI is low, it may be worth waiting.
Are OpenAI’s models the best ones to use?
- Not always, it’s very dependent on the use case. As an example, when creating embeddings (where you take text and translate it into a numerical representation to feed into downstream processes) it is worth exploring other open-source models such as BERT alongside the OpenAI embedders, as they may offer sufficient performance at a lower cost.
- You may also opt to use a model that has been developed specifically for use in your sector, to reduce the amount of finetuning required, for example, FinBERT which is aimed at the finance sector and so trained on financial documents.
How do I make sure implementation is safe?
- It is important to build in safety from the outset. Asking yourself questions such as: will the outputs be explainable and monitored to ensure the output can be trusted? Are the outputs going to avoid unfair bias? Are measures in place to protect privacy so that sensitive data cannot be extracted? And is it robust, with the risk of errors mitigated?