What is generative AI?
It is a type of artificial intelligence (AI) which can write new content in response to user instructions (prompts).
It uses a large language model (LLM), which is a type of computer programme which can analyse large amounts of information, to learn from its training data to recognise patterns such as text, images or code. It finds patterns in the data to learn its underlying structures, styles, relationships and rules. For text it learns grammar, facts and writing style. For images, it learns shapes, colours, textures and how objects look.
In response to the user's prompt, it creates new content, using the patterns it's learnt from the training data. It predicts what word, image or sound is most likely to come next in the output it generates in response to the prompt.
For example, asking a generative AI tool “what words do you associate with granny smith apples” may produce words like ‘green’ ‘fruit’ ‘tart’ ‘crunchy’ ‘crisp’, because those words are commonly used when describing granny smith apples, and the tool will have likely been trained on lots of materials using those words. It may provide additional information on recipes, cultivation or gardening advice.
Generative AI tools all specialise in different tasks. It is important to use the right tool for the job.
It uses a large language model (LLM), which is a type of computer programme which can analyse large amounts of information, to learn from its training data to recognise patterns such as text, images or code. It finds patterns in the data to learn its underlying structures, styles, relationships and rules. For text it learns grammar, facts and writing style. For images, it learns shapes, colours, textures and how objects look.
In response to the user's prompt, it creates new content, using the patterns it's learnt from the training data. It predicts what word, image or sound is most likely to come next in the output it generates in response to the prompt.
For example, asking a generative AI tool “what words do you associate with granny smith apples” may produce words like ‘green’ ‘fruit’ ‘tart’ ‘crunchy’ ‘crisp’, because those words are commonly used when describing granny smith apples, and the tool will have likely been trained on lots of materials using those words. It may provide additional information on recipes, cultivation or gardening advice.
Generative AI tools all specialise in different tasks. It is important to use the right tool for the job.
What data does generative AI use?
Often LLMs are trained on data from the open web. This excludes non-public data held on secure networks, for example, paywall content or content which needs authentication to access. More specialist LLMs maybe be trained on licenced content or specific content for the job e.g. publisher websites or internal guidelines. Although LLMs are primarily trained on unstructured data, they can also learn from spreadsheets and metadata. This is called structured data. It is organised into predefined format to make it easier to search and analyse by computers e.g. OPACs.
When AI outputs are factually incorrect, misleading or entirely fabricated it is sometimes called a hallucination. These occur when the LLM is missing information, has been trained on incorrect information or information which is disputed. Answers are also based on probability, so incorrect returns may happen if something doesn’t follow the likely predicted pattern. To minimise hallucinations, LLMs use synthetic data. This is data generated by the LLM itself that tries to mimic real data. Giving more clarity to your prompt, but expanding on instruction, trying synonyms or pointing the LLM to the source of the information may also help.
Retrieval Augmented Generation (RAG) also helps to reduce hallucinations by using additional information when it predicts what should come next in an output. This additional information may come from a company's internal data, such as documents, emails, datasets etc.
When AI outputs are factually incorrect, misleading or entirely fabricated it is sometimes called a hallucination. These occur when the LLM is missing information, has been trained on incorrect information or information which is disputed. Answers are also based on probability, so incorrect returns may happen if something doesn’t follow the likely predicted pattern. To minimise hallucinations, LLMs use synthetic data. This is data generated by the LLM itself that tries to mimic real data. Giving more clarity to your prompt, but expanding on instruction, trying synonyms or pointing the LLM to the source of the information may also help.
Retrieval Augmented Generation (RAG) also helps to reduce hallucinations by using additional information when it predicts what should come next in an output. This additional information may come from a company's internal data, such as documents, emails, datasets etc.
How does it learn?
Learning can be supervised or unsupervised. In supervised you provide examples alongside accurate answers[AF1] . It will then use these to compare and learn differences. This method improves accuracy as you are providing clear guidelines for it to follow. It is useful of tasks like classification, image recognition, spam detections and regression (predicting trends). It takes time to train and label the data, has limited flexibility and is unable to cope if presented with something it hasn’t seen before.
Unsupervised models look for similarities likes clusters of data or structures and tries to make sense of it. It can be used to simplify data, whilst keeping important patterns or detecting anomalies in data that don’t fit a pattern. Generative AI uses this model. This model is less predictable and harder to evaluate. Often it will need human input to check and make sense of the data.
Unsupervised models look for similarities likes clusters of data or structures and tries to make sense of it. It can be used to simplify data, whilst keeping important patterns or detecting anomalies in data that don’t fit a pattern. Generative AI uses this model. This model is less predictable and harder to evaluate. Often it will need human input to check and make sense of the data.
How does copyright affect my use of generative AI tools?
All content created by humans is automatically copyrighted. UK copyright law includes a text and data mining exception for non-commercial purposes. This exception can be used for automated analytical techniques to analyse text and data for patterns, trends and other useful information. Text and data mining usually requires copying of the work to be analysed.
This exception does not cover alteration / manipulation of the original content which happens as a result of the use of generative AI. There are some models which are ethical and have got permission or paid licence fees to the copyright owner for the copyright for inclusion in their LLM.
Some LLM will absorb uploaded content to the LLM as part of the training data, which is potentially another breach. Some procured models will analyse the content but not absorb it into the LLM e.g. NHS Copilot licence.
This exception does not cover alteration / manipulation of the original content which happens as a result of the use of generative AI. There are some models which are ethical and have got permission or paid licence fees to the copyright owner for the copyright for inclusion in their LLM.
Some LLM will absorb uploaded content to the LLM as part of the training data, which is potentially another breach. Some procured models will analyse the content but not absorb it into the LLM e.g. NHS Copilot licence.
How does copyright affect my own AI generated output?
Check the licensing for the tool to see whether copyright remains with the creator or the tool developer. If you are joint working check any service level agreement for a statement about copyright and ownership. Check whether the generated content is the same or very similar to original content. If it is the same or substantially similar, it may infringe copyright.
Can I upload an article/document to a genAI tool for it to summarise it for me?
No, unless you have obtained permission from the content owner. Some Creative Common content may be permissible to use depending on the licence used.
Some developers may only use licensed content licensed for training its large language model. Please check the terms and conditions of use. If you are using a generative AI tool on your organisational data, you are likely to be able to upload and summarise an internal document, but IT may restrict access to sensitive data.
Some developers may only use licensed content licensed for training its large language model. Please check the terms and conditions of use. If you are using a generative AI tool on your organisational data, you are likely to be able to upload and summarise an internal document, but IT may restrict access to sensitive data.
Can I upload bibliographic data (article titles, journal titles, publication dates, keywords/MeSH) for tools to analyse for me?
No, unless you have obtained permission from the content owner.
Can I upload an document I have written myself to a genAI tool?
Yes, you own the copyright. You should note that your content may be used for future training unless you are using a tool that expressly does not do so.
Can I upload a Trust document I have written myself to a genAI tool?
Yes, but ask Information Governance first. You should note that your content may be used for future training unless you are using a tool that expressly does not do so.
How can I use generative AI tools in a more sustainable way?
Generative AI tools consume energy in several ways. When the large language model is being trained and being fine-tuned, it consumes a lot of energy; up to 33 times the amount consumed by computers running task-specific software (2023). For outputs, the amount of energy consumed will vary depending on the task being performed.
LLMs, both for training and their operation, require a lot of processing on the servers housed in data centres. The servers are cooled with water, which may have an impact on the local environment. Many AI centre are aiming to use 100% renewable energy in the near future; this could be a factor when considering for procurement. To save energy, action can be taken by the developers of the large language model and by you when you use the tool(s) using the LLM.
By applying a method called quantisation (using fewer decimal places to round down the numbers used in calculations), the energy usage of the model dropped by up to 44% while maintaining at least 97% accuracy compared to the baseline1. This is because it is easier to get to the answer, in much the same way as most people could calculate two plus two much more quickly than calculating 2.34 plus 2.17, for example. (2025).
Quantisation, combined with cutting down user prompt and AI response length from 300 to 150 words, could reduce energy consumption by 75%. The biggest gains in energy efficiency can be achieved by switching from large models to smaller, specialised models in certain tasks such as translation or knowledge retrieval” (2025).
Newer AI models such are DeepSeek use a Mixture-of-Experts (MoE) architecture, which activates only the relevant sub-models for each specific task – thereby reducing the computing power requirements. It requires one-tenth of the GPU hours (the time that GPUs operate at full capacity during model training), used by Meta's model, resulting in a reduced carbon footprint, decreased server usage and lower water requirements for cooling systems. There are cautions that this benefit may be short lived as global use continues to rise.
Do you need to use generative AI tools for the task you're planning to use it for?
LLMs, both for training and their operation, require a lot of processing on the servers housed in data centres. The servers are cooled with water, which may have an impact on the local environment. Many AI centre are aiming to use 100% renewable energy in the near future; this could be a factor when considering for procurement. To save energy, action can be taken by the developers of the large language model and by you when you use the tool(s) using the LLM.
By applying a method called quantisation (using fewer decimal places to round down the numbers used in calculations), the energy usage of the model dropped by up to 44% while maintaining at least 97% accuracy compared to the baseline1. This is because it is easier to get to the answer, in much the same way as most people could calculate two plus two much more quickly than calculating 2.34 plus 2.17, for example. (2025).
Quantisation, combined with cutting down user prompt and AI response length from 300 to 150 words, could reduce energy consumption by 75%. The biggest gains in energy efficiency can be achieved by switching from large models to smaller, specialised models in certain tasks such as translation or knowledge retrieval” (2025).
Newer AI models such are DeepSeek use a Mixture-of-Experts (MoE) architecture, which activates only the relevant sub-models for each specific task – thereby reducing the computing power requirements. It requires one-tenth of the GPU hours (the time that GPUs operate at full capacity during model training), used by Meta's model, resulting in a reduced carbon footprint, decreased server usage and lower water requirements for cooling systems. There are cautions that this benefit may be short lived as global use continues to rise.
Do you need to use generative AI tools for the task you're planning to use it for?
Why can generative AI tools be inconsistent in their responses to the same prompt?
You can ask the same model the exact same question and get a slightly differently worded response. The AI may interpret the prompt in different ways, especially if the prompt is open ended or vague. Most tools also have settings to make them spontaneous and human-like. This degree of randomness is sometimes called ‘temperature control’ of output.
AI tools may apply a deterministic model to its learning and so will predict a more consistent response to the same prompt. Some may take the recent conversation history into account to provide a context for the response. Other tools will apply a stochastic model which adds randomness and uncertainty to the learning, a range of possible outcomes may be predicted, leading to a different response to the same prompt. Creating specific prompts with custom instruction and fine-tuning will help to reduce variation.
Ultimately LLMs are mathematically generating likely responses on a token by token basis and always generating the same response would be a noteworthy situation.
If it’s important for a generative AI tool to be consistent in its output for the same prompt, test the same prompt several times with different tools, and choose a tool that provides the consistency you’re looking for.
AI tools may apply a deterministic model to its learning and so will predict a more consistent response to the same prompt. Some may take the recent conversation history into account to provide a context for the response. Other tools will apply a stochastic model which adds randomness and uncertainty to the learning, a range of possible outcomes may be predicted, leading to a different response to the same prompt. Creating specific prompts with custom instruction and fine-tuning will help to reduce variation.
Ultimately LLMs are mathematically generating likely responses on a token by token basis and always generating the same response would be a noteworthy situation.
If it’s important for a generative AI tool to be consistent in its output for the same prompt, test the same prompt several times with different tools, and choose a tool that provides the consistency you’re looking for.