The Carbon Impact of Large Language Models: AI's Growing Environmental Cost

A guide to the Energy Demands and CO2 Emissions of Leading LLMs in a Sustainability-Conscious Era

Dec 10, 2023

Introduction

The rise of LLMs has brought to the forefront an essential yet often overlooked aspect: their environmental impact. As these models grow in complexity and usage, they consume vast computational resources, leading to significant energy use and corresponding carbon emissions. This aspect of LLMs is crucial, especially in an era where sustainability and climate change are among the top global concerns.

Basics of Carbon Footprint and Electrical Power

Climate change, a critical global issue, is primarily driven by the emission of greenhouse gases, notably carbon dioxide (CO2). This gas and others like methane and ozone play a significant role in the greenhouse effect. The result is a gradual increase in global temperatures, leading to various environmental impacts.

A considerable chunk of CO2 emissions originates from human activities like burning coal and gas to produce electricity. As our reliance on technology grows, so does the importance of data centers that house the compute that enable those technologies. Machine learning tasks like training and inference, increasingly performed on cloud computing instances, consume substantial energy. Measuring and reducing the emissions from these computing tasks is crucial for minimizing the environmental impact of the AI field.

Understanding the carbon footprint of machine learning is essential here. Carbon footprint refers to the total amount of greenhouse gases, including CO2, that are emitted directly or indirectly by an individual, organization, event, or product. In the context of ML and data centers, it encompasses not only the emissions from the energy used to run ML computing hardware like GPUs but also the power to run the data centers, cool the servers, along with indirect emissions from the production and disposal of the hardware.

Electricity consumption is commonly measured in kilowatt-hours (kWh). One megawatt-hours (MWh) equals one thousand kilowatt-hours. However, not all power is generated in the same way. For instance, power can be generated by clean energy sources like wind and solar or by burning coal, which produces significant carbon emissions. Often, a single grid will pull power from multiple sources depending on their availability and the system's load.

Map of countries where models were trained, the main energy source and the carbon intensity. Taken from “Counting Carbon: A Survey of Factors Influencing the Emissions of Machine Learning” by Alexandra Sasha Luccioni et al.

This brings us to the concept of "Location-based marginal carbon emissions." This measurement reflects the amount of CO2 emitted per unit of electricity consumed, varying significantly based on the energy mix of the local grid. Since many data centers are trying to shift to renewable energy sources, the same amount of power will have a different carbon footprint depending on the data center's location.

The Factors Affecting Carbon Footprint of ML Models

Five key elements affect the carbon footprint of ML workloads: hardware, training data, model architecture, training duration, and the location of data centers.

Hardware

The type and efficiency of hardware used to run ML models, particularly GPUs, play a significant role in energy consumption. While newer hardware is generally more energy-efficient, offering more computations per watt of power consumed, ML workloads, particularly LLM training, often require thousands of state-of-the-art GPUs running for months, thus negating much of the efficiency gains.

In addition to powering the servers, data centers also require electricity to run infrastructure like cooling and lighting. It's also worth noting that the carbon footprint of hardware includes the energy consumed during manufacturing and disposal. However, it's difficult to measure as multiple organizations share the hardware for several years.

Selecting state-of-the-art, energy-efficient hardware is crucial for reducing the carbon footprint of AI operations, especially for LLM applications.

Training data

The size and complexity of datasets used for training models are directly proportional to the energy required for processing them and, consequently, the carbon footprint of the training process. The training datasets for LLMs are colossal. This magnitude of data processing demands significant energy resources, elevating the carbon footprint. Therefore, optimizing the size and complexity of datasets without compromising the quality of the model is a balancing act essential for sustainability.

Model architecture

The architecture of neural networks significantly impacts their energy consumption. Larger networks like LLMs generally provide better performance but at the cost of higher computational requirements for both training and inference. Simplifying model architecture without significant loss in performance can lead to substantial energy savings. For instance, in the case of LLama 2, the amount of carbon emitted during training was proportional to the size of the network. While the smallest LLama 2 model with 7B parameters emitted 31.22tCO2eq, the largest 70B parameter model emitted nearly ten times as much.

Training duration

Training duration, measured in epochs or iterations, directly affects energy usage. More epochs or iterations usually mean better model performance but increased energy consumption. Training duration is measured in GPU hours, where 1 hour is equivalent to running a single GPU for 1 hour. If you train across 10 GPUs, you will incur 10 GPU hours in 1 hour. Larger models require more GPU hours for training. While Llama 2 7B required 184,320 for training, Llama 2 70B required 1,720,320 hours (nearly ten times more). Therefore, optimizing the number of epochs to achieve the desired accuracy without additional cycles is vital for energy efficiency.

Location of data centers

Finally, the location of data centers hosting AI operations significantly influences their carbon footprint. This factor is primarily due to the carbon intensity of the electricity source powering these centers. Data centers powered by renewable energy sources, like wind or solar, have a much lower carbon footprint than those relying on non-renewable sources like coal or natural gas. Therefore, the geographical placement of data centers in regions with abundant renewable energy resources is a strategic decision for reducing LLM's environmental impact.

Estimated carbon emissions for training a BERT model on 8x V100s for 36 hours in different regions. Central US and Australia have some of the highest emissions while Norway and France have the least. Taken from “Measuring the Carbon Intensity of AI in Cloud Instances“ by Jesse Dodge et al.

While the factors affecting the carbon footprint of ML models have remained unchanged, the scale at which LLMs operate makes them a significant contributor to the carbon footprint. Let's look at the carbon footprint of a few popular LLM models.

Case Studies: Carbon Footprint and Power Requirements of Popular Foundational Models

GPT-4:

GPT-4's training process involved approximately 25,000 Nvidia A100 GPUs [1]. With a Thermal Design Power (TDP) of 400 watts each, these GPUs form the backbone of the computational power needed for training LLMs today.

When we multiply the TDP of a single A100 GPU by the number of GPUs used, we get a power consumption of 10,000 kWh for all 25,000 GPUs combined. For GPT-4's 100-day training period, this adds up to a colossal 24,000,000 kWh.

However, these figures only represent the tip of the iceberg. In real-world scenarios, data centers housing these GPUs are not perfectly efficient. There are additional power demands from supporting hardware like CPUs, memory, and lighting and cooling infrastructure. We add 20% to our initial power calculations to account for this [2]. This adjustment brings our total power consumption to an even more astounding 28,800,000 kWh.

The environmental implications of such energy consumption are significant. Considering the carbon emission factor of Azure West - where GPT-4 was likely trained, given OpenAI's partnership with Microsoft - stands at 0.24 kg equivalent CO2 per kWh, we can estimate the total carbon emissions for the training of GPT-4 [3]. This calculation results in approximately 6,912,000 kilograms (or 6912 metric tons) of CO2 equivalent emissions.

To put that into perspective, that is the equivalent of driving a gasoline car for nearly 18 million miles or the equivalent of powering more than 1300 homes for one year [4].

LLama 2:

Llama 2's training process incorporated a series of models, with their sizes ranging from 7 billion to 70 billion parameters. In total, 3,311,616 GPU hours have been spent on training these models. Summing up these emissions, LLaMA 2's total carbon footprint amounted to approximately 539 tCO2eq. It is important to note that this figure includes all experiments during the training phase, not just the final trained model.

LLama Training Carbon Emissions. Taken from “LLaMA: Open and Efficient Foundation Language Models“ by Hugo Touvron et al.

LLaMA 1 also had a range of models, but with a comparatively lower carbon footprint. The total carbon emissions for the LLaMA 1 series were around 300 tCO2eq. LLaMA 2 emitted nearly 539 tCO2eq, almost 80% more than the 300 tCO2eq emitted by LLaMA 1. This increase can be attributed to the more extensive and diverse training experiments conducted in the LLaMA 2 series, reflecting the impact of the evolving complexity and scale of LLM model training.

LLama 2 Training Carbon Emissions. Taken from “Llama 2: Open Foundation and Fine-Tuned Chat Models“ by Hugo Touvron et al.

While significantly less than GPT-4, the total carbon emitted by the training process of both Llama models is equivalent to powering 163 homes for a year. Over 1000 acres of U.S. forest land for a year is required to sequester the generated carbon [4].

It is crucial to mention that for GPT-4, we need comprehensive data on the total carbon emissions, including experimental phases. The figures for GPT-4 often revolve around speculations and estimations based on leaked hardware configurations and training durations. This differs from Llama 1 and LLama 2, which contained their environmental impact figures in their published paper.

Stable Diffusion v1:

Another work that reported their environmental impact was Stable Diffusion v1 [5]. The training process utilized 200,000 hours of Nvidia A100 PCIe 40GB GPUs. AWS was chosen as the cloud provider, with the computing tasks being executed in the US-East region. The estimated carbon emissions for training and development were about 15 metric tons of CO2 equivalent. This is equivalent to driving nearly 40k miles in a gas-powered car or powering three homes for a year [4].

Conclusion

LLMs have opened new technological horizons, offering unprecedented natural language processing, creativity, and problem-solving advancements. However, their development and operation come with a substantial carbon footprint, as evidenced by the extensive resources required for training and maintaining these models, posing severe environmental concerns.

This situation calls for a careful balancing act. On one side, there's the potential for LLMs to drive innovation, enhance efficiency, and even aid in solving complex global issues. On the other, there's a pressing need to address their environmental impact. It's about striking a balance where we can harness the benefits of these AI models while actively working to minimize their carbon footprint.

Appendix

[1] https://archive.md/2RQ8X

[2] How Microsoft measures datacenter water and energy use to improve Azure Cloud sustainability

[3] MLCO2 Impact Calculator

[4] EPA Greenhouse Gas Equivalencies Calculator

[5] High-Resolution Image Synthesis With Latent Diffusion Models by Rombach et al.

ScaleDown

Discussion about this post