US Military Embraces Large Language Models for Military Tasks

US Air Force Colonel, Matthew Strohmeyer, is buzzing with excitement as he ventures into new territory. For years, Strohmeyer has been conducting data-based exercises within the US Defense Department. However, for the first time, he has tested a few large-language models (LLM) to perform a military task, and the results have been impressive.

Results

Strohmeyer shares the results a few hours after providing the initial prompts to the model.

“It was highly successful. It was very fast. We are learning that this is possible for us to do.”

What are Large-Language Models (LLM)?

LLMs, trained on extensive internet data, are instrumental in generating human-like responses and predictions in artificial intelligence. They power generative AI tools like OpenAI’s ChatGPT and Google’s Bard.

The Complete Story

As part of a series of experiments aimed at developing data integration and digital platforms, the US Department of Defense is putting five LLMs through rigorous testing. These exercises, led by the Pentagon’s digital and AI office, involve military top brass and participation from US allies. While the Pentagon remains tight-lipped about the specific LLMs in use, San Francisco-based startup Scale AI has revealed that its new Donovan product is among the platforms being tested.

The potential use of LLMs signifies a significant shift for the military, which has traditionally lagged behind in terms of digitization and connectivity. Currently, requesting information from a specific military division can take hours or even days to complete, with numerous staff members making phone calls or preparing slide decks, explains Strohmeyer.

In one test, an AI tool managed to fulfill a request within a mere 10 minutes. Strohmeyer clarifies that while it may not be ready for widespread implementation yet, the experiment successfully employed secret-level data and could be deployed in the near future.

What’s More?

Strohmeyer reveals that the models have been fed classified operational information to address sensitive questions. The ultimate goal of these exercises is to equip the US military with AI-enabled data for decision-making, sensor utilization, and even firepower.

Numerous companies, including Palantir Technologies Inc. and Anduril Industries Inc., are actively developing AI-based decision platforms for the Pentagon. Microsoft Corp. recently announced that users of its Azure Government cloud computer service would have access to AI models from OpenAI, with the Defense Department among its customers.

Running until July 26, the military exercise will also serve as a test to determine if LLMs can generate entirely new options that officials have never considered before.

Concerns about Large Language Models

While concerns about bias and incorrect information remain, Strohmeyer emphasizes that the Pentagon is conducting these experiments to address these issues. They are actively seeking a strong understanding of information sources and are collaborating with tech security companies to evaluate the trustworthiness of AI-enabled systems.

The Demonstration

In a demonstration utilizing 60,000 pages of open-source data, including US and Chinese military documents, Bloomberg News asked Scale AI’s Donovan about the possibility of deterring a conflict involving Taiwan and the potential outcome of a war.

Within seconds, the system provided a series of bullet points with explanations, stating that direct US intervention with ground, air, and naval forces would likely be necessary but warning of the challenges in swiftly incapacitating China’s military. The system concluded by noting the lack of consensus in military circles regarding the outcome of a potential conflict between the US and China over Taiwan.

As the US military delves into the possibilities offered by large language models, it marks a significant step toward leveraging AI for military decision-making and operational readiness.