Harnessing the Power of Small Language Models in Industrial IoT

Posted by on Tuesday, May 14, 2024

Harnessing the Power of Small Language Models in Industrial IoT

The industrial Internet of Things (IIoT) is revolutionizing how industries operate, bringing connectivity and data-driven insights to every corner of the manufacturing process. As IIoT devices proliferate, the need for robust, efficient, and intelligent data processing becomes increasingly critical. This is where small language models come into play, particularly in the context of intelligent edge scenarios where offline capabilities are paramount.

The Edge of Innovation: Intelligent Edge Scenarios

In the realm of IIoT, the “edge” refers to the physical location where data is generated and processed, close to the source. This could be a manufacturing plant, an oil rig, or any other industrial site. Edge computing reduces latency, enhances data privacy, and ensures real-time processing capabilities by minimizing the reliance on cloud-based data centers.

However, deploying sophisticated AI models at the edge presents unique challenges. These environments often have limited computational resources and require models that can operate efficiently offline. This is where small language models come into their own, providing powerful capabilities without the heavy resource demands of their larger counterparts.

The Role of Small Language Models

Small language models, as their name suggests, are designed to be compact yet efficient. They are capable of performing a wide range of natural language processing (NLP) tasks, including text classification, translation, and sentiment analysis, without the need for extensive computational power. This makes them ideal for deployment in edge scenarios.

Offline Capabilities

One of the most significant advantages of small language models is their ability to function offline. In industrial settings, connectivity can be intermittent or non-existent. Small language models can process data locally, ensuring that critical operations continue uninterrupted even when network connectivity is lost. This capability is crucial for maintaining the reliability and efficiency of industrial processes.

Efficiency and Performance

Despite their smaller size, these models are optimized for performance. They can process large volumes of data quickly and efficiently, providing real-time insights and enabling rapid decision-making. This efficiency is particularly valuable in industrial environments where delays can result in significant downtime and financial loss.

The ONNX Runtime: A Game Changer

The Open Neural Network Exchange (ONNX) runtime is a key enabler for deploying small language models at the edge. ONNX is an open-source format for AI models, providing interoperability between different frameworks and hardware platforms. The ONNX runtime is designed to be highly performant and scalable, making it an excellent choice for edge deployments.

Benefits of the ONNX Runtime

  1. Interoperability: ONNX models can be trained in one framework (e.g., PyTorch) and then deployed in another (e.g., Onnx Runtime), providing flexibility and choice.
  2. Performance Optimization: The ONNX runtime is optimized for various hardware platforms, ensuring that models run efficiently on edge devices with limited resources.
  3. Scalability: The runtime supports a wide range of devices, from small sensors to powerful edge servers, allowing for scalable deployment across different industrial environments.

Real-World Application

Consider a manufacturing plant where equipment generates vast amounts of data. Using an SLM like the Microsoft PHI-3 model deployed via the ONNX runtime, this data can be processed locally to detect anomalies, predict maintenance needs, and optimize operations. Even if the plant loses internet connectivity, the PHI-3 model continues to analyze data and provide actionable insights, ensuring smooth and efficient operation.

How to Start

Microsoft provides with Microsoft.ML.OnnxRuntimeGenAI an NuGet package, that makes to use of ONNX models very easy. The current version 0.2.0-rc6 is able to load and use the Microsoft PHI-3 model which generates really impressive results. To run the model on CPU only we just have to use additionally the Microsoft.ML.OnnxRuntimeGenAI.Managed NuGet package but we can also use Microsoft.ML.OnnxRuntimeGenAI.Cuda or Microsoft.ML.OnnxRuntimeGenAI.DirectML based on the hardware capabilities of the target device. If there is the need of more fine grain control the NuGet packages Microsoft.ML.OnnxRuntime and Microsoft.ML.OnnxRuntime.Extensions come to action. In the repository ONNX-HuggingFace-Wrapper I used those packages to demonstrate the use of local models and wrapped it into an Hugging Face compatible API for text generation, chat completion as well as embeddings generation. With the HuggingFace abstraction it is easy to write the own application with Semantic Kernel which makes it easy to build whole AI empowered applications (e.g. Implementing Retrieval Augmented Generation architecture).


Small language models are transforming the landscape of industrial IoT by enabling intelligent edge solutions. With their offline capabilities and efficiency, these models are perfectly suited for the unique demands of industrial environments. Coupled with the ONNX runtime, they provide a powerful and flexible solution for bringing AI to the edge, ensuring that industries can harness the full potential of their data, regardless of connectivity constraints.

As IIoT continues to evolve, the integration of small language models at the edge will play a pivotal role in driving innovation, enhancing operational efficiency, and delivering real-time insights across a wide range of industrial applications.