Embedded Neural Networks

nn pix.jpg

Analyzing Data Locally Within Devices

With the increasing performance and power efficiency of embedded processors, there comes the ability to embed sophisticated machine algorithms in deeply embedded systems, bringing computing and machine learning closer to where sensors gather data.
 
Reasons to move to data analysis within the device (local processing) include concerns regarding:

  • Transmission Bandwidth and link budget improvement

  • Power consumption balancing

  • Data privacy and localization

  • Real-time applications

Embedding the machine intelligence in the sensor--or a portion of the implementation--allows the potential reduction of information that needs to be communicated through the wireless network allowing the use of low data rate, long-range wide area networks. This reduction enables the embedding of the sensor into link budget compromising conditions such as near the ground, in basements, tunnels, buried cabinetry or enclosed areas.

Furthermore, the localization of data provides for increased security, localizing the access to potentially sensitive data and transmitting only information which that will not compromise local security. In addition, the user can maintain full control over the sensitive data, such as sound or image data, while transmitting anonymized data or classifications to the cloud (where middleman attacks may view or compromise the data).
 
The reduction of network traffic also reduces latency and potential network costs. By implementing sophisticated intelligence in the device, the device becomes resilient against network slowdowns or connection failures.
 
Power and energy requirements and resultant power sources are the primary considerations when moving intelligence to the sensors.
 
The energy requirements of the physical sensors such as microphones, cameras, LIDAR and inertial continue to drop, allowing more energy for computation. Many physical sensors now include sophisticated power management modes lowering the overall energy requirements. In some instances, the machine learning implementation of the algorithm can save power, based on including longer sleep states into the architecture and data acquisition rate. The balance of architecture vs. energy is an important consideration in local or distributed machine learning. 
 
Most feature detection neural networks utilize a convolutional layer followed by a number of pooling and dense layers, resulting in a classification of the signal. The input to the convolutional layer may be a one-dimensional time series, or the time series converted into another multiple dimensional space, such as the power spectral density. This requires a power spectral or time frequency representation operation, prior to the input into the network. Many embedded libraries already include optimized version of these operations.
 
Currently, there exist frameworks for analyzing, architecting, developing, training and testing machine learning algorithms such as TensorFlow, Keras, Kaffe, etc. Embedded frameworks such as CSMIS NN from ARM and Fraunhofer ISS allow the implementation of machine learning architectures on standard ARM cores using tested and optimized libraries. Many embedded processors now include multiple cores, including GPU based vector and matrix-based machines, further optimizing computation performance vs. energy consumed.
 
Numerical representations must be taken into account when implementing machine learning on embedded cores. While there are floating point operations available on many of the current cores, fixed point or fixed fractional operations remain the most power efficient at this time. Further development and reduction of FPU operation will allow for power efficient floating point machine learning in the future.
 

The ability to implement both traditional digital signal processing and machine learning operation in the sensors is going to transform the way devices interact in the world.