Research Area

Research Area

Digital Machine Learning Accelerator Design based on Neural Network with in 3D Stacked DRAM

  • Processor in memory architecture
    As machine learning or neuromorphic algorithm has low operation density (ops/byte), memory bandwidth is bottleneck of entire system. Placing computing engine below DRAM dies to communicate with high speed TSVs can achieve high memory bandwidth and low access energy consumption. During the design, area and temperature is critical to guarantee DRAM operation.
  • Memory Centric Neural Computing
    In neuromorphic algorithm, main operation of single neuron is constant for most of neural network types. Only difference is set of connection for a single neuron. Based on neural network layer type, connection for one neuron is pre-determined. Therefore, sequence of operands during the computation is known priori. Thanks to this property, memory can prepare and push data to computing units without memory access request. When data from memory arrive computing units, it trigger operation (event-driven). As memory controls entire system, we call this operating scheme as Memory Centric Neural Computing.
  • Programmable neuromorphic architecture
    Since different neural network layers are defined by different connection across the layers, programming data sequence from memory can cover various types of neural network layer. Dedicated address generator (Programmable Neurosequence Generator: PNG) is attached on the memory side to deliver the data. While previous literature cover acclerator for given neural network connection (mostly 2D convolutional layer), it can cover fully connected, 2D convolutional, and recurrent layer.
  • Presented at The Next platform

Approximate Computing in Digital Machine Learning Accelerator

  • Truncating bit precision of synaptic weights
    Reducing bit precision of fixed point representing synaptic weights to reduce switching activities in MAC (multiply-accumulator) units to save dynamic power.
  • Replacing multiplier with in-exact multiplier
    Inexact multiplier, designed to allow some bit error on LSBs, consumes less dynamic power and has smaller footprint. To save dynamic power more, some of multipliers are replaced as in-exact multiplier. By coupling bit-precision truncation, erros on LSBs from in-exact multiplier are hidden.
  • Allocating approximation level to synaptic weights based on gradient
    Gradient, pre-computed during the training, represents error sensitivities for each synaptic weights. After training, synaptic weights with less gradient are approximated more since  they have less impact on the result of output layer in neural network.
  • Impact of training conditions on the approximation during the inference
    For different training conditions (number of iterations, bit precision during the training, number of layers in the network), the amount of power saving from the approximation under target accuracy degradation margin is studied.