A
Hardware architecture for an accelerated
artificial intelligence processor includes: a main engine, a front lobe engine, a parietal lobe engine, a renderer engine, a pillow engine, a
temporal lobe engine and a memory. The front-lobe engine obtains 5D
tensor from the host and divides it into several sets of tensors, and sends these sets of tensors to the top-lobe engine. The front-lobe engine obtains 5D tensors from the host and divides them into several sets of tensors. The top engine acquires a set of tensors and divides them into a plurality of
tensor waves, sends the
tensor waves to the renderer engine to execute an input feature renderer, and outputs a portion of the tensors to the
pincushion engine. The
pincushion engine accumulates a partial tensor and executes an output feature renderer to obtain a final tensor sent to the
temporal lobe engine. The
temporal lobe engine compresses the data and writes the final tensor to memory. The
artificial intelligence work in the inventionis divided into a plurality of highly parallel parts, some parts are allocated to an engine for
processing, the number of engines is configurable, the
scalability is improved, and all work partitioning and distribution are realized in the architecture, thereby obtaining high-
performance efficiency. The
artificial intelligence work in the invention is divided into a plurality of highly parallel parts, and some parts are allocated to an engine for
processing, and the number of engines is configurable, and the
scalability is improved.