Disclosed embodiments relate to a variable format, variable sparsity matrix multiplication (VFVSMM) instruction. In one example, a processor includes fetch and decode circuitry to fetch and decode a VFVSMM instruction specifying locations of A, B, and C matrices having (M * K), (K * N), and (M * N) elements, respectively, execution circuitry, responsive to the decoded VFVSMM instruction, to: routeeach row of the specified A matrix, staggering subsequent rows, into corresponding rows of a (M * N) processing array, and route each column of the specified B matrix, staggering subsequent columns,into corresponding columns of the processing array, wherein each of the processing units is to generate K products of A-matrix elements and matching B-matrix elements having the same row address as acolumn address of the A-matrix element, and to accumulate each generated product with a corresponding C-matrix element.