A
data processing system having a
central processing unit (CPU) with address generation circuitry for accessing a
circular buffer region in a non-aligned manner is provided. The CPU has an
instruction set architecture that is optimized for intensive numeric
algorithm processing. The CPU has dual load / store units connected to dual memory ports of a
memory controller. The CPU can execute two aligned data transfers each having a length of one
byte, two bytes, four bytes, or eight bytes in parallel by executing two load / store instructions. The CPU can also execute a single non-aligned data transfer having a length of four bytes or eight bytes by executing a non-aligned load /
store instruction that utilizes both memory ports. A data transfer address for each load /
store instruction is formed by fetching the instruction (600), decoding the instruction (610) to determine instruction type, transfer data size,
addressing mode and scaling selection. For a non-aligned instruction, after selectively scaling (620) an offset provided by the instruction and combining the selectively scaled offset with a
base address value the
resultant address is then augmented (640) by a line size associated with the instruction. For circular
addressing mode, both the
resultant address and the augmented address are bounded (650, 651) to stay within the
circular buffer region and two aligned data items are accessed in parallel (652, 653) and a non-aligned data item is extracted (654) from the two aligned data items, such that the non-aligned data item wraps around the boundary of the
circular buffer region.