A memory cell array is physically divided into an even number of sectors, with each pair of sectors sharing read circuitry. The outputs of the shared read circuitry are commonly connected to form data lines spanning the height of the array, which are input to global sense amplifiers. A two-stage sensing scheme is employed, with first stage and global sense amplifiers. The driving capability of the first stage sense amplifier can be used to decrease the time to charge or discharge the data lines, which reduces the total signal development time and consequently improves read performance. Granularity of the array can be adjusted by dividing groups and sub-groups of memory cells within a sector accordingly. In a read operation, the bit line in the opposite sector at the same column location is used as reference bit line, which greatly improves matching of bit line loading for the sensing.