The invention provides an
electricity consumption data batch high speed
processing method and
system based on distributed offline technology, and relates to a data
batch processing method
system. At present, there is a lacking of the high speed
mass data storage and calculation model, and not all the types and forms of
mass data storage and calculation can be fulfilled. The method comprises the steps of (1) an
electricity consumption front sampler sending the sampled
electricity consumption data to a Kafka data
queue for buffering in a real-time mode, (2)
Storm cluster reading the electricity consumption information in the Kafka
queue, and storing the information in a real-time mode to a Hbase, (3) Spark extracting the electricity consumption information to be processed from the Hbase, and uploading the information to a Hive data sheet, (4) through Spark conducting the offline operation of related Hive data sheet to acquire the electricity consumption of the
current period, and
processing related abnormalities. The method combines the advantages of both the
Storm and the Spark, enhances the overall
processing capacity. From the dynamic migration technology of
transaction security node, a
transaction security protocol of complete task node migration is proposed. On the foundation of the guarantee of no
packet loss, no repetition in the migration process, at the same time the execution efficiency of the migration itself is enhanced, and the
system stability is elevated.