The invention discloses a dialog strategy online realization method based on multi-
task learning. According to the method, corpus information of a man-
machine dialog is acquired in real time,
current user state features and user action features are extracted, and construction is performed to obtain training input; then a single accumulated
reward value in a dialog strategy learning process is split into a dialog round number
reward value and a dialog success
reward value to serve as training annotations, and two different value models are optimized at the same time through the multi-
task learning technology in an online training process; and finally the two reward values are merged, and a dialog strategy is updated. Through the method, a learning reinforcement framework is adopted, dialog strategy optimization is performed through
online learning, it is not needed to manually design rules and strategies according to domains, and the method can adapt to domain information structures with different degrees of complexity and data of different scales; and an original optimal single accumulated reward value task is split, simultaneous optimization is performed by use of multi-
task learning, therefore, a better
network structure is learned, and the variance in the training process is lowered.