Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for constructing deep visual Q&A system for visually impaired persons

A visual barrier, deep vision technology, applied in the field of deep vision speech system construction

Active Publication Date: 2017-07-14
ZHEJIANG UNIV
View PDF6 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There is also a product called Third Eye, which is mainly able to take pictures to identify objects, and then voice input, which obviously skips the stage of interacting with users, and has very large limitations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for constructing deep visual Q&A system for visually impaired persons
  • Method for constructing deep visual Q&A system for visually impaired persons
  • Method for constructing deep visual Q&A system for visually impaired persons

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0119] 1. Based on Android 5.0 Jelly Bean and above, the application Deep Ask based on the above algorithm was developed.

[0120] 2. Based on our consideration of the operational limitations of the visually impaired, we have carefully designed a simple and practical interaction method. details as follows

[0121] a. Since the blind cannot accurately identify the various parts of the mobile phone screen for precise operations, we use each part of the entire screen as the reaction area for the operation

[0122] b. When a blind user taps and clicks anywhere on the screen, the shooting program of the application will be started. This operation will use the camera module at the bottom of the system to directly call the camera of the mobile phone to take pictures and store the captured pictures in the file system cache of the mobile phone in the form of a common JPEG file. At the right time, the image file will be transferred to the server through the RESTful API so that it can ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a method for constructing a deep visual Q&A system for visually impaired persons. In the training phase, the method comprises: taking collected pictures and a corresponding Q&A text to constitute a training set; extracting picture features for the pictures by using the convolutional neural network; for a question text, converting questions into a word vector list by using the word vector technique, and taking the word vector list as input of the LSTM so as to extract question features; and finally, after carrying out element dot product on the pictures and the question features, carrying out classification on the pictures and the question features so as to obtain an answer prediction value, comparing the answer prediction value with an answer tag in the training set, calculating the loss, and using the back propagation algorithm to optimize the model. In the running phase, the method comprises that: a client obtains photos taken by the user and the question text, and uploads the photos and the question text to a server; the server inputs the uploaded photos and question text into a trained model, extracts question features by using the same manner, outputs a corresponding answer prediction value by using a classifier, and returns the answer prediction value to the client; and the client returns the answer prediction value to the user in a form of voice input.

Description

technical field [0001] The present invention relates to the field of visual question answering (Visual Question Answering), a cross field involving both Natural Language Processing (NLP) and Computer Vision (CV), in particular to a deep How to build a visual speech system. Background technique [0002] The visually impaired population accounts for a huge proportion of the world. In addition, according to the National Bureau of Statistics, in 2014 there were about 6-7 million blind people in China, and another 12 million patients with low vision in both eyes. These people will encounter many problems in their daily life. Take the blind people traveling as an example. Although there are blind roads, many blind people dare not walk on the blind roads at all, because the blind roads are often occupied, not to mention the danger of talking on the road. . So they really need some auxiliary equipment to help them "restore the light". The current voice assistants generally only ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/00G06N3/04G06N3/08
CPCG06F16/3329G06F16/3343G06F16/3344G06F16/583G06N3/049G06N3/084G06V20/10
Inventor 潘浩杰刘洋周君沛陆家林
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products