Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

706 results about "Feature engineering" patented technology

Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. Feature engineering is fundamental to the application of machine learning, and is both difficult and expensive. The need for manual feature engineering can be obviated by automated feature learning.

Deep learning-based question and answer matching method

The invention relates to a deep learning-based question and answer matching method. The method comprises the following steps of: 1) sufficiently learning word orders and sentence local features of a question text and an answer text by utilizing two underlying deep neural networks: a long short-term memory network LSTM and a convolutional neural network CNN; and 2) selecting a keyword with best semantic matching on the basis of a pooling manner of an attention mechanism AM. Compared with existing methods, the method has the advantages of being in low in feature engineering workload, strong in cross-field performance and relatively high in correctness, and can be effectively applied to the fields of commercial intelligent customer service robots, automatic driving, internet medical treatment, online forum and community question answering.
Owner:TONGJI UNIV

Commodity recommendation method based on mobile electronic commerce of big data

The invention requests to protect a commodity recommendation method based on mobile electronic commerce of big data. The method comprises the following steps that: 101: carrying out a preprocessing operation on the historical behavior data of a user; 102: according to behavior time, carrying out a data division operation on the historical data of the user; 103: marking the historical behavior dataof the user; 104: carrying out a feature engineering construction operation on the historical data of the user; 105: establishing a plurality of machine learning models, and carrying out a model fusion operation; and 106: through an established model, according to the behavior data of the user, predicting whether the user purchases a certain commodity in one future day or not. By use of the method, the historical data of the user is preprocessed and analyzed to extract features, the plurality of machine learning models are established so as to predict a probability for the user to purchase the certain commodity in one future day, and accuracy for a merchant to recommend commodities to the user is improved.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Wind turbine generator bearing fault diagnosis method for multi-channel deep convolutional neural network

The invention relates to a wind turbine generator bearing fault diagnosis method for a multi-channel deep convolutional neural network. The method comprises the steps of: simultaneously acquiring high-frequency vibration acceleration signals of a drive end and a non-drive end of a test bearing in various states by using a vibration acceleration sensor; analyzing the acquired vibration signals by using a time frequency analysis technology to obtain corresponding time frequency spectra; establishing a deep convolutional neural network diagnosis model, and training the diagnosis model by using the time frequency spectra and the states of the bearing as a training sample; evaluating the diagnosis model, and applying the diagnosis model to a bearing to be monitored. The method can realize automatic feature learning, avoids feature engineering, effectively utilizes multi-channel vibration signals, and has good universality and extensibility.
Owner:NORTH CHINA ELECTRIC POWER UNIV (BAODING)

High potential user buying intention prediction method based on big data user behavior analysis

The invention provides a high potential user buying intention prediction method based on big data user behavior analysis. The high potential user buying intention prediction method comprises the following steps: 101 data preprocessing: the historical behavior data set of the e-commerce user is preprocessed; 102 sample defining and marking: samples are constructed with the interacted user product pairs to act as the keywords according to the historical consumption behavior of the user; 103 division of a training set and a test set: the historical data are divided into the training set and the test set by using a time window division method; 104 feature construction: feature engineering construction of the historical behavior data of the user is performed; and 105 algorithm design and implementation: feature selection of the feature group and unbalanced data processing of the data set are performed and then the final result of two-layer model iterative learning algorithm prediction is put forward. The prediction model is established on the basis of the historical behavior data of the e-commerce user of the time span of 45 days so that whether the user places an order of the commodityin the candidate commodity set P in the following 5 days can be predicted.
Owner:上海普瑾特信息技术服务股份有限公司

Similarity analysis method and system for patients suffering from cardio-cerebral vascular diseases

The invention provides a similarity analysis method and system for patients suffering from cardio-cerebral vascular diseases. The method comprises the following steps of 1 problem definition, wherein problem definition for the patients suffering from the cardio-cerebral vascular diseases is conduced; 2 data collection, wherein health care data of the patients suffering from the cardio-cerebral vascular diseases is collected; 3 data preprocessing, wherein data integration, data cleaning, missing value processing, feature deleting and abnormal point removing are included; 4 feature engineering, wherein feature construction, feature selection and feature processing are included; 5 patient clustering modeling; 6 diagnosis and treatment scheme recommendation. Accordingly, an effective similarity analysis model for the patients suffering from the cardio-cerebral vascular diseases is built, a clinician can obtain the similar populations of a give patient through the patient features and then recommends a personalized treatment plan to achieve the purpose of accurate medical treatment, population grouping based on similarity analysis can be well conducted on the patients suffering from the cardio-cerebral vascular diseases in the Chinese population, and pointed personalized rehabilitation therapy is conducted on different risk populations as early as possible.
Owner:中电科数字科技(集团)有限公司

Fine-grained word representation model-based sequence labeling model

ActiveCN108460013ABoundary Judgment ImprovementImprove entity recognitionSemantic analysisCharacter and pattern recognitionData setAlgorithm
The invention provides a fine-grained word representation model-based sequence labeling model, which is used for performing a sequence labeling task, and belongs to the field of computer application and natural language processing. The structure of the model is mainly composed of three parts including a feature representation layer, a BiLSTM layer and a CRF layer. When the sequence labeling task is performed by utilizing the model, firstly an attention mechanism-based character level word representation model Finger is proposed for fusing morphological information and character information ofwords; secondly the Finger and a BiLSTM-CRF model finish the sequence labeling task jointly; and finally a result with F1 of 91.09% is obtained in a CoNLL 2003 data set in end-to-end and no any feature engineering forms by a method. An experiment shows that the designed Finger model remarkably improves the recall rate of a sequence labeling system, so that the model identification capability is remarkably improved.
Owner:DALIAN UNIV OF TECH

A disk failure detection method using multi-model prediction

The invention discloses a disk fault detection method using multi-model prediction, which extracts multiple characteristics of disk SMART indexes through a sequential data processing means, and establishes a classification model to predict disk state. Step 1, data input: acquiring a data set composed of monitoring data of a plurality of disks in a period of time; step 2, SMART screening: adoptingmutation point detection mode to select SMART index; step 3, feature engineering: using SMART index as the input of the user-defined feature extraction module to extract the features of the SMART index, then extracting the corresponding parameter configuration, and transmitting the parameter configuration to the feature extraction module as a parameter, so as to extract the feature sets of the training set and the test set; step 4, data set balance: desampling the negative sample which occupies a large amount by adopting dimension reduction clustering; step 5, algorithm selection and modeling:on the basis of the step 4, training the classification model and testing whether the current disk belongs to the normal state or the fault state that needs to be replaced.
Owner:南京群顶科技股份有限公司

Method and system for detecting and locating network anomaly

The invention discloses a method and system for detecting and locating network anomaly, which relates to the fields of Internet security, deep learning and neural network. The method comprises the following steps: firstly, dividing the URL according to special characters; secondly, performing word vector encoding on the divided URL with word2vec; putting word vectors into a convolution layer for automatic feature processing; combining a convolution layer result with an attention layer which possesses a sequential attention mechanism; and finally, performing maximum pooling and full-connectionon an attention layer result to obtain a final anomaly detection result, and at the same time, using the output of the attention layer to locate the malicious code in the URL. The invention has an excellent detection effect, not only is the detection rate high, but also the malicious code fragment in the URL can be located and visualized, thus effectively avoiding the drawbacks of the artificial feature engineering and the expert knowledge method.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Text classification method and device, equipment and medium

The invention discloses a text classification method and device, equipment and a medium, and relates to the technical field of natural language processing. According to the specific implementation scheme, to-be-classified texts are obtained; the word sequence of the text to be classified is input into a word vector coding model to determine a word vector sequence of the word sequence; the entity sequence of the text to be classified is input into an entity vector model to determine an entity vector sequence corresponding to the entity sequence; wherein the entity vector model determines an entity vector based on an entity vector encoding model, and the entity vector encoding model is formed by text training based on an entity knowledge graph database; and classification identification is performed on the to-be-classified text according to the word vector sequence and the entity vector sequence. According to the embodiment of the invention, the construction of feature engineering and training samples is avoided, and the construction difficulty of a text classification model is reduced; text classification is comprehensively carried out through the word vector sequence and the entityvector sequence, the semantic sensitivity of the text classification model is improved, and then the accuracy of the classification result of the to-be-classified text is improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

EMR data drive based GDM forecasting method

The invention discloses an EMR (Electronic Medical Record) data drive based GDM (Gestational Diabetes Mellitus) forecasting method playing an increasing important role in smart medical service. The invention proposes a machine learning based GMD forecasting frame and constructs three forecasting frames including a full-domain data forecasting model, a staging data forecasting model and a weekly data forecasting model according different time window division methods for collected data. After a forecasting item is identified, high-dimensional EMR data digging is implemented through seven steps including input and ETL data cleaning, correlation of a medical record code and feature data, EMR data pre-treatment, secondary data treatment, feature engineering, machine learning and forecasting application. A mark data set related to definite diagnosis is constructed by using clinic data and is divided into two sub sets used for model training and testing. The method performs forecasting through supporting a support vector machine, a Bayesian network, a decision making tree and an integration based hybrid model and GDM mode classification is realized.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA +1

Global average pooling convolutional neural network-based Chinese emotion tendency classification method

ActiveCN108614875AWith automatic feature extractionEnhanced automatic feature extractionNeural architecturesSpecial data processing applicationsFeature extractionClassification methods
The invention provides a global average pooling convolutional neural network-based Chinese emotion tendency classification method, which is a technology for analyzing a Chinese text collected from a network by utilizing a computer. The method comprises the steps of building a global average pooling convolutional neural network-based Chinese emotion tendency classification model which extracts semantic emotion features by utilizing three channel transformation convolution layers; performing pooling calculation on the features extracted by the convolution layers by a global average pooling layerto obtain confidence values corresponding to output types; and outputting emotion classification tags by Softmax. According to the method, model parameters are set for performing multi-time training,and the model with the highest classification accuracy is selected for Chinese emotion tendency classification; and, complex feature engineering in conventional emotion analysis is avoided, the semantic emotion feature extraction capability of the model is enhanced, the model over-fitting is effectively avoided, and the emotion tendency classification performance of the model is improved.
Owner:BEIJING UNIV OF POSTS & TELECOMM

User portrait establishing method and system based on big data

The invention discloses a user portrait establishing method and system based on big data, which belongs to the technical field of big data application. The user portrait establishing method based on big data comprises the steps of S1 user portrait label system constructing, S2 data preprocessing; S3 automatic sample annotating, S4 user data sampling sample imbalance processing, S5 feature engineering, S6 model training through combination of a multi-classification model and a dichotomy model and S7 model optimization. The user portrait establishing method and system based on big data can improve the accuracy of a user portrait, can construct a personalized intelligent recommendation system to realize precise marketing and precise advertising, and has good promotion and application values.
Owner:INSPUR SOFTWARE CO LTD

O2O coupon usage big data prediction method

InactiveCN107301562APortrayal accuracyImprove write-off rateMarketingData setData prediction
The invention protects an O2O coupon usage big data prediction method. The method comprises the steps that 101, a historical consumption dataset of a user is subjected to preprocessing operation; 102, the historical consumption dataset of the user is marked, and a training set and a prediction set are divided and constructed; 103, the historical consumption dataset of the user is subjected to feature engineering construction; 104, feature selection and processing of unbalanced data are performed; 105, the data is subjected to multi-classifier integrated learning; and 106, coupon usage of the user is predicted through an established model according to historical consumption data of the user, and serving of an O2O coupon is optimized. According to the method, a prediction model is established mainly by processing the user consumption data and performing multi-classifier integrated learning on the data, therefore, the coupon usage of the user in the future is predicted, and serving of the O2O coupon is optimized.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Active machine learning system for hazardous host detection

ActiveCN106790256ARealize detectionRealize the alarmTransmissionSecurity information and event managementDeep belief network
The invention discloses an active machine learning system for hazardous host detection. Hosts high in possibility of being destroyed are marked by the aid of SIEM (security information and event management) warning information, various safety logs and research notes of analysts. The active machine learning system comprises parts including data collection, feature engineering, mark production, machine learning, feedback algorithms of active learning analyst insights, real-time warning and the like; natural language progressing, text mining and graphics-based methods are adopted, targets are generated and characteristics are created for machine learning; a machine learning mechanism of deep belief network, multilayer deep neutral network, random forest, support vector machine, Logistic regression and the like are adopted for machine learning. The hazardous hosts in the network can be accurately detected, false alarm rate is greatly reduced, host security detection requests and SOC (security operation center) actual investigation are considered, important security events can be handled in time, and labor cost is reduced while network security monitoring capability is improved.
Owner:浙江航芯科技有限公司

A phishing website URL detection method based on depth learning

The invention discloses a phishing website URL detection method based on depth learning, which can detect the phishing website on the Internet in real time only according to the website URL. At first,the URL string sequence is encoded into one-Hot two-dimensional sparse matrix, then transformed into dense character embedding matrix and input to a convolution neural network, local depth features are extracted, and then the input and the output of convolution neural network are input to long-term and short-term memory network, the correlation of URL sequences is captured, and finally a softmaxmodel is accessed to classify URLs. The invention can avoid redundant feature engineering, extract local depth correlation feature through convolution neural network, learn long-range dependency in URL through long-term and short-term memory network, and quickly and accurately detect phishing website URL.
Owner:SOUTHEAST UNIV

Intelligent operation and maintenance alarm filtering method based on multiplatform autonomous prediction and system thereof

The invention relates to the maintenance field of operation and maintenance equipment, in particular to an intelligent operation and maintenance alarm filtering method and system based on multi-platform autonomous prediction. The method comprises the following steps: (1) data acquisition integration; (2) data quality inspection; (3) data cleaning; (4) feature engineering; (5) sample collection; (6) model training and parameter optimization; ) model release (8) model use; (9) model feedback and optimization. The system includes a data acquisition and integration module, a data quality inspection module, a data cleaning module, a feature engineering module, a sample sampling module, a model training and parameter optimization module, a model release module, a model alarm filtering module, and a model feedback and optimization module. The invention guarantees the real-time processability of low-level alarm events, and avoids the possibility of potential sudden serious alarm events due to subjective judgment errors of experts and inability to work 24 hours a day.
Owner:北京至信普林科技有限公司

Bearing fault prediction method and device based on equal division

The invention discloses a bearing fault prediction method and device based on equal division. The bearing fault prediction method comprises the following steps that one-dimensional or multi-dimensional vibration signals of a bearing are detected, and accordingly, sample signals are obtained according to the one-dimensional or multi-dimensional vibration signals; the sample signals are equally divided so as to obtain equally-divided time sequence segments; and the equally-divided time sequence segments are input into a fault prediction model according to the collecting time, and the predictionresult of each time sequence segment is obtained; and according to an attention mechanism, the weight is distributed to the finally-output contribution sizes for the hidden state of the model at eachtime, so that the fault prediction result of the bearing is generated after the contribution sizes are subjected to weighted summation. According to the bearing fault prediction method based on equaldivision, complicated feature engineering is omitted, an end-to-end fault diagnosis system is achieved, the bearing fault prediction method is further suitable for multi-channel sensing scenarios, theprediction accuracy and time efficiency of the prediction model are effectively improved, applicability is high, and the bearing fault prediction method is simple and easy to implement.
Owner:BEIJING JIAOTONG UNIV +1

A drug-disease relationship classification method based on a neural network

The present invention relates to a drug-disease relationship classification method based on a neural network, which belongs to the technical field of biomedical text mining and data mining, and solvesthe problem of more accurate and effective classification of drug-induced relationship among drug-induced diseases for the drug-induced entities labeled in biomedical literature. The method comprisesS1 constructing a drug-induced relationship candidate case set; 2 performing text processing on that biomedical literature; 3 constructing the domain knowledge; 4 constructing an input vector; 5 constructing a text-level semantic information sub-network model; S6 adopting an attention mechanism to form the final representation of knowledge; S7 constructing a drug-disease relationship classification model; S8 predicting drug-disease relationships in the biomedical literature. The method can automatically identify the entity relationship of drug diseases between sentences and within sentences effectively, and overcomes the existing methods that most systems utilize a large number of feature engineering methods based on the traditional machine learning methods.
Owner:DALIAN JIAOTONG UNIVERSITY

Online public opinion text information sentiment polarity classification processing system and method

The invention belongs to the technical field of computer science, and discloses an online public opinion text information emotion polarity classification processing system and method, the online public opinion text emotion polarity is widely applied to a public opinion monitoring system, however, a feature engineering extraction module of a traditional machine learning method is large in text information loss, and the accuracy of a classification model is not high enough. The method comprises the steps of preprocessing data; the method comprises the following steps of: constructing a word vector in a way of pre-training a model fin-tuning through BERT; the BERT model calculates the correlation between the characters in the sentence and each of the other characters; the constructed word vector can better solve the problems of'one-word polysemy 'and'synonym' of Chinese; the loss of word vector representation is greatly reduced; in the classification model, firstly Bi-LSTM is used for effectively learning context information, then Attention is used for capturing main semantic information and effectively filtering valuable public opinion information, finally softmax classification is used, and the performance of an obtained public opinion text emotion polarity classification result is better than that of a current mainstream algorithm.
Owner:XIDIAN UNIV

Automated supervised learning method with multi-source data supported

The invention discloses an automated supervised learning method with multi-source data supported, which comprises steps of (1) data pre-processing, (2) feature engineering, (3) model and super parameter regulation and (4) Bayesian pipeline optimization. The traditional data analysis process is automated, the process of manually regulating a machine learning pipeline is fundamentally improved, under the high coupling degree of super parameter regulation and pipeline optimization, the extensibility of the system on a supervised learning algorithm is greatly improved, a genetic algorithm is creatively put forward for regulation on the super parameters for the machine learning pipeline, and the efficiency for automated parameter regulation is greatly improved. Besides, a Bayesian optimizer is adopted for optimizing a pipeline algorithm combination, the problem of combination space explosion can be greatly solved, and finally, the accuracy and the efficiency of the automated supervised learning method are improved in the result.
Owner:ZHEJIANG UNIV

An internet financial user loan overdue prediction method based on big data

ActiveCN109255506AAchieving Loan Overdue PredictionHigh precisionFinanceForecastingCountermeasurePredictive methods
The invention requests to protect an Internet financial user loan overdue prediction method based on big data, which comprises the following steps: 101 preprocessing data according to user behavior, user basic information and credit scoring data preprocessing operations; 102 dividing the data according to a 70-fold cross-validation method; 103 extending the training set according to the countermeasure network; 104 carrying out feature engineering construction operation according to user behavior, user basic information and credit scoring data; 105, establishing four machine learn models, and carrying out that linear regression model fusion operation; 106 predicting whether that user is in breach of contract according to the manual threshold rule set according to the basic information of the user by the established model base. The invention utilizes big data to realize the transformation of the Internet financial risk organization from the traditional 'after-action interception' means to 'pre-identification' to identify high-default users.
Owner:上海孚厘科技有限公司

Systems and methods for real-time neural text-to-speech

Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant ofWaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditionalTTS systems, wherein each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.
Owner:BAIDU USA LLC

Data processing model construction method and device, server and client

A data processing model construction method and device, a server and a client are provided in that present specification. The method comprises the following steps: acquiring model description parameters and sample data of the target data processing model; determining a basic model according to the model description parameters and the sample data; training the basic model according to the sample data to obtain a target data processing model. In the embodiment of that present specification, the model description parameters are obtained in a targeted manner, and based on the model description parameters, the server is able to accurately determine the modeling requirements of the user, and automatically matches the appropriate model as the basic model according to the user's modeling requirements, and then the corresponding data processing model is constructed, so that users do not have to do tedious test, feature engineering, model selection, model parameter search and other operations tofind the appropriate basic model, the difficulty of user operation is reduced, and the processing efficiency is improved.
Owner:ADVANCED NEW TECH CO LTD

Geological disaster prediction method, device and equipment

The invention discloses a geological disaster prediction method, device and equipment, and the method comprises the steps of obtaining the monitoring data of a monitoring region, wherein the monitoring data comprises the spatial data and attribute information for describing geological disasters; performing preprocessing and feature engineering on the spatial data and the attribute information to determine a feature subset of the geological disaster; and establishing an artificial intelligence model based on the feature subset to determine the predicted occurrence probability of the geologicaldisaster of the monitoring area. The embodiment of the invention discloses the technical scheme which includes obtaining monitoring data, processing the data and subjecting to feature engineering, extracting the feature subsets related to the geological disasters, up / down sampling the unbalanced data, determining the occurrence probability of the geological disasters based on the artificial intelligence model, realizing real-time monitoring and automatic prediction of the occurrence probability of the disasters, and improving the prediction comprehensiveness and accuracy.
Owner:杭州鲁尔物联科技有限公司

Safe feature engineering method and device

The invention provides a safe feature engineering method and device. The first device may transmit a first data set to the second device, the first data set including a ciphertext of tag information of the plurality of data objects. The second device may perform feature engineering processing on a second data set including feature data of the plurality of data objects to generate a subset of the second data set. The second device may generate a ciphertext subset of the tag information of the corresponding data object in the first data set according to the subset of the second data set, and send the ciphertext of the ciphertext subset to the first device. The first device can decrypt the ciphertext subset and generate statistical information, and send the statistical information to the second device. The second device may use the statistical information to calculate a characteristic engineering index. The invention further provides a corresponding feature engineering device.
Owner:ADVANCED NEW TECH CO LTD

Text sentiment analysis method and system based on deep learning

The invention particularly relates to a text sentiment analysis method and system based on deep learning. The method includes: normalizing initial text data to generated preprocessed text data, and clustering the preprocessing text data to preset fields; manually labeling part of data in different fields, training a sentiment analysis model based on the deep learning, and building the special depth of each preset field; using a formed classifier and the special depths to perform sentiment classification on to-be-classified text. The method has the advantages that manpower cost is reduced, influence of feature engineering on classification results is avoided, and workload of special engineering is reduced at the same time; in addition, the fields where the text belongs to are considered, sothat the accuracy of text sentiment analysis is increased.
Owner:北京牡丹电子集团有限责任公司数字科技中心
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products