Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data clustering method and device

A data clustering and clustering center technology, which is applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problem of inability to realize arbitrary shape data clustering, affect the classification results, and not apply to sensor data concept extraction, etc. question

Inactive Publication Date: 2018-05-25
CHINA MOBILE COMM LTD RES INST +1
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the rule-based method is to abstract the rules or templates through the artificial identification of domain concepts, and find out the domain concepts that match the rules or conform to the template in the text. Usually, this method relies on natural language processing tools. This method is affected by different languages ​​and different fields. It is cumbersome to construct new rules for new environments and lacks versatility. The statistical method is to use machine learning technology to find the text in the corpus Features, label and train the corpus, and obtain domain concept extraction models. The methods usually used include hidden Markov models, decision trees, neural networks, etc. Although this method is not affected by language and domain, it needs to be trained in labeling. Manually intervene in the collection of domain concepts before collection, it is necessary to observe all documents, otherwise it will lead to the lack of candidate domain concept word sets, which will affect the classification results; the method of combining rules and statistics is to use linguistics and mathematical statistics methods to obtain Domain concept, in which the rule method focuses on obtaining the candidate domain concept, while the statistical method is used to improve the accuracy and efficiency of domain concept extraction. At present, most domain concepts use this combination method
[0005] However, although the method of combining rules and statistics overcomes the shortcomings of rule-based and statistical methods, the current research objects are all for domain concept extraction of text information, but not suitable for sensor data concept extraction; secondly , the existing methods require corpus, open text collection or rule base as training samples, which put forward higher requirements for the pre-training set data preparation. However, in many cases, the training set data that meets the requirements is not easy to obtain, which will seriously affect the extraction. The accuracy rate; again, the traditional k-means clustering method is mostly used for the extraction of domain concepts. This method is easily limited by the initial cluster center and the number of cluster centers, and cannot achieve clustering of arbitrary shape data. Affects the accuracy of extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data clustering method and device
  • Data clustering method and device
  • Data clustering method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0079] The specific implementation manners of a data clustering method and device provided by the embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the described embodiments are only some of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0080] The embodiment of the present invention provides a data clustering method, such as figure 1 As shown, the following steps can be included:

[0081] S101. Generate a multidimensional character string sequence according to the time series collected by multiple sensors;

[0082] S102. According to each character string sequence in the multidimensional character string sequence, and the time and place when each sen...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data clustering method and device. According to the invention, a local density clustering method is adopted, and a foundation is laid for clustering of sensor data through determining the distance between nodes, the local density of each node and the shortest distance between each node and a node with higher local density; and then the category to which each clustering center node and the nodes except for the clustering center nodes belong is determined according to the determined local density of each node and the shortest distance between each node and a node with higher local density, thereby realizing automatic clustering the nodes, accomplishing automatic extraction of data concepts, not only breaking through defects of the traditional k-means clustering method, but also realizing clustering for data of any shapes. In addition, the data clustering method and device lay a foundation for realizing collaborative analysis of heterogeneous equipment, interoperations of the equipment and the like, the reliability and complementarity of the information are ensured, and the accuracy of data concept extraction is improved.

Description

technical field [0001] The invention relates to the technical field of data mining, in particular to a data clustering method and device. Background technique [0002] With the rapid development of the Internet of Things, there are more and more types of smart devices in the Internet of Things, and more and more devices generate massive and heterogeneous perception data, which provide resource interaction, data association and reasoning within and between systems. All of these have brought great challenges. How to shield the heterogeneity and isolation of sensory data and realize data interconnection and fusion has become a hot issue in the field of Internet of Things research. [0003] Therefore, semantic technology is introduced into the Internet of Things, and ontology, as a modeling tool capable of describing concepts at the semantic and knowledge levels, has conceptual characteristics and is considered to be the core and key of information semantic representation; in or...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/232
Inventor 鲍媛媛
Owner CHINA MOBILE COMM LTD RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products