Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

192 results about "Map reduce" patented technology

Map-Reduce Ready Distributed File System

A map-reduce compatible distributed file system that consists of successive component layers that each provide the basis on which the next layer is built provides transactional read-write-update semantics with file chunk replication and huge file-create rates. A primitive storage layer (storage pools) knits together raw block stores and provides a storage mechanism for containers and transaction logs. Storage pools are manipulated by individual file servers. Containers provide the fundamental basis for data replication, relocation, and transactional updates. A container location database allows containers to be found among all file servers, as well as defining precedence among replicas of containers to organize transactional updates of container contents. Volumes facilitate control of data placement, creation of snapshots and mirrors, and retention of a variety of control and policy information. Key-value stores relate keys to data for such purposes as directories, container location maps, and offset maps in compressed files.
Owner:HEWLETT-PACKARD ENTERPRISE DEV LP

System and method for data warehousing and analytics on a distributed file system

ActiveUS20090055370A1Significant performance bottleneckProcess is time-consume and expensiveDigital data information retrievalDigital data processing detailsData warehouseMap reduce
A computer implemented method for executing an ANSI SQL expression belonging to the SELECT-WHERE-equi-JOIN class on data residing in a distributed file system, said method comprising the steps of entering the ANSI SQL expression into a user interface; converting the ANSI SQL expression into a map-reduce program; running the map-reduce program on the distributed file system; storing the result set of the program in the distributed file system; and presenting the result set through a user interface.
Owner:THRYV INC

Transparent efficiency for in-memory execution of map reduce job sequences

Executing a map reduce sequence may comprise executing all jobs in the sequence by a collection of a plurality of processes with each process running zero or more mappers, combiners, partitioners and reducers for each job, and transparently sharing heap state between the jobs to improve metrics associated with the job. Processes may communicate among themselves to coordinate completion of map, shuffle and reduce phases, and completion of said all jobs in the sequence.
Owner:IBM CORP

Scaling event processing using distributed flows and map-reduce operations

Some event ordering requirements can be determined based on continuous event processing queries. Other event ordering requirements can be determined based on distribution flow types being used to distribute events from event streams to node executing the queries. Events from event streams can be ordered according to ordering semantics that are based on a combination of all of these event ordering requirements. Additionally, virtual computing nodes can be associated with constraints, and computing processors can be associated with capabilities. Virtual computing nodes for processing event streams can be assigned to execute on various computing processors based on both these constraints and capabilities. Additionally, for each of several events in an event stream, a ratio between a total latency and a communication latency can be for determined. Based on an average of these ratios, a quantity of reducing nodes that will be involved in a map-reduce operation can be selected.
Owner:ORACLE INT CORP

Apparatus and method for integrating map-reduce into a distributed relational database

A computer readable storage medium includes executable instructions to define a map-reduce document that coordinates processing of data in a distributed database. The map-reduce document complies with a map-reduce specification that integrates map-reduce functions with queries in a query language. The operations specified by the map-reduce document are executed in the distributed database.
Owner:GOPIVOTAL

Map-Reduce Ready Distributed File System

A map-reduce compatible distributed file system that consists of successive component layers that each provide the basis on which the next layer is built provides transactional read-write-update semantics with file chunk replication and huge file-create rates. Containers provide the fundamental basis for data replication, relocation, and transactional updates. A container location database allows containers to be found among all file servers, as well as defining precedence among replicas of containers to organize transactional updates of container contents. Volumes facilitate control of data placement, creation of snapshots and mirrors, and retention of a variety of control and policy information. Also addressed is the use of distributed transactions in a map-reduce system; the use of local and distributed snapshots; replication, including techniques for reconciling the divergence of replicated data after a crash; and mirroring.
Owner:HEWLETT-PACKARD ENTERPRISE DEV LP

Double map reduce distributed computing framework

A method, apparatus, system, article of manufacture, and data structure provide the ability to perform a sorted map-reduce job on a cluster. A cluster of two or more computers is defined by installing a map-reduce framework onto each computer and formatting the cluster by identifying the cluster computers, establishing communication between them, and enabling the cluster to function as a unit. Data is placed into the cluster where it is distributed so that each computer contains a portion of the data. A first map function is performed where each computer sorts their respective data and creates an abstraction that is a representation of the data. The abstractions are exchanged and merged to create complete abstraction. A second map function searches the complete abstraction to redistribute and exchange the data across the computers in the cluster. A reduce function is performed in parallel to produce a result.
Owner:MYSPACE LLC

Table format for map reduce system

A key-value store provides column-oriented access to data in a distributed and fault tolerant manner. Data can be inserted into the data store and data can be retrieved either randomly or sequentially from the data store at high rates. Keys for a table are ordered and the entire table is divided into key ranges. Each key range is handled by a table which itself is divided into key ranges called a partition. Partitions are also divided into segments. Such recursive division into smaller and smaller key ranges provides parallelism. At the highest level, operations on tablets can be distributed to different nodes. At lower levels, different threads can handle operations on individual segments. Large-scale restructuring operations can be decomposed into operations on individual segments so that a global lock on larger objects does not need to be kept across the entire operation.
Owner:HEWLETT-PACKARD ENTERPRISE DEV LP

Locality-aware resource allocation for cloud computing

Computing resource allocation for map-reduce job execution comprises determining the volume of input data to the map-phase and the reduce-phase of a map-reduce job prior to execution. Based on said determination, data blocks and virtual machines (VMs) are selectively placed for locality aware map-reduce job execution on a cluster of computing nodes in a network. Selectively placing data blocks and VMs comprises integrally placing the data and the VMs at selected nodes to lower data transfer network hops for a map-phase and a shuffle-phase of the map-reduce job upon execution by the VMs.
Owner:IBM CORP

Enriching events with dynamically typed big data for event processing

Some event ordering requirements can be determined based on continuous event processing queries. Other event ordering requirements can be determined based on distribution flow types being used to distribute events from event streams to node executing the queries. Events from event streams can be ordered according to ordering semantics that are based on a combination of all of these event ordering requirements. Additionally, virtual computing nodes can be associated with constraints, and computing processors can be associated with capabilities. Virtual computing nodes for processing event streams can be assigned to execute on various computing processors based on both these constraints and capabilities. Additionally, for each of several events in an event stream, a ratio between a total latency and a communication latency can be for determined. Based on an average of these ratios, a quantity of reducing nodes that will be involved in a map-reduce operation can be selected.
Owner:ORACLE INT CORP

Integrating map-reduce into a distributed relational database

A computer readable storage medium includes executable instructions to define a map-reduce document that coordinates processing of data in a distributed database. The map-reduce document complies with a map-reduce specification that integrates map-reduce functions with queries in a query language. The operations specified by the map-reduce document are executed in the distributed database.
Owner:GOPIVOTAL

System and method for a task management library to execute map-reduce applications in a map-reduce framework

An improved system and method for a task management library to execute map-reduce applications is provided. A map-reduce application may be operably coupled to a task manager library and a map-reduce library on a client device. The task manager library may include a wrapper application programming interface that provides application programming interfaces invoked by a wrapper to parse data input values of the map-reduce application. The task manager library may also include a configurator that extracts data and parameters of the map-reduce application from a configuration file to configure the map-reduce application for execution, a scheduler that determines an execution plan based on input and output data dependencies of mappers and reducers, a launcher that iteratively launches the mappers and reducers according to the execution plan, and a task executor that requests the map-reduce library to invoke execution of mappers on mapper servers and reducers on reducer servers.
Owner:R2 SOLUTIONS

Methods and systems for processing large graphs using density-based processes using map-reduce

Embodiments are directed to a density-based clustering algorithm that decomposes and reformulates the DBSCAN algorithm to facilitate its performance on the Map-Reduce model. The DBSCAN algorithm is reformulated into connectivity problem using a density filter method and a partial connectivity detector. The density-based clustering algorithm uses message passing and edge adding to increase the speed of result merging, it also uses message mining techniques to further decrease the number of iterations to process the input graph. The algorithm is scalable, and can be accelerated by using more machines in a distributed computer network implementing the Map-Reduce program.
Owner:SALESFORCE COM INC

Management of Intermediate Data Spills during the Shuffle Phase of a Map-Reduce Job

A system and a method for spill management during the shuffle phase of a map-reduce job performed in a distributed computer system on distributed files. A spilling protocol is provided for handling the spilling of intermediate data based on at least one popularity attribute of key-value pairs of the input data on which the map-reduce job is performed. The spilling protocol includes and assignment order to storage resources belonging to the computer system based on the at least one popularity attribute. The protocol can be deployed in computer systems with heterogeneous storage resources. Additionally, pointers or tags can be assigned to improve shuffle phase performance. The distributed file systems that are most suitable are ones usable by Hadoop, e.g., Hadoop Distributed File System (HDFS).
Owner:ZETTASET

Distributed log analysis based operation state monitoring method of power system

The invention discloses a distributed log analysis based operation state monitoring method of a power system. The method comprises the steps of S1, acquiring log information of the power system, and combining into a log file; S2, segmenting the log file, processing to obtain the log information with the same format, respectively sequentially outputting the log information in the log file into a distributed storing system; S3, extracting the log information from the distributed storing system, classifying the log information by the log analysis algorithm based on state noise removing and clustering according to the Map-Reduce mechanism, and analyzing the classified log information to monitor the system operation state. With the adoption of the method, the abnormality in operation state of the power system can be timely found if any and can be handled at the first time, so that the requirement on timely and efficient operation of the power system can be effectively met.
Owner:BEIJING KEDONG ELECTRIC POWER CONTROL SYST +2

Intelligent urban construction examining and approving method based on case-based reasoning technology

The invention discloses an intelligent urban construction examining and approving method based on a case-based reasoning technology. The intelligent urban construction examining and approving method based on the case-based reasoning technology comprises the following steps of constructing an examining and approving case library; inputting new examining and approving case and model parameter information; submitting jogs to a Hadoop cluster to search KNN (k-nearest neighbor algorithm) Map Reduce cases; statistically analyzing a searching result on the basis of a 'weighted integral model'; evaluating and correcting the cases; and performing distributed full-text searching on examining and approving data. The intelligent urban construction examining and approving method has the advantages that by the method, the circumstance of manual examination and approval application at present can be changed, the work efficiency is improved, the basis on examining and approving is increased, and an examining and approving process is intelligent. Distributed searching can be carried out by using a Hadoop frame and a MapReduce frame through a cloud computing center, and a distributed case searching model based on the case-based reasoning technology is established. The 'weighted integral model' is creatively raised to statistically analyzing searched similar cases, and a guidance which is beneficial to new examining and approving cases is obtained.
Owner:ZHEJIANG UNIV CITY COLLEGE

Massive web log data query and analysis method

The invention discloses a massive web log data query and analysis method based on Hadoop and Hive by means of high reliability, high expansibility, high efficiency and high fault tolerance of a Hadoop and Hive distributed computing platform. The method includes the following steps that data of each data source are analyzed; the data are loaded into a database; HiveQL sentences are received; the received sentences are optimized to obtain a primary map result; the received sentences are converted into a Map Reduce task, the task is executed, and a query result is stored; the data are segmented; the data are analyzed and dug; the data are loaded into a Mysql database. According to massive web log data, precise query and data analysis are achieved, expansibility and effectiveness of storage, query and analysis of the massive data are achieved, and the problem that due to uneven job distribution caused by data skew, overall performance is reduced is avoided.
Owner:北京智融时代信息技术有限公司

Wind turbine malfunction early warning method based on cloud platform

InactiveCN105787584ARealize large-scale data distributed storageFast readForecastingResourcesElectricityMap reduce
The invention discloses a wind turbine malfunction early warning method based on a cloud platform, directed at problems of traditional malfunction early warning patterns in wind turbines, such as limited data storage and transmission, insufficient computing capability and unbalanced computing loads. The method involves a data distributed storage center, a malfunction early warning center, a remote monitoring center, a malfunction early warning algorithm database based on Map-Reduce and a central monitoring chamber. According to the invention, the method can sufficiently conduct data mining on the huge amount of and multi-directional monitoring data of wind turbines, and at the same time provides early stage malfunction early warning services to a plurality of wind fields. The method of the invention realizes large scale data distributed storage and remote rapid reading, performs trend analysis, service life estimation and data mining by using omnibearing states monitoring data of the wind turbines, and realizes automatic early stage malfunction early warning of the wind turbines. The method is characterized by automatic identification, smart control, convenience and speediness, high efficiency, and low cost.
Owner:NORTH CHINA ELECTRIC POWER UNIV (BAODING)

Dynamic user behavior-based cloud forensics method and dynamic user behavior-based cloud forensics system

The invention discloses a dynamic user behavior-based cloud forensics method and a dynamic user behavior-based cloud forensics system; the formalized definition of dynamic user behavior is put forward as basis; and the method comprises the steps of: collecting dynamic user behavior and behavior data, and storing the behavior data as primary evidence data; carrying out data integration, data cleaning and data mining to form forensic analysis data, and storing the forensic analysis data into a data analysis library (key value database); using Map Reduce to carry out correlation analysis, sequence pattern analysis and isolated point analysis on the forensic analysis data, and digging out the potential user behavior patterns and the possible aggressive behavior to form forensic evidence; and displaying the forensic evidence in a visual way. The high performance computing power of cloud computing and large-scale distributed-memory environment are applied into computer forensic analysis, so that various problems in cloud computing forensics can be solved.
Owner:INSPUR GROUP CO LTD

Similar trajectory mining method and device on basis of massive license plate identification data

The invention discloses a similar trajectory mining method and a similar trajectory mining device on the basis of massive license plate identification data. The method comprises three main steps of trajectory organization and screening, point escort relationship calculation and trajectory similarity judgment. According to the similar trajectory mining method and the similar trajectory mining device on the basis of massive license plate identification data, the problem of lagging of calculation of response time under massive data sets is solved; calculation accuracy is improved on the basis of analysis on the license plate identification data; due to use of a Hadoop Map Reduce distributed processing mode, calculation efficiency is improved; similar trajectories are efficiently and rapidly mined; the similar trajectory mining method and the similar trajectory mining device can be used for finding escort vehicles in the field of the traffic service.
Owner:NORTH CHINA UNIVERSITY OF TECHNOLOGY

Power-efficient nested map-reduce execution on a cloud of heterogeneous accelerated processing units

An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.
Owner:ADVANCED MICRO DEVICES INC

Locality-aware resource allocation for cloud computing

Computing resource allocation for map-reduce job execution comprises determining the volume of input data to the map-phase and the reduce-phase of a map-reduce job prior to execution. Based on said determination, data blocks and virtual machines (VMs) are selectively placed for locality aware map-reduce job execution on a cluster of computing nodes in a network. Selectively placing data blocks and VMs comprises integrally placing the data and the VMs at selected nodes to lower data transfer network hops for a map-phase and a shuffle-phase of the map-reduce job upon execution by the VMs.
Owner:IBM CORP

Vehicle detection method based on GPU (ground power unit) multi-core parallel acceleration

The invention discloses a vehicle detection method based on GPU (ground power unit) multi-core parallel acceleration. The method comprises the following steps including computer visual sense and characteristic extraction, target similarity detection and Map-Reduce parallel calculation framework. The method has the beneficial effects that through the Map-Reduce GPU parallel calculation, the HOG (histograms of oriented gradients) characteristic extraction algorithm efficiency is improved, the time required by the vehicle detection is obviously shortened, and the method can be used in the field of automatic intelligent traffic and urban management.
Owner:XIDIAN UNIV

RDF data storage and query method combined with star figure coding

InactiveCN104462609AReduce the number of query tasksReduce the number of intermediate resultsSemi-structured data indexingSpecial data processing applicationsRelevant informationMap reduce
The invention relates to an RDF data storage and query method combined with star figure coding. The RDF data storage and query method comprises the steps that S1, RDF data are preprocessed, and the RDF data are presented in an RDF data map mode; S2, an input SPARQL query statement is presented in an SPARQL query graph mode, and query decomposition is carried out; S3, the SPARQL query statement is preprocessed, and the task number of whole query, the connecting sequence of query star sub-nodes and relevant information of the query star sub-nodes are obtained; S4, the SPARQL query statement is executed, query connection planning is carried out, a Map Reduce parallel computation frame of Hadoop is adopted, and the number of times of starting a query task Job is decided according to the relevance of the SPARQL query statement; S5, subgraph query is carried out, and a Map function is adopted; S6, a result connecting algorithm is carried out, and a Reduce function is adopted. Due to the fact that a Hash coding index query strategy based on star configuration is adopted, stored data redundancy and the number of query tasks are reduced, and query efficiency is improved.
Owner:FUZHOU UNIV

Debugging a map reduce application on a cluster

A method, apparatus, system, article of manufacture, and data structure provide the ability to debug a map-reduce application on a cluster. A cluster of two or more computers is defined by installing a map-reduce framework (that includes an integrated development environment [IDE]) onto each computer. The cluster is formatted by identifying and establishing communication between each computer so that the cluster functions as a unit. Data is placed into the cluster. A function to be executed by the framework on the cluster is obtained, debugged, and executed directly on the cluster using the IDE and the data in the cluster.
Owner:MYSPACE LLC

System and method for finding connected components in a large-scale graph

An improved system and method for finding connected components in a large-scale graph is provided. In a map-reduce framework, subsets of a collection of edges for unique vertices may be distributed to several mappers. Connected components of subgraphs represented by each subset of edges may be computed by each mapper. Then the sets of edges for connected components of subgraphs may be sorted by vertex. The sets of edges representing connected components of subgraphs may be distributed to one or more reducers to find maximal sets of weakly connected components of the large-scale graph. The sorted sets of edges for each vertex representing the maximal sets of connected components for subgraphs may be merged by a reducer to identify maximal sets of connected components of a graph, and the maximal sets of connected components of a graph may be output.
Owner:OATH INC

Double map reduce distributed computing framework

A method, apparatus, system, article of manufacture, and data structure provide the ability to perform a sorted map-reduce job on a cluster. A cluster of two or more computers is defined by installing a map-reduce framework onto each computer and formatting the cluster by identifying the cluster computers, establishing communication between them, and enabling the cluster to function as a unit. Data is placed into the cluster where it is distributed so that each computer contains a portion of the data. A first map function is performed where each computer sorts their respective data and creates an abstraction that is a representation of the data. The abstractions are exchanged and merged to create complete abstraction. A second map function searches the complete abstraction to redistribute and exchange the data across the computers in the cluster. A reduce function is performed in parallel to produce a result.
Owner:MYSPACE LLC

Adjacent sorting repetition-reducing method based on Map-Reduce and segmentation

ActiveCN102163226AEfficient Field Matching MethodEfficient deduplication methodSpecial data processing applicationsMap reduceMatching methods
The present invention discloses an adjacent sorting repetition-reducing method based on Map-Reduce and segmentation. On the basis of adopting an SNM method under a Map-Reduce distributed framework of Hadoop, the adjacent sorting repetition-reducing method solves the problem that a large number of repetitive data exist when information is extracted with information extraction technology, and the data are designed to be processed in a distributed way, the similarity degree between records is calculated by field matching method to judge whether the records are repetitive, thereby increasing the whole repetition-reducing operating efficiency.
Owner:ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products