数据清洗 meaning in English
data cleaning
Examples
- 3 the concept of equivalence matrix , which expresses equivalence relation in rough set information system , is introduced ; the relations between equivalence matrix and equivalence classes are discussed . the algorithms for data cleaning and rules extraction in knowledge system based on matrix computation are proposed and their complexity of computation is analyzed
3 、在等价矩阵概念的基础上,分析了粗糙集知识系统中等价划分与等价摘要矩阵的关系,采用等价矩阵来表示粗糙集的等价关系,提出了一种对数据库知识系统进行数据清洗以及从中提取决策规则的矩阵算法,分析了该算法的计算复杂性。 - To solve some existed problems in data mining , the thesis gives out a few resolutions with the new mathematical tool . information theory and multiple statistics are introduced into rough analysis together with rough set theory and other techniques , new results are giving for knowledge discovering , associative rules mining , pattern classification and data cleaning , etc . after a brief summary on data mining and rough set theory , the research works in the thesis can be descript as follows : 1
Rough集理论是一种新型的处理不确定性知识的数学工具,围绕着数据挖掘领域存在的问题,本文利用rough集理论与rough分析工具,提出若干解决方案,同时在具体处理问题过程中引入了信息理论、因子分析等方法,与rough分析结合使用,讨论了rough集技术在知识发现、关联规则挖掘、模式分类以及数据清洗等问题中的应用。 - Then the thesis further analyses some core techniques including the system of database , data warehouse and data mining and so on , and presents the frame of function of bank crm . the thesis puts its emphasis on the research on the data preprocessing of data warehouse , data copying , data cleansing , data integration and quality verifying included . finally the thesis discusses the key technology of data warehouse in bank crm - the cleansing of data of customers , and presents some methods of cleansing aiming at noisy values , missing values , conflicting values and duplicated values
本文在充分分析银行crm的需求的基础上,提出了基于数据仓库的银行crm系统的体系结构,并进一步分析了该体系结构中客户数据库系统、数据仓库、数据挖掘等核心技术组件的内涵,给出了银行crm系统的功能构架;重点研究了银行业务系统多年积累的客户数据向数据仓库迁移的预处理方法和过程,其过程包括数据复制、数据清洗转换、数据集成、质量检验和数据装载;最后讨论了银行crm系统应用数据仓库的关键技术:客户数据清洗,给出了针对噪声数据、空缺数据、不一致数据和重复数据的清洗方法。 - On the basis of analyzing current problems existing in data cleaning , especially after abundant researching on exploring and eliminating approximately duplicated records , this paper brings forward record matching method and eliminating approximately duplicated records method based on rdbms , expecting to eliminate approximately duplicated records in data warehouse
本文在对当前的数据清洗问题,特别是探测和消除重复记录方面,做了充分的研究后,提出了基于rdbms的记录匹配方法和消除数据仓库中相似重复记录的方法,以期消除数据仓库中的相似重复记录。 - This thesis includes four parts in which the technologies of web usage mininig are systematically researched . in the first part we summarize the techniques of data mining and web usage mining , present the significance of the research on web usage mininig , the status of research and the problem which web usage mininig will face with . in the second part we discuss the web usage mininig according to the process of web mining . in the stage of data preparing and preprocessing we discuss the algorithm of data cleaning , user and session identification in detail , and present a data model of association rules and sequential patterns in the stage of pattern discovery , discuss the useful method of pattern analysis in last stage . a synthesis clustering algorithm cppc is proposed in the third part of this thesis
本文分主要从以下四个方面对web使用挖掘进行了系统的分析和研究。第一是对数据挖掘和web挖掘进行了概述,阐述了web挖掘的意义、研究的现状、面临的问题。第二是讨论了web使用挖掘的三个阶段:在数据准备和预处理阶段重点讨论了数据清洗及用户和会话识别算法;在模式发现阶段定义了关联规则和序列模式的数据模型;模式分析阶段则讨论了现行的几种分析方法。