A chinese language model integrating bigram and topic dependency features 的汉语语言建模方法
2.
In statistical attacks on ciphers the known frequency of various bigrams , in a given language , can be used in attempts to break the cipher 在对密码的统计攻击中,可用已知语言的各双字母组的频度来攻破密码。
3.
According to the idea of em , a language model is built increasingly by collection the fractional counts of patterns ( such as bigram pair ) from the augmentations of all the segmentation candidates of a sentence 基于em的思想,每个句子所对应的所有(或一定范围内)的分词结果构成训练集,通过这个训练集和初始的语言模型可以估计出一个新的语言模型。
4.
According to the idea of em , a language model is built increasingly by collection the fractional counts of patterns ( such as bigram pair ) from the augmentations of all the segmentation candidates of a sentence 基于em的思想,每个句子所对应的所有(或一定?围内)的分词结果构成训练集,通过这个训练集和初始的语言模型可以估计出一个新的语言模型。
5.
14 tan c m , wang y f , lee c d . the use of bigrams to enhance text categorization . journal of information processing and management , july 2002 , 38 : 529 - 546 . 15 ruiz m , sinivasan p . neural networks for text categorization 使用我们提出的权值计算方法对每个待测文档和主题模板中的特征频繁项目集进行打分,并以此权值为基础在向量空间模型中计算待测文档和主题模板之间的相似度,最后达到文本分类的目的。
6.
According to the distribution of chinese single - character after word segmentation in chinese text and the conception of " non - multi - character word error " , we proposed a group of rules to find errors in texts , to construct the automatic error - detection model and to implement its algorithm by combining the scattered single - character bigram models , part - of - speech bigram and trigram models 根据正确文本分词后单字词的出现规律以及“非多字词错误”的概念,提出一组错误发现规则,并与针对分词后单字散串建立的字二元、三元统计模型和词性二元、三元统计模型相结合,建立了文本自动查错模型与实现算法。