2007年1月30日 星期二

K論文之自言自語

CHARM: frequent item sets
PrefixSpan:frequent sequence
CloSpan:Closed frequent sequence
BIDE:Closed frequent sequence with more efficeint pruning method.

mining frequent sequences vs. mining frequent item sets
frequent sequence mining has to take ordering of items into consideration. It's more comlicated than frequent item sets mining

PrefixSpan vs. CloSpan
With similar data structure, which is prefix search tree, expands the frequent sequence along the prefix tree.
BUT! CloSpan mines only "Closed" sequence, which means no super sequence with same support. Based on this characteristic, Clospan can prun the search tree more efficiently than prefixspan. (i.e. s1 c s2, and with the same size of projected database, means the subtree of s1 and s2 is exactly same.)

CloSpan vs. BIDE
Both use prefix search tree. But BIDE needs no candicate and filtering process. It uses forward extension event checking and backward extension event checking to exam if a sequence is closed. And uses backscan, a more aggressive pruning method than clospan, to prun the search tree.
why we say backscan is more aggressive? because it needs not to exam the previous mined sequence. Intead, it exams only within current projected database.

沒有留言: