• 其他栏目

    邵俊明

    • 教授 博士生导师
    • 性别:男
    • 毕业院校:慕尼黑大学
    • 学历:博士研究生毕业
    • 学位:理学博士学位
    • 在职信息:在岗
    • 所在单位:计算机科学与工程学院(网络空间安全学院)
    • 电子邮箱:
    访问量:

    开通时间:..

    最后更新时间:..

    Feedback of SyncTree

      
    发布时间:2017-07-21   点击次数:

    Feedback (KDD’2015, paper ID: 268)

    Title: Synchronization-based Clustering on Evolving Data Streams

    First, we thank the reviewers for their positive comments and constructive suggestions. The main criticism was that we could have benchmarked better our proposed algorithm (i.e. SyncTree). Therefore, we provide here supplementary materials for supporting the feedback of reviews. Thanks for spending a substantial amount of time looking over it in advance.

    Here, we provide two experiments for better answering the questions of reviewers.

    Exp. 1. The ability to handle concept drift.

    Experimental setup: Here a simple fictitious synthetic data is generated, where the first 1000 points are formed as a Gaussian cluster (Fig. 1a), and three new emering clusters with different shapes are produced with later 200 points (Fig 1b. T2). We check whether different algorithms allows handling the emerging concepts or not. Fig. 1c- 1f plot the micro-clusters stored for the last 200 points. Here we can observe that CF-based algorithms actually difficult to handle the evolving clusters as most instances are wrongly grouped into micro-clusters. In contrast, SyncTree seems good.

    (a) DS (T1) (b) DS (T2)
    (c) CluStream (microclusters)
    (d) DenStream (microclusters)
    (e) SyncTree (microclusters) (f) ClusTree (microclusters)

    Exp. 2. The memory costs of SyncTree and comparing algorithms.


    SyncTree ClusTree CluStream DenStream
    Spam 70.897m 70.983m 398.820m 185.303m
    Electricity 34.696m 38.563m 33.423m 65.571m
    NWeather 89.889m 33.423m 34.708m 28.283m
    Covtype 57.484m 537.556m 455.347m 136.142m
    Sensor 265.824m 511.837m 214.144m 67.619m

    The source code of SyncTree and comparing algorithms. Code