datasets

data1 data2 data3

[data1]Based on the mobilegt, we collected the traffic generated by mobile apps on monitored smart phones. The collected mobile traffic data include active MTD (AMTD) and passive MTD(PMTD).

The characteristics of shared data are as following. 1) The traffic are grouped into flows and popular flow statistical features are extracted from raw traffic, and the data could be directly used to evaluate the performance of machine learning techniques on classifying multiple apps. 2) Passive traffic data (background traffic) were separately collected for passive and active (foreground traffic) traffic classification research.

Data download url (hosts on github)

data name data description
activedata biFeatureData.arff(10.8MB)&uniFeatureData.arff(19MB)
passsivedata bipassivedata.arff(887KB)&unipassivedata.arff(1.8MB)

Data download url (hosts on baidu)

data name passwd(提取码) data description
activedata 4rvh biFeatureData.arff(10.8MB)&uniFeatureData.arff(19MB)
passsivedata rrfq bipassivedata.arff(887KB)&unipassivedata.arff(1.8MB)
Table 1: Distribution of mobile traffic data
Apps Flows Packets Bytes
QQ 17,104 3,213k 2.30GB
WeChat 13,631 807k 0.57GB
Facebook 1,400 440k 0.29GB
Weibo 25,407 6,086k 4.75GB
Youku 5,825 1,544k 1.09GB
TencentVideo 1,593 382k 0.29GB
MgTV 14,046 5,021k 3.57GB
Browser 25,512 2,068k 1.30GB
JdShop 4,008 323k 0.20GB
VipShop 4,577 503k 0.28GB
QQMail 1,432 25k 0.01GB
YahooMail 3,485 143k 0.07GB

[data2]Mobile network traffic engineering requires benchmark traffic data. We shared a set of labeled mobile traffic data based on our presented mobilegt. The mobile traffic were generated by volunteers’ mobile devices on which mobilegt runs. The mobile traffic can be labeled in 100% accuracy based on the TCP/UDP session information of each traffic flow collected by mobilegt. On the mobile traffic, we found that 1) most of traffic utilize http as application protocol, and these traffic are generally label as web by traditional traffic label method; 2) social and web traffic account a large amount of flows as popular usages of social and web apps; 3) when performing several machine learning techniques on the mobile traffic data, random forest performs the best among.

Data download url (hosts on baidu)

data name passwd(提取码) data description
day01-day05 bg76 day01-day05.zip
day06-day10 wai6 day06-day10.zip
day11-day15 aqji day11-day15.zip
day16-day20 pkuq day16-day20.zip
day21-day25 sb2c day21-day25.zip
day26-day30 2f6w day26-day30.zip

[data3]App级流量数据和功能级流量数据,利用了两个不同的特征集合(单项流统计特征、双向流统计特征)描述网络流数据集。 在App级粒度方面,数据的采集过程为,在客户端安装Mobilegt的客户端程序,启动此程序连接服务器端,在连接的情况下,每个App运行20分钟以上,采集过程中任意使用App的多项功能。Mobilegt服务器端会根据Socket信息进行数据标记,直接可以从Mobilegt的输出获取网络流的App标签,从而建立实验数据集。 在功能级粒度方面,由于同一个应用可能具有不同的功能,比如微信可以进行文字聊天也可以视频聊天以及语音对话。而不同功能产生的数据所包含的行为特征可能有所不同,这可能会削弱机器学习算法在这些数据上的性能。为了比较研究机器学习算法在功能级的流量数据上的分类性能,本文进一步采集了功能粒度标记的流量数据,每种功能可以由不同的App实现,为此本文为每种功能级采集了不同App产生的流量。此实验数据的采集过程为:基于Mobilegt系统,在客户端程序连接服务器的情况下,使用App的某个特定功能,执行一段时间(20分钟以上),采集此功能级的数据,将此段时间采集的数据手工标记为特定的功能。由于需要手动标记,这里选择的应用相对较少。如何自动采集功能级的移动流量数据将作为未来研究工作。

Data download url (hosts on github)

data name data description
app App-BiFlowFeatures.arff(2.6MB)&App-UniFlowFeatures.arff(4.6MB)
behavior behavior-BiFlowFeatures.arff(445KB)&behavior-UniFlowFeatures.arff(806KB)

papers:
Zhen Liu, Ruoyu Wang, Deyu Tang, Peng Yang. A System for Linking Ground Truth to Mobile Network Traffic. MOBIQUITOUS 2016
Zhen Liu, Ruoyu Wang, and Deyu Tang. Extending labeled mobile network traffic data by three levels traffic identification fusion. Future Generation Computer Systems (2018).
Zhen Liu, Ruoyu Wang, Yongming Cai, Deyu Tang, Jin Yang, Zhao Yang. Benchmark Data for Mobile App Traffic Research. MOBIQUITOUS 2018