datasets
data1 data2 data3
[data1]Based on the mobilegt, we collected the traffic generated by mobile apps on monitored smart phones. The collected mobile traffic data include active MTD (AMTD) and passive MTD(PMTD).
>
- AMTD
The AMTD were collected through deploying the mgtClient on 10 volunteers’ smart phones and the mgtServer on a remote server during October, 2016 to March, 2017. Volunteers launched mgtClient when they would like to share their data. During data collection, they used popular apps as usual, such asWeChat,Weibo etc. As a whole, the dataset is made up of 4050 traffic traces,and per trace covers about 16mins on average.
- PMDT
The PMDT were collected during May 7 to May 24, 2018 by deploying mgtClient on a smart phone.We installed the apps shown in Table 1 on the smart phone, launch these apps. And then, we start the mgtClient to collect the session information and the mgtServer to collect the raw traffic data. And then, we would not use the
smart phone. These apps were run in background, so as to collect the passive traffic data. As a whole, the dataset is made up of 677 traffic traces, and per trace covers about 27mins on average. We finally got 9617 passive flows. These flows could be combined with above active flows to evalaute the performance of methods on
identifying passive traffic.
The characteristics of shared data are as following. 1) The traffic are grouped into flows and popular flow statistical features are extracted from raw traffic, and the data could be directly used to evaluate the performance of machine learning techniques on classifying multiple apps. 2) Passive traffic data (background traffic) were separately collected for passive and active (foreground traffic) traffic classification research.
Data download url (hosts on github)
data name |
data description |
activedata |
biFeatureData.arff(10.8MB)&uniFeatureData.arff(19MB) |
passsivedata |
bipassivedata.arff(887KB)&unipassivedata.arff(1.8MB) |
Data download url (hosts on baidu)
data name |
passwd(提取码) |
data description |
activedata |
4rvh |
biFeatureData.arff(10.8MB)&uniFeatureData.arff(19MB) |
passsivedata |
rrfq |
bipassivedata.arff(887KB)&unipassivedata.arff(1.8MB) |
Table 1: Distribution of mobile traffic data
Apps |
Flows |
Packets |
Bytes |
QQ | 17,104 | 3,213k | 2.30GB |
WeChat | 13,631 | 807k | 0.57GB |
Facebook | 1,400 | 440k | 0.29GB |
Weibo | 25,407 | 6,086k | 4.75GB |
Youku | 5,825 | 1,544k | 1.09GB |
TencentVideo | 1,593 | 382k | 0.29GB |
MgTV | 14,046 | 5,021k | 3.57GB |
Browser | 25,512 | 2,068k | 1.30GB |
JdShop | 4,008 | 323k | 0.20GB |
VipShop | 4,577 | 503k | 0.28GB |
QQMail | 1,432 | 25k | 0.01GB |
YahooMail | 3,485 | 143k | 0.07GB |
[data2]Mobile network traffic engineering requires benchmark traffic data. We shared a set of labeled mobile traffic data based on our presented mobilegt. The mobile traffic were generated by volunteers’ mobile devices on which mobilegt runs. The mobile traffic can be labeled in 100% accuracy based on the TCP/UDP session information of each traffic flow collected by mobilegt. On the mobile traffic, we found that 1) most of traffic utilize http as application protocol, and these traffic are generally label as web by traditional traffic label method; 2) social and web traffic account a large amount of flows as popular usages of social and web apps; 3) when performing several machine learning techniques on the mobile traffic data, random forest performs the best among.
Data download url (hosts on baidu)
[data3]App级流量数据和功能级流量数据,利用了两个不同的特征集合(单项流统计特征、双向流统计特征)描述网络流数据集。
在App级粒度方面,数据的采集过程为,在客户端安装Mobilegt的客户端程序,启动此程序连接服务器端,在连接的情况下,每个App运行20分钟以上,采集过程中任意使用App的多项功能。Mobilegt服务器端会根据Socket信息进行数据标记,直接可以从Mobilegt的输出获取网络流的App标签,从而建立实验数据集。
在功能级粒度方面,由于同一个应用可能具有不同的功能,比如微信可以进行文字聊天也可以视频聊天以及语音对话。而不同功能产生的数据所包含的行为特征可能有所不同,这可能会削弱机器学习算法在这些数据上的性能。为了比较研究机器学习算法在功能级的流量数据上的分类性能,本文进一步采集了功能粒度标记的流量数据,每种功能可以由不同的App实现,为此本文为每种功能级采集了不同App产生的流量。此实验数据的采集过程为:基于Mobilegt系统,在客户端程序连接服务器的情况下,使用App的某个特定功能,执行一段时间(20分钟以上),采集此功能级的数据,将此段时间采集的数据手工标记为特定的功能。由于需要手动标记,这里选择的应用相对较少。如何自动采集功能级的移动流量数据将作为未来研究工作。
Data download url (hosts on github)
data name |
data description |
app |
App-BiFlowFeatures.arff(2.6MB)&App-UniFlowFeatures.arff(4.6MB) |
behavior |
behavior-BiFlowFeatures.arff(445KB)&behavior-UniFlowFeatures.arff(806KB) |