User: Guest  Login
Title:

A comprehensive dataset of website traffic

Document type:
Forschungsdaten
Publication date:
19.09.2023
Responsible:
Patrick Krämer
Authors:
Krämer, Patrick; Baier, Benedikt; Landerer, Niklas; Griessel, Alexander; Hohlfeld, Oliver; Blenk, Andreas; Mieth, Martin; Kellerer, Wolfgang
Author affiliation:
Technical Universiti of Munich: Krämer, Patrick; Baier, Benedikt; Landerer, Niklas; Griessel, Alexander; Kellerer, Wolfgang
University of Kassel: Hohlfeld, Oliver
Ipoque GmbH - A Rohde&Schwarz Company: Mieth, Martin
Siemens AG: Blenk, Andreas
Publisher:
TUM
Identifier:
doi:10.14459/2023mp1700647
End date of data production:
24.06.2022
Subject area:
DAT Datenverarbeitung, Informatik; KOM Kommunikationswesen; NAT Naturwissenschaften (allgemein)
Resource type:
Experimente und Beobachtungen / experiments and observations
Data type:
Datenbanken / data bases
Other data type:
PCAP (Packet CAPture files) Note: The dataset consists primarily of PCAP files. The files are organizied via an SQL database. The dataset will contain both, the PCAPs and the database, as well as exemplary code showing how to use the database.
Description:
The dataset contains traffic collected for 96 websites located in three popular CDNs. For each CDN, the top 30 websites based on the Alexa Top 1000 ranking were selected. For each website, a random subset of 50 sub-pages was retrieved using a javascript- enabled web-crawler. Thus, the data set consists of 4 800 webpages. Traces of every webpage were collected daily for 70 days starting in April 2021. Every website was accessed with Chromium and the Firefox browser in headless mode. This r...     »
Method of data assessment:
To obtain the dataset, Docker containers were executed on three separate physical machines running Ubuntu 20.04 and used to isolate traffic of webpage access and to maintain equal conditions for all pages. A browser is started from inside the docker container in headless mode. Traffic was traced for 7 s, after which the docker container is terminated. Traffic was collected inside the docker container with the tcpdump utility.
Key words:
TLS, Website Fingerprinting, Webpage Fingerprinting, Security
Technical remarks:
View and download (1,76 TB total, 7075 Files)
The data server also offers downloads with FTP
The data server also offers downloads with rsync (password m1700647):
rsync rsync://m1700647@dataserv.ub.tum.de/m1700647/
Language:
en
Rights:
by, http://creativecommons.org/licenses/by/4.0
 BibTeX