PANDAcap – SSH Honeypot Dataset
Stamatogiannakis, Manolis and Bos, Herbert and Groth, Paul

folder eurosec2020-pandacap-dataset (194 files)
filepcap/pandahoney.0003.pcap 75.48kB
fileREADME-vm.md 3.86kB
fileREADME.md 4.97kB
filechecksums.sha256 17.54kB
filepcap/pandahoney.0000.pcap 26.15MB
filepcap/pandahoney.0002.pcap 13.76kB
fileubuntu16-planb-kernelinfo.conf 1.57kB
filerr/pandahoney.0064.tar.gz 217.95MB
filerr/pandahoney.0063.tar.gz 244.17MB
filerr/pandahoney.0062.tar.gz 247.46MB
filerr/pandahoney.0060.tar.gz 230.80MB
filerr/pandahoney.0059.tar.gz 234.19MB
filerr/pandahoney.0058.tar.gz 198.92MB
filerr/pandahoney.0057.tar.gz 196.20MB
filerr/pandahoney.0056.tar.gz 195.06MB
filerr/pandahoney.0055.tar.gz 257.75MB
filerr/pandahoney.0054.tar.gz 181.30MB
filerr/pandahoney.0053.tar.gz 181.62MB
filerr/pandahoney.0052.tar.gz 221.74MB
filerr/pandahoney.0051.tar.gz 194.42MB
filerr/pandahoney.0050.tar.gz 171.45MB
filerr/pandahoney.0049.tar.gz 546.20MB
filerr/pandahoney.0048.tar.gz 1.15GB
filerr/pandahoney.0047.tar.gz 194.67MB
filerr/pandahoney.0046.tar.gz 257.06MB
filerr/pandahoney.0045.tar.gz 194.11MB
filerr/pandahoney.0044.tar.gz 202.97MB
filerr/pandahoney.0043.tar.gz 217.82MB
filerr/pandahoney.0042.tar.gz 194.48MB
filerr/pandahoney.0041.tar.gz 193.60MB
filerr/pandahoney.0040.tar.gz 216.21MB
filerr/pandahoney.0039.tar.gz 181.40MB
filerr/pandahoney.0038.tar.gz 253.55MB
filerr/pandahoney.0037.tar.gz 197.47MB
filerr/pandahoney.0036.tar.gz 189.33MB
filerr/pandahoney.0035.tar.gz 209.45MB
filerr/pandahoney.0034.tar.gz 256.70MB
filerr/pandahoney.0033.tar.gz 202.34MB
filerr/pandahoney.0032.tar.gz 235.45MB
filerr/pandahoney.0031.tar.gz 239.23MB
filerr/pandahoney.0030.tar.gz 267.30MB
filerr/pandahoney.0029.tar.gz 194.83MB
filerr/pandahoney.0028.tar.gz 249.62MB
filerr/pandahoney.0027.tar.gz 234.95MB
filerr/pandahoney.0026.tar.gz 201.20MB
filerr/pandahoney.0025.tar.gz 253.91MB
filerr/pandahoney.0024.tar.gz 179.79MB
filerr/pandahoney.0023.tar.gz 245.91MB
filerr/pandahoney.0022.tar.gz 190.09MB
Too many files! Click here to view them all.
Type: Dataset
Tags: Dataset, PANDA, record and replay, docker, honeypot

Bibtex:
@article{,
title= {PANDAcap – SSH Honeypot Dataset},
journal= {},
author= {Stamatogiannakis, Manolis and Bos, Herbert and Groth, Paul},
year= {},
url= {https://github.com/vusec/pandacap},
abstract= {# PANDAcap – SSH Honeypot Dataset

## Overview
This is a dataset of **63 [PANDA][panda] traces**, collected using the
[PANDAcap][pandacap] framework.
The dataset aims to offer a starting point for the analysis of *ssh
brute force attacks*.
The traces were collected through the  course of approximately 3 days
from 21 to 23 February 2020.
A VM was configured using PANDAcap so that it accepts all passwords for
user `root`. When an ssh session starts for the user, PANDA is signaled
by the [recctrl plugin][recctrl] to start recording for 30'.

You can read more details about the experimental setup and an overview
of the dataset **EuroSec 2020** publication.

---------------------------------------------------------------------

[1] Manolis Stamatogiannakis, Herbert Bos, and Paul Groth.
PANDAcap: A Framework for Streamlining Collection of Full-System Traces.
In *Proceedings of the 13th European Workshop on Systems Security*,
EuroSec '20, Heraklion, Greece, April 2020.
doi: [10.1145/3380786.3391396][eurosec20-doi],
preprint: [vusec.net][eurosec20-preprint]

---------------------------------------------------------------------

## Dataset layout
The dataset is split in 3 zip files/directories:
* **rr**: Contains the 63 PANDA traces of the dataset. The traces are in the
  upcoming RRArchive format. Note that PANDA support for the format is still
  wip at the time of writing (April 2020). If you need to downgrade to the
  traditional PANDA trace format, you can use the snippet we provide below.
* **qcow**: Contains the QCOW base image (`ubuntu16-planb.qcow2`) used to create
  the dataset, as well as the disk deltas for the 63 traces. These can be mounted
  to inspect the contents of the filesystem before and after each session.
  and disk deltas for the 63 traces. Quick instructions on how to mount
  and inspect a QCOW image can be found below.
* **pcap**: Contains the pcap network traces for the sessions in the PANDA traces.
  These have been extracted using the PANDA [network plugin][network]. We decided
  to also include them in the dataset as standalone files for convenience.

Additionally, we provide the PANDA linux kernel profile `ubuntu16-planb-kernelinfo.conf`,
which can be used to analyze the traces using the PANDA [osi_linux plugin][osi_linux].

If you wish to reuse the VM image in your project, it is also available as a standalone
download through [academictorrents.com][at-vm-url], along with more detailed information
on its contents.

## Handy snippets

### Convert traces to traditional PANDA format
From inside the `rr` directory, run:

```bash
for f in *.tar.gz; do
    tar -zxvf "$f" --exclude=PANDArr --xform='s%/%-%' --xform='s%-metadata%%'
    rm -f "$f"
done
```

### Mounting a QCOW image
Run the following as root:
```bash
modprobe nbd max_part=69
qemu-nbd -c /dev/nbd0 ./ubuntu16-planb.qcow2
mount /dev/nbd0p1 ./mnt
# ...do some work...
umount mnt
qemu-nbd -d /dev/nbd0
```

[a-trace-convert]: #convert-traces-to-traditional-panda-format
[a-qcow-mount]: #mounting-a-qcow-image
[at-vm-url]: https://academictorrents.com/details/39df3904460e909e175434cbd87764b8c487891d
[eurosec20-doi]: https://doi.org/10.1145/3380786.3391396
[eurosec20-preprint]: https://www.vusec.net/publications/#stamatogiannakis-bos-groth-pandacapaframeworkforstreamliningcollectionoffullsystemtraces-2020
[eurosec20-www]: https://www.concordia-h2020.eu/eurosec-2020/
[osi_linux]: https://github.com/panda-re/panda/tree/master/panda/plugins/osi_linux
[panda]: https://github.com/panda-re/panda
[pandacap]: https://github.com/vusec/pandacap
[qcow]: https://en.wikipedia.org/wiki/Qcow
[qcow-cheat]: https://github.com/vusec/pandacap/blob/master/docs/cheatsheet.md#working-with-qcow2raw-images
[network]: https://github.com/panda-re/panda/tree/master/panda/plugins/network
[recctrl]: https://github.com/panda-re/panda/tree/master/panda/plugins/recctrl
[recctrlu]: https://github.com/panda-re/panda/tree/master/panda/plugins/recctrl/utils},
keywords= {Dataset, PANDA, record and replay, docker, honeypot},
terms= {Data are shared in accordance to the Creative Commons Attribution 4.0 International license.

For the included VM IMAGE, the following apply. The VM IMAGE is a COLLECTION of various open-source components, shared for research purposes. The VM IMAGE is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors of the VM IMAGE or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the  VM IMAGE or the use or other dealings in the VM IMAGE. NO ASSERTIONS are made on the copyright and licensing terms of the open-source components included in the VM IMAGE.},
license= {Creative Commons Attribution 4.0 International},
superseded= {}
}


Send Feedback