<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:academictorrents="http://academictorrents.com/" version="2.0">
<channel>
<title>MLNet-alpha - Academic Torrents</title>
<description>collection curated by iago27</description>
<link>https://academictorrents.com/collection/mlnet-alpha</link>
<item>
<title>Reddit comments/submissions 2023-02 (Dataset)</title>
<description>@article{,
title= {Reddit comments/submissions 2023-02},
journal= {},
author= {stuck_in_the_matrix and Watchful1},
year= {},
url= {},
abstract= {Reddit comments and submissions from 2023-02 collected by pushshift which can be found here https://files.pushshift.io/reddit/

These are zstandard compressed ndjson files. Example python scripts for parsing the data can be found here https://github.com/Watchful1/PushshiftDumps},
keywords= {reddit},
terms= {},
license= {},
superseded= {https://academictorrents.com/details/ba051999301b109eab37d16f027b3f49ade2de13}
}

</description>
<link>https://academictorrents.com/download/9971c68d2909843a100ae955c6ab6de3e09c04a1</link>
</item>
<item>
<title>Reddit comments/submissions 2023-01 (Dataset)</title>
<description>@article{,
title= {Reddit comments/submissions 2023-01},
journal= {},
author= {stuck_in_the_matrix and Watchful1},
year= {},
url= {},
abstract= {Reddit comments and submissions from 2023-01 collected by pushshift which can be found here https://files.pushshift.io/reddit/

These are zstandard compressed ndjson files. Example python scripts for parsing the data can be found here https://github.com/Watchful1/PushshiftDumps},
keywords= {reddit},
terms= {},
license= {},
superseded= {https://academictorrents.com/details/ba051999301b109eab37d16f027b3f49ade2de13}
}

</description>
<link>https://academictorrents.com/download/c861d265525c488a9439fb874bd9c3fc38dcdfa5</link>
</item>
<item>
<title>Subreddit comments/submissions 2005-06 to 2022-12 (Dataset)</title>
<description>@article{,
title= {Subreddit comments/submissions 2005-06 to 2022-12},
journal= {},
author= {Watchful1},
year= {},
url= {https://www.reddit.com/r/pushshift/comments/11ef9if/separate_dump_files_for_the_top_20k_subreddits/},
abstract= {This is the top 20,000 subreddits from reddit's history in separate files. You can use your torrent client to only download the subreddit's you're interested in.

These are from the pushshift dumps from 2005-06 to 2022-12 which can be found here https://academictorrents.com/details/7c0645c94321311bb05bd879ddee4d0eba08aaee

These are zstandard compressed ndjson files. Example python scripts for parsing the data can be found here https://github.com/Watchful1/PushshiftDumps},
keywords= {reddit},
terms= {},
license= {},
superseded= {https://academictorrents.com/details/56aa49f9653ba545f48df2e33679f014d2829c10}
}

</description>
<link>https://academictorrents.com/download/c398a571976c78d346c325bd75c47b82edf6124e</link>
</item>
<item>
<title>Wallstreetbets submissions/comments (Dataset)</title>
<description>@article{,
title= {Wallstreetbets submissions/comments},
journal= {},
author= {Watchful1},
year= {},
url= {},
abstract= {All submissions and comments in r/wallstreetbets from the creation of the subreddit through June 2021. Extracted from the pushshift dump files: https://academictorrents.com/details/90e7a746b1c24e45af0940b37cffcec7c96c8096

An example python script for iterating over the lines in these dumps is here: https://github.com/Watchful1/PushshiftDumps/blob/master/scripts/single_file.py

If you are interested in a similar file for another subreddit, feel free to DM u/Watchful1 on reddit},
keywords= {reddit, wallstreetbets},
terms= {},
license= {},
superseded= {https://academictorrents.com/details/cd25c332d18ad7cc6d1ef4e84eab151d4d6c1f4d}
}

</description>
<link>https://academictorrents.com/download/098cbcf9712a8747b89f7e235dae41431fd57f7e</link>
</item>
<item>
<title>Wikipedia Training Data for Megatron-LM (Dataset)</title>
<description>@article{,
title= {Wikipedia Training Data for Megatron-LM},
journal= {},
author= {},
year= {},
url= {},
abstract= {A preprocessed dataset for https://github.com/NVIDIA/Megatron-LM training. Please see instructions in https://github.com/Lyken17/ML-Datasets for how to use it.

Note: the author does not own any copyrights of the data. },
keywords= {BERT; NLP;},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/b6215a898a2a08b6061d23f2e4e1094121fb7082</link>
</item>
<item>
<title>US Stock Market End of Day dataset (Dataset)</title>
<description>@article{,
title= {US Stock Market End of Day dataset},
keywords= {Stock Market High Low Close Open Volume Ticker Date},
journal= {},
author= {Atreyuroc},
year= {},
url= {},
license= {},
abstract= {4974 Stock Symbols End of day data. Includes close open high low volume and date. Data was collected from Google finance public data. 

```
+----------+------------+
| Table    | Size in MB |
+----------+------------+
| surf_eod |    1109.00 |
+----------+------------+
1 row in set (0.00 sec)

mysql&gt; SELECT COUNT(DISTINCT(`ticker`)) FROM surf_eod;
+---------------------------+
| COUNT(DISTINCT(`ticker`)) |
+---------------------------+
|                      4974 |
+---------------------------+
1 row in set (6.31 sec)

mysql&gt; describe surf_eod;
+--------+-------------+------+-----+-------------------+-----------------------------+
| Field  | Type        | Null | Key | Default           | Extra                       |
+--------+-------------+------+-----+-------------------+-----------------------------+
| ticker | varchar(10) | YES  | MUL | NULL              |                             |
| date   | date        | YES  |     | NULL              |                             |
| close  | varchar(20) | YES  |     | NULL              |                             |
| high   | varchar(20) | YES  |     | NULL              |                             |
| low    | varchar(20) | YES  |     | NULL              |                             |
| open   | varchar(20) | YES  |     | NULL              |                             |
| volume | varchar(20) | YES  |     | NULL              |                             |
| time   | timestamp   | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------+-------------+------+-----+-------------------+-----------------------------+
8 rows in set (0.04 sec)

mysql&gt; SELECT COUNT(*) FROM surf_eod;
+----------+
| COUNT(*) |
+----------+
| 17726722 |
+----------+
1 row in set (25.18 sec)
```},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/c5a49e46249fef6a3219919fef96fd0265da4d3a</link>
</item>
<item>
<title>Arizona State University Twitter Data Set  (Dataset)</title>
<description>@article{,
title= {Arizona State University Twitter Data Set },
journal= {},
author= {R. Zafarani and H. Liu},
year= {2009},
institution= {Arizona State University, School of Computing, Informatics and Decision Systems Engineering},
url= {http://socialcomputing.asu.edu/datasets/Twitter},
abstract= {Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one neat and simple package. It's a new and easy way to discover the latest news related to subjects you care about.

|Attribute|Value|
|-|-|
|Number of Nodes: |11316811|
|Number of Edges: |85331846|
|Missing Values? |no|
|Source:| N/A|

##Data Set Information:

1. nodes.csv
-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains
all the node ids used in the dataset

2. edges.csv
-- this is the friendship/followership network among the users. The friends/followers are represented using edges. Edges are directed. 

Here is an example. 

1,2

This means user with id "1" is followering user with id "2".


##Attribute Information:

Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one neat and simple package. It's a new and easy way to discover the latest news related to subjects you care about.},
keywords= {ASU, Twitter, Social, Graph},
terms= {}
}

</description>
<link>https://academictorrents.com/download/2399616d26eeb4ae9ac3d05c7fdd98958299efa9</link>
</item>
</channel>
</rss>
