<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:academictorrents="http://academictorrents.com/" version="2.0">
<channel>
<title>Social Networking - Academic Torrents</title>
<description>collection curated by joecohen</description>
<link>https://academictorrents.com/collection/social-networking</link>
<item>
<title>Phishing corpus (Dataset)</title>
<description>@article{,
title= {Phishing corpus},
journal= {},
author= {Vit Listik},
year= {},
url= {},
abstract= {Downloaded at http://monkey.org/~jose/wiki/doku.php?id=phishingcorpus (2015-02-01)},
keywords= {email, phishing, eml, emails},
terms= {},
license= {},
superseded= {}
}

</description>
<link>https://academictorrents.com/download/a77cda9a9d89a60dbdfbe581adf6e2df9197995a</link>
</item>
<item>
<title>Facebook Names Dataset (Dataset)</title>
<description>@article{,
title= {Facebook Names Dataset},
keywords= {},
journal= {},
author= {Ron Bowes (Skull Security)},
year= {2010},
url= {https://blog.skullsecurity.org/2010/return-of-the-facebook-snatchers},
license= {},
abstract= {171 million names (100 million unique)

This torrent contains:

The URL of every searchable Facebook user's profile
The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc)
Processed lists, including first names with count, last names with count, potential usernames with count, etc
The programs I used to generate everything
So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? &gt;:-)

Limitations
So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don't have those capabilities right now. I'd like to tackle that in the future, though, so if anybody has any bandwidth they'd like to donate, all I need is an ssh account and Nmap installed.

An additional limitation is that these are only users whose first characters are from the latin charset. I plan to add non-Latin names in future releases.},
tos= {},
superseded= {},
terms= {}
}

</description>
<link>https://academictorrents.com/download/e54c73099d291605e7579b90838c2cd86a8e9575</link>
</item>
<item>
<title>Lerman Digg 2009 Dataset (Dataset)</title>
<description>@article{,
title= {Lerman Digg 2009 Dataset},
journal= {},
author= {Kristina Lerman },
year= {},
license= {This data is made available to the community for research purposes only},
url= {http://www.isi.edu/~lerman/downloads/digg2009.html},
abstract= {Digg2009 data set contains data about stories promoted to Digg's front page over a period of a month in 2009. For each story, we collected the list of all Digg users who have voted for the story up to the time of data collection, and the time stamp of each vote. We also retrieved the voters' friendship links. The semantics of the friendship links are as follows 
user_id --&gt; friend_id 
means that user_id is watching the activities of (is a fan of) friend_id. User ids have been anonymized, but are unique in the data set: a user with a specific id in the friendship links table and a user with the same id in the votes table correspond to the same actual user.
The data is in zipped csv files that are password protected. The password is digg2009_user.

##Votes

Table digg_votes contains 3,018,197 votes on 3553 popular stories made by 139,409 distinct users. The first vote is from the story's submitter. 

##Schema of the table
|Attribute|Value|
|-|-|
|vote_date: |Unix time stamp of the vote|
|voter_id: |anonymized unique id of the voter|
|story_id: |anonymized unique id of the story|
 
![](http://www.isi.edu/~lerman/downloads/diggs_distribution.jpg)
![](http://www.isi.edu/~lerman/downloads/voting_distribution.png)

(left) Distribution of votes (diggs) per story. An outlier with more than 24,000 votes is not shown. (right)Distribution of the number of votes (diggs) made by users.

##Friendship links

Table digg_friends contains 1,731,658 friendship links of 71,367 distinct users. Voters who do not appear in the table did not specify any friends at the time data was collected. 

Schema of the digg_friends table

|Attribute|Value|
|-|-|
|mutual: |indicated whether the link represents a mutual friend relation (1) or not (0)|
|friend_date: |Unix time stamp of when the friendship link was created|
|user_id: |anonymized unique id of a user|
|friend_id: |anonymized unique id of a user|
 
![](http://www.isi.edu/~lerman/downloads/fans_distribution.png)

Distribution of the number of fans per user.

Empirical characterization of this data is described in 

Lerman, K., and Ghosh, R. (2010) "Information Contagion: an Empirical Study of Spread of News on Digg and Twitter Social Networks." In Proceedings of 4th International Conference on Weblogs and Social Media (ICWSM). (presentation) (bibtex) This data is made available to the community for research purposes only. If you use the data in a publication, please cite the above paper.},
keywords= {digg2009},
terms= {}
}

</description>
<link>https://academictorrents.com/download/d98540da6d34fb6a0150fd88b41580a377cb454d</link>
</item>
<item>
<title>Lerman Twitter 2010 Dataset (Dataset)</title>
<description>@article{,
title= {Lerman Twitter 2010 Dataset},
journal= {},
author= {Kristina Lerman },
year= {2010},
license= {This data is made available to the community for research purposes only},
url= {http://www.isi.edu/~lerman/downloads/twitter/twitter2010.html},
abstract= {Twitter_2010 data set contains tweets containing URLs that have been posted on Twitter during October 2010. In addition to tweets, we also the followee links of tweeting users, allowing us to reconstruct the follower graph of active (tweeting) users.
URLs66,059
tweets2,859,764
users736,930
links36,743,448
Tweets

Table (in csv format) link_status_search_with_ordering_real_csv contains tweets with the following information

link: URL within the text of the tweet
id: tweet id
create_at: date added to the db
create_at_long
inreplyto_screen_name: screen name of user this tweet is replying to
inreplyto_user_id: user id of user this tweet is replying to
source: device from which the tweet originated
bad_user_id: alternate user id
user_screen_name: tweeting user screen name
order_of_users: tweet's index within sequence of tweets of the same URL
user_id: user id
Table (in csv format) distinct_users_from_search_table_real_map contains names of tweeting users, and the following information for each user:

user_id: user id
user_screen_name: user name
indegree: number of followers
outdegree: number of friends/followees
bad_user_id: alternate user id
Follower graph

File active_follower_real_sql contains zipped SQL dump of links between tweeting users in the form:

user_id: user id
follower_id: user id of the follower
Empirical characterization of this data is described in 
Kristina Lerman, Rumi Ghosh, Tawan Surachawala (2012) "Social Contagion: An Empirical Study of Information Spread on Digg and Twitter Follower Graphs." This data is made available to the community for research purposes only. If you use the data in a publication, please cite the above paper.},
keywords= {twitter},
terms= {}
}

</description>
<link>https://academictorrents.com/download/d8b3a315172c8d804528762f37fa67db14577cdb</link>
</item>
<item>
<title>UMN Sarwat Foursquare Dataset (September 2013) (Dataset)</title>
<description>@article{,
title= {UMN Sarwat Foursquare Dataset (September 2013)},
journal= {},
author= {Mohamed Sarwat and Justin J. Levandoski and Ahmed Eldawy and Mohamed F. Mokbe},
year= {2013},
url= {http://www-users.cs.umn.edu/~sarwat/foursquaredata/},
license= {},
abstract= {This data set contains 2,153,471 users, 1,143,092 venues, 1,021,970 check-ins, 27,098,490 social connections, and 2,809,581 ratings that users assigned to venues; all extracted from the Foursquare application through the public API. All users information have been anonymized, i.e., users geolocations are also anonymized. Each user is represented by an id, and GeoSpatial location. The same for venues. The data are contained in five files, users.dat, venues.dat, checkins.dat, socialgraph.dat, and ratings.dat. More details about the contents and use of all these files follows.

Content of Files
* users.dat: consists of a set of users such that each user has a unique id and a geospatial location (latitude and longitude) that represents the user home town location.
* venues.dat: consists of a set of venues (e.g., restaurants) such that each venue has a unique id and a geospatial location (lattude and longitude).
* checkins.dat: marks the checkins (visits) of users at venues. Each check-in has a unique id as well as the user id and the venue id.
* socialgraph.dat: contains the social graph edges (connections) that exist between users. Each social connection consits of two users (friends) represented by two unique ids (first_user_id and second_user_id).
* ratings.dat: consists of implicit ratings that quantifies how much a user likes a specific venue.

Credits

The user must acknowledge the use of the data set in publications resulting from the use of the data set by citing the following papers:

* Mohamed Sarwat, Justin J. Levandoski, Ahmed Eldawy, and Mohamed F. Mokbel. LARS*: A Scalable and Efficient Location-Aware Recommender System. in IEEE Transactions on Knowledge and Data Engineering TKDE
* Justin J. Levandoski, Mohamed Sarwat, Ahmed Eldawy, and Mohamed F. Mokbel. LARS: A Location-Aware Recommender System. in ICDE 2012},
keywords= {foursquare},
terms= {}
}

</description>
<link>https://academictorrents.com/download/b24c73949308b3f6bdd8fea1a485534392eef338</link>
</item>
<item>
<title>Twitter Data - NIPS 2012 (Dataset)</title>
<description>@article{,
title= {Twitter Data - NIPS 2012},
journal= {},
author= {J. McAuley and J. Leskovec},
year= {},
url= {http://snap.stanford.edu/data/egonets-Twitter.html},
license= {},
abstract= {This dataset consists of 'circles' (or 'lists') from Twitter. Twitter data was crawled from public sources. The dataset includes node features (profiles), circles, and ego networks.


##Dataset statistics

|Attribute|Value|
|---------|-------|
|Nodes|81306|
|Edges|1768149|
|Nodes in largest WCC|81306 (1.000)|
|Edges in largest WCC|1768149 (1.000)|
|Nodes in largest SCC|68413 (0.841)|
|Edges in largest SCC|1685163 (0.953)|
|Average clustering coefficient|0.5653|
|Number of triangles|13082506|
|Fraction of closed triangles|0.06415|
|Diameter (longest shortest path)|7|
|90-percentile effective diameter|4.5|

##Source (citation)

J. McAuley and J. Leskovec. Learning to Discover Social Circles in Ego Networks. NIPS, 2012.

##Files:

|Attribute|Value|
|---------|-------|
|nodeId.edges |The edges in the ego network for the node 'nodeId'. Edges are undirected for facebook, and directed (a follows b) for twitter and gplus. The 'ego' node does not appear, but it is assumed that they follow every node id that appears in this file.|
|nodeId.circles |The set of circles for the ego node. Each line contains one circle, consisting of a series of node ids. The first entry in each line is the name of the circle.|
|nodeId.feat |The features for each of the nodes that appears in the edge file.|
|nodeId.egofeat |The features for the ego user.|
|nodeId.featnames |The names of each of the feature dimensions. Features are '1' if the user has this property in their profile, and '0' otherwise. This file has been anonymized for facebook users, since the names of the features would reveal private data.|},
keywords= {twitter, social networks, NIPS},
terms= {}
}

</description>
<link>https://academictorrents.com/download/046cf7a75db2a530b1505a4ce125fbe0031f4661</link>
</item>
<item>
<title>Arizona State University Flixster Data Set (Dataset)</title>
<description>@article{,
title = {Arizona State University Flixster Data Set},
journal = {},
author = {Flixter },
year = {},
url = {http://socialcomputing.asu.edu/datasets/Flixster},
abstract = {Flixster is a social movie site allowing users to share movie ratings, discover new movies and meet others with similar movie taste.

Number of Nodes: 2523386
Number of Edges: 9197338
Missing Values? no
Source: N/A

Data Set Information:

2 files are included:

1. nodes.csv
-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains
all the node ids used in the dataset

2. edges.csv
-- this is the friendship network among the users. The friends are represented using edges. 
Here is an example. 

1,2

This means user with id "1" is friend with user id "2".


Attribute Information:

Flixster is a social movie site allowing users to share movie ratings, discover new movies and meet others with similar movie taste. This contains the friendship network crawled in December 2010 by Javier Parra (Javier.Parra@asu.edu). For easier understanding, all the contents are organized in CSV file format.

-. Basic statistics
Number of Nodes: 2,523,386
Number of Edges: 9,197,338}
}</description>
<link>https://academictorrents.com/download/4960373ea6dec89153639b0975ea92f9e3d3c914</link>
</item>
<item>
<title>Arizona State University Twitter Data Set  (Dataset)</title>
<description>@article{,
title= {Arizona State University Twitter Data Set },
journal= {},
author= {R. Zafarani and H. Liu},
year= {2009},
institution= {Arizona State University, School of Computing, Informatics and Decision Systems Engineering},
url= {http://socialcomputing.asu.edu/datasets/Twitter},
abstract= {Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one neat and simple package. It's a new and easy way to discover the latest news related to subjects you care about.

|Attribute|Value|
|-|-|
|Number of Nodes: |11316811|
|Number of Edges: |85331846|
|Missing Values? |no|
|Source:| N/A|

##Data Set Information:

1. nodes.csv
-- it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains
all the node ids used in the dataset

2. edges.csv
-- this is the friendship/followership network among the users. The friends/followers are represented using edges. Edges are directed. 

Here is an example. 

1,2

This means user with id "1" is followering user with id "2".


##Attribute Information:

Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and sms messaging all rolled into one neat and simple package. It's a new and easy way to discover the latest news related to subjects you care about.},
keywords= {ASU, Twitter, Social, Graph},
terms= {}
}

</description>
<link>https://academictorrents.com/download/2399616d26eeb4ae9ac3d05c7fdd98958299efa9</link>
</item>
</channel>
</rss>
