Lerman Digg 2009 Dataset
Kristina Lerman

Academic Torrents!

Disable your

downloads (2 files)
Type: Dataset
Tags: digg2009

title= {Lerman Digg 2009 Dataset},
journal= {},
author= {Kristina Lerman },
year= {},
license= {This data is made available to the community for research purposes only},
url= {http://www.isi.edu/~lerman/downloads/digg2009.html},
abstract= {Digg2009 data set contains data about stories promoted to Digg's front page over a period of a month in 2009. For each story, we collected the list of all Digg users who have voted for the story up to the time of data collection, and the time stamp of each vote. We also retrieved the voters' friendship links. The semantics of the friendship links are as follows 
user_id --> friend_id 
means that user_id is watching the activities of (is a fan of) friend_id. User ids have been anonymized, but are unique in the data set: a user with a specific id in the friendship links table and a user with the same id in the votes table correspond to the same actual user.
The data is in zipped csv files that are password protected. The password is digg2009_user.


Table digg_votes contains 3,018,197 votes on 3553 popular stories made by 139,409 distinct users. The first vote is from the story's submitter. 

##Schema of the table
|vote_date: |Unix time stamp of the vote|
|voter_id: |anonymized unique id of the voter|
|story_id: |anonymized unique id of the story|

(left) Distribution of votes (diggs) per story. An outlier with more than 24,000 votes is not shown. (right)Distribution of the number of votes (diggs) made by users.

##Friendship links

Table digg_friends contains 1,731,658 friendship links of 71,367 distinct users. Voters who do not appear in the table did not specify any friends at the time data was collected. 

Schema of the digg_friends table

|mutual: |indicated whether the link represents a mutual friend relation (1) or not (0)|
|friend_date: |Unix time stamp of when the friendship link was created|
|user_id: |anonymized unique id of a user|
|friend_id: |anonymized unique id of a user|

Distribution of the number of fans per user.

Empirical characterization of this data is described in 

	Lerman, K., and Ghosh, R. (2010) "Information Contagion: an Empirical Study of Spread of News on Digg and Twitter Social Networks." In Proceedings of 4th International Conference on Weblogs and Social Media (ICWSM). (presentation) (bibtex) This data is made available to the community for research purposes only. If you use the data in a publication, please cite the above paper.},
keywords= {digg2009},
terms= {}