Info hash | 4b9b7e449aa732842aea1a7d4e6413f4507aea99 |
Last mirror activity | 2:42 ago |
Size | 6.36GB (6,362,542,606 bytes) |
Added | 2019-12-05 21:09:15 |
Views | 773 |
Hits | 1636 |
ID | 4361 |
Type | multi |
Downloaded | 302 time(s) |
Uploaded by | djfisher |
Folder | illinois_doc_dataset |
Num files | 8 files [See full list] |
Mirrors | 5 complete, 0 downloading = 5 mirror(s) total [Log in to see full list] |
illinois_doc_dataset (8 files)
csv/marks.csv | 8.61MB |
csv/person.csv | 9.97MB |
csv/sentencing.csv | 22.74MB |
front.7z | 3.43GB |
htmltocsv.py | 13.70kB |
inmates.7z | 12.81MB |
readme | 3.19kB |
side.7z | 2.88GB |
Type: Dataset
Tags: machine learning, Dataset, images, prisoners
Bibtex:
Tags: machine learning, Dataset, images, prisoners
Bibtex:
@article{, title= {Illinois DOC labeled faces dataset}, journal= {}, author= {Illinois DOC}, year= {}, url= {}, abstract= {This is a dataset of prisoner mugshots and associated data (height, weight, etc). The copyright status is public domain, since it's produced by the government, the photographs do not have sufficient artistic merit, and a mere collection of facts aren't copyrightable. The source is the Illinois Dept. of Corrections. In total, there are 68149 entries, of which a few hundred have shoddy data. It's useful for neural network training, since it has pictures from both front and side, and they're (manually) labeled with date of birth, name (useful for clustering), weight, height, hair color, eye color, sex, race, and some various goodies such as sentence duration and whether they're sex offenders. Here is the readme file: ---BEGIN README--- Scraped from the Illinois DOC. https://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc= https://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc= https://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc= paste <(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/pub_showside.asp\?idoc\=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '\n' > showside.txt paste <(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/pub_showfront.asp\?idoc\=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '\n' > showfront.txt paste <(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/inms_print.asp\?idoc\=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.html/g') -d '\n' > inmates_print.txt aria2c -i ../inmates_print.txt -j4 -x4 -l ../log-$(pwd|rev|cut -d/ -f 1|rev)-$(date +%s).txt Then use htmltocsv.py to get the csv. Note that the script is very poorly written and may have errors. It also doesn't do anything with the warrant-related info, although there are some commented-out lines which may be relevant. Also note that it assumes all the HTML files are located in the inmates directory., and overwrites any csv files in csv if there are any. front.7z contains mugshots from the front side.7z contains mugshots from the side inmates.7z contains all the html files csv contains the html files converted to CSV The reason for packaging the images is that many torrent clients would otherwise crash if attempting to load the torrent. All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not. Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py. There are 68149 inmates in total, although some (a few hundred) are marked as "Unknown"/"N/A"/"" in one or more fields. The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in." Some inmates were marked "Not Available", this has been replaced with "N/A". Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A". The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY. The "weight" column is often rounded to the nearest 5 lbs. Statistics for hair: 43305 Black 17371 Brown 2887 Blonde or Strawberry 2539 Gray or Partially Gray 740 Red or Auburn 624 Bald 396 Not Available 209 Salt and Pepper 70 White 7 Sandy 1 Unknown Statistics for sex: 63409 Male 4740 Female Statistics for race: 37991 Black 20992 White 8637 Hispanic 235 Asian 104 Amer Indian 94 Unknown 92 Bi-Racial 4 Statistics for eyes: 51714 Brown 7808 Blue 4259 Hazel 2469 Green 1382 Black 420 Not Available 87 Gray 9 Maroon 1 Unknown ---END README--- Here is a formal summary: ---BEGIN SUMMARY--- Documentation: 1. Title: Illinois DOC dataset 2. Source Information -- Creators: Illinois DOC -- Illinois Department of Corrections 1301 Concordia Court P.O. Box 19277 Springfield, IL 62794-9277 (217) 558-2200 x 2008 -- Donor: Anonymous -- Date: 2019 3. Past Usage: -- None 4. Relevant Information: -- All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not. -- Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py. -- The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in." -- Some inmates were marked "Not Available", this has been replaced with "N/A". -- Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A". -- The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY. -- The "weight" column is often rounded to the nearest 5 lbs. 5. Number of Instances: 68149 6. Number of Attributes: 30 (in some instances, information is missing. If so, it should be treated as unknown or undefined information) 7. Attribute Information: 1. ID: Alphanumeric internal ID (string) 2. mark: Human-readable string describing marks and scars. May have zero, one, or multiple entries for one ID. (string) 3. name: First and last name in format "SURNAME, GIVEN" - upper case. Redacted in provided copy, script must be executed to regenerate column. (string/void) 4. date_of_birth: Date of birth in format MM/DD/YYYY. Some inmates are marked as "Not Available" and some inmates are marked as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. (date OR enumeration) 5. weight: Physical weight in pounds OR "N/A". Often rounded to 5 lb increments. It may be related to the institution they are kept in. (integer OR void) 6. hair: Hair color. One of ("Black", "Brown", "Blonde or Strawberry", "Gray or Partially Gray", "Red or Auburn", "Bald", "Not Available", "Salt and Pepper", "White", "Sandy", "Unknown") (enumeration) 7. sex: Sex. One of ("Male", "Female") (enumeration) 8. height: Height in inches. (integer) 9. race: Race. One of ("Black", "White", "Hispanic", "Asian", "Amer Indian", "Unknown", "Bi-Racial", "") (enumeration) 10. eyes: Eye color. One of ("Brown", "Blue", "Hazel", "Green", "Black", "Not Available", "Gray", "Maroon", "Unknown") (enumeration) 11. admission_date: Date of admission in format MM/DD/YYYY. (date) 12. projected_parole_date: Projected parole date in format MM/DD/YYYY OR one of ("TO BE DETERMINED", "Sexually D", "3yrs---Lif", "3yrs---Lif", "TO BE DETERMINED BY COMMITTING COURT") OR "" (if none projected) (date OR enumeration OR void) 13. last_paroled_date: Last paroled date in format MM/DD/YYYY OR "" (if not paroled). (date OR void) 14. projected_discharge_date: Projected discharge date in format MM/DD/YYYY OR one of ("TO BE DETERMINED", "3 YRS TO LIFE - TO BE DETERMINED", "INELIGIBLE", "SEXUALLY D", "TO BE DETERMINED BY COMMITTING COURT", "PENDING", "3 YRS TO L") OR "". (date OR enumeration OR void) 15. parole_date: Parole date in format MM/DD/YYYY OR "". (date OR void) 16. electronic_detention_date: Electronic detention date in format MM/DD/YYYY OR "". (date OR void) 17. discharge_date: Date of discharge from institution. Always "", since discharged offenders are not included in the data set. (void) 18. parent_institution: Institution at which offender is kept, or "PAROLE" if parole. One of ("STATEVILLE CORRECTIONAL CENTER", "SHERIDAN CORRECTIONAL CENTER", "PINCKNEYVILLE CORRECTIONAL CENTER", "MENARD CORRECTIONAL CENTER", "LOGAN CORRECTIONAL CENTER", "ILLINOIS RIVER CORRECTIONAL CENTER", "DIXON CORRECTIONAL CENTER", "VANDALIA CORRECTIONAL CENTER", "GRAHAM CORRECTIONAL CENTER", "LAWRENCE CORRECTIONAL CENTER", "EAST MOLINE CORRECTIONAL CENTER", "SHAWNEE CORRECTIONAL CENTER", "JACKSONVILLE CORRECTIONAL CENTER", "DANVILLE CORRECTIONAL CENTER", "VIENNA CORRECTIONAL CENTER", "HILL CORRECTIONAL CENTER", "BIG MUDDY CORRECTIONAL CENTER", "CENTRALIA CORRECTIONAL CENTER", "ROBINSON CORRECTIONAL CENTER", "WESTERN ILLINOIS CORRECTIONAL CENTER", "LINCOLN CORRECTIONAL CENTER", "TAYLORVILLE CORRECTIONAL CENTER", "SOUTHWESTERN CORRECTIONAL CENTER", "PONTIAC CORRECTIONAL CENTER", "CONCORDIA", "DECATUR CORRECTIONAL CENTER", "KEWANEE LIFE SKILLS RE-ENTRY CENTER", "JOLIET TREATMENT CENTER", "PAROLE") (enumeration) 19. offender_status: Status of offender. One of ("CUSTODY", "PAROLE", "ABSCONDER", "RECEPTION", "WORK RELEASE CUSTODY", "TEMP RESIDENT", "NON-IDOC CUSTODY", "WRIT", "BOND", "HOME CUSTODY", "DETAINER", "MEDICAL FURLOUGH", "ESCAPE") (enumeration) 20. location: Location. One of ("PAROLE DISTRICT 1", "PAROLE DISTRICT 2", "PAROLE DISTRICT 3", "MENARD", "INTERSTATE COMPACT", "PINCKNEYVILLE", "LAWRENCE CORRECTIONAL CENTER", "PAROLE DISTRICT 4", "ILLINOIS RIVER", "DANVILLE", "HILL", "SHAWNEE", "DIXON", "SHERIDAN", "BIG MUDDY RIVER", "LOGAN", "PAROLE", "GRAHAM", "CENTRALIA", "EAST MOLINE", "NORTHERN RECEPTION CENTER", "VANDALIA", "ROBINSON", "STATEVILLE", "WESTERN ILLINOIS", "VIENNA", "TAYLORVILLE", "LINCOLN", "JACKSONVILLE", "PAROLE DISTRICT 5", "PONTIAC", "DIXON CORRECTIONAL CENTER", "SOUTHWESTERN ILLINOIS", "DECATUR", "", "MENARD MEDIUM SECURITY UNIT", "PONTIAC MEDIUM SECURITY", "GRAHAM R&C", "CROSSROADS CCC", "KEWANEE", "ILL/OTH STATE/FED CONCURR", "PEORIA CCC", "NORTH LAWNDALE ADULT TRANSITI", "STATEVILLE FARM", "GREENE COUNTY WORK CAMP", "COURT", "PITTSFIELD WORK CAMP", "FOX VALLEY CCC", "BOND", "SOUTHWESTERN IL WORK CAMP", "MENARD R&C", "ELECTRONIC DETENTION", "CLAYTON WORK CAMP", "DIXON SPRINGS BOOT", "DUQUOIN IMPACT INCARCERATION P", "DETAINER", "PAROLE DISTRICTS", "FURLOUGH", "ESCAPE", "DEPT. OF HUMAN SERVICES", "FED/STATE/TRANSFER OTH ST", "WOMENS TREATMENT CENTER", "JAIL", "CONCORDIA") (enumeration) 21. sex_offender_registry_required: Whether the offender is required to register as a sex offender. One of ("true", "") (boolean) 22. alias: Aliases, separated by pipe sign OR one of ("", "None Reported") (string OR enumeration) 23. mittimus: Mittimus ID (string) 24. class: Class of offender. One of ("4", "2", "3", "X", "1", "M", "U", "A", "B", "C") (enumeration) 25. count: Count of offenses (?) (integer) 26. offense: Offense. One of 1576 values. Appears to have been keyed in by hand. (enumeration/string) 27. custody_date: Date at which offender was taken into custody. (date) 28. sentence: Duration of sentence in format "X Years Y Months Z Days", where Y and Z may exceed 12 and 31 respectively OR one of ("DEATH", "LIFE", "SDP") (int[3] OR enumeration) 29. county: County or "out-of-state". One of ("COOK", "WILL", "WINNEBAGO", "KANE", "DUPAGE", "MADISON", "MACON", "LAKE", "PEORIA", "ST-CLAIR", "CHAMPAIGN", "MCLEAN", "SANGAMON", "KANKAKEE", "VERMILION", "LA SALLE", "TAZEWELL", "ADAMS", "LIVINGSTON", "STEPHENSON", "MCHENRY", "COLES", "WHITESIDE", "JEFFERSON", "MARION", "KENDALL", "ROCK-ISLAND", "KNOX", "HENRY", "DEKALB", "BOONE", "JACKSON", "MONTGOMERY", "MACOUPIN", "SALINE", "FRANKLIN", "LOGAN", "ROCK ISLAND", "CHRISTIAN", "FAYETTE", "CLINTON", "MORGAN", "WILLIAMSON", "JERSEY", "WHITE", "LEE", "MASON", "PIKE", "EDGAR", "RANDOLPH", "WOODFORD", "OGLE", "EFFINGHAM", "FULTON", "GRUNDY", "BOND", "IROQUOIS", "SHELBY", "UNION", "CRAWFORD", "LAWRENCE", "BUREAU", "CLAY", "MCDONOUGH", "DEWITT", "JOHNSON", "PERRY", "WAYNE", "MASSAC", "RICHLAND", "CLARK", "CASS", "HANCOCK", "ALEXANDER", "DOUGLAS", "WABASH", "HAMILTON", "GREENE", "WARREN", "FORD", "EDWARDS", "MONROE", "WASHINGTON", "MOULTRIE", "CUMBERLAND", "MERCER", "MENARD", "CARROLL", "GALLATIN", "SCHUYLER", "JASPER", "BROWN", "CALHOUN", "PIATT", "JO-DAVIESS", "POPE", "HARDIN", "PULASKI", "MARSHALL", "HENDERSON", "ST CLAIR", "PUTNAM", "SCOTT", "STARK", "OUT-OF-STATE", "OUT OF STATE", "JO DAVIESS") OR "" (enumeration or void) 30. sentence_discharged: Whether the sentence has been discharged. One of ("YES", "NO") (boolean) 8. Missing Attribute Values: See values marked "void" above. 9. Class Distribution: Statistics for hair: 43305 Black 17371 Brown 2887 Blonde or Strawberry 2539 Gray or Partially Gray 740 Red or Auburn 624 Bald 396 Not Available 209 Salt and Pepper 70 White 7 Sandy 1 Unknown Statistics for sex: 63409 Male 4740 Female Statistics for race: 37991 Black 20992 White 8637 Hispanic 235 Asian 104 Amer Indian 94 Unknown 92 Bi-Racial 4 Statistics for eyes: 51714 Brown 7808 Blue 4259 Hazel 2469 Green 1382 Black 420 Not Available 87 Gray 9 Maroon 1 Unknown Summary Statistics: median weight: 185 height: 69 ---END SUMMARY--- Image: ![](https://i.postimg.cc/D7pbKD0g/montage-0.jpg) https://i.postimg.cc/D7pbKD0g/montage-0.jpg}, keywords= {machine learning, Dataset, images, prisoners}, terms= {}, license= {Public Domain}, superseded= {} }