How I Collected The Data For Our Latest Story On Credit Acceptance

PublishedOctober 15, 2018

We may earn a commission from links on this page.

Image for article titled How I Collected The Data For Our Latest Story On Credit Acceptance — Illustration: Sam Woolley/GMG

Hi, I’m Ishaan Jhaveri. I’m a data journalist who works with Gizmodo Media Group’s Special Projects Desk. I worked with Ryan Felton on the latest story about subprime lender Credit Acceptance, and the effect its collections cases have had on the people of Detroit. In case you’re a fellow data nerd, here’s how I did it.

Watch

2024 BMW 530i xDrive | First Drive

view video

2024 BMW 530i xDrive | First Drive

Reggie Watts Is A Big Porsche Fan | Jalopnik Chats

Tuesday 11:46AM

I Need A Fun And Stylish Car For My Commute | WCSYB?

Monday 11:16AM

I first read PlainSite’s reports on Credit Acceptance Corporation and drew inspiration from their methods (Appendix B), particularly their idea of scraping court records from Detroit’s 36th district court as a way of obtaining information about Credit Acceptance Corporation. Though they made their code available, I wanted to collect more data on each case than they had in their analysis so I decided to write fresh code and collect all my data from scratch.

I then:

scraped Detroit’s 36th District Court’s public website to obtain ALL the available cases from 1999-2017 and a majority of the available cases from 1995-1998 as HTML files.
parsed each of the HTML files I obtained in step 1, to collect specific data for each case.
output this data to this spreadsheet. You can download it here.
used the spreadsheet to make the 2 graphs that you can see in the story.

Step 1

I knew from the PlainSite report that “[case] numbers in [Detroit’s] 36th District are of the form XXYYYYYY, where XX represents the two-digit year (for example, 99 for 1999 or 17 for 2017) and YYYYYY represents a six-digit serial number starting at 100,000 for each year.”
Like them, I requested every single case number for each year from 1999-2017 until I reached cases filed on or around the last business day in December and began to observe invalid case number errors. For cases in 1995-1998 the data is incomplete, so there are publicly available cases that were filed after this date that I haven’t requested and haven’t included in the dataset. The reason for this is the 36th district court’s website is considerably slower when you search for cases filed before 1999. As a result, the time I had allotted to collect the data wasn’t enough to collect the complete data for these years.
For these requests, I used the puppeteer headless Chrome browser. My code is available here.

Step 2

I used this code to parse each of the files.
I made the following design decisions:

If two defendants had the same first name, last name and middle initial, I assumed they were the same person. Therefore in the published data I report a “total number of defendants” in any category and “number of unique defendants” in any category. Total number may count a given defendant twice if there is more than one case against this defendant. Unique number will only count a given defendant once, no matter how many cases there are against this person.
“Judgment satisfied cases” (referred to in the code as closed cases) are cases in which the defendant in question has satisfied the judgments against them. “Open cases” are cases in which the defendant in question has not satisfied the judgments against them.

Step 3

Here I will explain what each sheet in the released spreadsheet represents and what the various columns denote:

Sheet 1, Metadata. The rows in this sheet represent data for cases filed in a given year.

“Year” is the year the given cases were filed.
“Date of First Case” is the date of the first case filed that year.
“ID of First Case” is the ID of the first case filed that year. It is the ID of the case whose date is “Date of First Case”.
“Date of Last Case” is the date of the last case filed that year among the cases included in the dataset. For the years 1999-2017 this is the date of the actual last publicly available case filed that year. For the years 1995-1998 the data is incomplete as discussed above, so there are publicly available cases that were filed after this date; they just weren’t included in the dataset.
“ID of Last Case” is the ID of the last case filed that year among the cases included in the dataset. Since it is the ID of the case whose date is “Date of Last Case”, the same restrictions on the data as above apply.
“Number of Overall Cases” is the total number of cases filed in Detroit’s 36th district court that year.
“Number of Cases in which Credit Acceptance Corporation was the Plaintiff” is the total number of cases filed in Detroit’s 36th district court that year in which Credit Acceptance Corporation was the plaintiff.
“Number of Defendants in these cases” is the total number of defendants across all the cases in statistic 7. Note that this statistic is valuable because many cases brought by Credit Acceptance Corporation have more than one defendant.
“% of Cases in which Credit Acceptance Corporation was the Plaintiff” is statistic 7's percentage of statistic 6.
“Number of Defendants who were able to satisfy the Judgments against them” is the number of defendants across the cases Credit Acceptance Corporation brought in Detroit’s 36th district court that year who were able to satisfy the judgments against them.
“% of Defendants who were able to satisfy the Judgments against them” is statistic 10‘s percentage of statistic 8.
“Number of Defendants who have still not been able to satisfy the Judgments against them” is the number of defendants across the cases Credit Acceptance Corporation brought in Detroit’s 36th district court that year who have still not been able to satisfy the judgments against them (as of September 2018).
“Average Number of Years Taken for defendants to satisfy the Judgments against them Discounted Cases” is the average amount of time taken for defendants to satisfy the judgments against them across all cases filed this year. Naturally, it only applies to those defendants that were able to satisfy the judgments against them.
“Discounted Cases” are the IDs of those case files that were included in the data that but discounted in the data analysis. Usually a case is discounted because its ID suggests it should have been filed in a given year (since case IDs encode the date of filing in them), but the actual date of filing reported in the body of the case shows it to be filed in a different year. There are very few of these cases, so discounting mismatches like this does not have any significant bearing on the data analysis.

Sheets 2 and 3, Data about individual cases. Sheet 2 represents data about cases in which defendants were able to satisfy the judgments against them. Sheet 3 represents data about cases in which defendants have not yet been able to satisfy the judgments against them (as of September 2018).

“Case ID” is the ID of this case. Note that a given case ID will occur as many times as the number of defendants in that case.
“Date Filed” is the date this case was filed in Detroit’s 36th district court.
“Date Judgment was Satisfied” (for closed cases) is the date Credit Acceptance Corporation’s case against this defendant closed because they were able to satisfy the judgments against them. “Date of last update to case” (for open cases) is the last time an update to this case was logged. As of the logging of that update, the case was still not closed.
“How Many Years it Took for Judgment to be Satisfied” is the time between “Date Filed” and “Date Judgment was Satisfied”. “Years case has been open” is the time between “Date Filed” and “Date of last update to case”.
“Defendant” is the name of the defendant. The names have been anonymized and replaced with numbers to protect the privacy of the defendants.
“Defendant’s Attorney” is the name of the defendant’s attorney. Very few defendants had attorneys so this field is blank for most of the entries.
“Default Judgment Dates”. This field denotes the date a “default judgment” (cases when the car buyer didn’t show up to defend themselves) was filed against the defendant. There can be multiple of these per case. If there are multiple, they are printed on successive lines and the information about the next case will not start till the line after all the default judgment dates have been printed.
“How Many Years After beginning of case/previous default judgment this default judgment was” is the time between “Date Filed” and a given “Default Judgment Date”. If there are multiple default judgments, this column denotes the time between the date of the current default judgment (in the column to the left of the same row) and the date of the previous default judgment (in the column to the left of the row above).
“Default Judgment Amounts” is the amount this default judgment against the defendant was for. In other words the debt that this default judgment places upon this defendant.
“Total Default Judgment Amount for this Defendant” is the sum of all the “Default Judgment Amounts” for this defendant. In most cases there is only one default judgment, so for most cases the total will just be equal to the only “Default Judgment Amount”.
“Dates Bankruptcy was Filed for”. This field denotes the date this defendant filed for bankruptcy. If there are multiple dates, they are printed on successive lines and the information about the next case will not start till the line after all the bankruptcy filing dates have been printed. The data for this statistic is not totally precise, so there may be defendants that filed for bankruptcy that are not reported here.
“How Many Years After beginning of case/previous bankruptcy filing bankruptcy was filed for” is the time between “Date Filed” and a given “Date Bankruptcy was Filed for”. If this defendant filed for bankruptcy more than once, this column instead denotes the time between the date of the current bankruptcy filing (in the column to the left of the same row) and the date of the previous bankruptcy filing (in the column to the left of the row above).
“Dates Bankruptcy Stays were granted”. This field denotes the date the court put a stay on the case because the defendant had indicated bankruptcy (it is unclear whether stays were granted only to defendants who FILED for bankruptcy). If there are multiple stays, they are printed on successive lines and the information about the next case will not start till the line after all the bankruptcy stay dates have been printed.
“How Many Years After beginning of case/previous bankruptcy stay bankruptcy stays were granted” is the time between “Date Filed” and a given “Date Bankruptcy Stay was granted”. If this case had multiple bankruptcy stays at different times, this column instead denotes the time between the date of the current bankruptcy stay (in the column to the left of the same row) and the date of the previous bankruptcy stay (in the column to the left of the row above).
“Judges” denotes all the judges that adjudicated this case through its lifetime.
“Plaintiff’s Name” is always either “Credit Acceptance Corporation” or “Credit Acceptance Corp.”. This information was included in case cases filed by plaintiffs other than Credit Acceptance Corporation somehow got included in the data.
“Plaintiff’s Attorney” is the name of Credit Acceptance Corporation’s attorney in the given case.
“Dates Possible Bankruptcy was Filed for”. As discussed above, the data for bankruptcy filing is not totally precise. In some rare cases with multiple defendants, the text “Notice of Bankruptcy Filed” is in the case notes, but it is unclear which defendant this notice has been filed for. In these cases, I added a “possible” bankruptcy filing for each of the defendants in this case with the adjoining date. So, when a defendant has a “Date Possible Bankruptcy was Filed for” it means it is possible but not certain that they filed for bankruptcy on that date.
“How Many Years After beginning of case/previous possible bankruptcy possible bankruptcy was filed for” is the time between “Date Filed” and a given “Date Possible Bankruptcy was Filed for”. If this defendant (“possibly”) filed for bankruptcy more than once, this column instead denotes the time between the date of the current “possible” bankruptcy filing (in the column to the left of the same row) and the date of the previous “possible” bankruptcy filing (in the column to the left of the row above).
Both sheets 2 and 3 have aggregate and average statistics at the bottom. Most of these are obvious from the name, but I explain the following in case they are confusing:

“Average Amount Per Default Judgment” is the average amount that Credit Acceptance Corporation secures in judgments against defendants. This is the average across all default judgments regardless of whether they are secured against the same plaintiff or not. This average only takes into account cases in which there were default judgments.
“Average Amount of Default Judgments Against Single Defendant” is the average amount that Credit Acceptance Corporation secures in judgments against a given defendant. This is the average across all defendants in this category. This average takes all cases in this category into account, whether or not there were default judgments in them. When a case doesn’t have a default judgment, it is treated as a case with $0 in default judgments against the given defendant.

Step 4

Sheet 4 has the data that went into the two graphs for this story. The graph that plots number and percentage of Credit Acceptance Corporation cases from 2007-2017 was made using Microsoft Excel and the graph that plots information about the open cases was made using this code.

Feel free to drop any questions in the comments, and thanks for reading.

Update: This post has been updated to prominently credit PlainSite for their idea of scraping court records from Detroit’s 36th district court as a way of obtaining information about Credit Acceptance Corporation.

Show all 21 comments