Skip to content

From Black Box to Tax Audit Memo: Feature Importance in XGBoost

Posted on:June 1, 2015

I came to gradient‑boosted trees as a tax lawyer working on audits, not as a data‑science purist. At Deloitte’s tax audit department, I help very large companies manage complicated tax audits—explaining anomalies in their accounting and reconciling them with tax declarations. Finding anonalies is easy, explaining them a bit less, you need to find a pattern and a story to back them. So model accuracy wasn’t the roadblock—explainability was. Controllers and inspectors don’t accept “because the model said so.” They want an audit trail: Which variables pushed this entry into the suspect bucket, and why now?

To make that conversation possible, I built and shipped feature‑importance into the open‑source XGBoost project and made it easy to visualize. Overnight, a forest became a ranked, defensible list of signals—something you can paste into a memo, discuss in committee, and act on.

Want the longer version I used with finance and audit teams? Here’s the deck on SlideShare: Feature Importance Analysis with XGBoost in Tax Audit.


The contribution that changed the conversation

What it gives you. After training, you press one button and get a sorted table and bar chart of what drove the model. No spelunking through dumps, no bespoke scripts—just a ranking you can take to an audit. Because it’s open source, the logic is visible, reproducible, and reviewable by anyone on the engagement.

How to read it like an auditor.

Why it mattered in tax audit.
This turned a black‑box model into a short, actionable list:


Why I insisted on doing it open source


The companions I used, kept short and pragmatic

All of these were built and polished in the open—raised as issues, reviewed as PRs, and improved by maintainers and users who cared about turning accuracy into accountable decisions.


A checklist I gave finance teams

  1. Lead with Gain; use Frequency/Cover for corroboration.
  2. Think like a lawyer: a strong signal without a plausible tax mechanism is a prompt to investigate data quality or interactions—not a conclusion.
  3. Cross‑check structure: top‑ranked features should appear early or often; if not, look for interactions.
  4. Close the loop: convert the top drivers into sample plans, document requests, and control improvements.