Using the missingno package to visualize missing data


Once your data is safely localized, one of the first and most important things that you have to do at the beginning of any data analytics project is taking "a lay of the land" with your data. Data is fundamentally messy: full of oddities and noise and incomplete entries. Getting a handle on this weirdness is an essential first step to getting anything actually done with it, and as much as 80% of project time might end up getting sunk in it.

To help with that process I built missingno, a missing data visualization tool and the subject of this post. The package (still can't believe the name was never taken) exposes a series of top-level data visualizations taking pandas DataFrame as input and gives up-tweaked data nullity visualizations as output:

Data. Data. Data.

Head over to the the GitHub repository to learn more.

— Aleksey