Data comes in from an increasing number of sources these days.

For decades, data stores were primarily built by good old data entry.

But today we cull data from a range of sources including IoT devices, social media feeds, email, and other sources outside of traditional database platforms.

That means the data isn’t always free of errors, blank spaces, or junk characters, and it may be inconsistently formatted from source to source.

If you work with data, at some point you will have the unenviable task of cleaning it.

Data cleaning, or cleansing, is a method to get rid of syntax errors, typographical errors, and broken or fragmented records; remove duplicate records; and/or reformat data so it’s easier to work with.

The text above is a summary, you can read full article here.