Your podcast discovery platform

Curious minds select the most fascinating podcasts from around the world. Discover hand-piqd audio recommendations on your favorite topics.

You are currently in channel:

Technology and society

Author, fact-checker and academic

piqer for: Global finds Technology and society Health and Sanity

Nechama Brodie is a South African journalist and researcher. She is the author of six books, including two critically acclaimed urban histories of Johannesburg and Cape Town. She works as the head of training and research at TRI Facts, part of independent fact-checking organisation Africa Check, and is completing a PhD in data methodology and media studies at the University of the Witwatersrand.

View piqer profile

piqer: Nechama Brodie

Monday, 21 May 2018

How Bad Data Could Ruin Machine Learning

Even great programmers will tell you how tricky it can be to find a problem in lines and lines and lines of code. But, as students and users or consumers of 'data' in the bigger and broader sense, sometimes the problems we encounter exist—or are introduced—before the data even enters the 'machine'. And this is one of the biggest challenges facing the development of computer-based problem solving right now: The quality of the data we are using in our programmes.

Algorithms are, simply, a set of instructions (given to a 'computer') in a specific language. Machine learning (ML) is something that takes this an exponential step further, and instructs the 'computer' to learn from itself and its data, as part of the algorithm. In order to enable this process, machines need starting data to 'learn' from. Typically, the training data are sets of information that have already been examined and filtered and checked by humans. Computers use these to learn how to process and filter different inputs, and what are considered 'correct' outputs, and then they are able to replicate those outputs at a later stage, on new data. If you can imagine those annoying 'captcha' screens you encounter when you have to verify you are not a robot when logging in to a website? Those are actually training data, and you are providing input while verifying yourself!

We know bad data produces bad results. But what if the training data that gets put into a ML programme is flawed, or dirty, or over-cleaned, or biased? Well, as this article in the Harvard Business Review correctly argues, it compromises everything the 'machine' learns, and because of the speed and frequency with which some programmes perform and repeat their functions, even small errors can cascade into large deviations – a bad data snowball effect.

This article clearly explains the problems and challenges with the quality of data being used to train machines, and suggests several simple, effective, and essential solutions.

How Bad Data Could Ruin Machine Learning

Source: Thomas C. Redman hbr.org

04/02/2018

If Your Data Is Bad, Your Machine Learning Tools Are Useless

Executive Summary Poor data quality is enemy number one to the widespread, profitable use of machine learning. The quality demands of machine learning are steep, and bad data can rear its ugly head twice both in the historical data used ...

View article

6.7

One vote

relevant?

Would you like to comment? Then register now for free!

Become a member now

Comments 0

Previous piq Technology and society The Right To Be Forgotten: A Mother's Struggle To Erase Italy's Viral Sex Tape Next piq Technology and society Should Robots Be Baptized?

Your podcast discovery platform

How Bad Data Could Ruin Machine Learning

If Your Data Is Bad, Your Machine Learning Tools Are Useless

{{title}}