Tuesday, February 17, 2009

Understanding the value of Data Mining

Artificial Intelligence is quite a large area, academically speaking. Truly accessible applications for business are not so common however. Data mining may be one of them, but as a manager you should approach data mining correctly. Some people may tell you that having a data warehouse is the first pre-requisite to be able to undertake the first steps in data mining. Other managers will tell you they've gotten "SQL in their fingers" and will be able to come up with some interesting discoveries given some experimentation time. Some lucky miners have access to a large data warehouse and just run a few queries to confirm suspicions or new theories.

The thing is, data mining is not like a hobby, it's a true profession. You'll know the difference between a hobbyist and a professional when you ask them how they think about data mining. Find out if they think it's about developing a hypothesis and then testing the data against it, or whether it's discovering new truths about data. True data mining is guided by business perspective. Let someone from the business tell you where it aches or where they want to improve and then go off to find their answers. Are customer leaving? Does the business want to become more efficient in one area? What is the knowledge they lack, rather than the knowledge they want to confirm?

Data mining, to be efficient, needs a focus. It's easy to tell a company that the only way to even consider commencing in mining is having a data warehouse. Time has proven that the establishment of such a DWH is time-consuming and doesn't provide the necessary payback because half is not used.

Factor in your business goals with these requirements. What do you want to achieve? Let an expert go over these goals and consider the data flows that are needed. Then just get those data flows from wherever you can get them, load them in a database (general ETL), don't necessarily model it out, especially if you need it only once, derive your conclusions and move on if it's not giving the payback you require.

Especially in these times, managers need to be more focused on the evaluation of what provides payback and what doesn't. Don't linger around, find out which things are promising, eliminate those things that cost money.

Data mining can be very strategic also for not so established companies. Once you gather sufficient volumes of data, you can start considering it, but you definitely need expert guidance here, it's not a job that the common software engineer can get away with.

Worst of all, don't rely on anyone offering "neural networks" to tell you about things. Make sure to use proper algorithms to smooth and massage your data, so it becomes more interpretable for human beings. Graph it out, since that's the best way to visualize complex data sets. One picture is better than 5,000 numbers.

And finally, it's very, very unlikely that any data mining algorithm will tell you: "Please do A in order to achieve B". In general, data mining results require interpretation and understanding, especially understanding about its limitations.

2 comments:

Andrés Parra said...
This comment has been removed by the author.
Andrés Parra said...

Good article. It's worth to mention that a data warehouse can hurt data mining posibilities if it's designed without it in mind.