How to be competitive in the modern workplace? Part V


Internet is great, because you can find a lot of resources on it to help your work. If you rely only on Google to help your work, then you must be careful.

Last year, I sat in a Big Data seminar hosted by the local Sydney IBM. It was a great seminar with a few workshops and the products offered by IBM were great. I bumped into my PhD student at that function. He works full time as an data analyst while trying to finish his PhD thesis on a part-time basis.

After a talk about using big data to develop a smart model, I had a chat with my PhD student over coffee during a break. I asked him: how do you think about the talk (more specifically, the model)? He said the concept was good but the model was a bit ‘doggy’. Tying to ‘test’ my PhD student a bit, then I asked: why? He said that the model was just a regression model but it had 115 variables in it. Either that guy over-fitted the model or that guy didn’t know how to use regression model properly.

My student was right (sigh….. at least he knows what’s going on there); there was something ‘fishy’ there.

First of all, if you need to use 115 variables to fit a regression model, then you shouldn’t have used this model. Second, by only looking at the model performance, it was a good model, indeed. However, that’s not a ‘correct’ use of a regression model. Third, no one pointed out this flaw in the talk. I guess people were either trying to be polite or they didn’t understand either! A bit of both, I guess.

To be competitive, you must know what you are doing. It’s not because this is what Google results tell you, it is because you truly know what you are producing. A good data scientist knows that choosing the right model is so important that you want to avoid ‘garbage-in-garbage-out’ situation. An ordinary data scientist only pays attention to whether the results look nice or not, and whether the model has a fancy name. I’d suggest that you become the first type; then you are really competitive at your workplace.

