1. Serve your model quickly.
The ultimate measure of success is that the insights generated by your model are useful to someone. The sooner that someone looks at the model output the sooner he/she can provide valuable feedback. I’d not be surprised if currently the vast majority of advanced analytics projects are shelved or abandoned, because the predictions cannot be integrated into business processes in a meaningful way. Or something is predicted that was already known by other means. A quickly-served model will also help you answer all your questions around performance, visualization, and production environment. As a result you’re less likely to optimize prematurely, over-deliver, and you can ensure that the model can continue to exist past the end of the project.
2. Data Science projects are inherently iterative.
Data quality can usually not be determined before modelling and model experimentation might affect data acquisition and pre-processing. Going through these iterations is often more efficient than trying to ascertain truths about data quality and model choice in advance. Many clients are worried about the quality and quantity of their data and might want to delay the start of a data science project until they’re sure that success is guaranteed. However, the most efficient way to determine data quality is by modelling.
Data science projects are people projects.
3. Data Science projects are similar to software projects.
They benefit from team spirit, improved communication, and a culture that encourages experimentation and learning. Clients are often concerned that they’re not getting the ‘best’ model and the ‘most accurate’ prediction. But the fact that the speed and quality of delivery could be easily improved with a few simple measures is often forgotten. In the end, data science projects are people projects. Things like a distraction-free work environment, easy access to an adequate development environment, ergonomic workplace and hardware, and easy access to data, have a bigger impact on success than a particular model choice.
Bonus Rule: Chose technology and tooling to support all of the above.
Three quick rules for successful Data Science projects. Interested readers might want to learn about a structured Data Science approach like CRISP-DM 1 next. 2
-
CRoss Industry Standard Process for Data Mining (CRISP-DM) ↩︎
-
The original version of this text was published 2018-02-08. ↩︎