Robust Models and Analyses – Defining Good (or Meh!), Better, Best

Python is everywhere these days. Universities crank out Data Science majors and minors. Middle school kids are learning the language. There is a groundswell of coding as a generic skill for non-software professionals –engineering, scientific, business, journalism and even liberal arts disciplines. That’s great, but, when it comes to building models and doing important analyses with the language, I observe three distinct levels of robustness. I group projects into these levels based on how they are organized (aka “architected”) and whether they come with re-runnable testing. That is a standard in the software industry for good reason. If you are learning Python, you can use my distinctions as a roadmap and ensure you have the top two levels in your repertoire. If you are a manager of people who create data products and models for decision making (or if you hire consultants!), it should be a priority for these analyses to be curated, validated and maintainable/extensible. The quality of these three items varies wildly and honestly tends towards the low end of the scale.

I recently created a workshop training called Practical Python for Modelers for an audience of R&D engineers and scientists ranging from Python beginners to computer modeling experts. I am reapplying it in collaboration with Dr. Will Hartt and colleagues at University of Delaware Chemical Engineering. We seek to educate highly technical undergrad and graduate students for whom data science is outside their immediate domain. The slides and training data are available at the link.

A followup Tech Notes blog goes into more detail about what distinguishes the three levels of Python styles/architectures. Here are definitions of the three robustness objectives:

  • Robust models and analyses are curated. Another knowledgeable human can look at the files and not only repeat the analysis but extend it without the author needing to be present.
  • Robust models and analyses are validated. They come with re-runnable test cases consisting of mocked up data proving correctness of calculations, data cleaning steps and output generation.
  • Robust models are maintainable and extensible. They are written in a modular way, so that they are easy to build on and easy to reapply.

Look for these things in what you generate. If the work is done externally for you, ask consultants to write them into proposal deliverables along with the direct, decision-making models and analyses! You will not be sorry.