Good, Better, Best For Python Projects

I blogged about my observation that Python-coded models and analyses can be grouped into three robustness levels by architecture and macroscopic coding arrangement. Here is a more detailed discussion of the three levels taught also at this training link to a Practical Python for Modelers short course.

The 3-level scale is based on how a project does in three areas: First, is the model or analysis “curated” for the developer and others to understand, inspect and review? Second, are the data transformations and calculations validated with re-runnable, coded tests to build confidence in correctness over the useful life of the model? Finally, are solutions maintainable and extensible by the original developer and colleagues or successors?

Level 1 is what I call the “ad hoc Jupyter notebook approach.” It will garner the highest grade in Data Science coursework and is the “I claim Python expertise on LinkedIn” threshold. Level 1 notebooks will not be questioned for models built for personal usage at a company.  It is the go-to for short, linear, one-time analyses and demos. It leaves a lot to be desired beyond that though, and one-time analyses have a funny habit of morphing into permanence.

Level 1 code is rarely well curated and is difficult for anyone else to re-use. From personal experience, its utility breaks down with complexity of data sources and analyses. Level 1 analyses cannot practically be validated with a re-runnable test suite. Usually, Level 1 “validation” consists of printing out and looking visually at a result or performing quick hand calculations: “Yep. That looks plausible. Move on.” That makes things high risk for errors either initially or especially if re-used later.

What I call Level 2 is my go-to approach for client-requested, one-time analyses and for exploratory work. At Level 2, this is still [usually] based in a Jupyter notebook for convenience, but use cases get organized into what I call procedures –groupings of single-action functions that, when executed in order, perform some task or data transformation.

I use the diagram to illustrate a Level 2 and 3 “architecture.”  The left block represents a model’s user entering some inputs and executing a use case. The right-most functions are single-action –easy to debug and, at least at Level 3, validated with re-runnable Pytest testing. The “procedures” in the middle are procedural functions that call the elemental ones sequentially to do the work as shown in example code below. It is arbitrary what defines a “procedure,” but they should generally be single-topic and independent (e.g. “transform the ingredients DataFrame into MKS units and add calculated columns.” Or “Create a rows/columns summary of business unit sales and revenue by calendar month”).  Delivering the user-facing use case consists of performing 1+ of these more abstract “procedures.”  While there are other architectural frameworks out there, I find that this one works well for 90+% of user-facing models and analyses.


The structure makes troubleshooting easy, and it makes it possible to maintain and extend the model by refactoring individual functions, splicing new functions into a procedure or by adding new procedure functions and their right-side collections of elemental ones.  At Level 3, all such upgrades can come with re-running the test suite to ensure nothing got broken.

Level 3 is what I call Tuhdoop –referring to a combination of Test-Driven Design (TDD) and Object Oriented Programming (OOP).  I coined the term and wrote about Tuhdoop and its synergistic benefits in this Github tutorial. I notice that I can create Level 3 code as efficiently as Level 2 and as efficiently as Level 1 if I factor in rework and debugging time.  I am gradually trending to a hybrid work process of developing at Level 2 and then porting the code to Level 3 as appropriate. A distinction between Level 2 and 3 is that Level 3 is object-oriented –often consisting of a single Class per use case or sometimes a Class for the use case that relies on other helper Classes.

To go with the architecture diagram, this screen picture shows typical Level 3 code where the application code (left side) and test code (right side) mirror each other and are organized by procedure. Well-organized code is self-documenting. The testing also forms part of the documentation by making it easy to probe and learn about what individual functions are doing.

As a closing aside, I found that the TDD theme in Level 3 and using ChatGPT to spit out function skeletons has influenced me personally to improve at planning the procedures and elemental functions and in prewriting the tests. I experiment currently with using ChatGPT with prompts like, “For the above rows and columns table listing Python functions in Column A, write the skeleton of the Python function with its arguments (Column B) and write a Pytest test function to verify that it generates the expected output.” I am still undecided whether this will ultimately be advantageous versus writing my own Python helper script to do this systematically. This seems like it might be easier than trying to cajole ChatGPT to do the same thing repeatably. Regardless of how that turns out, credit to GPT-4 for the inspiration.