Manufacturing companies develop information systems with Machine Learning (ML) to support the work in use cases throughout the product lifecycle. They do so through development projects. They produce software systems with ML components that deliver a predictive capability. We refer to them as ML solutions. Currently, ML development projects face two difficulties that reduce their effectiveness to deliver suitable ML solutions.
The first difficulty refers to the understandability of ML solutions. ML solutions are composed of multiple software components, learning algorithms and hardware resources. Data scientists or ML engineers determine a useful configuration and combination of components through trial and error, following previous experience or preferences. This can potentially lead to test all possible combinations, which results in extended development times. Moreover, once a useful combination is found, it is difficult to document the ML solution composition properly, i.e., with comprehensive details to reproduce the development and to understand any performance trade-offs.
The second difficulty refers to the explainability of ML solutions. Developed ML solutions need to be selected and approved by domain experts before deployment in the productive use case. This is difficult to achieve because of the inherent complexity of the ML solution components as well as the complexity of the explanations provided. ML solutions are typically validated using ML metrics, e.g., F-Score, RMSE or R2. These metrics are difficult to understand by non-experts and have little significance regarding the impact of the ML solution in the use case.
The GUACAMOLE project implemented prototypical tools to address these two difficulties. Over the course of 29 months, we developed 4 prototypes, a metadata repository and annotation formats to support the ML solution development process. We discussed their applicability with our industry partner at TRUMPF. Their underlying concepts were published in several scientific publications.
The first prototype, the ML Solution Designer, offers an interface to design ML solution specifications based on Axiomatic Design for Machine Learning (AD4ML). The resulting ML solution specifications can be assessed by the tool before the corresponding ML solution is implemented. It also allows development teams to build a repository of reusable specification components, which could be used in the future to automatically recommend components when new ML solutions are specified.
The second prototype, the ML Solution Viewer, displays the annotated metadata of ML solutions components. The metadata describes the data features used to train an ML model, the technical configuration, version numbers and parameter values of the software components used, the hardware resources required to deploy the ML solution and the performance that can be expected across multiple metrics. These metadata summarize all necessary information to enable the reproducibility of ML results. They also serve as input data for the reuse recommendations provided by AssistML.
The third prototype, the ML Solution Tester, serves as a validation tool for data scientists and software developers to ensure that the metadata they produce can be integrated in a common metadata repository. The prototype lets different teams know which ML solutions have been contributed to the metadata repository and provides them with a code to submit their own development.
The fourth prototype, AssistML, recommends existing ML solutions to be reused in new use cases. The prototype finds the ML solutions that better suit the performance preferences of the new use case from those in the metadata repository. AssistML then presents the selected solutions to decision makers in simple and intuitive reports. This reduces the development time for new projects.
Together, all four prototypes provide a standard development approach to build ML solutions for different use cases using different technology stacks. This enables manufacturing companies to cover many development projects at the same time, to quickly identify and reuse ML solution components and to provide meaningful explanations for experts and non-experts alike. These advantages result in shorter development times, reduced resource usage per development project and better informed decisions to select and use ML solutions.
This project started on February 1st, 2018 and ended on June 30th, 2021.