How to best capture all relevant information when starting a new project on SDLs

willigo09 · May 2, 2025, 1:27pm

While use cases that can be carried out by SDLs are highly domain- and problem-specific, they also have plenty of things in common. I am wondering how people typically align project teams (e.g. business, data scientists, lab technicians, scientists,…) before a project is started to allow for most efficient execution of the project. Below I post a few thoughts as well as an intake form / questionnaire which might help to align a cross-functional team - idea is that the form is filled in by the entire team. Are there any important info missing, are the questions understandable or would additional refinements make it clearer? All feedback is welcome and hopefully it is of use for others as well.

Project intake

Motivation

A comprehensive and well-executed intake process is the cornerstone of successfully setting up a Self-Driving Laboratory (SDL). This process serves as the critical first step in aligning technological solutions with the overarching objectives of the SDL initiative. The set-up of a SDL will be most efficient if started with the end in mind: this document will provide guidance on how to do the intake as efficiently as possible. It involves clarifying the following key elements:

overall objectives: Clearly articulate the goals of building an SDL, such as decreasing time-to-product/market/money, minimizing the number of required samples or minimizing cost of development.
Key Performance Indicators (KPIs): Identify and define all relevant KPIs that will measure success. These should be quantifiable, actionable and tied directly to the intended outcomes of the product or process being developed.
associated constraints: Understand any constraints that might influence the design and execution of experiments, including budgetary, operational, regulatory, or technical limitations.
levers and variables: Map out all actionable inputs and levers that can be adjusted to optimize the target KPIs, such as material properties, process conditions or formulation parameters.

This knowledge will subsequently be used to make informed decisions about the most suitable:

design methods: Select optimization strategies (e.g. Bayesian Optimization, Design of Experiments, Reinforcement Learning,…) tailored to the problem structure.
data infrastructure: Define the requirements for capturing, storing and processing data to ensure compatibility with AI/ML frameworks and FAIR principles.
experimental set-up: Design the physical and digital workflows that will enable seamless integration of robotic platforms and data capture.
workflow orchestration: Ensure efficient coordination of all components, from experimental execution to iterative learning cycles.

Such thorough intake minimizes the risk of misaligned goals, inefficient workflows and wasted resources.

A generic intake form

1. Project Overview

project title:
- (Provide a short and descriptive title for the SDL use case.)
brief description:
- (Summarize the purpose and goals of the project, including the specific application area.)
objectives:
- (Define the main goals of the SDL project, e.g. decrease time-to-product/market/money, minimize sample numbers to reach the desired targets, minimize costs of product development.)

2. Key Performance Indicators (KPIs)

For each KPI relevant for the to be developed product/process i.e. that is used to assess the quality of your product/process, provide info on

name: (Name the KPI used to evaluate performance, e.g. binding affinity, OER, viscosity)
unit: (Specify the units of measurement, e.g., mA/cm², %, g/mL.)
target: (Specify the desired value (e.g. 10, 3.5) or range (>10, [3, 5].)
evaluation method: (Outline the method or assay used to measure the KPI.)

3. Inputs

Parameter name	Type	Range	Description
Parameter A	Continuous	0–1	Fraction of Component A
Parameter B	Discrete	10, 20, 30	Possible values for Parameter B
Parameter C	Categorical	Type 1, Type 2	Category options for Parameter C

(Include as many rows as needed based on use case.)

Physical and Operating Parameters:

(Include parameters like temperature, pressure, voltage, etc., with ranges and descriptions if they are NOT part of the inputs that should be optimized.)

4. Constraints

Input constraints:
- (List input-related constraints, e.g. sum of fractions equals 1, minimum/maximum number of components.)
- Conditional constraints (e.g. Component A and B must not coexist, There must be at least 0.2 of Component A and 0.3 of Component B present).
Operating Constraints:
- (List constraints related to experimental conditions, e.g. temperature must not exceed 60°C.)

5. Experimental Design

Controls:
- (Specify the controls to be included in each experiment, e.g. standard samples, baseline formulations.)
Throughput:
- Number of unique designs per iteration (excluding controls):
- Maximum number of cycles:

6. Workflow Considerations

cycle initialization:
- (Describe how new designs are initialized for each round of experiments, e.g., novel designs, iterative modifications.)
scaling strategy (optional, only relevant if several scales are being used):
- Describe how results from lab-scale experiments will be validated and scaled to pilot and production scales.
- Include metrics for evaluating successful scaling.

7. Historical data

historical data availability:

Scale	# unique samples available
lab
pilot
production

8. Metadata requirements

Define what metadata is critical for the SDL to function effectively (e.g. temperature, matrix information).

sgbaird · May 4, 2025, 2:21am

Hi Willi, thanks for posting on the AC forum! This seems like an important topic to the community.

While use cases that can be carried out by SDLs are highly domain- and problem-specific, they also have plenty of things in common

Agreed! Over the years, I found myself asking nearly the same set of questions in conversations. In 2022, I made and refined a survey that captured some of these questions and clarifications I found myself often making. I received a dozen or so responses (early career, mixed fields/topics, mostly low-programming backgrounds, mostly materials science applications), mostly in connection with a materials informatics course being taught at the time. I found that the form helped reduce some of the repeat Q&A and streamline the conversation to dive into the more interesting and nuanced aspects.

I think what you show here captures quite a bit. Having objectives (referred to as KPIs above), parameters (“inputs” section), and constraints as top-level components makes sense to me. Some of the things I think are strengths and should likely be kept:

“name”, “unit”, “target”, and “evaluation method” (or general comments) for each of the objectives
example table with columns “parameter name”, “type”, etc.
content related to noise, throughput, historical data, scaling

Some other notes:

Perhaps “physical and operating parameters” section could be expanded with a similar table.
I notice that the form is text-only (i.e., no visualizations or external links)
Perhaps consider how the intended platform for hosting may affect the design choices of the form

Excited to see where this goes!

Topic		Replies	Views
When building an SDL, how do you build your team? Ecosystem	0	38	January 14, 2025
A central, curated repository for self-driving lab software packages Ecosystem open-source	2	151	July 3, 2024
First steps to get into the self-driving lab space Ecosystem	6	284	July 16, 2024
Materials development with processing constraints? General bayes-opt	2	28	April 4, 2025
Do you use existing standards/general purpose infrastructure? Do you contribute to standards/general purpose infrastructure? Ecosystem open-source , pose-workshop-2024	2	143	July 10, 2024