Database best practices for SDLs and database types (e.g., document, relational, knowledge graphs)

sgbaird · March 20, 2025, 2:32pm

There are various database frameworks, some popular ones being document-based (e.g., MongoDB), relational (e.g., postgreSQL), and knowledge graph-based (e.g., neo4j).

Generally a document-based one is easiest to get started with (especially given the parallels and user familiarity with Python dictionaries), but tends to adapt poorly as needs change and schema are updated.

SQL and other relational databases have a long history, and while the entry barrier is generally higher, these seem to do relatively well when adjusting to dynamically changing needs. Data (characterization results, compositions and processing conditions, and other data and metadata) for the physical sciences are inherently relational.

Graph-based databases often evoke somewhat nebulous terms to some such as ontologies and knowledge graphs. However, generally people who come to understand what these are tend to agree that it’s the “long-term dream”, in some sense taking relational a step further. However, these also seem to have one of the highest learning curves and a fair bit of ambiguity and careful thought required in terms of the design choices of the database (again, especially thinking about ontologies).

Depending on the discipline, the data types and data formats can vary a lot. It gets more complex when you start thinking about integrations with electronic lab notebooks or laboratory information management systems or direct upload of data by devices. How are you approaching this in your labs? What is working well, or not working so well? Even with highly custom implementations, it would help to be aware of what thinking went into the design decisions.

Cc @willigo09. Also, Sergio and Matthias Popp.

mseifrid · March 28, 2025, 7:09pm

We use a MySQL database. We didn’t implement it from scratch though. We’re using eLabFTW as both a ELN and LIMS. It’s a little clunky at times, but I appreciate its flexibility.

I haven’t thought about it too much yet, but I assume the data within our db could be ported to a knowledge graph relatively easily.

Topic		Replies	Views
Relational databases, NoSQL, and downstream usage in AI models Tools	1	16	May 3, 2025
Recommendations for ELN and Data Management General	2	15	June 6, 2025
Do you use existing standards/general purpose infrastructure? Do you contribute to standards/general purpose infrastructure? Ecosystem open-source , pose-workshop-2024	2	142	July 10, 2024
What kind of BO set up (representation, model, acquisition function, etc.) do you typically use for materials discovery tasks? Tools	2	47	April 30, 2025
When building an SDL, how do you build your team? Ecosystem	0	37	January 14, 2025

Database best practices for SDLs and database types (e.g., document, relational, knowledge graphs)

Related topics