Development / Summer of Code / 2025 / MusicBrainz

Diese Seite wurde nicht von unserem Dokumentationsteam überarbeitet und geprüft (weitere Informationen).

MusicBrainz is a community-maintained open source music encyclopaedia that collects music metadata and makes it available to the public. Try it out.

Getting Started

(see also: Getting started with GSoC)

Modernize search storage format for the MusicBrainz database

Proposed mentor: lucifer
Proposed co-mentors: bitmap, reosarevok, yvanzo
Languages/skills: Solr, Python, Java
Forum for discussion
Estimated Project Length: 350 hours
Difficulty: medium



The MusicBrainz (MB) database has a Solr search engine used for both website search and search API. It stores the data in search fields. Two output formats are supported: MB XML (returned directly to API clients) and MB JSON (returned both to API clients and to the website server). The MB JSON format is automatically generated from the MB XML format. A RELAX NG schema is used to generate bindings and check the MB XML output. However, storing this MB XML format additionally to the search fields is redundant and inefficient (in disk usage and indexing time). Actually the current implementation has not been deeply revisited since the early versions of Solr.

It would be helpful to modernize the format used to store data for search in Solr. Minimum goals:

  • Upgrade the Solr schema version from 1.5 to 1.7
  • Complete fields (in configsets and indexer) to store all the data to be returned
  • Create two response writers to return data from fields to MB XML/MB JSON formats (with automated validation tests)

Many extra goals can be added by the candidate if wanted and if time permits. See tickets.

References:

Extend the mail service with template API

Proposed mentors: bitmap
Languages/skills: Rust
Forum for discussion
Estimated Project Length: 350 hours
Difficulty: medium

Since last summer, a new mail rendering service is gradually being used by MusicBrainz to replace Template Toolkit which has had its day. The new service is written in Rust and is based on MJML markup language (for proper rendering in mail clients) and MessageFormat 1 (for internationalization). However, the mail templates currently have to be written in the same repository as the mail service. It makes adding a new template (since both the project and the mail service have to be updated, released, and deployed) and sharing translation resources more difficult. While the current implementation is suitable to work with MusicBrainz, it has always been in sight to make the mail service available to the other MetaBrainz projects afterwards.

Allowing to load templates through API would be a great extension to the mail service, making it possible to maintain the templates in the repositories of their respective projects and to load/update these on demand without requiring to redeploy the mail service.

Many extra goals can be added by the candidate if wanted and if time permits. There are several mail templates in different projects to be adapted or reworked. There are also some cron jobs to send daily mails that would greatly benefit a full rewrite in Rust.

References:

Implement a daemon that corrects out-of-sync cover art and event art metadata on archive.org

Proposed mentor: bitmap
Full Description: reosarevok, yvanzo
Full Description: Python, SQL
Forum for discussion
Full Description: 175 hours
Full Description: medium

The Cover Art Archive and Event Art Archive store both metadata about the entity in question and metadata about the available images.

Historically there have been service issues that have introduced inconsistencies in these metadata files:

  • Outdated entity metadata (incorrect titles, artists, dates, etc.)
  • Outdated image metadata (types, comments, thumbnails, etc.)
  • Images that exist on archive.org but are missing from index.json (or the MusicBrainz database)
  • Images are are listed in index.json (or the MusicBrainz database) but are missing from archive.org
  • Malformed JSON (strings being used instead of integers, encoding issues, etc.)

Many such issues have been described as part of ROpdebee's excellent auditing work in IMG-129.

Recently a new artwork-indexer service has been deployed which manages the metadata in question. The task of this project would be to extend the artwork-indexer to monitor entities in the MusicBrainz database having images, and automatically check and repair the types of issues listed above.

Ideally we can use the auditing results in jira:IMG-129 to generate queued tasks that prioritize checking the entities contained in the audit.

Note that some initial work on this idea was started by bitmap.