I’ve built a feed that surfaces Bluesky posts containing DOIs. It uses regex to detect DOI links, but I’m not sure my patterns are robust enough to catch all valid variations.

Can you help?

The Feed

Latest Academic DOIs
A feed of the newest Bluesky posts containing DOIs—helping researchers quickly spot fresh academic references.
Bluesky
https://bsky.app/profile/did:plc:s2rczyxit2v5vzedxqs326ri/feed/aaajnmvcrx3k6

REGEX strings

And here are the two regex strings in use:

https://doi.org/10.\d{4,9}/[-._;()/:A-Za-z0-9]+

10.\d{4,9}/[-._;()/:A-Za-z0-9]+

They work for most common DOI formats, but DOIs can be tricky, and I may be missing edge cases. If you’re familiar with ReGex and DOI formats, I’d really appreciate your feedback or suggestions for improvements.


More Info

Here are two articles I used to inspire these strings:

Finding a DOI in a document or page
The DOI system places basically no useful limitations on what constitutes a reasonable identifier. However, being able to pull DOIs out of PDFs, web pages, etc. is quite useful for citation informa...
https://stackoverflow.com/questions/27910/finding-a-doi-in-a-document-or-page
Finding and Validating DOIs with Regex
To find a DOI (Digital Object Identifier) in a document or webpage using regex (regular expressions), you can create a pattern that matches the standard format of DOIs. A typical DOI looks like this: `10.1000/xyz123`.
https://stackhub.net/manuals/finding-and-validating-dois-with-regex