I’ve built a feed that surfaces Bluesky posts containing DOIs. It uses regex to detect DOI links, but I’m not sure my patterns are robust enough to catch all valid variations.
Can you help?
The Feed
Latest Academic DOIs
A feed of the newest Bluesky posts containing DOIs—helping researchers quickly spot fresh academic references.
https://bsky.app/profile/did:plc:s2rczyxit2v5vzedxqs326ri/feed/aaajnmvcrx3k6
REGEX strings
And here are the two regex strings in use:
https://doi.org/10.\d{4,9}/[-._;()/:A-Za-z0-9]+
10.\d{4,9}/[-._;()/:A-Za-z0-9]+
They work for most common DOI formats, but DOIs can be tricky, and I may be missing edge cases. If you’re familiar with ReGex and DOI formats, I’d really appreciate your feedback or suggestions for improvements.
More Info
Here are two articles I used to inspire these strings:
The DOI system places basically no useful limitations on what constitutes a reasonable identifier. However, being able to pull DOIs out of PDFs, web pages, etc. is quite useful for citation informa...
https://stackoverflow.com/questions/27910/finding-a-doi-in-a-document-or-page
To find a DOI (Digital Object Identifier) in a document or webpage using regex (regular expressions), you can create a pattern that matches the standard format of DOIs. A typical DOI looks like this: `10.1000/xyz123`.
https://stackhub.net/manuals/finding-and-validating-dois-with-regex