I would like to synchronize a spoken recording against a known text. Is there a speech-to-text / natural language processing library that would facilitate this? I imagine I'd want to detect word boundaries and compute candidate matches from a dictionary. Most of the questions I've found on SO concern written language.
Desired, but not required:
Edit: I realize this is a very broad, even naive, question, so thanks in advance for your guidance.
What I've found so far: