I'm trying to parse a plain text document in PHP but have no idea how to do it correctly. I want to separate each word, assign them an ID and save the result in JSON format.
Sample text:
"Hello, how are you (today)"
This is what im doing at the moment:
$document_array = explode(' ', $document_text);
json_encode($document_array);
The resulting JSON is
[["Hello,"],["how"],["are"],["you"],["(today)"]]
How do I ensure that spaces are kept in-place and that symbols are not included along with the words...
[["Hello"],[", "],["how"],[" "],["are"],[" "],["you"],[" ("],["today"],[")"]]
I’m sure some sort of regex is required... but have no idea what kind of pattern to apply to deal with all cases... Any suggestions guys?