String pattern matching problem

Imagine we have a long string containing the substrings 'cat' and 'dog' as well as other random characters, eg.

cat x dog cat x cat x dog x dog x cat x dog x cat

Here 'x' represents any random sequence of characters (but not 'cat' or 'dog').

What I want to do is find every 'cat' that is followed by any characters except 'dog' and then by 'cat'. I want to remove that first instance of 'cat' in each case.

In this case, I would want to remove the bracketed [cat] because there is no 'dog' after it before the next 'cat':

cat x dog [cat] x cat x dog x dog x cat x dog x cat

To end up with:

cat x dog x cat x dog x dog x cat x dog x cat

How can this be done?

I thought of somehow using a regular expression like (n)(?=(n)) as VonC recommended here

(cat)(?=(.*cat))

to match all of the pairs of 'cat' in the string. But I am still not sure how I could use this to remove each cat that is not followed by 'dog' before 'cat'.


The real problem I am tackling is in Java. But I am really just looking for a general pseudocode/regex solution.

7
задан Community 23 May 2017 в 12:04
поделиться