Regex: How to match sequence of key-value pairs at end of string

I am trying to match key-value pairs that appear at the end of (long) strings. The strings look like (I replaced the "\n")

my_str = "lots of blah
          key1: val1-words
          key2: val2-words
          key3: val3-words"

so I expect matches "key1: val1-words", "key2: val2-words" and "key3: val3-words".

  • The set of possible key names is known.
  • Not all possible keys appear in every string.
  • At least two keys appear in every string (if that makes it easier to match).
  • val-words can be several words.
  • key-value pairs should only be matched at the end of string.
  • I am using Python re module.

I was thinking

re.compile('(?:tag1|tag2|tag3):')

plus some look-ahead assertion stuff would be a solution. I can't get it right though. How do I do?

Thank you.

/David

Real example string:

my_str = u'ucourt métrage pour kino session volume 18\nThème: O sombres héros\nContraintes: sous titrés\nAuthor: nicoalabdou\nTags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise\nPosted: 06 June 2009\nRating: 1.3\nVotes: 3'

EDIT:

Based on Mikel's solution I am now using the following:


my_tags = ['\S+'] # gets all tags
my_tags = ['Tags','Author','Posted'] # selected tags
regex = re.compile(r'''
    \n                     # all key-value pairs are on separate lines
    (                      # start group to return
       (?:{0}):            # placeholder for tags to detect '\S+' == all
        \s                 # the space between ':' and value
       .*                  # the value
    )                      # end group to return
    '''.format('|'.join(my_tags)), re.VERBOSE)

regex.sub('',my_str) # return my_str without matching key-vaue lines regex.findall(my_str) # return matched key-value lines

6
задан OG Dude 16 March 2011 в 14:25
поделиться