Корреляция и фильтрация событий - Как, с чего начать?

Got an asynchronous stream of events, where each event has information like -

  • Agency (one of many Agencies possible to be served by my solution)
  • Agent (one of many Agents in an Agency)
  • Served-Entity (a person/organization served by 1 or more agencies)
  • Date+Time
  • Class-Data (tags from a fixed but large set of tags)

What I need to do is to --

  1. Correlate an event based on Served-Entity, Date+Time and Class-Data, and create a consolidated new Event. Example:

    Event #0021: { Agency='XYZ', Agent='ABC', Served-Entity='MMN', Date+Time='12-03-2011/11:03:37', Class-Date='missed-delivery,no-repeat,untracable,orphan' }

    Event #0193: { Agency='KLM', Agent='DAY', Served-Entity='MMN', Date+Time='12-03-2011/12:32:21', Class-Date='missed-delivery,orphan,lost' }

    Event #1217: { Agency='KLM', Agent='CARE', Served-Entity='MMN', Date+Time='12-03-2011/18:50:45', Class-Date='escalated' }

    Here I find 3 events which are spaced out in time (more than 7hr separation), which are for the same Served-Entity (MMN), occur within a certain time window (say 24-hours), have matching or related Class-Data.

  2. Finally create a consolidated (new) event which could represent an inference drawn.

  3. Be able to create reports on a per Agency, per Agency, per Served-Entity basis, based on things like specific Class-Data tags (e.g. missed-delivery) over a certain period of time. This could be done using the original/input events, or the synthesized (inference) events.

  4. While this is not a requirement today, but quite likely to appear in future, that the "tags" that appear in Class-Data could grow, without any human intervention. So not sure if this should then be treated as unstructured data.

  5. Also not an immediate requirement, but in future there may be a need to identify trends / patterns of event occurrences (i.e. Event1 led to Event2 led to Event3).

The event arrival rate could be quite high... possibly thousands of events per minute. Maybe more. And, I need to archive the original/synthesized events for a period of time (a month or so).

My solution needs to be based on FOSS components (preferably). Some research done so far, points in the direction of CEP (Complex Event Processing), Bayesian-Networks/Classification, Predictive-Analytics.

Looking for some suggestions regarding approach to take. I'd prefer to take the path which meets most of my goals, with minimum difficulty/time, or to put another way, "learning AI" or "formal statistical methods" isn't my short-term goal :-)

9
задан Stompchicken 1 April 2011 в 12:56
поделиться