Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#335938 - 09/08/2010 20:11 Math using time as variables in Perl
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31602
Loc: Seattle, WA
Coworker is working on a Perl script that validates some pieces of collected data. The nature of the data isn't important but we're having trouble wrapping our head around the math needed to validate this elegantly.

We have a set of timestamps within some files we're creating. These timestamps are a kind of a "marker" within the file, indicating the start point of a certain kind of data activity. These markers can be created at any time (very random times) depending on conditions. But we also added code to make sure to, at the very least, create the markers at fifteen minute intervals, four times per hour at the 00 15 30 45 points in the hour based on the time of day.

So a file might have timestamps like this:
- File start time (at <hour>:07)
- Marker
- Marker
- Marker
- Marker (at <hour>:15)
- Marker
- Marker
- File end time (at <hour>:24)


Or the file might look like this:
- File start time (at <hour>:22)
- Marker (at <hour:30>)
- Marker (at <hour:45>)
- Marker (at <hour+1:00>)
- File end time (at <hour+1>:07)

Or the file might look like this:
- File start time (at <hour>:32)
- Marker
- File end time (at <hour>:34)


The length of the file is variable, as are all of the markers and the file's start and end times. But there should at least be markers at those 00 15 30 45 points within each hour.

We're creating a script to validate that the 15-minute interval bits actually got done in our code in all possible cases. We need to run this script against a huge set of data files of varying length that were previously collected.

All timestamps are already converted to Epoch time (number of seconds since the Epoch) so theoretically we should be able to validate this with integer math. We're just trying to figure out how to do it without resorting to brute force. We're looking for an elegant math-based solution for this. Given a list like the ones above, how do we elegantly validate that the list does not contain a gap at any of the quarter hour points?

Any ideas?
_________________________
Tony Fabris

Top
#335943 - 09/08/2010 23:13 Re: Math using time as variables in Perl [Re: tfabris]
Shonky
pooh-bah

Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
I don't think you can. Unless the data is massively large, surely just bruteforcing is just the way.

1) Read first time stamp
2) Calculate next 15 minute boundary time stamp
3) Search for it from there - if a later time stamp is found first, file is borked. Success if end of file is reached.
4) Calculate next time stamp boundary (add 15 minutes), goto 3

Alternately, filter out any markers that are not on the exact 15 minute boundary first and then do the same run through. Need to be careful in case there are non regular markers exactly on the 15 boundary since you won't filter them out (unless you can identify them and filter them out too). Then it's just a case of making sure each entry is 15 minutes after the previous one.


Edited by Shonky (09/08/2010 23:14)
Edit Reason: Changed success detection in step 3
_________________________
Christian
#40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)

Top
#335944 - 10/08/2010 00:00 Re: Math using time as variables in Perl [Re: Shonky]
tonyc
carpal tunnel

Registered: 27/06/1999
Posts: 7058
Loc: Pittsburgh, PA
If you're looking for an elegant solution using Perl, you're doing it wrong. smile

If I'm understanding you right, I think you just want to do something like this:

1. Grab first timestamp in the file, let's call it "t0"
2 The first quarter-hour timestamp (t1) is going to be:
t1 = t0 + 900 - (t0 % 900)
3. Then you just loop, looking for t1 + n*900, until it fails or you hit EOF.

That's what you mean by a "math based" solution, right?
_________________________
- Tony C
my empeg stuff

Top
#335952 - 10/08/2010 13:03 Re: Math using time as variables in Perl [Re: tonyc]
siberia37
old hand

Registered: 09/01/2002
Posts: 702
Loc: Tacoma,WA
Grab the first timestamp, then grab all the markers until the end of the timestamp, put them in a hash list keyed by date. Now you've got a nice list of easily accessed timestamps for a time period and you also know the starting and ending point of that list of timestamps.
Next, take the start hour (i.e. strip off the minutes) of the first timestamp and assign it to an incrementing date variable you use to loop. Loop until the incrementing date variable < end date, increment the date by 15 minutes each time. At each iteration look for the date in the hash list. If you find it you are good, if you don't find it there is a problem. It should be very fast this way because of the use of a hash list. I have no idea if Perl can do a hash list as easily as OO languages though (Java, .NET etc..)

Top
#335953 - 10/08/2010 13:08 Re: Math using time as variables in Perl [Re: siberia37]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
Originally Posted By: siberia37
put them in a hash list keyed by date

That takes O(N) space, plus it can't check for out-of-order timestamps. Shonky/tonyc's algorithm is the way to go.

Peter

Top
#335958 - 10/08/2010 14:05 Re: Math using time as variables in Perl [Re: tonyc]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31602
Loc: Seattle, WA
Originally Posted By: tonyc
That's what you mean by a "math based" solution, right?


No, that's what I consider a brute-force solution. That sort of thing would work, though, and yours and Shonky's solutions are viable if we don't have a math solution.

By math solution, I'm talking about something that simply needs a math equation to solve the problem. Something along the lines of "IF (T1-T0) MOD 900" or something like that. Something that doesn't require iterating through a list for each time stamp. I don't know if that's possible.
_________________________
Tony Fabris

Top
#335960 - 10/08/2010 14:20 Re: Math using time as variables in Perl [Re: tfabris]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4180
Loc: Cambridge, England
Originally Posted By: tfabris
By math solution, I'm talking about something that simply needs a math equation to solve the problem. Something along the lines of "IF (T1-T0) MOD 900" or something like that. Something that doesn't require iterating through a list for each time stamp. I don't know if that's possible.

Unless I'm misunderstanding either your problem or their solution, the Shonky/tonyc algorithm makes just one pass through the file and uses only scalars (not arrays or lists) as local variables. You aren't going to improve on that.

Peter

Top
#335962 - 10/08/2010 14:52 Re: Math using time as variables in Perl [Re: tfabris]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Originally Posted By: tfabris
Something that doesn't require iterating through a list for each time stamp.

I have the feeling you don't know what you're asking for. I'm not poking fun; I've been in the same situation of feeling like there's a better solution but not even really knowing how to ask for it.

Ultimately, you have a list of numbers. You have to check each one. There's no way around dealing with a list, because that's your input. You can avoid dealing with additional lists, and that's what Shonky and Tony have suggested. There are potentially other ways to decorate it, but they're all going to be basically the same.
_________________________
Bitt Faulk

Top
#335966 - 10/08/2010 15:30 Re: Math using time as variables in Perl [Re: wfaulk]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
I have the feeling that you're looking for a more stateless solution. Let me redecorate the existing solution in that style:

Code:
// entry action; can start at any point in list
test(initial_stamp)

// recursive transit of list
// will either error, or exit due to EndOfList
function test(current)
  next := nextmarker(current)
  if next - current > FifteenMinutes then
    error
  else
    test(next)
  endif
endfunc

// get next 15m marker
// returns marker or exits due to EndOfList
function nextmarker(stamp)
  check_stamp := stamp.next
  if check_stamp == EndOfList
    exit
  if check_stamp MOD FifteenMinutes <> 0 then
    return nextmarker(check_stamp)
  else
    return check_stamp
  endif
endfunc


TonyC got after me about wasting RAM on the recursion, so:
Code:
function test(current)
  next := nextmarker(current)
  while next - current <= FifteenMinutes do
    current := next
    next := nextmarker(current)
  done
  error
endfunc


Edited by wfaulk (10/08/2010 16:02)
Edit Reason: iterative solution
_________________________
Bitt Faulk

Top
#335967 - 10/08/2010 15:45 Re: Math using time as variables in Perl [Re: wfaulk]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Here's a slightly different decoration:

Code:
// find first marker stamp
marker := nextmarker(initial_stamp)

// make sure there's no missing marker between the onset and the first marker
if (marker - initial_stamp) > FifteenMinutes then
  error
endif

// test starting at first marker
test(marker, marker DIV FifteenMinutes)

// recursively test for each marker by counting units of FifteenMinutes
// stops either with error or EndOfList
function test(marker, expected_quotient)
  assert(marker MOD FifteenMinutes == 0)
  if (marker DIV FifteenMinutes) != expected_quotient then
    error
  endif
  test(nextmarker(marker, expected_quotient + 1))
endfunc


Again: an iterative version:

Code:
function test(marker, expected_quotient)
  while (marker DIV FifteenMinutes) == expected_quotient do
    marker := nextmarker(marker)
    expected_quotient++
  done
  error
endfunc



Edited by wfaulk (10/08/2010 16:07)
Edit Reason: iterative procedure
_________________________
Bitt Faulk

Top
#335968 - 10/08/2010 16:08 Re: Math using time as variables in Perl [Re: wfaulk]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Ultimately, though, both of those solutions are likely to be less optimal than the initial decorations. I can't imagine an architecture where (MOD 900) costs less than (+ 900).
_________________________
Bitt Faulk

Top
#335976 - 10/08/2010 18:12 Re: Math using time as variables in Perl [Re: wfaulk]
tfabris
carpal tunnel

Registered: 20/12/1999
Posts: 31602
Loc: Seattle, WA
Originally Posted By: wfaulk
Ultimately, though, both of those solutions are likely to be less optimal than the initial decorations. I can't imagine an architecture where (MOD 900) costs less than (+ 900).


Thanks very much, Bitt, Peter, TonyC, Shonky, and everyeone else. Your clarity with describing the issues and the tradeoffs is extremely helpful.
_________________________
Tony Fabris

Top