A C++ Library for Reading
Amtrak Performance History

Version 0

Bill Seymour
2018-02-03

Copyright Bill Seymour 2018.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)


Contents


Abstract

This paper describes an open-source C++ library that provides a mechanism for reading historical performance of Amtrak trains. There are several overloads of a read() function template to do the deed; and as extra added attractions, the library supplies functions for adding hyphens to "yyyymmdd" dates, for generating "hh:mm" strings from minutes after midnight, and for converting weekdays back and forth between enum and string representations.

The author has also translated this library into a Java class for no better reason than that he could.

The source files are atkhist.hpp and atkhist.cpp. The sources are also supplied in Appendix A so that you can peruse the code first before deciding whether you want to use it.

Appendix B contains a sample program that uses the library.

This paper, and the code it describes, are distributed under the Boost Software License, which isn’t viral as the GPL and others are said to be. (This software isn’t part of Boost; I just like the license.)


Some manual work required

In the current version, the data are generated at https://juckins.net/amtrak_status/archive/html/history.php. The user needs to manually copy and paste the detail lines in the HTML table produced by that Web page into plain text files. It’s those files that get read. It’s hoped that a future version will require less manual intervention.


Conformance

This library is deliberately very forgiving regarding the quality of your C++ implementation. Indeed, any implementation that conforms to C++98 will do. The library uses no constexpr function nor any other feature of C++11 or beyond.

The library reserves identifiers beginning with “ATKHIST_” for use as macros.


Synposis

using std::string; // for brevity in this paper only
namespace atkhist {

constexpr int one_day = 24 * 60; // minutes per day

enum weekday
{
    bad_day = 0,
    mo, tu, we, th, fr, sa, su,
    all_days
};

weekday make_wday(char, char);
weekday make_wday(const char*);
weekday make_wday(const string&, std::size_t = 0);

const char* const weekday_abbrv[];
const char* const weekday_names[];

const char* wday_abbrv(weekday);
const char* wday_name(weekday);

class traintime
{
public:
    const string& date() const;
    weekday wday() const;
    int time() const;
    int late() const;

    bool operator==(const traintime&) const;
    bool operator!=(const traintime&) const;
    bool operator< (const traintime&) const;
    bool operator> (const traintime&) const;
    bool operator<=(const traintime&) const;
    bool operator>=(const traintime&) const;

    void back_one_day();
};

template<typename Out1, typename Out2 = implementation-detail>
string read(const string& filename,
            Out1 results,
            Out2 weekdays = Out2());

template<typename Out1, typename Out2 = implementation-detail>
string read(bool departure,
            const string& train_number,
            const string& station_code,
            Out1 results,
            Out2 weekdays = Out2());

template<typename Out1, typename Out2 = implementation-detail>
string read(bool departure,
            int train_number,
            const string& station_code,
            Out1 results,
            Out2 weekdays = Out2());

string readable_date(const string&);

string readable_time(int minutes, const char* minus_sign = "-");

inline string html_time(int minutes)
{
    return readable_time(minutes, "&minus;");
}

} // namespace atkhist


Detailed Descriptions

The data are read from plain text files and, presumably, will be inserted into some container of traintimes.

For trains that run less than daily, the input files may optionally contain a list of weekdays on which the train is scheduled to run; and if the file actually contains such a list, the weekdays will be inserted into some container of weekdays.

Given valid arguments and valid data read from the input files, nothing in this library will throw an exception; but the read() overloads technically provide only the basic guarantee because, if something happens that causes read() to return a non-empty error message, that template might have already inserted objects into the caller’s containers.

Note:  in this version, since the input files are created manually, invalid data in input files is considered a contract violation. Debug versions of atkhist.cpp will assert given invalid arguments; but no other check is made. This might change in future releases; and that change might involve the read() template throwing an exception.

All code described herein is in namespace atkhist.


The input files

When rows from the HTML table are copied and pasted into a plain text file, they look like:
12/31/2016 (Sa) 01/01/2017 9:45 AM (Su) 11:06AM Ar: 1 hr, 21 min late.
12/11/2017 (Mo) 12/12/2017 1:12 AM (Tu) 1:05AM Ar: 7 min early. SD
Breaking down the above examples
Origin Scheduled Actual
Time
Comments
Date Day Date Time Day
12/31/2016 (Sa) 01/01/2017 9:45 AM (Su) 11:06AM Ar: 1 hr, 21 min late.
12/11/2017 (Mo) 12/12/2017 1:12 AM (Tu) 1:05AM Ar: 7 min early. SD

This library doesn’t care about the date of origin; but the rest of the fields are interesting.

If the scheduled time is late at night, it’s possible that the actual time will be in the wee hours of the following morning. Similarly, if the scheduled time is early in the morning, it’s possible that the actual time will be late the previous day. It’s also possible that the actual time doesn’t appear at all. This can happen if the train was annulled for some reason, or if the actual time just wasn’t reported because somebody at Amtrak forgot.

For trains that run less often than daily, users will probably want to add an additional first line that contains a list of weekdays that the train runs. For example. train 50 departs Chicago only on Tuesday, Thursday and Saturday; so the first line of the file for that train and station would be “TuThSa”. The weekdays may appear in any order and may be in any combination of upper and lower case; but they must be the usual two-character abbreviations (the first two characters of the English name of the day).

Files for daily trains may begin with “MoTuWeThFrSaSu”, but that silliness isn’t required: they can begin with a blank line or with a regular data line (no additional first line at all).


The traintime class

class traintime
{
public:
    const string& date() const;
    weekday wday() const;
    int time() const;
    int late() const;

    bool operator==(const traintime&) const;
    bool operator!=(const traintime&) const;
    bool operator< (const traintime&) const;
    bool operator> (const traintime&) const;
    bool operator<=(const traintime&) const;
    bool operator>=(const traintime&) const;

    void back_one_day();
};
A traintime will be constructed for each data line read from the input file. These objects are constructed only by one of the read() templates, so there’s no publicly visible constructor (except that you’ll get all the usual compiler-supplied special member functions).

Note that there’s no train number or station code. read() reads only data for a particular train and station. Programs that read data for more than one train and/or more than one station will presumably have separate traintime containers for each train/station pair anyway.

date() returns the scheduled date in "yyyymmdd" form.

wday() returns the scheduled day of the week.

time() returns the actual time as minutes after midnight. This value can be one_day or more if the actual arrival or departure is on the day after the scheduled day. If the actual time is unknown for some reason (for example, the train didn’t actually run on the scheduled day, or maybe the time just never got reported), this function will return INT_MAX.

<warning>
time()
never returns a value less than zero. In the unlikely event that a train is sufficiently early that it arrives on the day before it’s scheduled, date() and wday() will return the actual arrival day. This should affect arrivals only:  although there are cases when Amtrak trains depart before the scheduled time, the author is unaware of any examples of that happening around midnight.
</warning>

late() returns the number of minutes that the train is late (that is, the actual time minus the scheduled time). This will be negative if the train is early. If time() returns INT_MAX, this function will, too.

<warning>
If a train is a whole day late or more, and if the time is actually reported, this library will do the wrong thing. The author is unaware of any example of a train that late not having a “service disruption” and thus no time being reported.
</warning>

The usual comparison operators are provided as member functions. Since a given train calls on a given station only once per day, the date is the only datum compared. (One exception to that general rule is that the Silver Star calls on Lakeland, FL twice in each direction; but Amtrak uses different three-character station codes for the two stops; and it’s the station code, not the physical place, that matters.)

On rare occasions, we might want an arrival time to be on the day after the day before (yes, really). For example, Pacific Surfliner train 796 arrives in San Diego at 01:06 on the day after it departs from Los Angeles; and the date that we care about might be the departure date. The back_one_day() member function is provided to subtract one day from the date and the weekday, and to add one_day to the time. (Note that this adjustment had better be made before comparing the objects for any reason, including sorting them, storing them in sets, or using them as map keys.) It’s entirely up to the caller to determine whether to use this feature. Nothing in this library will ever call back_one_day().


The read() templates

The library’s principal reason for being is:
    template<typename Out1, typename Out2 = implementation-detail>
    string read(const string& filename,
                Out1 results,
                Out2 weekdays = Out2());
This template reads the file named filename, constructs a traintime for each regular input line, and writes them through results.

Out1 and Out2 are output iterator types. The library’s only uses are:

    *results++ = /* a traintime rvalue */;
    *weekdays++ = /* a weekday, lvalue or rvalue */;
    std::iterator_traits<Out2>::iterator_category()

The template returns an error message in the event of some I/O failure, or it returns an empty string if the read completes successfully.

In future releases, read() might throw an exception instead of returning an error message.

There are overloads that generate filenames for you:

    template<typename Out1, typename Out2 = implementation-detail>
    string read(bool departure,
                const string& train_number,
                const string& station_code,
                Out1 results,
                Out2 weekdays = Out2());

    template<typename Out1, typename Out2 = implementation-detail>
    string read(bool departure,
                int train_number,
                const string& station_code,
                Out1 results,
                Out2 weekdays = Out2());
The first argument indicates whether we’re looking for arrivals (false) or departures (true); the second argument, which can be either a string or an int, is the train number; and the third argument is Amtrak’s three-character station code.

These templates generate filenames that begin with "ar" for arrivals or "dp" for departures followed by the train number, the station code, and the extension, ".txt". For example, data for train 50 departing Chicago would be expected to be in a file named "dp50chi.txt". The station code may be in any combination of upper or lower case. The generated filename will always be lower case.


Days of the week

enum weekday
{
    bad_day = 0,
    mo, tu, we, th, fr, sa, su,
    all_days
};

const char* const weekday_abbrv[] =
{
    "??", "Mo", "Tu", "We", "Th", "Fr", "Sa", "Su", "??"
};
const char* const weekday_names[] =
{
    "No such day",
    "Monday", "Tuesday", "Wednesday", "Thursday",
    "Friday", "Saturday", "Sunday",
    "All Days" // extra added attraction
};

inline const char* wday_abbrv(weekday wd)
{
    return weekday_abbrv[int(wd)];
}
inline const char* wday_name(weekday wd)
{
    return weekday_names[int(wd)];
}
Note that the integer values are 1 for Monday through 7 for Sunday. This is common practice in the travel industry.

weekday is deliberately an old-fashioned unscoped enum. The enumerators are in the scope of the atkhist namespace.


Make weekday values from character representations

    weekday make_wday(char first_char, char second_char);
    weekday make_wday(const char* sp)
    {
        // as if:
        return make_wday(sp[0], sp[1]);
    }
    weekday make_wday(const string& str, std::size_t pos = 0)
    {
        // as if:
        return make_wday(str[pos], str[pos + 1]);
    }
These functions return one of atkhist::mo through atkhist::su given the first two characters of the weekday name; or they return bad_day if either or both of the arguments are invalid. The arguments may be upper or lower case.


Readable dates

    string readable_date(const string&);
The traintime class stores dates internally as "yyyymmdd" for lexicographical comparisons. This function simply adds hyphens for easier reading by H. sapiens. (The dates remain in ISO 8601 format. Localization is not anticipated at this time.)


Readable times

    string readable_time(int minutes, const char* minus_sign = "-");

    inline string html_time(int minutes)
    {
        return readable_time(minutes, "&minus;");
    }
The traintime class stores times of day internally as ints containing minutes after midnight. These functions return "hh:mm" strings for H. sapiens’ benefit. (The separator is the colon as specified in ISO 8601. Localization is not anticipated at this time.)

If minutes is INT_MAX, these functions just return "?".

The first argument is allowed to be one_day or more. If it is, the returned value will still be in the range, "00:00" to "23:59", but with "+n" appended where n is the number of days. This is common practice in the travel industry.

The first argument might represent the difference between two times, for example, the amount of time that a train is early or late. If minutes is less than zero, the returned string will begin with the minus sign passed as the second argument; and the "hh:mm" part of the string will represent the absolute value. (Note that this is not “a time of day on the previous day”…that would be one_day minus the absolute value, not the absolute value itself.)


Appendix A:  the actual source code

//
// atkhist.hpp
//
// Bill Seymour, 2018-02-03
//
// Copyright Bill Seymour 2018.
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
// http://www.boost.org/LICENSE_1_0.txt)
//
// This header declares a library for reading Amtrak departure
// and arrival times from files created from data received at
// https://juckins.net/amtrak_status/archive/html/history.php.
// See atkhist.html for user documentation.
//

#ifndef ATKHIST_HPP_INCLUDED
#define ATKHIST_HPP_INCLUDED

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>

#ifdef ATKHIST_NO_CONSTEXPR
  #define ATKHIST_CONSTEXPR const
  // (never used for functions herein)
#else
  #define ATKHIST_CONSTEXPR constexpr
#endif

namespace atkhist {

ATKHIST_CONSTEXPR int one_day = 24 * 60;

std::string readable_date(const std::string&);

std::string readable_time(int, const char* = "-");

inline std::string html_time(int mins)
{
    return atkhist::readable_time(mins, "&minus;");
}

enum weekday
{
    bad_day = 0,
    mo, tu, we, th, fr, sa, su,
    all_days
};

weekday make_wday(char, char);
weekday make_wday(const char*);
weekday make_wday(const std::string&, std::size_t = 0);

extern const char* const weekday_abbrv[9];
extern const char* const weekday_names[9];

inline const char* wday_abbrv(weekday wd)
{
    return weekday_abbrv[int(wd)];
}
inline const char* wday_name(weekday wd)
{
    return weekday_names[int(wd)];
}

template<typename Out1, typename Out2>
std::string read(const std::string&, Out1, Out2);

class traintime
{
    std::string dt;
    weekday wd;
    int tm, lt;

    explicit traintime(const std::string&);

    template<typename Out1, typename Out2>
    friend std::string atkhist::read(const std::string&, Out1, Out2);

    void prev_date();

public:
    const std::string& date() const { return dt; }
    weekday wday() const { return wd; }
    int time() const { return tm; }
    int late() const { return lt; }

    bool operator==(const traintime& rhs) const { return dt == rhs.dt; }
    bool operator!=(const traintime& rhs) const { return dt != rhs.dt; }
    bool operator< (const traintime& rhs) const { return dt <  rhs.dt; }
    bool operator> (const traintime& rhs) const { return dt >  rhs.dt; }
    bool operator<=(const traintime& rhs) const { return dt <= rhs.dt; }
    bool operator>=(const traintime& rhs) const { return dt >= rhs.dt; }

    void back_one_day()
    {
        prev_date();
        tm += one_day;
        lt += one_day;
    }
};

namespace detail { // read() helpers

std::string train_nbr(int);
std::string makefn(bool, const std::string&, const std::string&);
std::string error(const char*, const std::string&);

//
// When the caller doesn't care about scheduled weekdays,
// read()'s final argument can be a kind of "null iterator".
// We'll use the tag dispatch idiom to do that:
//
struct nulliter : std::iterator<nulliter,void,void,void,void> { };

//
// Writing all seven weekdays:
//

template<typename Out>
void include_all(Out wdys, std::output_iterator_tag)
{
    for (int i = mo; i <= su; ++i)
        *wdys++ = weekday(i);
}

template<typename Out> void include_all(Out, nulliter) { }

template<typename Out>
void include_all(Out wdys)
{
    detail::include_all
        (wdys, typename std::iterator_traits<Out>::iterator_category());
}

//
// Writing particular weekdays from the input file:
//

template<typename Out>
bool include(const std::string& s, Out wdys, std::output_iterator_tag)
{
    // assert s.size() is even (statically provable)
    for (size_t i = 0; i < s.size(); i += 2)
    {
        weekday wday = make_wday(s[i], s[i + 1]);
        if (wday == bad_day)
            return false;
        *wdys++ = wday;
    }
    return true;
}

template<typename Out> bool include(const std::string&, Out, nulliter)
{
    return true;
}

template<typename Out>
bool include(const std::string& s, Out wdys)
{
    return detail::include
        (s, wdys, typename std::iterator_traits<Out>::iterator_category());
}

} // namespace detail

template<typename Out1, typename Out2 = detail::nulliter>
std::string read(const std::string& fn, Out1 dest, Out2 wdys = Out2())
{
    std::ifstream is(fn);
    if (!is)
        return detail::error("Can't open", fn);

    std::string in;

    if (!std::getline(is, in))
    {
        is.close();
        return detail::error("Empty input file: ", fn);
    }

    if (in.empty()) // OK...train runs daily
    {
        detail::include_all(wdys);
    }
    else if (in[0] >= '0' && in[0] <= '9') // normal input line
    {
        detail::include_all(wdys);
        *dest++ = traintime(in);
    }
    else if ((in.size() & 1) != 0 || !detail::include(in, wdys))
    {
        // first line should have been scheduled weekdays
        is.close();
        return detail::error("Invalid first line in", fn);
    }

    while (std::getline(is, in))
    {
        *dest++ = traintime(in);
    }

    bool ok = !is.bad() && is.eof();
    is.close();

    return ok ? std::string() : detail::error("Error reading", fn);
}

template<typename Out1, typename Out2 = detail::nulliter>
std::string read(bool dep,
                 const std::string& tn,
                 const std::string& sta,
                 Out1 dest,
                 Out2 wdys = Out2())
{
    return atkhist::read(detail::makefn(dep, tn, sta), dest, wdys);
}

template<typename Out1, typename Out2 = detail::nulliter>
std::string read(bool dep,
                 int tn,
                 const std::string& sta,
                 Out1 dest,
                 Out2 wdys = Out2())
{
    return atkhist::read(dep, detail::train_nbr(tn), sta, dest, wdys);
}

} // namespace atkhist

#endif // ATKHIST_HPP_INCLUDED

//
// atkhist.cpp
//
// Bill Seymour, 2018-02-03
//
// Copyright Bill Seymour 2018.
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
// http://www.boost.org/LICENSE_1_0.txt)
//

#include "atkhist.hpp"

#include <cstddef> // INT_MAX
#include <cstdlib> // atoi
#include <cctype>  // tolower, isdigit

#include <assert.h>

using std::size_t;
using std::string;

namespace {

void int_to_string(string& s, int val, int len = 0)
{
    // assert val non-negative (statically provable)
    if (val >= 10 || len > 1)
        int_to_string(s, val / 10, len - 1);
    s.append(1, char(val % 10 + '0'));
}

} // anonymous namespace

namespace atkhist {

string readable_date(const string& s)
{
    string val;
    val.reserve(10);
    val.assign(s, 0, 4);
    val.append(1, '-');
    val.append(s, 4, 2);
    val.append(1, '-');
    val.append(s, 6, 2);
    return val;
}

string readable_time(int mins, const char* minus)
{
    string val;

    if (mins != INT_MAX)
    {
        static const char maxval[] = "&minus;00:00";
        static ATKHIST_CONSTEXPR size_t maxlen = sizeof maxval - 1;

        val.reserve(maxlen);

        if (mins < 0)
        {
            mins = -mins;
            val.assign(minus);
        }

        int hrs = mins / 60;
        mins %= 60;

        int days = 0;
        while (hrs >= 24) // probably no more than once,
        {                 // but just in case ...
            hrs -= 24;
            ++days;
        }

        int_to_string(val, hrs, 2);
        val.append(1, ':');
        int_to_string(val, mins, 2);

        if (days != 0)
        {
            val.append(1, '+');
            int_to_string(val, days);
        }
    }
    else
    {
        val.assign(1, '?');
    }
    return val;
}

char const * const weekday_abbrv[] =
{
    "??", "Mo", "Tu", "We", "Th", "Fr", "Sa", "Su", "??"
};

char const * const weekday_names[] =
{
    "No such day", "Monday", "Tuesday", "Wednesday",
    "Thursday", "Friday", "Saturday", "Sunday", "All Days"
};

weekday make_wday(char c0, char c1)
{
    c1 = char(std::tolower(c1));

    switch (std::tolower(c0))
    {
        case 'm':
            if (c1 == 'o')
                return mo;
            break;
        case 't':
            if (c1 == 'u')
                return tu;
            if (c1 == 'h')
                return th;
            break;
        case 'w':
            if (c1 == 'e')
                return we;
            break;
        case 'f':
            if (c1 == 'r')
                return fr;
            break;
        case 's':
            if (c1 == 'a')
                return sa;
            if (c1 == 'u')
                return su;
            break;
    }
    return bad_day;
}

weekday make_wday(const char* s)
{
    assert(s != 0 && s[0] != '\0' && s[1] != '\0');
    return make_wday(s[0], s[1]);
}

weekday make_wday(const std::string& s, std::size_t pos)
{
    assert(s.size() > pos + 1);
    return make_wday(s[pos], s[pos + 1]);
}

//
// Private helper for traintime::back_one_day(),
// also used by ctor in some corner cases:
//
void traintime::prev_date()
{
    if ((wd = weekday(int(wd) - 1)) == bad_day)
        wd = su;

    //
    // if day > "01", same month:
    //
    char& dy1 = dt[7];
    if (dy1 > '1')
    {
        --dy1;
        return;
    }
    //
    // else dy1 is '0' or '1':
    //
    char& dy10 = dt[6];
    if (dy10 != '0')
    {
        if (--dy1 < '0')
        {
            dy1 = '9';
            --dy10;
        }
        return;
    }

    //
    // else day == "01", so change to last day of previous month:
    //

    static const int mday[][13] =
    {
        { 0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 },
        { 0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 },
    };

    int yr = std::atoi(dt.substr(0, 4).c_str());
    int mo = std::atoi(dt.substr(4, 2).c_str());

    if (--mo == 0)
    {
        mo = 12;
        --yr;
    }
    bool leap = yr % 4 == 0 && (yr % 100 != 0 || yr % 400 == 0);
    int dy = mday[leap][mo];

    dt.clear();
    int_to_string(dt, yr);
    int_to_string(dt, mo, 2);
    int_to_string(dt, dy, 2);
}

namespace detail { // read() helpers

string train_nbr(int tn)
{
    assert(tn >= 0 && tn <= 9999);
    string val;
    val.reserve(4);
    int_to_string(val, tn);
    return val;
}

string makefn(bool dep, const string& tn, const string& sta)
{
    assert(sta.size() == 3);

    static const char maxval[] = "ar2250nyp.txt";
    static ATKHIST_CONSTEXPR size_t maxlen = sizeof maxval - 1;

    string fn;
    fn.reserve(maxlen);

    fn.assign(dep ? "dp" : "ar");
    fn.append(tn);
    for (char c : sta)
        fn.append(1, char(std::tolower(c)));
    fn.append(".txt");
    return fn;
}

string error(const char* why, const string& what)
{
    string msg(why);
    msg.append(1, ' ');
    msg.append(what);
    return msg;
}

} // namespace detail
} // namespace atkhist

namespace { // helpers for traintime ctor

/*
 * Input line examples:
 *
 * origin date |   expected day/time   | actual time
12/31/2016 (Sa) 01/01/2017 9:45 AM (Su) 11:06AM Ar: 1 hr, 21 min late.
12/11/2017 (Mo) 12/12/2017 1:12 AM (Tu) 1:05AM Ar: 7 min early. SD
 *
 * Note variations in time representation:  dy_pos and tm_pos below aren't
 * fixed character positions, but rather positions where we start searching.
 */
ATKHIST_CONSTEXPR size_t dt_pos = 16;
ATKHIST_CONSTEXPR size_t ex_pos = 27; // expected time
ATKHIST_CONSTEXPR size_t dy_pos = 35;
ATKHIST_CONSTEXPR size_t tm_pos = 40; // actual time

atkhist::weekday make_day(const string& in)
{
    size_t pos = in.find('(', dy_pos);
    assert(pos != string::npos);
    return atkhist::make_wday(in, pos + 1);
}

int parse_time(const string& in, size_t hrs_pos)
{
    // precondition:  hrs_pos is the 1st decimal digit's position

    size_t colon = in.find(':', hrs_pos);
    assert(colon <= hrs_pos + 2 && in.size() >= colon + 4);

    const char* sp = in.c_str();
    int hr = std::atoi(sp + hrs_pos);
    int mn = std::atoi(sp + colon + 1);

    size_t pm = in.find("PM", colon + 3);
    if (pm != string::npos && pm <= colon + 4)
    {
        if (hr != 12)
        {
            hr += 12;
        }
    }
    else if (hr == 12)
    {
        hr = 0;
    }

    return hr * 60 + mn;
}

int make_time(const string& in, size_t time_pos)
{
    size_t pos = in.find_first_not_of(" \t", time_pos);
    return pos != string::npos && std::isdigit(in[pos]) ?
        parse_time(in, pos) : INT_MAX;
}

//
// Whether a train is early or late:
//
bool is_early(const string& in, int lt)
{
    // assert not on-time (statically provable)

    size_t bar_pos = in.find('|', tm_pos);
    if (bar_pos != string::npos)
    {
        //
        // If the comment field contains '|', then both arrival and departure
        // statuses are reported; and we don't know /a priori/ which we want.
        // The best we can do is default to the departure status if that
        // matches expectations, otherwise report the arrival status.
        // (Note that "On time" can't be what we're looking for.)
        //
        if (in.find("On", bar_pos) == string::npos) // departure not on-time
        {
            size_t late_pos = in.find("late", bar_pos);
            if (in.find("On", tm_pos) != string::npos) // on-time arrival
            {
                return late_pos == string::npos;
            }

            // neither arrival nor departure on-time
            if (lt > 0 && late_pos != string::npos)
            {
                return false; // late matches expections
            }
            if (lt < 0 && late_pos == string::npos)
            {
                return true; // early matches expections
            }
        }
        // no match (including on-time departure)
    }
    // only one status reported or no departure match

    return in.find("late", tm_pos) >= bar_pos;
}

} // anonymous namespace

namespace atkhist {

traintime::traintime(const string& in)
  : dt(), wd(make_day(in)), tm(make_time(in, tm_pos))
{
    // input date is mm/dd/yyyy
    dt.reserve(8);
    dt.assign(in, dt_pos + 6, 4);
    dt.append(in, dt_pos + 0, 2);
    dt.append(in, dt_pos + 3, 2);

    if ((lt = tm) == INT_MAX)
        return; // unknown...nothing more to do

    int expected_time = make_time(in, ex_pos);
    assert(expected_time != INT_MAX);

    if ((lt -= expected_time) == 0)
        return; // on-time...nothing more to do

    bool early = is_early(in, lt);

    if (lt < 0 && !early) // morning of next day
    {
        tm += one_day;
        lt += one_day;
    }
    else if (lt > 0 && early) // late night yesterday
    {
        prev_date();
        // tm correct for yesterday
        lt -= one_day;
    }
}

} // namespace atkhist

// End of atkhist.cpp


Appendix B:  a sample application

//
// san-trip.cpp
//
// Bill Seymour, 2018-02-03
//
// Copyright Bill Seymour 2018.
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
// http://www.boost.org/LICENSE_1_0.txt)
//
// If I take the Texas Eagle to Los Angeles arriving on a Monday,
// then take one of the Pacific Surfliners to San Diego, how much
// of the first day of my meeting will I miss?
//
// This program illustrates the use of the atkhist library.
// Note that I don't care about the likelihood of making
// a connection since there are several Surfliners and I'll
// certainly connect to one of them (unless the Eagle is
// annulled for some reason).  The question is which Surfliner
// I'd connect to and how many times I actually arrive in San Diego
// before some particular point in the meeting.
//
// Observed output for 53 Monday arrivals 2017-01-02 through 2018-01-01:
/*
09:00:   8 times (15%) - a few minutes late
10:30:  37 times (70%) - in time for morning refreshments
12:00:   7 times (13%) - in time for lunch
15:00:   1 time  ( 2%) - in time for afternoon refreshments
 */

#include "atkhist.hpp"
using atkhist::traintime;

#include <iostream>
#include <iomanip>   // setw
#include <string>
#include <vector>
#include <map>
#include <iterator>  // back_inserter
#include <algorithm> // sort

#include <cstddef>   // INT_MAX
#include <cstdlib>   // EXIT_SUCCESS, EXIT_FAILURE
#include <cmath>     // round

using namespace std;

namespace {

//
// The minimum number of minutes to make a connection:
//
const int min_layover = 5;

//
// Since we need to keep Surfliner departure and arrival times together,
// and since atkhist::read() will read only one at a time, we'll also need
// to keep track of the train number so that we can know which departure
// time belongs with which arrival time.
//
struct surf
{
    int tr, dp, ar;

    void assign_time(bool dep, int tm)
    {
        *(dep ? &dp : &ar) = tm;
    }

    surf(bool dep, int t, int tm) : tr(t), dp(INT_MAX), ar(INT_MAX)
    {
        assign_time(dep, tm);
    }

    // for sorting by departure time:
    bool operator<(const surf& rhs) const
    {
        return dp < rhs.dp;
    }
};
typedef vector<surf> surfs;
typedef map<string /*date*/, surfs> surf_trains;

//
// A surf_trains instance at namespace scope:
//
surf_trains all_surfs;

//
// A degenerate output iterator that's just good enough for loading all_surfs:
// ("Don't try this at home, Kids." -- Mr. Wizard)
//
class surf_iter
{
    bool dp;
    int tr;
public:
    surf_iter(bool d, int t) : dp(d), tr(t) { }

    //
    // Since this is just an output iterator, the * and ++ operators
    // can just return *this; we'll overload the assignment operator
    // to actually do the deed; and we have meta-knowledge that we
    // won't need prefix ++.
    //
    surf_iter& operator*() { return *this; }
    surf_iter& operator++(int) { return *this; }

    //
    // Note that atkhist::read() will assign a temporary,
    // so operator='s argument can be an rvalue reference.
    //
    surf_iter& operator=(traintime&&);
};

//
// The number of times that various San Diego arrivals happen:
//
struct counts
{
    int tm;          // minutes after midnight
    int cnt;         // how many times it happens
    const char* stm; // human-readable time
    const char* msg;
}
cnts[] =
{
  //
  // We hope that the venue will be easy walking distance
  // from the Santa Fe depot; and we won't have checked
  // baggage to wait for.
  //
    {     525, 0, "08:45", "make the whole meeting"             },
    {     540, 0, "09:00", "a few minutes late"                 },
    {     630, 0, "10:30", "in time for morning refreshments"   },
    {     720, 0, "12:00", "in time for lunch"                  },
    {     810, 0, "13:30", "in time for afternoon session"      },
    {     900, 0, "15:00", "in time for afternoon refreshments" },
    {    1560, 0, "Later", "miss the whole day"                 },
    { INT_MAX, 0, "Never", "drop fifteen and punt"              },
};
const size_t ncnts = sizeof cnts / sizeof cnts[0];
counts& max_cnt = cnts[ncnts - 1];

bool count_connection(const traintime&, const surfs&);

} // anonymous namespace

int main()
{
    //
    // We're only reading one file of Texas Eagle trains.
    //
    vector<traintime> all_eagles;

    //
    // The Texas Eagle input file will actually be for train 1
    // (the Sunset Limited), not train 421 (the Texas Eagle
    // through cars that were added in San Antonio); and we
    // don't care about the days of the week.
    //
    string err(atkhist::read("ar1lax.txt", back_inserter(all_eagles)));
    if (!err.empty())
    {
        cerr << err << '\n';
        return EXIT_FAILURE;
    }

    //
    // All the Surfliner train numbers that we care about:
    //
    static const int surfnos[] =
    {
        562, 564, 566, 568, 572, 580, 582, 584, 590, 592, // orig. LAX
        768, 774, 782, 784, 790, 792, 796, // orig. north of LAX
        1566, 1588, // a couple of holiday trains
    };

    for (int tr : surfnos)
    {
        err.assign(atkhist::read(true, tr, "lax", surf_iter(true, tr)));

        err.append(1, '\n'); // in case we have two error messages

        err.append(atkhist::read(false, tr, "san", surf_iter(false, tr)));

        if (err.size() > 1) // not just the newline
        {
            cerr << err << '\n';
            return EXIT_FAILURE;
        }
    }

    //
    // Assure that, for any given date, the surfliners are sorted
    // by departure time:
    //
    for (auto& node : all_surfs)
    {
        sort(node.second.begin(), node.second.end());
    }

    //
    // Count the arrival times:
    //
    for (const traintime& tt : all_eagles)
    {
        if (tt.time() == INT_MAX)
        {
            ++max_cnt.cnt; // never arrived that day
        }
        else
        {
            surf_trains::const_iterator it = all_surfs.find(tt.date());
            if (it == all_surfs.end() || !count_connection(tt, it->second))
            {
                // no Surfliner at all that day || missed even the last one
                ++max_cnt.cnt;
            }
        }
    }

    //
    // Report the results:
    //
    cout.put('\n');
    size_t tot = all_eagles.size();
    for (const counts& c : cnts)
    {
        if (c.cnt > 0)
        {
            cout << c.stm << ": " << setw(3) << c.cnt
                 << " time" << (c.cnt > 1 ? "s (" : "  (")
                 << setw(2) << int(round(c.cnt * 100.0 / tot))
                 << "%) - " << c.msg << '\n';
        }
    }

    return EXIT_SUCCESS;
}

namespace {

surf_iter& surf_iter::operator=(traintime&& rhs)
{
    if (!dp && tr == 796)
    {
        //
        // We have meta-knowledge that train 796 runs overnight,
        // and that the input files for arrivals will report
        // the arrival date; but it's the departure date that
        // we care about.
        //
        rhs.back_one_day();
    }

    if (rhs.wday() != atkhist::mo)
        return *this; // don't bother if it's not Monday

    surf_trains::iterator it = all_surfs.find(rhs.date());

    if (it == all_surfs.end()) // don't have this date yet
    {
        all_surfs[rhs.date()].push_back(surf(dp, tr, rhs.time()));
        return *this;
    }

    //
    // We've got the date.  Do we have this train yet?
    //
    for (surf& sf : it->second)
    {
        if (tr == sf.tr)
        {
            sf.assign_time(dp, rhs.time());
            return *this;
        }
    }

    //
    // Got the date but not the train yet:
    //
    it->second.push_back(surf(dp, tr, rhs.time()));
    return *this;
}

inline void count_time(int tm)
{
    for (counts& c : cnts)
    {
        if (tm <= c.tm)
        {
            // definitely found (c.tm == INT_MAX eventually)
            ++c.cnt;
            return;
        }
    }
}

bool count_connection(const traintime& tt, const surfs& sfs)
{
    int target = tt.time() + min_layover;
    for (const surf& sf : sfs)
    {
        if (target <= sf.dp && sf.dp != INT_MAX)
        {
            count_time(sf.ar);
            return true;
        }
    }
    return false; // no connection at all
}

} // anonymous namespace

// End of san-trip.cpp


All suggestions and corrections will be welcome; all flames will be amusing.
Mail to was at pobox dot com