Tutorial

2.2.8. Multi-Character Filters

All the Filters you've seen so far — except for those that derive from stdio_filter — process characters one at a time. If you instead process several characters at once, you can often reduce the number of function calls it takes to filter a character sequence, resulting in more efficient code. This is what Multi-Character Filters allow you to do.

Multi-Character InputFilters

A typical narrow-character Multi-Character InputFilter looks like this:

#include <iosfwd>                          // streamsize
#include <boost/iostreams/categories.hpp>  // tags

class my_input_filter {
public:
    typedef char char_type;
    struct category 
        : input_filter_tag,
          multichar_tag
        { };

    template<typename Source>
    std::streamsize read(Source& src, char* s, std::streamsize n)
    {
        // Read up to n filtered characters into the buffer s,
        // returning the number of characters read or -1 for EOF.
        // Use src to access the unfiltered character sequence
    }

    /* Other members */
};

Notice that the member type category is a struct convertible to input_filter_tag and to multichar_tag. This tells the Iostream library that my_input_filter is a Multi-Character Filter and an InputFilter. You could have achieved the same effect several ways. E.g.,

    struct category 
        : input,
          filter_tag,
          multichar_tag
        { };

        /* or */

    typedef multichar_input_filter_tag category;

(For details, see Mode Tags and Category Tags.)

You could also write the above example as follows:

#include <iosfwd>                       // streamsize
#include <boost/iostreams/concepts.hpp> // multichar_input_filter

class my_input_filter : public multichar_input_filter {
public:
    template<typename Source>
    std::streamsize read(char* s, std::streamsize n);

    /* Other members */
};

Here multichar_input_filter is a convenience base class which provides the member types char_type and category, as well as no-op implementations of member functions close and imbue.

shell_comments_multichar_input_filter

You can express a shell comments Filter as an Multi-Character InputFilter as follows:

#include <boost/iostreams/char_traits.hpp> // EOF, WOULD_BLOCK
#include <boost/iostreams/concepts.hpp>    // multichar_input_filter
#include <boost/iostreams/operations.hpp>  // get

namespace boost { namespace iostreams { namespace example {

class shell_comments_multichar_input_filter : public multichar_input_filter {
public:
    explicit shell_comments_multichar_input_filter(char comment_char = '#')
        : comment_char_(comment_char), skip_(false)
        { }

    template<typename Source>
    std::streamsize read(Source& src, char* s, std::streamsize n)
    {
        for (std::streamsize z = 0; z < n; ++z) {
            int c;
            while (true) {
                if ((c = boost::iostreams::get(src)) == EOF)
                    return z != 0 ? z : -1;
                else if (c == WOULD_BLOCK)
                    return z;
                skip_ = c == comment_char_ ?
                    true :
                    c == '\n' ?
                        false :
                        skip_;
                if (!skip_)
                    break;
            }
            s[z] = c;
        }
        return n;
    }

    template<typename Source>
    void close(Source&) { skip_ = false; }
private:
    char comment_char_;
    bool skip_;
};

} } } // End namespace boost::iostreams:example

Note that the implementation of read is very similar to what you would get if you put the implementation of shell_comments_input_filter::get inside a for loop iterating from 0 to n. InputFilters which call themselves recursively, such as tab_expanding_input_filter, are much harder to transform into Multi-Character filters.

Multi-Character OutputFilters

A typical narrow-character Multi-Character OutputFilter looks like this:

#include <iosfwd>                          // streamsize
#include <boost/iostreams/categories.hpp>  // tags

class my_output_filter {
public:
    typedef char char_type;
    struct category 
        : output_filter_tag,
          multichar_tag
        { };

    template<typename Sink>
    std::streamsize write(Sink& dest, const char* s, std::streamsize n)
    {
        // Consume up to n filtered characters from the buffer s,
        // writing filtered characters to dest. Return the number
        // of characters consumed.
    }

    /* Other members */
};

Notice that the member type category is a struct convertible to keyword and to multichar_tag. This tells the Iostream library that my_output_filter is a Multi-Character Filter and an OutputFilter. As with Multi-Character InputFilters, you could have achieved the same effect several different ways. E.g.,

    struct category 
        : output,
          filter_tag,
          multichar_tag
        { };

        /* or */

    typedef multichar_output_filter_tag category;

(For details, see Mode Tags and Category Tags.)

You could also write the above example as follows:

#include <iosfwd>                       // streamsize
#include <boost/iostreams/concepts.hpp> // multichar_output_filter

class my_output_filter : public multichar_output_filter {
public:
    template<typename Sink>
    std::streamsize write(Sink& dest, const char* s, std::streamsize n);

    /* Other members */
};

Here multichar_output_filter is a convenience base class which provides the member types char_type and category, as well as no-op implementations of member functions close and imbue.

shell_comments_multichar_output_filter

You can express a shell comments Filter as an Multi-Character OutputFilter as follows:

#include <boost/iostreams/char_traits.hpp> // EOF, WOULD_BLOCK
#include <boost/iostreams/concepts.hpp>    // multichar_output_filter
#include <boost/iostreams/operations.hpp>  // get

namespace boost { namespace iostreams { namespace example {

class shell_comments_multichar_output_filter : public multichar_output_filter {
public:
    explicit shell_comments_multichar_output_filter(char comment_char = '#')
        : comment_char_(comment_char), skip_(false)
        { }

    template<typename Sink>
    std::streamsize write(Sink& dest, const char* s, std::streamsize n)
    {
        std::streamsize z;
        for (z = 0; z < n; ++z) {
            int c = s[z];
            skip_ = c == comment_char_ ?
                true :
                c == '\n' ?
                    false :
                    skip_;
            if (skip_)
                continue;
            if (!iostreams::put(dest, c))
                break;
        }
        return z;
    }

    template<typename Source>
    void close(Source&) { skip_ = false; }
private:
    char comment_char_;
    bool skip_;
};

} } } // End namespace boost::iostreams:example

Note that the implementation of write is very similar to what you would get if you put the implementation of shell_comments_output_filter::put inside a for loop iterating from 0 to n. OutputFilters which call themselves recursively, such as unix2dos_output_filter, are much harder to transform into Multi-Character filters.