...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
class offset_separator
The offset_separator class is an implementation of the TokenizerFunction concept that can be used with the tokenizer class to break text up into tokens. The offset_separator breaks a sequence of Char's into strings based on a sequence of offsets. For example, if you had the string "12252001" and offsets (2,2,4) it would break the string into 12 25 2001. Here is an example.
// simple_example_3.cpp #include<iostream> #include<boost/tokenizer.hpp> #include<string> int main(){ using namespace std; using namespace boost; string s = "12252001"; int offsets[] = {2,2,4}; offset_separator f(offsets, offsets+3); tokenizer<offset_separator> tok(s,f); for(tokenizer<offset_separator>::iterator beg=tok.begin(); beg!=tok.end();++beg){ cout << *beg << "\n"; } }
The offset_separator has 1 constructor of interest. (The default constructor is just there to make some compilers happy). The declaration is below
template<typename Iter> offset_separator(Iter begin,Iter end,bool bwrapoffsets = true, bool breturnpartiallast = true)
Parameter |
Description |
begin, end | Specify the sequence of integer offsets. |
bwrapoffsets | Tells whether to wrap around to the beginning of the offsets when the all the offsets have been used. For example the string "1225200101012002" with offsets (2,2,4) with bwrapoffsets to true, would parse to 12 25 2001 01 01 2002. With bwrapoffsets to false, it would parse to 12 25 2001 and then stop because all the offsets have been used. |
breturnpartiallast | Tells whether, when the parsed sequence terminates before yielding the number of characters in the current offset, to create a token with what was parsed, or to ignore it. For example the string "122501" with offsets (2,2,4) with breturnpartiallast set to true will parse to 12 25 01. With it set to false, it will parse to 12 25 and then will stop because there are only 2 characters left in the sequence instead of the 4 that should have been there. |
To use this class, pass an object of it anywhere a TokenizerFunction is required. If you default constructruct the object, it will just return every character in the parsed sequence as a token. (ie it defaults to an offset of 1, and bwrapoffsets is true).
Revised 25 December, 2006
Copyright © 2001 John R. Bandela
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)