reflex::AbstractMatcher Class Reference

updated Thu Jan 26 2017
 
Classes | Public Types | Public Member Functions | Public Attributes | Protected Types | Protected Member Functions | Protected Attributes | Private Member Functions | List of all members
reflex::AbstractMatcher Class Referenceabstract

The abstract matcher base class template defines an interface for all pattern matcher engines. More...

#include <absmatcher.h>

Inheritance diagram for reflex::AbstractMatcher:
Inheritance graph
[legend]
Collaboration diagram for reflex::AbstractMatcher:
Collaboration graph
[legend]

Classes

struct  Const
 AbstractMatcher::Const common constants. More...
 
class  Iterator
 AbstractMatcher::Iterator class for scanning, searching, and splitting input character sequences. More...
 
class  Operation
 AbstractMatcher::Operation functor to match input to a pattern, also provides a (const) AbstractMatcher::iterator to iterate over matches. More...
 
struct  Option
 AbstractMatcher::Options for matcher engines. More...
 

Public Types

typedef AbstractMatcher::Iterator< AbstractMatcheriterator
 std::input_iterator for scanning, searching, and splitting input character sequences More...
 
typedef AbstractMatcher::Iterator< const AbstractMatcherconst_iterator
 

Public Member Functions

virtual void reset (const char *opt=NULL)
 Reset this matcher's state to the initial state and set options (when provided). More...
 
bool buffer (size_t blk=0)
 Set buffer block size for reading: use 1 for interactive input, 0 (or omit argument) to buffer all input in which case returns true if all the data could be read and false if a read error occurred. More...
 
void interactive (void)
 Set buffer to 1 for interactive input. More...
 
void flush (void)
 Flush the buffer's remaining content. More...
 
virtual AbstractMatcherinput (const Input &inp)
 Set the input character sequence for this matcher and reset the matcher. More...
 
size_t matches (void)
 Returns true if the entire input matches this matcher's pattern (and internally caches the true/false result for repeat invocations). More...
 
size_t accept (void) const
 Returns a positive integer (true) indicating the capture index of the matched text in the pattern or zero (false) for a mismatch. More...
 
const char * text (void) const
 Returns string with the text matched. More...
 
size_t size (void) const
 Returns the length of the matched text in number of bytes. More...
 
size_t wsize (void) const
 Returns the length of the matched text in number of (wide) characters. More...
 
size_t lineno (void) const
 Returns the line number of the match in the input character sequence. More...
 
size_t columno (void) const
 Returns the column number of matched text, counting wide characters (unless compiled with WITH_BYTE_COLUMNO). More...
 
std::pair< size_t, std::string > pair () const
 Returns a pair of size_t accept() and std::string text(), useful for tokenizing input into containers of pairs. More...
 
size_t first (void) const
 Returns the position of the first character starting the match in the input character sequence. More...
 
size_t last (void) const
 Returns the position of the last character + 1 after of the match in the input character sequence. More...
 
bool at_bob (void) const
 Returns true if this matcher is at the start of an input character sequence. Use reset() to restart input. More...
 
bool at_end (void)
 Returns true if this matcher has no more input to read from the input character sequence. More...
 
bool hit_end (void) const
 Returns true if this matcher hit the end of the input character sequence. More...
 
void set_end (bool eof)
 Set and force the end of input state. More...
 
bool at_bol (void) const
 Returns true if this matcher reached the begin of a new line. More...
 
void set_bol (bool bol)
 Set the begin of a new line state. More...
 
int input (void)
 Returns the next character from the input character sequence while preserving the current text match. More...
 
void unput (char c)
 Put back one character on the input character sequence for matching, invalidating the current match info and text. More...
 
const char * rest (void)
 Fetch the rest of the input as text, useful for searching/splitting up to n times after which the rest is needed. More...
 
void more (void)
 Append the next match to the currently matched text returned by AbstractMatcher::text, when the next match found is adjacent to the current match. More...
 
void less (size_t n)
 Truncate the AbstractMatcher::text length of the match to n characters in length and reposition for next match. More...
 
 operator size_t () const
 Cast this matcher to positive integer indicating the nonzero capture index of the matched text in the pattern, same as AbstractMatcher::accept. More...
 
 operator std::string () const
 Cast this matcher to a std::string of the text matched by this matcher. More...
 
 operator std::pair< size_t, std::string > () const
 Cast this matcher to a pair of size_t accept() and std::string text(), useful for tokenization into containers. More...
 
bool operator== (const char *rhs) const
 Returns true if matched text is equal to a string, useful for std::algorithm. More...
 
bool operator== (const std::string &rhs) const
 Returns true if matched text is equalt to a string, useful for std::algorithm. More...
 
bool operator== (size_t rhs) const
 Returns true if capture index is equal to a given size_t value, useful for std::algorithm. More...
 
bool operator== (int rhs) const
 Returns true if capture index is equal to a given int value, useful for std::algorithm. More...
 
bool operator!= (const char *rhs) const
 Returns true if matched text is not equal to a string, useful for std::algorithm. More...
 
bool operator!= (const std::string &rhs) const
 Returns true if matched text is not equal to a string, useful for std::algorithm. More...
 
bool operator!= (size_t rhs) const
 Returns true if capture index is not equal to a given size_t value, useful for std::algorithm. More...
 
bool operator!= (int rhs) const
 Returns true if capture index is not equal to a given int value, useful for std::algorithm. More...
 

Public Attributes

Operation scan
 functor to scan input (to tokenize input) More...
 
Operation find
 functor to search input More...
 
Operation split
 functor to split input More...
 
Input in
 input character sequence being matched by this matcher More...
 

Protected Types

typedef int Method
 

Protected Member Functions

 AbstractMatcher (const Input &inp, const char *opt)
 Construct a base abstract matcher. More...
 
 AbstractMatcher (const Input &inp, const Option &opt)
 Construct a base abstract matcher. More...
 
void init (const char *opt=NULL)
 Initialize the base abstract matcher at construction. More...
 
virtual size_t get (char *s, size_t n)
 Returns more input (method can be overriden as by reflex::FlexLexer::get to invoke reflex::FlexLexer::LexerInput). More...
 
virtual bool wrap (void)
 Returns true if wrapping of input after EOF is supported. More...
 
virtual size_t match (Method method)=0
 The abstract match operation implemented by pattern matching engines derived from AbstractMatcher. More...
 
bool grow (size_t need=Const::BLOCK)
 Shift or expand the internal buffer when it is too small to accommodate more input, where the buffer size is doubled when needed. More...
 
int get (void)
 Returns the next character from the buffered input character sequence. More...
 
int peek (void)
 Peek at the next character in the buffered input without consuming it. More...
 
void set_current (size_t loc)
 Set the current position to advance to the next match. More...
 

Protected Attributes

Option opt_
 options for matcher engines More...
 
char * buf_
 input character sequence buffer More...
 
const char * txt_
 points to the matched text in buffer AbstractMatcher::buf_ More...
 
size_t len_
 size of the matched text More...
 
size_t cap_
 nonzero capture index of an accepted match or zero More...
 
size_t cur_
 next position in AbstractMatcher::buf_ to assign to AbstractMatcher::txt_ More...
 
size_t pos_
 position in AbstractMatcher::buf_ after AbstractMatcher::txt_ More...
 
size_t end_
 ending position of the input buffered in AbstractMatcher::buf_ More...
 
size_t max_
 total buffer size and max position + 1 to fill More...
 
size_t ind_
 current indent position More...
 
size_t blk_
 block size for block-based input reading, as set by AbstractMatcher::buffer More...
 
int got_
 last unsigned character we looked at (to determine anchors and boundaries) More...
 
int chr_
 the character located at AbstractMatcher::buf_[AbstractMatcher::pos_] More...
 
size_t lno_
 line number count (prior to this buffered input) More...
 
size_t cno_
 column number count (prior to this buffered input) More...
 
size_t num_
 character count (number of characters flushed prior to this buffered input) More...
 
bool eof_
 input has reached EOF More...
 
bool mat_
 true if AbstractMatcher::matches() was successful More...
 

Private Member Functions

void update (void)
 Update the newline count, column count, and character count when shifting the buffer. More...
 

Detailed Description

The abstract matcher base class template defines an interface for all pattern matcher engines.

The buffer expands when matches do not fit. The buffer size is initially 2*BLOCK size.

_________________
| | | | |
buf_=| |text|rest|free|
|__|____|____|____|
^ ^ ^ ^
buf_ // points to buffered input, grows to fit long matches
cur_ // current position in buf_ while matching text, cur_ = pos_ afterwards, changed by more()
pos_ // position in buf_ to start the next match
end_ // position in buf_ that is free to fill with more input
max_ // allocated size of buf_
txt_ // buf_ + cur_ points to the match, \0-terminated
len_ // length of the match
chr_ // buf_[pos_] character after this match, with buf_[pos_] replaced with \0
got_ // buf_[cur_-1] character before this match (assigned before each match)

More info TODO

Member Typedef Documentation

std::input_iterator for scanning, searching, and splitting input character sequences

std::input_iterator for scanning, searching, and splitting input character sequences

typedef int reflex::AbstractMatcher::Method
protected

Constructor & Destructor Documentation

reflex::AbstractMatcher::AbstractMatcher ( const Input inp,
const char *  opt 
)
inlineprotected

Construct a base abstract matcher.

Parameters
inpinput character sequence for this matcher
optoption string of the form (A|N|T(=[[:digit:]])?|;)*
reflex::AbstractMatcher::AbstractMatcher ( const Input inp,
const Option opt 
)
inlineprotected

Construct a base abstract matcher.

Parameters
inpinput character sequence for this matcher
optoptions

Member Function Documentation

size_t reflex::AbstractMatcher::accept ( void  ) const
inline

Returns a positive integer (true) indicating the capture index of the matched text in the pattern or zero (false) for a mismatch.

Returns
nonzero capture index of the match in the pattern, which may be matcher dependent, or zero for a mismatch, or Const::EMPTY for the empty last split.
bool reflex::AbstractMatcher::at_bob ( void  ) const
inline

Returns true if this matcher is at the start of an input character sequence. Use reset() to restart input.

Returns
true if at the begin of an input sequence.
bool reflex::AbstractMatcher::at_bol ( void  ) const
inline

Returns true if this matcher reached the begin of a new line.

Returns
true if at begin of a new line.
bool reflex::AbstractMatcher::at_end ( void  )
inline

Returns true if this matcher has no more input to read from the input character sequence.

Returns
true if at end of input and a read attempt will produce EOF.
bool reflex::AbstractMatcher::buffer ( size_t  blk = 0)
inline

Set buffer block size for reading: use 1 for interactive input, 0 (or omit argument) to buffer all input in which case returns true if all the data could be read and false if a read error occurred.

Returns
true when successful to buffer all input when n=0.
Warning
Use this method before any matching is done and before any input is read since the last time input was (re)set.
Parameters
blknew block size between 1 and Const::BLOCK, or 0 to buffer all
size_t reflex::AbstractMatcher::columno ( void  ) const
inline

Returns the column number of matched text, counting wide characters (unless compiled with WITH_BYTE_COLUMNO).

Returns
column number.
size_t reflex::AbstractMatcher::first ( void  ) const
inline

Returns the position of the first character starting the match in the input character sequence.

Returns
position in the input character sequence.
void reflex::AbstractMatcher::flush ( void  )
inline

Flush the buffer's remaining content.

virtual size_t reflex::AbstractMatcher::get ( char *  s,
size_t  n 
)
inlineprotectedvirtual

Returns more input (method can be overriden as by reflex::FlexLexer::get to invoke reflex::FlexLexer::LexerInput).

Parameters
s
Returns
the nonzero number of (less or equal to n) 8-bit characters added to buffer s from the current input, or zero when EOF. points to the string buffer to fill with input
Parameters
nsize of buffer pointed to by s
int reflex::AbstractMatcher::get ( void  )
inlineprotected

Returns the next character from the buffered input character sequence.

Returns
the character read (unsigned char 0..255) or EOF (-1).
bool reflex::AbstractMatcher::grow ( size_t  need = Const::BLOCK)
inlineprotected

Shift or expand the internal buffer when it is too small to accommodate more input, where the buffer size is doubled when needed.

Returns
true if buffer was shifted or was enlarged
Parameters
needoptional needed space = Const::BLOCK size by default
bool reflex::AbstractMatcher::hit_end ( void  ) const
inline

Returns true if this matcher hit the end of the input character sequence.

Returns
true if EOF was hit (and possibly more input would have changed the result), false otherwise (but next read attempt may return EOF immediately).
void reflex::AbstractMatcher::init ( const char *  opt = NULL)
inlineprotected

Initialize the base abstract matcher at construction.

Parameters
optoptions
virtual AbstractMatcher& reflex::AbstractMatcher::input ( const Input inp)
inlinevirtual

Set the input character sequence for this matcher and reset the matcher.

Returns
this matcher.
Parameters
inpinput character sequence for this matcher
int reflex::AbstractMatcher::input ( void  )
inline

Returns the next character from the input character sequence while preserving the current text match.

Returns
the character read (unsigned char 0..255) read or EOF (-1).
void reflex::AbstractMatcher::interactive ( void  )
inline

Set buffer to 1 for interactive input.

Warning
Use this method before any matching is done and before any input is read since the last time input was (re)set.
size_t reflex::AbstractMatcher::last ( void  ) const
inline

Returns the position of the last character + 1 after of the match in the input character sequence.

Returns
position in the input character sequence.
void reflex::AbstractMatcher::less ( size_t  n)
inline

Truncate the AbstractMatcher::text length of the match to n characters in length and reposition for next match.

Parameters
ntruncated string length
size_t reflex::AbstractMatcher::lineno ( void  ) const
inline

Returns the line number of the match in the input character sequence.

Returns
line number.
virtual size_t reflex::AbstractMatcher::match ( Method  method)
protectedpure virtual

The abstract match operation implemented by pattern matching engines derived from AbstractMatcher.

Returns
nonzero when input matched the pattern using method Const::SCAN, Const::FIND, Const::SPLIT, or Const::MATCH.

Implemented in reflex::Matcher, reflex::StdMatcher, and reflex::BoostMatcher.

size_t reflex::AbstractMatcher::matches ( void  )
inline

Returns true if the entire input matches this matcher's pattern (and internally caches the true/false result for repeat invocations).

Returns
true if the entire input matched this matcher's pattern.
void reflex::AbstractMatcher::more ( void  )
inline

Append the next match to the currently matched text returned by AbstractMatcher::text, when the next match found is adjacent to the current match.

reflex::AbstractMatcher::operator size_t ( ) const
inline

Cast this matcher to positive integer indicating the nonzero capture index of the matched text in the pattern, same as AbstractMatcher::accept.

Returns
nonzero capture index of a match, which may be matcher dependent, or zero for a mismatch.
reflex::AbstractMatcher::operator std::pair< size_t, std::string > ( ) const
inline

Cast this matcher to a pair of size_t accept() and std::string text(), useful for tokenization into containers.

Returns
a std::pair of size_t accept() and std::string text().
reflex::AbstractMatcher::operator std::string ( ) const
inline

Cast this matcher to a std::string of the text matched by this matcher.

Returns
std::string allocated from NUL-terminated matched text.
bool reflex::AbstractMatcher::operator!= ( const char *  rhs) const
inline

Returns true if matched text is not equal to a string, useful for std::algorithm.

Returns
true if matched text is not equal to rhs string.
Parameters
rhsrhs string to compare to
bool reflex::AbstractMatcher::operator!= ( const std::string &  rhs) const
inline

Returns true if matched text is not equal to a string, useful for std::algorithm.

Returns
true if matched text is not equal to rhs string.
Parameters
rhsrhs string to compare to
bool reflex::AbstractMatcher::operator!= ( size_t  rhs) const
inline

Returns true if capture index is not equal to a given size_t value, useful for std::algorithm.

Returns
true if capture index is not equal to rhs.
Parameters
rhscapture index to compare accept() to
bool reflex::AbstractMatcher::operator!= ( int  rhs) const
inline

Returns true if capture index is not equal to a given int value, useful for std::algorithm.

Returns
true if capture index is not equal to rhs.
Parameters
rhscapture index to compare accept() to
bool reflex::AbstractMatcher::operator== ( const char *  rhs) const
inline

Returns true if matched text is equal to a string, useful for std::algorithm.

Returns
true if matched text is equal to rhs string.
Parameters
rhsrhs string to compare to
bool reflex::AbstractMatcher::operator== ( const std::string &  rhs) const
inline

Returns true if matched text is equalt to a string, useful for std::algorithm.

Returns
true if matched text is equal to rhs string.
Parameters
rhsrhs string to compare to
bool reflex::AbstractMatcher::operator== ( size_t  rhs) const
inline

Returns true if capture index is equal to a given size_t value, useful for std::algorithm.

Returns
true if capture index is equal to rhs.
Parameters
rhscapture index to compare accept() to
bool reflex::AbstractMatcher::operator== ( int  rhs) const
inline

Returns true if capture index is equal to a given int value, useful for std::algorithm.

Returns
true if capture index is equal to rhs.
Parameters
rhscapture index to compare accept() to
std::pair<size_t,std::string> reflex::AbstractMatcher::pair ( ) const
inline

Returns a pair of size_t accept() and std::string text(), useful for tokenizing input into containers of pairs.

Returns
a std::pair of size_t accept() and std::string text().
int reflex::AbstractMatcher::peek ( void  )
inlineprotected

Peek at the next character in the buffered input without consuming it.

Returns
the character (unsigned char 0..255) or EOF (-1).
virtual void reflex::AbstractMatcher::reset ( const char *  opt = NULL)
inlinevirtual

Reset this matcher's state to the initial state and set options (when provided).

Reimplemented in reflex::Matcher, reflex::BoostMatcher, and reflex::StdMatcher.

const char* reflex::AbstractMatcher::rest ( void  )
inline

Fetch the rest of the input as text, useful for searching/splitting up to n times after which the rest is needed.

Returns
const char* string of the remaining input (wrapped when AbstractMatcher::wrap is defined).
void reflex::AbstractMatcher::set_bol ( bool  bol)
inline

Set the begin of a new line state.

void reflex::AbstractMatcher::set_current ( size_t  loc)
inlineprotected

Set the current position to advance to the next match.

Parameters
locnew location in buffer
void reflex::AbstractMatcher::set_end ( bool  eof)
inline

Set and force the end of input state.

size_t reflex::AbstractMatcher::size ( void  ) const
inline

Returns the length of the matched text in number of bytes.

Returns
match size in bytes.
const char* reflex::AbstractMatcher::text ( void  ) const
inline

Returns string with the text matched.

Returns
NUL-terminated const char* string.
void reflex::AbstractMatcher::unput ( char  c)
inline

Put back one character on the input character sequence for matching, invalidating the current match info and text.

Parameters
ccharacter to put back
void reflex::AbstractMatcher::update ( void  )
inlineprivate

Update the newline count, column count, and character count when shifting the buffer.

virtual bool reflex::AbstractMatcher::wrap ( void  )
inlineprotectedvirtual

Returns true if wrapping of input after EOF is supported.

Returns
true if input was succesfully wrapped.
size_t reflex::AbstractMatcher::wsize ( void  ) const
inline

Returns the length of the matched text in number of (wide) characters.

Returns
the length of the match in number of (wide, multibyte UTF-8) characters.

Member Data Documentation

size_t reflex::AbstractMatcher::blk_
protected

block size for block-based input reading, as set by AbstractMatcher::buffer

char* reflex::AbstractMatcher::buf_
protected

input character sequence buffer

size_t reflex::AbstractMatcher::cap_
protected

nonzero capture index of an accepted match or zero

int reflex::AbstractMatcher::chr_
protected

the character located at AbstractMatcher::buf_[AbstractMatcher::pos_]

size_t reflex::AbstractMatcher::cno_
protected

column number count (prior to this buffered input)

size_t reflex::AbstractMatcher::cur_
protected

next position in AbstractMatcher::buf_ to assign to AbstractMatcher::txt_

size_t reflex::AbstractMatcher::end_
protected

ending position of the input buffered in AbstractMatcher::buf_

bool reflex::AbstractMatcher::eof_
protected

input has reached EOF

Operation reflex::AbstractMatcher::find

functor to search input

int reflex::AbstractMatcher::got_
protected

last unsigned character we looked at (to determine anchors and boundaries)

Input reflex::AbstractMatcher::in

input character sequence being matched by this matcher

size_t reflex::AbstractMatcher::ind_
protected

current indent position

size_t reflex::AbstractMatcher::len_
protected

size of the matched text

size_t reflex::AbstractMatcher::lno_
protected

line number count (prior to this buffered input)

bool reflex::AbstractMatcher::mat_
protected

true if AbstractMatcher::matches() was successful

size_t reflex::AbstractMatcher::max_
protected

total buffer size and max position + 1 to fill

size_t reflex::AbstractMatcher::num_
protected

character count (number of characters flushed prior to this buffered input)

Option reflex::AbstractMatcher::opt_
protected

options for matcher engines

size_t reflex::AbstractMatcher::pos_
protected
Operation reflex::AbstractMatcher::scan

functor to scan input (to tokenize input)

Operation reflex::AbstractMatcher::split

functor to split input

const char* reflex::AbstractMatcher::txt_
protected

points to the matched text in buffer AbstractMatcher::buf_


The documentation for this class was generated from the following file: