Xem mẫu

ostream_iterator(cout, "\n")); } ///:~ This example was suggested by Nathan Myers, who invented the istreambuf_iterator and its relatives. This iterator extracts information character-by-character from a stream. Although the istreambuf_iterator template argument might suggest to you that you could extract, for example, ints instead of char, that’s not the case. The argument must be of some character type – a regular char or a wide character. After the file is open, an istreambuf_iterator called p is attached to the istream so characters can be extracted from it. The set called wordlist will be used to hold the resulting words. The while loop reads words until the end of the input stream is found. This is detected using the default constructor for istreambuf_iterator which produces the past-the-end iterator object end. Thus, if you want to test to make sure you’re not at the end of the stream, you simply say p != end. The second type of iterator that’s used here is the insert_iterator, which creates an iterator that knows how to insert objects into a container. Here, the “container” is the string called word which, for the purposes of insert_iterator, behaves like a container. The constructor for insert_iterator requires the container and an iterator indicating where it should start inserting the characters. You could also use a back_insert_iterator, which requires that the container have a push_back( ) (string does). After the while loop sets everything up, it begins by looking for the first alpha character, incrementing start until that character is found. Then it copies characters from one iterator to the other, stopping when a non-alpha character is found. Each word, assuming it is non-empty, is added to wordlist. StreamTokenizer: a more flexible solution The above program parses its input into strings of words containing only alpha characters, but that’s still a special case compared to the generality of strtok( ). What we’d like now is an actual replacement for strtok( ) so we’re never tempted to use it. WordList2.cpp can be modified to create a class called StreamTokenizer that delivers a new token as a string whenever you call next( ), according to the delimiters you give it upon construction (very similar to strtok( )): //: C04:StreamTokenizer.h // C++ Replacement for Standard C strtok() #ifndef STREAMTOKENIZER_H #define STREAMTOKENIZER_H #include #include Chapter 15: Multiple Inheritance 201 #include class StreamTokenizer { typedef std::istreambuf_iterator It; It p, end; std::string delimiters; bool isDelimiter(char c) { return delimiters.find(c) != std::string::npos; } public: StreamTokenizer(std::istream& is, std::string delim = " \t\n;()\"<>:{}[]+-=&*#" ".,/\\~!0123456789") : p(is), end(It()), delimiters(delim) {} std::string next(); // Get next token }; #endif STREAMTOKENIZER_H ///:~ The default delimiters for the StreamTokenizer constructor extract words with only alpha characters, as before, but now you can choose different delimiters to parse different tokens. The implementation of next( ) looks similar to Wordlist2.cpp: //: C04:StreamTokenizer.cpp {O} #include "StreamTokenizer.h" using namespace std; string StreamTokenizer::next() { string result; if(p != end) { insert_iterator ii(result, result.begin()); while(isDelimiter(*p) && p != end) p++; while (!isDelimiter(*p) && p != end) *ii++ = *p++; } return result; } ///:~ The first non-delimiter is found, then characters are copied until a delimiter is found, and the resulting string is returned. Here’s a test: //: C04:TokenizeTest.cpp //{L} StreamTokenizer Chapter 15: Multiple Inheritance 202 // Test StreamTokenizer #include "StreamTokenizer.h" #include "../require.h" #include #include #include using namespace std; int main(int argc, char* argv[]) { requireArgs(argc, 1); ifstream in(argv[1]); assure(in, argv[1]); StreamTokenizer words(in); set wordlist; string word; while((word = words.next()).size() != 0) wordlist.insert(word); // Output results: copy(wordlist.begin(), wordlist.end(), ostream_iterator(cout, "\n")); } ///:~ Now the tool is more reusable than before, but it’s still inflexible, because it can only work with an istream. This isn’t as bad as it first seems, since a string can be turned into an istream via an istringstream. But in the next section we’ll come up with the most general, reusable tokenizing tool, and this should give you a feeling of what “reusable” really means, and the effort necessary to create truly reusable code. A completely reusable tokenizer Since the STL containers and algorithms all revolve around iterators, the most flexible solution will itself be an iterator. You could think of the TokenIterator as an iterator that wraps itself around any other iterator that can produce characters. Because it is designed as an input iterator (the most primitive type of iterator) it can be used with any STL algorithm. Not only is it a useful tool in itself, the TokenIterator is also a good example of how you can design your own iterators.18 The TokenIterator is doubly flexible: first, you can choose the type of iterator that will produce the char input. Second, instead of just saying what characters represent the delimiters, TokenIterator will use a predicate which is a function object whose operator( ) takes a char and decides if it should be in the token or not. Although the two examples given 18 This is another example coached by Nathan Myers. Chapter 15: Multiple Inheritance 203 here have a static concept of what characters belong in a token, you could easily design your own function object to change its state as the characters are read, producing a more sophisticated parser. The following header file contains the two basic predicates Isalpha and Delimiters, along with the template for TokenIterator: //: C04:TokenIterator.h #ifndef TOKENITERATOR_H #define TOKENITERATOR_H #include #include #include #include struct Isalpha { bool operator()(char c) { using namespace std; //[[For a compiler bug]] return isalpha(c); } }; class Delimiters { std::string exclude; public: Delimiters() {} Delimiters(const std::string& excl) : exclude(excl) {} bool operator()(char c) { return exclude.find(c) == std::string::npos; } }; template class TokenIterator: public std::iterator< std::input_iterator_tag,std::string,ptrdiff_t>{ InputIter first; InputIter last; std::string word; Pred predicate; public: TokenIterator(InputIter begin, InputIter end, Pred pred = Pred()) : first(begin), last(end), predicate(pred) { Chapter 15: Multiple Inheritance 204 ++*this; } TokenIterator() {} // End sentinel // Prefix increment: TokenIterator& operator++() { word.resize(0); first = std::find_if(first, last, predicate); while (first != last && predicate(*first)) word += *first++; return *this; } // Postfix increment class Proxy { std::string word; public: Proxy(const std::string& w) : word(w) {} std::string operator*() { return word; } }; Proxy operator++(int) { Proxy d(word); ++*this; return d; } // Produce the actual value: std::string operator*() const { return word; } std::string* operator->() const { return &(operator*()); } // Compare iterators: bool operator==(const TokenIterator&) { return word.size() == 0 && first == last; } bool operator!=(const TokenIterator& rv) { return !(*this == rv); } }; #endif // TOKENITERATOR_H ///:~ TokenIterator is inherited from the std::iterator template. It might appear that there’s some kind of functionality that comes with std::iterator, but it is purely a way of tagging an iterator so that a container that uses it knows what it’s capable of. Here, you can see input_iterator_tag as a template argument – this tells anyone who asks that a TokenIterator only has the capabilities of an input iterator, and cannot be used with algorithms requiring Chapter 15: Multiple Inheritance 205 ... - tailieumienphi.vn