程式師世界 >> 編程語言 >> C語言 >> C++ >> 關於C++ >> 萌新筆記——C++裡創立 Trie字典樹（中文詞典）（二）（拔出、查找、導入、導出）

萌新筆記——C++裡創立 Trie字典樹（中文詞典）（二）（拔出、查找、導入、導出）

編輯：關於C++

萌新筆記——C++裡創立 Trie字典樹（中文詞典）（二）（拔出、查找、導入、導出）。本站提示廣大學習愛好者：（萌新筆記——C++裡創立 Trie字典樹（中文詞典）（二）（拔出、查找、導入、導出））文章只能為提供參考，不一定能成為您想要的結果。以下是萌新筆記——C++裡創立 Trie字典樹（中文詞典）（二）（拔出、查找、導入、導出）正文

　　萌新做詞典第二篇，做得不好，還請指正，謝謝大佬！

　　做好了拔出與遍歷功用之後，我發現最根本的查找功用沒有完成，同時還希望可以把內存的數據存入文件保管上去，並可以從文件中導入詞典。此外，數據的途徑是存在配置文件中的。甚至，還想嘗試相似自動補全的功用。當然了，是做一個比擬low的補全，比方傳入“編程”，可以失掉”軟件“、”學習“、”學習網站“、”入門“四個字符串。但是傳入“編”不會失掉“程”，由於“編程”沒有錄入詞典。於是對原有的類停止了修正，並添加了一些函數。

　　先放上運轉後果吧：

 1 test1
 2 ID : 2    char : 件    word : 編程軟件
 3 ID : 3    char : 習    word : 編程學習
 4 ID : 4    char : 站    word : 編程學習網站
 5 ID : 1    char : 門    word : 編程入門
 6 
 7 test2
 8 ID : 1    char : 門    word : 編程入門
 9 ID : 3    char : 習    word : 編程學習
10 ID : 4    char : 站    word : 編程學習網站
11 ID : 2    char : 件    word : 編程軟件
12 find ID : 3    word : 編程學習

　　test1就是上一步裡，把word的id打印了出來，並導出到文件中。

　　test2是從文件中導入，然後打出後果，並查找“編程學習”，假如存在就打出id。

　　首先，新增了一個配置文件類DictionaryConf，如下：

DictionaryConf.h

 1 #ifndef __DICTIONARYCONF_H__
 2 #define __DICTIONARYCONF_H__
 3 
 4 #include <string>
 5 
 6 namespace ccx{
 7 
 8 using std::string;
 9 
10 class DictionaryConf
11 {
12     public:
13         DictionaryConf();
14         string getDictionaryPath();
15     private:
16         void GetConf();
17         string _DictionaryPath;
18 };
19 
20 }
21 
22 #endif

　　它的功用很復雜，在結構的時分讀取文件配相信息，對外只提供一個接口，就是獲取一個字符串。它的完成如下：

DictionaryConf.cc

 1 #include "DictionaryConf.h"
 2 #include <json/json.h>
 3 #include <stdlib.h>
 4 #include <fstream>
 5 #include <iostream>
 6 
 7 namespace ccx{
 8 
 9 using std::ifstream;
10 using std::cout;
11 using std::endl;
12 
13 DictionaryConf::DictionaryConf()
14 {
15     GetConf();
16 }
17 
18 void DictionaryConf::GetConf()
19 {
20     ifstream ifs;
21     ifs.open("DictionaryConf.json");
22     if(!ifs.good())
23     {
24         cout << "open DictionaryConf.json error" << endl;
25         exit(EXIT_FAILURE); 
26     }
27     
28     Json::Value root;
29     Json::Reader reader;
30 
31     if(!reader.parse(ifs, root, false))
32     {
33         cout << "DictionaryConf json read error" << endl;
34         exit(EXIT_FAILURE);
35     }
36     
37     _DictionaryPath = root["DictionaryPath"].asString();
38     
39     ifs.close();
40 }
41 
42 string DictionaryConf::getDictionaryPath()
43 {
44     return _DictionaryPath;
45 }
46 
47 }

　　配置文件是寫成json格式的，方便讀取：

DictionaryConf.json

1 {"DictionaryPath" : "Dictionary.json"}

　　然後是修正已有的Dictionary類，添加它的功用（總覺得是不是要變胖接口了0。0）

Dictionary.h

 1 #ifndef __DICTIONARY_H__
 2 #define __DICTIONARY_H__
 3 
 4 #include "DictionaryData.h"
 5 #include "DictionaryConf.h"
 6 
 7 #include <memory>
 8 #include <vector>
 9 #include <list>
10 
11 namespace ccx{
12 
13 using std::shared_ptr;
14 using std::vector;
15 using std::list;
16 
17 class Dictionary
18 {
19     typedef unordered_map<string, pDictElem>::iterator WordIt;
20     public:
21         Dictionary();
22         void push(const string & word);
23         void push(vector<string> & words);
24         int search(const string & word);//查找，前往詞的ID，假如沒找到前往-1
25     private:
26         void AddWord(const string & word, int wordId);
27         void splitWord(const string & word, vector<string> & characters);//把詞拆成字
28         pDictElem _dictionary;
29         DictionaryConf _conf;    
30 
31 //遍歷用
32     public:
33         void next();
34         string getCurChar();
35         string getCurWord();
36         int getCurWordId();
37         void resetPoint();
38         bool isEnd();
39     private:
40         void nextWord();
41         pDictElem _pcur;
42         WordIt _itcur;
43 
44 //用list完成棧，遍歷時方便
45         list<WordIt> _stackWord;
46         list<pDictElem> _stackDict;
47 
48 //導入導出
49     public:
50         void leading_in();
51         void leading_out();
52 };
53 
54 }
55 
56 #endif

　　這又是一個最終的類。比之前的多了一個查找的函數，一個Add函數，和導入導出的函數。

　　Add函數是原來的push改個名字，緣由就是從我這裡導出到文件的詞是包括他的ID的，導入的時分的ID是固定的，所以改成在添加的時分要傳入ID。關於提供的接口push，調用了Add函數。

　　改動過的Add和push如下：

 1 void Dictionary::AddWord(const string & word, int wordId)
 2 {
 3     vector<string> characters;
 4     splitWord(word, characters);
 5     
 6     vector<string>::iterator it_char;
 7     it_char = characters.begin();    
 8     pDictElem root;
 9     root = _dictionary;
10     for(; it_char != characters.end(); ++it_char)
11     {
12         WordIt it_word;
13         it_word = root->_words.find(*it_char);
14         
15         if(it_word == root->_words.end())
16         {
17             pair<string, pDictElem> temp;
18             temp.first = *it_char;
19             pDictElem dictemp(new DictElem);
20             dictemp->_word = *it_char;
21             dictemp->_wordId = 0;
22             temp.second = dictemp;
23             root->_words.insert(temp);
24             root = dictemp;
25         }else{
26             root = it_word->second;
27         }
28     }
29     if(!root->_wordId)
30     {
31         root->_wordId = wordId;
32     }
33 }

　　總覺得29行的if可以去掉呢--！但是沒有去試

 1 void Dictionary::push(const string & word)
 2 {
 3     ++(_dictionary->_wordId);
 4     AddWord(word, _dictionary->_wordId);
 5 }
 6 
 7 
 8 void Dictionary::push(vector<string> & words)
 9 {
10     int size = words.size();
11     for(int i = 0; i < size; ++i)
12     {
13         push(words[i]);
14     }
15 }

　　對root的wordID的自增放到了這裡。這樣，導和的時分就可以調用Add函數了。（好吧我就是懶得重新寫一個拔出的函數0。0）

　　接上去是查找的函數，比擬復雜，直接把Add拷過去改了一下：

 1 int Dictionary::search(const string & word)
 2 {
 3     vector<string> characters;
 4     splitWord(word, characters);
 5     
 6     vector<string>::iterator it_char;
 7     it_char = characters.begin();    
 8     pDictElem root;
 9     root = _dictionary;
10     for(; it_char != characters.end(); ++it_char)
11     {
12         WordIt it_word;
13         it_word = root->_words.find(*it_char);
14         
15         if(it_word == root->_words.end())
16         {
17             return -1;
18         }else{
19             root = it_word->second;
20         }
21     }
22     return root->_wordId;
23 }

　　關於以上，有另外一個想法：不論是在拔出還是查找，都是要先“查找”，看看有沒有那個節點。關於查找操作，只需前往有或許沒有就行了；而關於拔出，假如沒有就前往它的地位，從那裡開端添加節點，這樣似乎又可以復用一些代碼了 0。0

　　最後就是導入和導出了。

　　關於這塊，寫之前不斷在糾結，是要遍歷一切詞，把詞完好地存上去。采用json的格式，關於“編程學習”與“編程訓練”：

　　直接存進文件，前面帶上ID的效果：

1 [
2     { "word" : "編程學習" , "id" : 1},
3     { "word" : "編程訓練" , "id" : 2)
4 ]

　　還是把樹按節點存上去，

（後來發現其實這是一個序列化與反序列化的問題——2016.12.4 23:54 注）

 1 [
 2     {"key" : "編" , "flag" : 1, "id" : 0 , "value" : 
 3 　　　　{"key" : "程" , "flag" : 1 , "id" : 0 , "value" : 
 4 　　　　　　{"key" : "學" , "flag" : 1 , "id" : 0, "value" : 
 5 　　　　　　　　{"key" : "習" , "flag" : 0 , "id" : 1, "value" : "NULL"
 6 　　　　　　　　}
 7 　　　　　　},
 8 　　　　　　{"key" : "訓" , "flag" : 1 , "id" : 0, "value" : 
 9 　　　　　　　　{"key" : "練" , "flag" : 0 , "id" : 2, "value" : "NULL"
10 　　　　　　　　}
11 　　　　　　}
12 　　　　}
13 　　}
14 ]

　　大致是這樣一層嵌套一層的構造。雖然這種構造可以用棧來完成，即把Json::Value型數據入棧，但是後來想想這樣雖然一次只存一個結點，仿佛很浪費空間，但是每個節點還要存一些額定的信息，邏輯似乎比擬復雜，於是決議用第一種辦法來完成，比擬復雜粗爆。

　　代碼如下：

 1 void Dictionary::leading_out()
 2 {
 3     Json::Value root;
 4     Json::FastWriter writer;
 5 
 6     resetPoint();
 7     
 8     while(!isEnd())
 9     {
10         Json::Value elem;
11         elem["Word"] = getCurWord();
12         elem["WordId"] = getCurWordId();
13         root.append(elem);
14         next();
15     }
16 
17     string words;
18     words = writer.write(root);
19     
20     ofstream ofs;
21     const char * path = _conf.getDictionaryPath().c_str();
22     ofs.open(path);
23     if(!ofs.good())
24     {
25         cout << "open Dictionary.json error(leading_out)" << endl;
26         ofs.open("Dictionary.tmp");
27         if(!ofs.good())
28         {
29             exit(EXIT_FAILURE);
30         }
31     }
32     
33     ofs << words;
34     ofs.close();
35 }

　　存入文件的效果如下：

1 [{"Word":"編程軟件","WordId":2},{"Word":"編程學習","WordId":3},{"Word":"編程學習網站","WordId":4},{"Word":"編程入門","WordId":1}]

　　只要一行（不要在意id順序）。

　　之前想過一種辦法可以讓它以多行的方式存到文件中，做了如下處置：

 1     .
 2     .
 3     .
 4 ofs << "[" << endl;
 5 int size = root.size();
 6 for(int i = 0; i < size; ++i)
 7 {
 8     string json_file = writer.write(root[i]);
 9     int len = json_file.size();
10     json_file[len - 1] = ' ';
11     ofs << "\t" << json_file;
12     if(i != size -1)
13     {
14         ofs << "," << endl;
15     }
16     }
17 ofs << endl << "]" << endl;
18 ofs.close();
19     .
20     .
21     .

　　一項一項地取出，把最後的'\n'交換成空格，然後自己控制輸入格式。

　　導入格式要與導出格式對應，也運用json，如下：

 1 void Dictionary::leading_in()
 2 {
 3     ifstream ifs;
 4     const char * path = _conf.getDictionaryPath().c_str();
 5     ifs.open(path);
 6     if(!ifs.good())
 7     {
 8         cout << "open Dictionary.json error(leading_in)" << endl;
 9     }else{
10         Json::Value root;
11         Json::Reader reader;
12         
13         if(!reader.parse(ifs, root, false))
14         {
15             cout << "json read Dictionary.json error" << endl;
16         }else{
17             int size = root.size();
18             for(int i = 0; i < size; ++i)
19             {
20                 string word = root[i]["Word"].asString();
21                 int wordId = root[i]["WordId"].asInt();
22                 AddWord(word, wordId);
23                 ++(_dictionary->_wordId);
24             }
25         }
26     }
27 }

　　這裡要闡明一下，導出和導入翻開文件假如失敗了沒有exit的緣由：

　　關於導出，假如翻開一個文件失敗了直接exit，那內存中的東西就全部喪失了，於是就采用在當途徑下創立一個暫時文件來存要存的內容。

　　關於導入，翻開文件失敗只是闡明途徑有問題或許文件自身有問題，修正一下就可以了，或許只是沒有導入詞典而已，順序中假如曾經添加了一些詞，此時加入也會使內容喪失。

　　當然，目前的機制還是有問題的，比方詞ID的設置，假如在內存中已有詞的狀況下導入新詞，能夠會使ID紊亂。關於導出，或答應以設置一個機制：任何地位要加入的時分都把內容存入暫時文件中，能夠會更好。

　　做到這裡，忽然又想完成詞的補全功用：傳入“編程”，能傳出“學習”、“學習網站”等的集合，這個做好了再補上吧！