C++新手进阶：cmdline源码剖析

发布时间：2024年01月09日

介绍

今天向大家介绍一个名为cmdline的简单命令行解析器，这个工具是用C++开发的，而且非常易于使用，只需包含一个头文件即可。开源地址：https://github.com/tanakh/cmdline ，目前已经有1.3k的star了。

cmdline的特点在于其简洁性和易用性。对于C++新手来说，只需要几行代码就能完成命令行参数的解析。这个工具是一个"header-only"库，意味着你只需将头文件包含到你的项目中，而无需担心额外的编译和链接步骤。
cmdline的代码量仅有不到1000行，很适合cpp初学者阅读学习。下面我们就来剖析一下它的实现原理。

使用方法

使用cmdline非常简单，只需要包含一个头文件就可以了。下面是一个简单的例子：

#include "cmdline.h"

#include <iostream>
#include <string>
using namespace std;

int main(int argc, char *argv[])
{
  cmdline::parser a;
  a.add<string>("host", 'h', "host name", true, "");
  a.add<int>("port", 'p', "port number", false, 80, cmdline::range(1, 65535));
  a.add<string>("type", 't', "protocol type", false, "http", cmdline::oneof<string>("http", "https", "ssh", "ftp"));
  a.add("gzip", '\0', "gzip when transfer");

  a.parse_check(argc, argv);

  cout << a.get<string>("type") << "://"
       << a.get<string>("host") << ":"
       << a.get<int>("port") << endl;

  if (a.exist("gzip")) cout << "gzip" << endl;

  return 0;
}

上面的代码中，我们首先定义了一个cmdline::parser对象，然后通过add方法添加了几个命令行参数。
其中，add方法的的参数依次是：
第一个参数是参数的全名；
第二个参数是参数的简写；
第三个参数是参数的描述；
第四个参数是参数是否必须（默认是false）；
第五个参数是参数的默认值（当第四个参数为false时有效）；
第六个参数是参数的额外约束（比如参数的取值范围）。
最后，我们通过parse_check方法来解析命令行参数，然后通过get方法来获取参数的值。

输出结果

使用g++编译上面的代码，然后运行，输出结果如下：

$ g++ ./test.cpp -o test
$ ./test --host=github.com -p 443
https://github.com:443

cmdline的使用非常简单，只需要几行代码就可以完成命令行参数的解析。下面我们来看一下它的实现原理。

实现原理

在test.cpp中，我们首先定义了一个cmdline::parser对象，然后后续操作都是通过这个对象来完成的。
我们先屏蔽掉具体的函数实现，来看一下cmdline::parser的声明：

class parser
{
public:
    parser();
    ~parser();

    void add(const std::string &name,
                char short_name = 0,
                const std::string &desc = "");

    template <class T>
    void add(const std::string &name,
                char short_name = 0,
                const std::string &desc = "",
                bool need = true,
                const T def = T());

    template <class T, class F>
    void add(const std::string &name,  
                char short_name = 0,
                const std::string &desc = "",
                bool need = true,
                const T def = T(),
                F reader = F());  


    void footer(const std::string &f);

    void set_program_name(const std::string &name);

    bool exist(const std::string &name) const;
    template <class T>
    const T &get(const std::string &name) const;

    const std::vector<std::string> &rest() const;


    bool parse(const std::string &arg); 
    bool parse(const std::vector<std::string> &args); 
    bool parse(int argc, const char *const argv[]); 


    void parse_check(const std::string &arg);
    void parse_check(const std::vector<std::string> &args);
    void parse_check(int argc, char *argv[]);

    std::string error() const;

    std::string error_full() const;

    std::string usage() const;


private:
    void check(int argc, bool ok);
    void set_option(const std::string &name);

    void set_option(const std::string &name, const std::string &value);

    class option_base;

    class option_without_value : public option_base;

    template <class T>
    class option_with_value : public option_base
    {
    public:
        option_with_value(const std::string &name,
                        char short_name,
                        bool need,
                        const T &def,
                        const std::string &desc);
    };

    template <class T, class F>
    class option_with_value_with_reader : public option_with_value<T>;


    std::map<std::string, option_base *> options;
    std::vector<option_base *> ordered;
    std::string ftr;

    std::string prog_name;
    std::vector<std::string> others;

    std::vector<std::string> errors;
};

从上面的代码中，我们可以看到cmdline::parser类的成员函数非常多，大致上可以分为以下几类：
1. 添加命令行参数的方法：add、footer、set_program_name；
2. 解析命令行参数的方法：parse、parse_check；
3. 获取命令行参数的方法：exist、get、rest；
4. 获取错误信息的方法：error、error_full；
5. 获取帮助信息的方法：usage；
6. 内部使用的方法：check、set_option。

同时，观察cmdline::parser的声明，我们还可以发现，它定义了几个内部变量：
1. options：用来保存命令行参数的信息；
2. ordered：用来保存命令行参数的顺序；
3. ftr：用来保存命令行参数的描述信息；
4. prog_name：用来保存程序的名称；
5. others：用来保存命令行参数中的非选项参数；
6. errors：用来保存错误信息。
此外，cmdline::parser还定义了几个内部类，用来保存命令行参数的信息：
1. option_base：命令行参数的基类；
2. option_without_value：不带值的命令行参数；
3. option_with_value：带值的命令行参数；
4. option_with_value_with_reader：带值的命令行参数，且有自定义的读取函数。
其中option_base是其他三个类的基类，cmdline::parser中的options变量是一个option_base*的map，用来保存命令行参数的信息。从这些信息中，我们可以看出，cmdline::parser类的设计非常巧妙，它将命令行参数的信息保存在options变量中，而add、exist、get等方法都是对options变量的操作。下面我们来看一下cmdline::parser的实现。

option_base类以及其子类

option_base类是其他三个类的基类，它的定义如下：

 class option_base
{
public:
    virtual ~option_base() {}

    virtual bool has_value() const = 0;
    virtual bool set() = 0;
    virtual bool set(const std::string &value) = 0;
    virtual bool has_set() const = 0;
    virtual bool valid() const = 0;
    virtual bool must() const = 0;

    virtual const std::string &name() const = 0;
    virtual char short_name() const = 0;
    virtual const std::string &description() const = 0;
    virtual std::string short_description() const = 0;
};

从上面的代码中，我们可以看到，option_base类定义了一些纯虚函数，用来获取命令行参数的信息。此外，option_base类还定义了一些纯虚函数，用来设置命令行参数的值。option_base类的子类有两个，分别是option_without_value和option_with_value，它们分别用来保存不带值的命令行参数和带值的命令行参数的信息。下面我们来看一下这两个类的实现。

option_without_value类

option_without_value类用来保存不带值的命令行参数的信息，它的定义如下：


 class option_without_value : public option_base
{
public:
    option_without_value(const std::string &name,
                        char short_name,
                        const std::string &desc)
        : nam(name), snam(short_name), desc(desc), has(false)
    {
    }
    ~option_without_value() {}

    bool has_value() const { return false; }

    bool set()
    {
        has = true;
        return true;
    }

    bool set(const std::string &) { return false; }

    bool has_set() const { return has;}

    bool valid() const { return true;}

    bool must() const { return false; }
 
    const std::string &name() const { return nam; }

    char short_name() const { return snam; }

    const std::string &description() const { return desc;}

    std::string short_description() const { return "--" + nam;}

private:
    std::string nam;
    char snam;
    std::string desc;
    bool has;
};

从上面的代码中，我们可以看到，option_without_value类的实现非常简单，它只有4个成员变量，分别是：
1. nam：命令行参数的全名；
2. snam：命令行参数的简写；
3. desc：命令行参数的描述；
4. has：命令行参数是否被设置。
option_without_value类对应cmdline::parser中的第一个add方法，它用来保存不带值的命令行参数的信息。

option_with_value类

option_with_value类用来保存带值的命令行参数的信息，它的定义如下：

template <class T>
class option_with_value : public option_base
{
public:
    option_with_value(const std::string &name,
                    char short_name,
                    bool need,
                    const T &def,
                    const std::string &desc)
        : nam(name), snam(short_name), need(need)
        , has(false), def(def), actual(def)
    {
        this->desc = full_description(desc);
    }
    ~option_with_value() {}

    const T &get() const { return actual; }

    bool has_value() const { return true; }

    bool set() { return false; }

    bool set(const std::string &value)
    {
        try
        {
            actual = read(value);
            has = true;
        }
        catch (const std::exception &e)
        {
            return false;
        }
        return true;
    }

    bool has_set() const { return has; }

    bool valid() const
    {
        if (need && !has)
            return false;
        return true;
    }

    bool must() const { return need; }
    const std::string &name() const { return nam; }

    char short_name() const { return snam;}

    const std::string &description() const { return desc;}

    std::string short_description() const
    {
        return "--" + nam + "=" + detail::readable_typename<T>();
    }

protected:
    std::string full_description(const std::string &desc)
    {
        return desc + " (" + detail::readable_typename<T>() +
            (need ? "" : " [=" + detail::default_value<T>(def) + "]") + ")";
    }

    virtual T read(const std::string &s) = 0;

    std::string nam;
    char snam;
    bool need;
    std::string desc;

    bool has;
    T def;
    T actual;
};

option_with_value类的实现比较复杂，它有7个成员变量，分别是：
1. nam：命令行参数的全名；
2. snam：命令行参数的简写；
3. need：命令行参数是否必须；
4. desc：命令行参数的描述；
5. has：命令行参数是否被设置；
6. def：命令行参数的默认值；
7. actual：命令行参数的实际值；
option_with_value类对应cmdline::parser中的第二个add方法，它用来保存带值的命令行参数的信息。

option_with_value_with_reader类

option_with_value_with_reader类用来保存带reader的命令行参数的信息，它的定义如下：

template <class T, class F>
class option_with_value_with_reader : public option_with_value<T>
{
public:
    option_with_value_with_reader(const std::string &name,
                                char short_name,
                                bool need,
                                const T def,
                                const std::string &desc,
                                F reader)
        : option_with_value<T>(name, short_name, need, def, desc), reader(reader)
    {
    }

private:
    T read(const std::string &s)
    {
        return reader(s);
    }

    F reader;
};

option_with_value_with_reader类的实现非常简单，它继承自option_with_value类，只是在read方法中调用了自定义的读取函数。option_with_value_with_reader类对应cmdline::parser中的第三个add方法，它用来保存带reader的命令行参数的信息。

option_base类以及其子类的设计非常巧妙，使用继承和多态的方式，将不同类型的命令行参数的信息保存在option_base类的子类中。在cmdline::parser中，通过map来保存命令行参数的信息，而map的key是命令行参数的全名，value是命令行参数的基类指针。这样，我们就可以通过基类指针来访问不同类型的命令行参数了。

add方法

add方法用来添加命令行参数，它有三个重载版本，分别是：

    void add(const std::string &name,
             char short_name = 0,
             const std::string &desc = "")
    {
      if (options.count(name))
        throw cmdline_error("multiple definition: " + name);
      options[name] = new option_without_value(name, short_name, desc);
      ordered.push_back(options[name]);
    }

    template <class T>
    void add(const std::string &name,
             char short_name = 0,
             const std::string &desc = "",
             bool need = true,
             const T def = T())
    {
      add(name, short_name, desc, need, def, default_reader<T>());
    }

    template <class T, class F>
    void add(const std::string &name,
             char short_name = 0,
             const std::string &desc = "",
             bool need = true,
             const T def = T(),
             F reader = F())
    {
      if (options.count(name))
        throw cmdline_error("multiple definition: " + name);
      options[name] = new option_with_value_with_reader<T, F>(name, short_name, need, def, desc, reader);
      ordered.push_back(options[name]);
    }

从上面的代码中，我们可以看到，add方法的实现非常简单，它只是将命令行参数的信息保存到options变量中，并将命令行参数的顺序保存到ordered变量中。通过不同的重载版本，add方法可以添加不同类型的命令行参数，比如不带值的命令行参数、带值的命令行参数、带值的命令行参数且有自定义的读取函数等。

parse方法

parse方法用来解析命令行参数，它是最重要的方法之一，命令行的参数解析都是通过这个方法来完成的。parse方法有三个重载版本，其中bool parse(int argc, const char *const argv[])是最重要的版本，它的实现如下：

bool parse(int argc, const char *const argv[])
{
    errors.clear();  //清空错误信息
    others.clear(); //清空非选项参数

    if (argc < 1) //参数个数小于1，返回false
    {
        errors.push_back("argument number must be longer than 0");
        return false;
    }

    if (prog_name == "")  //程序名称为空，设置为第一个参数
        prog_name = argv[0];

    std::map<char, std::string> lookup;  //初始化选项查找

    //遍历选项，对于每个具有非空短选项名称的选项，检查该短选项名称是否已经存在于lookup映射中。
    for (std::map<std::string, option_base *>::iterator p = options.begin();
        p != options.end(); p++)
    {
        if (p->first.length() == 0)  //选项名称为空，跳过
            continue;
        char initial = p->second->short_name();  
        if (initial)  //短选项名称不为空
        {
            if (lookup.count(initial) > 0)  //短选项名称已经存在于lookup映射中
            {
                lookup[initial] = "";
                errors.push_back(std::string("short option '") + initial + "' is ambiguous");
                return false;
            }
            else //短选项名称不存在于lookup映射中
                lookup[initial] = p->first;
        }
    }

    //循环遍历每个命令行参数，从索引1开始
    for (int i = 1; i < argc; i++)
    {
        if (strncmp(argv[i], "--", 2) == 0) //长选项
        {
            const char *p = strchr(argv[i] + 2, '=');
            if (p) //包含值
            {
                std::string name(argv[i] + 2, p);
                std::string val(p + 1);
                set_option(name, val);
            }
            else //不包含值
            {
                std::string name(argv[i] + 2);
                if (options.count(name) == 0)
                {
                    errors.push_back("undefined option: --" + name);
                    continue;
                }
                if (options[name]->has_value()) 
                {
                    if (i + 1 >= argc)
                    {
                    errors.push_back("option needs value: --" + name);
                    continue;
                    }
                    else
                    {
                    i++;
                    set_option(name, argv[i]);
                    }
                }
                else
                {
                    set_option(name);
                }
            }
        }
        else if (strncmp(argv[i], "-", 1) == 0)  //短选项
        {
            if (!argv[i][1])
                continue;
            char last = argv[i][1];
            for (int j = 2; argv[i][j]; j++)
            {
                last = argv[i][j];
                if (lookup.count(argv[i][j - 1]) == 0)
                {
                    errors.push_back(std::string("undefined short option: -") + argv[i][j - 1]);
                    continue;
                }
                if (lookup[argv[i][j - 1]] == "")
                {
                    errors.push_back(std::string("ambiguous short option: -") + argv[i][j - 1]);
                    continue;
                }
                set_option(lookup[argv[i][j - 1]]);
            }

            if (lookup.count(last) == 0)
            {
                errors.push_back(std::string("undefined short option: -") + last);
                continue;
            }
            if (lookup[last] == "") 
            {
                errors.push_back(std::string("ambiguous short option: -") + last);
                continue;
            }

        
            if (i + 1 < argc && options[lookup[last]]->has_value())
            {
                set_option(lookup[last], argv[i + 1]);
                i++;
            }
            else
            {
                set_option(lookup[last]);
            }
        }
        else
        {
            others.push_back(argv[i]); //普通参数
        }
    }

    for (std::map<std::string, option_base *>::iterator p = options.begin();
        p != options.end(); p++)
    if (!p->second->valid())
        errors.push_back("need option: --" + std::string(p->first));

    return errors.size() == 0;
}

从上面的代码中，我们可以看到，parse方法的实现非常复杂，它的实现逻辑如下：
1.初始化：
1.1.清空errors和others向量，确保它们处于空状态；
1.2.对参数个数argc进行初始检查，如果小于等于0，则向errors向量添加一条错误信息，并返回false；
1.3.选项查找初始化：
1.3.1.创建一个名为lookup的映射，用于存储短选项字符和对应的长选项名称的映射关系；
1.3.2.遍历选项，对于每个具有非空短选项名称的选项，检查该短选项名称是否已经存在于lookup映射中。如果是，则向errors向量添加一条表示短选项模糊的错误信息。否则，将短选项添加到lookup映射中；
2.解析命令行参数：
2.1.循环遍历每个命令行参数，从索引1开始；
2.2.如果一个参数以"–“开头（表示长选项），则进一步检查是否包含值（由等号指定）。然后，设置相应的选项为指定的值，或者如果该选项未定义，则向errors向量添加错误；
2.3.如果一个参数以”-“开头（表示短选项），则对其中的每个字符进行处理。如果遇到未定义或模糊的短选项，向errors向量添加错误；
2.4.如果短选项需要值，并且下一个参数可用，则将其设置为该值。否则，添加错误；
2.5.如果一个参数既不以”–“开头，也不以”-"开头，则将其视为普通参数，并添加到others向量中；
3.验证和最终检查:
3.1.在解析所有参数后，使用最后一个循环检查是否有任何必需的选项缺失。对于options映射中的每个选项，如果相应的valid方法返回false，则向errors向量添加一条错误信息；
3.2.如果没有错误（errors.size() == 0），则函数返回true；否则返回false。

至此，cmdline的基本原理已经介绍完毕。接下来，我们来看一下cmdline的其他工具类。
在cmdline中，还定义了一些工具类，用来保存命令行参数的值，比如range、oneof、default_reader等。下面我们来看一下这些工具类的实现。

reader类

default_reader类用来保存命令行参数的值的范围，它的定义如下：

template <class T>
struct default_reader
{
    T operator()(const std::string &str)
    {
        return detail::lexical_cast<T>(str);
    }
};

template <class T>
struct range_reader
{
    range_reader(const T &low, const T &high) : low(low), high(high) {}
    T operator()(const std::string &s) const
    {
        T ret = default_reader<T>()(s);
        if (!(ret >= low && ret <= high))
        throw cmdline::cmdline_error("range_error");
        return ret;
    }

    private:
    T low, high;
};
template <class T>
range_reader<T> range(const T &low, const T &high)
{
    return range_reader<T>(low, high);
}


template <class T>
struct oneof_reader
{
    T operator()(const std::string &s)
    {
        T ret = default_reader<T>()(s);
        if (std::find(alt.begin(), alt.end(), ret) == alt.end())
        throw cmdline_error("");
        return ret;
    }
    void add(const T &v) { alt.push_back(v); }

    private:
    std::vector<T> alt;
};
template <class T>
oneof_reader<T> oneof(T a1)
{
    oneof_reader<T> ret;
    ret.add(a1);
    return ret;
}

default_reader类的实现非常简单，它只是调用detail::lexical_cast方法来将字符串转换为相应的类型。range_reader类用来保存命令行参数的值的范围，oneof_reader类用来保存命令行参数的值的可选值。这三个类都是模板类，可以保存不同类型的命令行参数的值。通过range、oneof方法，我们可以创建range_reader、oneof_reader类的实例。

lexical_cast方法

lexical_cast方法用来将字符串转换为相应的类型，它的定义如下：

  namespace detail
  {

    template <typename Target, typename Source, bool Same>
    class lexical_cast_t
    {
    public:
      static Target cast(const Source &arg)
      {
        Target ret;
        std::stringstream ss;
        if (!(ss << arg && ss >> ret && ss.eof()))
          throw std::bad_cast();

        return ret;
      }
    };

    template <typename Target, typename Source>
    class lexical_cast_t<Target, Source, true>
    {
    public:
      static Target cast(const Source &arg)
      {
        return arg;
      }
    };

    template <typename Source>
    class lexical_cast_t<std::string, Source, false>
    {
    public:
      static std::string cast(const Source &arg)
      {
        std::ostringstream ss;
        ss << arg;
        return ss.str();
      }
    };

    template <typename Target>
    class lexical_cast_t<Target, std::string, false>
    {
    public:
      static Target cast(const std::string &arg)
      {
        Target ret;
        std::istringstream ss(arg);
        if (!(ss >> ret && ss.eof()))
          throw std::bad_cast();
        return ret;
      }
    };

    template <typename T1, typename T2>
    struct is_same
    {
      static const bool value = false;
    };

    template <typename T>
    struct is_same<T, T>
    {
      static const bool value = true;
    };

    template <typename Target, typename Source>
    Target lexical_cast(const Source &arg)
    {
      return lexical_cast_t<Target, Source, detail::is_same<Target, Source>::value>::cast(arg);
    }

    static inline std::string demangle(const std::string &name)
    {
      int status = 0;
      char *p = abi::__cxa_demangle(name.c_str(), 0, 0, &status);
      std::string ret(p);
      free(p);
      return ret;
    }

    template <class T>
    std::string readable_typename()
    {
      return demangle(typeid(T).name());
    }

    template <class T>
    std::string default_value(T def)
    {
      return detail::lexical_cast<std::string>(def);
    }

    template <>
    inline std::string readable_typename<std::string>()
    {
      return "string";
    }

  }

lexical_cast方法返回一个Target类型的值，它的参数是一个Source类型的值。lexical_cast方法通过模板参数推导，可以将字符串转换为相应的类型，也可以将相应的类型转换为字符串。lexical_cast方法的实现非常复杂，它的实现逻辑如下：
如果Target和Source类型相同，则直接返回Source类型的值；如果Target和Source类型不同，则通过std::stringstream来转换类型；如果转换失败，则抛出std::bad_cast异常；

在使用时，可以直接将oneof或者range方法作为add方法的参数，也可以自定义一个读取函数，然后将其作为add方法的参数。下面我们来看一下cmdline的使用方法。

    cmdline::parser a;
    a.add<string>("type", 't', "protocol type", false, "http", cmdline::oneof<string>("http", "https", "ssh", "ftp"));

oneof返回一个oneof_reader类的实例，在add函数中会被转发到option_with_value_with_reader类的构造函数中。从而实现了命令行参数的值的可选值。

以上就是cmdline的全部内容了。

文章来源:https://blog.csdn.net/2301_81179790/article/details/135468000
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：chenni525@qq.com进行投诉反馈，一经查实，立即删除！