Parsing Structured Data in C++

Parsing Structured Data in C++

Introduction

AngelCode is known for creating game development tools, including the widely used BMFont tool for generating bitmap fonts. The AngelCode BMFont format has become a standard for managing font data and textures efficiently.

Before we dive into writing a layout algorithm for bitmap fonts in a follow-up post, we first need to parse the data in this format. To accomplish this, we’ll implement a utility class called FixedKeyValueParser to handle the structured key-value pairs, which will be a crucial part of the parser we’re developing for the AngelCode BMFont format.

Structured key-value pairs are a common data format, pairing each key with a corresponding value. This format is popular for its readability and ease of parsing due to its consistent structure. Here are some real-world examples:

  • Log Files: In many systems, logs are written with multiple key-value pairs describing an event or transaction.
timestamp=2024-08-31T12:45:00Z level=info message="User logged in" userId=12345 sessionId=abcde
  • Command-Line Arguments: Command-line tools often accept arguments in a key-value format, which can be easily parsed.
mytool --input=file.txt --output=result.txt --mode=fast
  • Assembly Code Annotations: Sometimes in low-level programming, annotations include key-value pairs that describe instructions or data.
MOV R1, R2 ; comment="Move data" opcode=0x89

These examples highlight the versatility of structured key-value pairs across different domains. Our FixedKeyValueParser utility class will be designed to handle these formats, making it a robust tool for parsing structured data.

Effective software design emphasizes the importance of creating reusable and adaptable components. The FixedKeyValueParser embodies this principle, serving as a building block for more complex systems. This approach not only saves development time but also results in cleaner, more maintainable code.

This tutorial uses C++17 and assumes familiarity with the language. You can explore the complete code for the FixedKeyValueParser here.

Design Overview and Requirements

Before diving into the implementation, it's essential to understand both the design of the FixedKeyValueParser class and the requirements that guided its structure.

The class is designed to meet several key requirements:

  • Fixed Set of Expected Keys: The parser should expect a predefined set of keys, which must all be present in the input string.
  • Key-Value Pair Parsing: : The parser should correctly parse key-value pairs, where each key is followed by an '=' sign and a corresponding value, handling any surrounding whitespace appropriately.
  • Handle Missing or Extra Keys Gracefully: The parser should ignore extra keys that are not in the predefined set and should fail if any expected keys are missing.
  • Return Results Incrementally: The parser should allow for incremental parsing, returning each parsed key-value pair one at a time.
  • Successful Parsing Indicator: After processing, the parser should indicate whether all expected keys were successfully parsed.

To achieve these goals, the class features two public-facing APIs for external usage, alongside three internal helper methods that manage the core parsing logic. This structure ensures a clear separation of concerns, making the class easy to use while encapsulating the complexity of the parsing process. Additionally, the class includes two member variables to store the input data and expected keys, which are essential for the parsing logic.

Alt text

The name FixedKeyValueParser was carefully chosen to clearly reflect the class's purpose. 'Fixed' indicates that the parser is tailored to work with a specific set of predefined keys. 'KeyValue' highlights that it handles structured key-value data, while 'Parser' signifies its role in processing and extracting this information. This straightforward name makes the class easy to understand and work with, both now and in the future.

Usage Examples

Before we dive into the implementation of the FixedKeyValueParser, it can be helpful to see how this class will be used in practice. Let’s explore two quick examples that demonstrate the parser's capabilities:

Scenario 1: Parsing All Required Keys

This scenario demonstrates the parser successfully extracting and printing all expected key-value pairs.

int main()
{
    std::string input = "dingo id=65 x=65 y=30 width=20   height=30 xoffset=2 yoffset=4 apple xadvance=22";  // "dingo" and "apple" are non key-value pairs
    std::vector<std::string> expectedKeys = {"id", "x", "y", "width", "height", "xoffset", "yoffset", "xadvance"};

    FixedKeyValueParser parser(input, expectedKeys);

    while (auto kvPairOpt = parser.ParseNextKeyValuePair()) 
    {
        auto [key, value] = kvPairOpt.value();
        std::cout << key << " = " << value << std::endl;
    }

    if (parser.IsAllKeysParsed()) 
    {
        std::cout << "All keys parsed successfully!" << std::endl;
    }

    return 0;
}

This will produce the following output:

id = 65
x = 65
y = 30
width = 20
height = 30
All keys parsed successfully!

Notice how the parser correctly handles the extra spaces after 'width=20' and skips over the non key-value pairs like 'dingo' and 'apple'.

Scenario 2: Missing Some Keys

This scenario shows the parser detecting that some expected keys were missing, highlighting its ability to handle incomplete input.

int main()
{
    std::string input = "id=65 x=65 width=20 height=30 xoffset=2 yoffset=4"; // Missing "y" and "xadvance"
    std::vector<std::string> expectedKeys = {"id", "x", "y", "width", "height", "xoffset", "yoffset", "xadvance"};

    FixedKeyValueParser parser(input, expectedKeys);

    while (auto kvPairOpt = parser.ParseNextKeyValuePair()) 
    {
        auto [key, value] = kvPairOpt.value();
        std::cout << key << " = " << value << std::endl;
    }

    if (!parser.IsAllKeysParsed()) 
    {
        std::cout << "Not all keys were parsed." << std::endl;
    }
}

This will produce the following output:

id = 65
x = 65
width = 20
height = 30
Not all keys were parsed.

Implementation

Class Implementation

This initializes the parser with a line of input and a vector of expected keys. The input string is stored in a string stream for easy token extraction.

class FixedKeyValueParser
{
public:
    explicit FixedKeyValueParser(const std::string& line, std::vector<std::string> expectedKeys)
        : mStream(line)
        , mExpectedKeys(std::move(expectedKeys))
    { }

    ...

private:
    std::istringstream mStream;
    std::vector<std::string> mExpectedKeys;
};

Extracting Tokens

This reads the next token from the input stream. If no more tokens are available, it returns false.

bool ExtractNextToken(std::string& token)
{
    return static_cast<bool>(mStream >> token); // Return false if unable to read token
}

Parsing Tokens

This method splits the token into a key and a value using the '=' delimiter. It ensures that both the key and value are non-empty.

bool ParseToken(const std::string& token, std::string& parsedKey, std::string& value)
{
    auto delimiterPos = token.find('=');
    if (delimiterPos == std::string::npos)
    {
        return false; // Skip tokens that don't have a '='
    }

    parsedKey = token.substr(0, delimiterPos);
    value = token.substr(delimiterPos + 1);

    return !(parsedKey.empty() || value.empty()); // Return false if key or value is empty
}

Processing Key-Value Pairs

This method checks if the parsed key is one of the expected keys. If so, it removes the key from the list of expected keys, ensuring each key is only parsed once.

bool ProcessKeyValuePair(const std::string& key, const std::string& value)
{
    auto it = std::find(mExpectedKeys.begin(), mExpectedKeys.end(), key);
    if (it == mExpectedKeys.end())
    {
        return false; // Ignore tokens that don't match the expected keys
    }

    mExpectedKeys.erase(it);
    return true;
}

Parsing All Key-Value Pairs

This method drives the parsing process, extracting tokens, parsing them into key-value pairs, and processing them. It returns the next valid key-value pair or std::nullopt if no more valid pairs are found.

std::optional<std::pair<std::string, std::string>> ParseNextKeyValuePair()
{
    while (!mExpectedKeys.empty())
    {
        std::string token;
        if (!ExtractNextToken(token))
        {
            return std::nullopt;
        }

        std::string parsedKey, value;
        if (!ParseToken(token, parsedKey, value))
        {
            continue; // Skip invalid tokens
        }

        if (ProcessKeyValuePair(parsedKey, value))
        {
            return std::make_pair(parsedKey, value);
        }
    }

    return std::nullopt;
}

Verifying Completion of Parsing

This method checks if all expected keys have been successfully parsed.

bool IsAllKeysParsed() const
{
    return mExpectedKeys.empty(); // Return true if all expected keys have been parsed and removed.
}

The FixedKeyValueParser class simplifies parsing structured data with predefined keys. It's a reusable tool that can be easily integrated into more complex systems, helping to keep your code maintainable and scalable. Stay tuned for a follow-up post where we’ll put this parser to work in a real-world scenario!

That wraps up this article. Stay tuned for the next installment as we continue this Epic Quest in software development!

social