CPP学习笔记—priority_queue

1. `priority_queue` 是什么？

std::priority_queue 是 C++ 标准模板库（STL）中的一个容器适配器，它提供了一种特殊的队列：优先级队列。

核心概念：优先出队

与普通的队列（std::queue）遵循“先进先出”（FIFO）的原则不同，优先级队列中的元素并非根据其进入队列的顺序出队，而是根据其优先级。每次从队列中取出的元素（通过 top() 访问，pop() 移除），都是当前队列中优先级最高的那个。

默认情况下，对于数字，“大”的元素优先级更高；对于其他类型，使用 std::less 作为比较函数，即通过 operator< 来判断优先级。

底层数据结构：堆 (Heap)

priority_queue 的高效实现得益于其底层的堆数据结构。通常是最大堆 (Max-Heap)。

堆：一个特殊的完全二叉树，满足“堆属性”：父节点的值总是大于或等于（最大堆）或小于或等于（最小堆）其所有子节点的值。
最大堆 (Max-Heap)：根节点是整个堆中最大的元素。std::priority_queue 默认就是最大堆。
最小堆 (Min-Heap)：根节点是整个堆中最小的元素。

由于这个特性，访问优先级最高的元素（即堆顶元素）的操作非常快。

与 `std::queue` 的区别

特性	`std::priority_queue`	`std::queue`
出队规则	优先级最高（默认最大）的元素先出	先进来的元素先出 (FIFO)
访问队首	`top()`	`front()`
底层实现	堆 (通常是 `std::vector`)	双端队列 (通常是 `std::deque`)
元素顺序	内部维持堆有序，非完全排序	保持插入顺序

容器适配器 (Container Adapter)

priority_queue 本身不是一个独立的容器，它是一个容器适配器。它包装了另一个序列容器（如 std::vector 或 std::deque），并利用该容器来存储元素，同时对外提供优先级队列的接口。默认情况下，它使用 std::vector 作为其底层容器。

2. 何时使用 `priority_queue`？

当你需要处理一个集合，并且总是需要快速访问和移除其中“最重要”的元素时，就应该使用 priority_queue。常见应用场景包括：

Top K 问题：从大量数据中找出最大或最小的 K 个元素。
图算法：
- Dijkstra 算法中用于找到下一个距离最短的未访问顶点。
- Prim 算法中用于找到连接到当前树的最小权重的边。
任务调度：操作系统中，用于管理待处理的任务，总是先执行优先级最高的任务。
事件驱动模拟：按时间顺序处理事件队列，时间最早的事件优先级最高。
合并 K 个有序列表。

3. 基本用法

包含头文件和声明

#include <iostream>
#include <queue> // priority_queue 在 <queue> 头文件中
#include <vector>

// 声明一个存储 int 的默认优先级队列（最大堆）
std::priority_queue<int> pq;

核心操作 (`push`, `pop`, `top`)

push(const T& value): 将元素 value 压入队列，并保持堆的属性。
top() const: 返回对队首（优先级最高）元素的常量引用。注意，它只返回，不移除。
pop(): 移除队首元素。注意，它没有返回值。

重要：获取并移除队首元素的标准操作是两步：

用 top() 获取元素值。
用 pop() 将其从队列中移除。

辅助操作 (`size`, `empty`)

size() const: 返回队列中元素的数量。
empty() const: 检查队列是否为空。

一个简单的例子

#include <iostream>
#include <queue>

int main() {
    // 默认是最大堆
    std::priority_queue<int> max_heap;

    max_heap.push(30);
    max_heap.push(100);
    max_heap.push(25);
    max_heap.push(40);

    std::cout << "Max-heap contents (from highest to lowest priority):" << std::endl;
    while (!max_heap.empty()) {
        std::cout << max_heap.top() << " "; // 访问最高优先级的元素
        max_heap.pop();                     // 移除最高优先级的元素
    }
    std::cout << std::endl;

    return 0;
}

输出:

1 2	Max-heap contents (from highest to lowest priority): 100 40 30 25

4. 定制 `priority_queue`

模板参数详解

priority_queue 的完整模板声明如下：

template<
    class T,
    class Container = std::vector<T>,
    class Compare = std::less<typename Container::value_type>
> class priority_queue;

T: 存储的元素类型。
Container: 用于存储元素的底层容器。必须是支持 front(), push_back(), pop_back() 等操作的序列容器。std::vector (默认) 和 std::deque 是常用选择。
Compare: 一个比较函数对象（Functor），用于确定元素的优先级。它必须提供一个 bool operator()(const T& a, const T& b)。如果 operator()(a, b) 返回 true，则认为 b 的优先级比 a 高。
- 默认的 std::less<T> 使用 operator<，a < b 为 true 时 b 优先，这构成了最大堆。
- 要实现最小堆，我们需要一个比较器，当 a > b 为 true 时 b 优先，这就是 std::greater<T>。

如何实现最小堆 (Min-Heap)

只需在声明时提供后两个模板参数即可。

#include <iostream>
#include <queue>
#include <vector>
#include <functional> // for std::greater

int main() {
    // 声明一个最小堆
    std::priority_queue<int, std::vector<int>, std::greater<int>> min_heap;

    min_heap.push(30);
    min_heap.push(100);
    min_heap.push(25);
    min_heap.push(40);

    std::cout << "Min-heap contents (from lowest to highest priority):" << std::endl;
    while (!min_heap.empty()) {
        std::cout << min_heap.top() << " ";
        min_heap.pop();
    }
    std::cout << std::endl;

    return 0;
}

输出:

1 2	Min-heap contents (from lowest to highest priority): 25 30 40 100

如何用于自定义数据类型 (Custom Data Types)

假设我们有一个 Task 结构，我们希望优先级高的 Task 先被处理。

struct Task {
    std::string name;
    int priority;
};

有三种主流方法可以实现：

方法一：重载 operator< (最简单)
这是最直接的方法。因为默认比较器 std::less 使用 operator<，我们只需为 Task 重载它即可。

struct Task {
    std::string name;
    int priority;

    // 重载 < 操作符，priority 小的反而 "小于"（优先级更高）
    // 为了构建一个以 priority 值为标准的最大堆
    bool operator<(const Task& other) const {
        // 如果 this->priority < other.priority，返回 true
        // 这意味着 other 的优先级更高，符合最大堆的定义
        return this->priority < other.priority;
    }
};

int main() {
    std::priority_queue<Task> task_queue; // 使用默认模板参数

    task_queue.push({"Fix bug", 100});
    task_queue.push({"Write documentation", 20});
    task_queue.push({"Release new version", 999});

    while (!task_queue.empty()) {
        std::cout << task_queue.top().name << " (Priority: " << task_queue.top().priority << ")" << std::endl;
        task_queue.pop();
    }
    return 0;
}

输出 (priority 最高的先出):

1
2
3

Release new version (Priority: 999)
Fix bug (Priority: 100)
Write documentation (Priority: 20)

方法二：提供自定义比较器 (Functor) (最灵活)
当你不能或不想修改 Task 结构体，或者需要根据不同场景使用不同的排序规则时，此方法是最佳选择。

struct Task {
    std::string name;
    int priority;
};

// 自定义比较器结构体
struct CompareTask {
    bool operator()(const Task& a, const Task& b) {
        // 返回 true 表示 a 的优先级比 b 低
        // 我们希望 priority 大的优先级高，所以当 a.priority < b.priority 时返回 true
        return a.priority < b.priority;
    }
};

int main() {
    std::priority_queue<Task, std::vector<Task>, CompareTask> task_queue;

    task_queue.push({"Fix bug", 100});
    task_queue.push({"Write documentation", 20});
    task_queue.push({"Release new version", 999});
    
    // ... (输出同上)
}

如果你想要一个最小堆（priority 值最小的优先级最高），只需改变比较器的逻辑：
return a.priority > b.priority;

方法三：使用 Lambda 表达式 (C++11)
这是一种现代且简洁的方式，尤其适用于局部定义的比较逻辑。

struct Task {
    std::string name;
    int priority;
};

int main() {
    auto compare = [](const Task& a, const Task& b) {
        return a.priority < b.priority;
    };

    // 使用 decltype 获取 lambda 的类型
    std::priority_queue<Task, std::vector<Task>, decltype(compare)> task_queue(compare);

    task_queue.push({"Fix bug", 100});
    task_queue.push({"Write documentation", 20});
    task_queue.push({"Release new version", 999});
    
    // ... (输出同上)
}

注意：使用 Lambda 时，通常需要将 Lambda 对象实例传递给 priority_queue 的构造函数。

5. 性能分析

priority_queue 的性能非常出色，这得益于其底层的堆结构。

操作	时间复杂度
`push()`	O(log N)
`pop()`	O(log N)
`top()`	O(1)

其中 N 是队列中元素的数量。

push()：插入新元素到末尾，然后向上调整以维持堆属性，路径长度为 O(log N)。
pop()：将末尾元素移到堆顶，然后向下调整以恢复堆属性，路径长度为 O(log N)。
top()：访问堆顶元素是常数时间操作。

6. 综合示例：解决 Top K 问题

问题：在一个非常大的整数数组中，找到 K 个最大的元素。

如果对整个数组排序，时间复杂度是 O(N log N)。使用 priority_queue 可以更高效地解决，时间复杂度为 O(N log K)。

思路：

维护一个大小为 K 的最小堆。
遍历数组，依次处理每个元素：
- 如果堆的大小小于 K，直接将元素压入堆。
- 如果堆的大小等于 K，将当前元素与堆顶元素（即堆中最小的元素）比较。
- 如果当前元素比堆顶元素大，则弹出堆顶，并将当前元素压入。
遍历结束后，堆中剩下的 K 个元素就是整个数组中最大的 K 个元素。

代码实现：

#include <iostream>
#include <vector>
#include <queue>
#include <functional>

std::vector<int> findTopK(const std::vector<int>& nums, int k) {
    if (k <= 0 || k > nums.size()) return {};

    std::priority_queue<int, std::vector<int>, std::greater<int>> min_heap;

    for (int num : nums) {
        if (min_heap.size() < k) {
            min_heap.push(num);
        } else if (num > min_heap.top()) {
            min_heap.pop();
            min_heap.push(num);
        }
    }

    std::vector<int> result;
    while (!min_heap.empty()) {
        result.push_back(min_heap.top());
        min_heap.pop();
    }
    // 结果默认从小到大，如果需要从大到小，可以反转
    std::reverse(result.begin(), result.end());
    return result;
}

int main() {
    std::vector<int> data = {3, 2, 1, 5, 6, 4, 9, 8, 7};
    int k = 3;
    std::vector<int> topK = findTopK(data, k);

    std::cout << "The " << k << " largest elements are: ";
    for (int n : topK) {
        std::cout << n << " ";
    }
    std::cout << std::endl;

    return 0;
}

输出:

1	The 3 largest elements are: 9 8 7

1. priority_queue 是什么？

核心概念：优先出队

底层数据结构：堆 (Heap)

与 std::queue 的区别

容器适配器 (Container Adapter)

2. 何时使用 priority_queue？

3. 基本用法

包含头文件和声明

核心操作 (push, pop, top)

辅助操作 (size, empty)

一个简单的例子

4. 定制 priority_queue

模板参数详解

如何实现最小堆 (Min-Heap)

如何用于自定义数据类型 (Custom Data Types)

5. 性能分析

6. 综合示例：解决 Top K 问题

1. `priority_queue` 是什么？

与 `std::queue` 的区别

2. 何时使用 `priority_queue`？

核心操作 (`push`, `pop`, `top`)

辅助操作 (`size`, `empty`)

4. 定制 `priority_queue`