哈夫曼编码是一种用于数据压缩的算法,它使用变长编码表对源符号(如文件中的一个字母)进行编码,其中变长编码表是通过一种评估源符号出现概率的方法得到的。源符号的出现概率越高,其对应的哈夫曼编码越短。
以下是一个简单的Python实现:
```python
import heapq
from collections import defaultdict
def calculate_frequency(s):
frequency = defaultdict(int)
for c in s:
frequency[c] += 1
return frequency
def huffman_encode(frequency):
heap = [[weight, [symbol, ""]] for symbol, weight in frequency.items()]
heapq.heapify(heap)
while len(heap) > 1:
lo = heapq.heappop(heap)
hi = heapq.heappop(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(heap)[1:], key=lambda p: (len(p[-1]), p))
s = "huffman encoding"
frequency = calculate_frequency(s)
huff = huffman_encode(frequency)
print("Symbol".ljust(10) + "Weight".ljust(10) + "Huffman Code")
for p in huff:
print(p[0].ljust(10) + str(frequency[p[0]]).ljust(10) + p[1])
```
在这个例子中,我们首先计算输入字符串中每个字符的频率,然后使用这些频率创建一个最小堆。然后,我们从堆中取出两个最小的元素,合并它们,并将结果放回堆中。我们重复这个过程,直到堆中只剩下一个元素。最后,我们打印出每个字符及其对应的哈夫曼编码。
以下是一个简单的Python实现:
```python
import heapq
from collections import defaultdict
def calculate_frequency(s):
frequency = defaultdict(int)
for c in s:
frequency[c] += 1
return frequency
def huffman_encode(frequency):
heap = [[weight, [symbol, ""]] for symbol, weight in frequency.items()]
heapq.heapify(heap)
while len(heap) > 1:
lo = heapq.heappop(heap)
hi = heapq.heappop(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(heap)[1:], key=lambda p: (len(p[-1]), p))
s = "huffman encoding"
frequency = calculate_frequency(s)
huff = huffman_encode(frequency)
print("Symbol".ljust(10) + "Weight".ljust(10) + "Huffman Code")
for p in huff:
print(p[0].ljust(10) + str(frequency[p[0]]).ljust(10) + p[1])
```
在这个例子中,我们首先计算输入字符串中每个字符的频率,然后使用这些频率创建一个最小堆。然后,我们从堆中取出两个最小的元素,合并它们,并将结果放回堆中。我们重复这个过程,直到堆中只剩下一个元素。最后,我们打印出每个字符及其对应的哈夫曼编码。