Python | データ構造強化：collections.Counter

概要（Counter は「要素の出現回数」を一瞬で数える専用辞書）
基本の使い方（ここが重要）
実務での勘所（重要ポイントの深掘り）
ファイル・テキスト集計の定番パターン
例題で身につける（定番から一歩先まで）
まとめ

概要（Counter は「要素の出現回数」を一瞬で数える専用辞書）

collections.Counter は、リストや文字列など「繰り返し可能なデータ」に含まれる各要素が何回出てきたかを数えるためのクラスです。辞書 dict の仲間なのでキーが要素、値が回数になりますが、集計や並べ替えに便利なメソッドが最初から備わっています。手書きで for と if を組み合わせるより短く、安全で、読みやすいコードになります。

from collections import Counter

data = ["apple", "banana", "apple", "orange", "banana", "apple"]
c = Counter(data)
print(c)              # Counter({'apple': 3, 'banana': 2, 'orange': 1})
print(c["apple"])     # 3（辞書アクセス）
print(c.most_common())# 上位から一覧

from collections import Counter

data = ["apple", "banana", "apple", "orange", "banana", "apple"]
c = Counter(data)
print(c)              # Counter({'apple': 3, 'banana': 2, 'orange': 1})
print(c["apple"])     # 3（辞書アクセス）
print(c.most_common())# 上位から一覧

Python

基本の使い方（ここが重要）

作り方とアクセス（イテラブルを渡すだけ）

Counter はイテラブル（リスト、タプル、文字列など）を渡すだけで集計できます。辞書と同様に c[key] で回数にアクセスできます。存在しないキーは 0 を返すため、KeyError が出ずに安全です。

from collections import Counter

c = Counter("abracadabra")  # 文字列もOK
print(c["a"])               # 5
print(c["z"])               # 0（存在しない → 0）

from collections import Counter

c = Counter("abracadabra")  # 文字列もOK
print(c["a"])               # 5
print(c["z"])               # 0（存在しない → 0）

Python

追加入力・更新（update と直接加算）

集計後にさらにデータを追加したいときは update を使います。直接 c[key] += 1 でも加算できます。

from collections import Counter

c = Counter(["tea", "coffee", "tea"])
c.update(["tea", "sugar"])  # 追加集計
c["coffee"] += 2            # 直接加算
print(c)  # Counter({'tea': 3, 'coffee': 3, 'sugar': 1})

from collections import Counter

c = Counter(["tea", "coffee", "tea"])
c.update(["tea", "sugar"])  # 追加集計
c["coffee"] += 2            # 直接加算
print(c)  # Counter({'tea': 3, 'coffee': 3, 'sugar': 1})

Python

よく使う出力（most_common・elements）

most_common は「出現回数の多い順」のタプル一覧を返します。elements は回数に応じて要素を並べたイテレータを返します（0や負は無視）。

from collections import Counter

c = Counter(["A", "B", "A", "C", "B", "A"])
print(c.most_common(2))       # [('A', 3), ('B', 2)]
print(list(c.elements()))     # ['A', 'A', 'A', 'B', 'B', 'C']

from collections import Counter

c = Counter(["A", "B", "A", "C", "B", "A"])
print(c.most_common(2))       # [('A', 3), ('B', 2)]
print(list(c.elements()))     # ['A', 'A', 'A', 'B', 'B', 'C']

Python

実務での勘所（重要ポイントの深掘り）

「dict より Counter」を選ぶ場面

頻度集計: ログのエラーコード、CSV の商品名、テキストの単語などを即集計。
Top-N 抽出: most_common(n) で上位 n 件を一発取得。
ゼロ安全: 未出現キーは 0 を返すため、KeyError 対策が不要。

from collections import Counter

words = "coffee tea tea sugar coffee coffee".split()
c = Counter(words)
for w, cnt in c.most_common(3):
    print(w, cnt)  # 上位3語

from collections import Counter

words = "coffee tea tea sugar coffee coffee".split()
c = Counter(words)
for w, cnt in c.most_common(3):
    print(w, cnt)  # 上位3語

Python

フィルタ・マージ・差分（演算に強い）

Counter 同士は加算・減算・交差（最小）・和（最大）などの演算ができます。複数ファイルの集計をまとめる、差分だけ取りたい、といった用途に有効です。

from collections import Counter

c1 = Counter("aabbbc")   # a:2 b:3 c:1
c2 = Counter("abbcdd")   # a:1 b:2 c:1 d:2
print(c1 + c2)           # 合算
print(c1 - c2)           # 0未満は切り捨て（b:1 など）
print(c1 & c2)           # 各キーの最小値（交差）
print(c1 | c2)           # 各キーの最大値（和）

from collections import Counter

c1 = Counter("aabbbc")   # a:2 b:3 c:1
c2 = Counter("abbcdd")   # a:1 b:2 c:1 d:2
print(c1 + c2)           # 合算
print(c1 - c2)           # 0未満は切り捨て（b:1 など）
print(c1 & c2)           # 各キーの最小値（交差）
print(c1 | c2)           # 各キーの最大値（和）

Python

出力整形とゼロ・負の扱い

subtract を使うと負の値になり得ますが、elements や most_common は主に正のカウントに対して使うのが自然です。レポート前に「0 以下を落とす」などの前処理を入れると綺麗に整います。

from collections import Counter

c = Counter({"A": 3, "B": 1})
c.subtract({"A": 5, "B": 1, "C": 2})
# Counter({'A': -2, 'B': 0, 'C': -2})
clean = Counter({k: v for k, v in c.items() if v > 0})
print(clean)  # すべて正のものだけ

from collections import Counter

c = Counter({"A": 3, "B": 1})
c.subtract({"A": 5, "B": 1, "C": 2})
# Counter({'A': -2, 'B': 0, 'C': -2})
clean = Counter({k: v for k, v in c.items() if v > 0})
print(clean)  # すべて正のものだけ

Python

ファイル・テキスト集計の定番パターン

CSV の特定列を集計して上位を出す

import csv
from collections import Counter

def top_items(path: str, col: int, n: int = 5) -> list[tuple[str, int]]:
    c = Counter()
    with open(path, "r", encoding="utf-8", newline="") as f:
        for row in csv.reader(f):
            if len(row) > col:
                c[row[col]] += 1
    return c.most_common(n)

print(top_items("sales.csv", col=0))

import csv
from collections import Counter

def top_items(path: str, col: int, n: int = 5) -> list[tuple[str, int]]:
    c = Counter()
    with open(path, "r", encoding="utf-8", newline="") as f:
        for row in csv.reader(f):
            if len(row) > col:
                c[row[col]] += 1
    return c.most_common(n)

print(top_items("sales.csv", col=0))

Python

単語頻度の集計（簡易トークナイザ）

from collections import Counter
import re

def word_freq(text: str, n: int = 10) -> list[tuple[str, int]]:
    tokens = re.findall(r"[A-Za-z]+", text.lower())
    c = Counter(tokens)
    return c.most_common(n)

print(word_freq("Coffee coffee tea; sugar, TEA!"))

from collections import Counter
import re

def word_freq(text: str, n: int = 10) -> list[tuple[str, int]]:
    tokens = re.findall(r"[A-Za-z]+", text.lower())
    c = Counter(tokens)
    return c.most_common(n)

print(word_freq("Coffee coffee tea; sugar, TEA!"))

Python

拡張子別のファイル数集計（pathlib と併用）

from pathlib import Path
from collections import Counter

def ext_counts(root: str) -> Counter:
    c = Counter()
    for p in Path(root).rglob("*"):
        if p.is_file():
            c[p.suffix.lower()] += 1
    return c

print(ext_counts("downloads").most_common())

from pathlib import Path
from collections import Counter

def ext_counts(root: str) -> Counter:
    c = Counter()
    for p in Path(root).rglob("*"):
        if p.is_file():
            c[p.suffix.lower()] += 1
    return c

print(ext_counts("downloads").most_common())

Python

複数ディレクトリの集計を合算

from collections import Counter

def merge_counters(*cs: Counter) -> Counter:
    total = Counter()
    for c in cs:
        total += c
    return total

from collections import Counter

def merge_counters(*cs: Counter) -> Counter:
    total = Counter()
    for c in cs:
        total += c
    return total

Python

例題で身につける（定番から一歩先まで）

例題1：上位 N 商品の一覧を出力（出現回数順）

from collections import Counter

items = ["coffee", "tea", "coffee", "sugar", "coffee", "tea"]
c = Counter(items)
for name, cnt in c.most_common(2):
    print(f"{name}: {cnt}")  # coffee:3, tea:2

from collections import Counter

items = ["coffee", "tea", "coffee", "sugar", "coffee", "tea"]
c = Counter(items)
for name, cnt in c.most_common(2):
    print(f"{name}: {cnt}")  # coffee:3, tea:2

Python

例題2：ログレベルの頻度集計を行列で整形

from collections import Counter

levels = ["INFO", "WARN", "INFO", "ERROR", "WARN", "INFO"]
c = Counter(levels)
print(f"|{'LEVEL':<6}|{'COUNT':>5}|")
for lv in ["INFO", "WARN", "ERROR", "DEBUG"]:
    print(f"|{lv:<6}|{c[lv]:>5}|")  # 未出現も 0 で安全

from collections import Counter

levels = ["INFO", "WARN", "INFO", "ERROR", "WARN", "INFO"]
c = Counter(levels)
print(f"|{'LEVEL':<6}|{'COUNT':>5}|")
for lv in ["INFO", "WARN", "ERROR", "DEBUG"]:
    print(f"|{lv:<6}|{c[lv]:>5}|")  # 未出現も 0 で安全

Python

例題3：差分抽出（今期で増えた／減った項目）

from collections import Counter

prev = Counter({"coffee": 10, "tea": 5})
cur  = Counter({"coffee": 8,  "tea": 9, "sugar": 3})

inc = Counter({k: cur[k] - prev[k] for k in set(prev) | set(cur) if cur[k] - prev[k] > 0})
dec = Counter({k: prev[k] - cur[k] for k in set(prev) | set(cur) if prev[k] - cur[k] > 0})
print("増加:", inc.most_common())
print("減少:", dec.most_common())

from collections import Counter

prev = Counter({"coffee": 10, "tea": 5})
cur  = Counter({"coffee": 8,  "tea": 9, "sugar": 3})

inc = Counter({k: cur[k] - prev[k] for k in set(prev) | set(cur) if cur[k] - prev[k] > 0})
dec = Counter({k: prev[k] - cur[k] for k in set(prev) | set(cur) if prev[k] - cur[k] > 0})
print("増加:", inc.most_common())
print("減少:", dec.most_common())

Python

例題4：elements で元データを復元（重み展開）

from collections import Counter

weights = Counter({"A": 2, "B": 3})
expanded = list(weights.elements())  # ['A', 'A', 'B', 'B', 'B']

from collections import Counter

weights = Counter({"A": 2, "B": 3})
expanded = list(weights.elements())  # ['A', 'A', 'B', 'B', 'B']

Python

まとめ

Counter は「頻度集計」を最短で、安全に、読みやすく実現するための道具です。イテラブルを渡すだけで辞書型のようにキー→回数が得られ、未出現キーも 0 で扱えるため例外に強い。most_common で上位抽出、update/elements/subtract で柔軟に操作でき、複数集計の合算や差分にも向いています。ファイルやテキストの集計と相性が抜群なので、まずは「手書きの辞書カウント」を卒業し、Counter を標準手段として習慣化しましょう。