ねらい:重複を「見つける・残す・消す・まとめる」を型にして、速く安全に処理する
重複処理は現場で最も事故が多い領域です。VBAでは「配列I/O+Dictionary+正規化」で、検出・マーキング・除去・集約までを“超再利用部品”として標準化できます。初心者でも貼って動くテンプレを、単一キー・複合キー・最初/最後採用・全件保持・近似(あいまい)判定までかみ砕いて解説します。
基盤部品:配列I/O・キー正規化・安全な区切り
配列一括読み書きとキー生成
' ModDup_Base.bas
Option Explicit
Private Const SEP As String = Chr$(30) ' 複合キーの安全な区切り
Public Function ReadRegion(ByVal ws As Worksheet, Optional ByVal topLeft As String = "A1") As Variant
ReadRegion = ws.Range(topLeft).CurrentRegion.Value
End Function
Public Sub WriteBlock(ByVal ws As Worksheet, ByVal a As Variant, ByVal topLeft As String)
ws.Range(topLeft).Resize(UBound(a, 1), UBound(a, 2)).Value = a
End Sub
Public Function NormKey(ByVal v As Variant) As String
NormKey = LCase$(Trim$(CStr(v))) ' 大小・余分スペースを除去
End Function
Public Function MakeKey2(ByVal v1 As Variant, ByVal v2 As Variant) As String
MakeKey2 = NormKey(v1) & SEP & NormKey(v2)
End Function
VB重要ポイントの深掘り
- ヘッダは1行目固定、データは2行目から扱うと事故が激減します。
- キーは両側で同じ正規化(Trim/LCase)を通すのが鉄則。片側だけだとヒット漏れが起きます。
- 複合キーは“ありえない文字”で束ね、誤連結(例:ab|c と a|bc が同じになる)を根絶します。
重複検出とマーキング:どの行が重複かを見える化
重複フラグ列を付与(単一キー)
' ModDup_Flag.bas
Option Explicit
' srcSheet: A列=キー、他列=データ、outColLetter=フラグ出力列(例 "Z")
Public Sub FlagDuplicates(ByVal srcSheet As String, ByVal outColLetter As String)
Dim a As Variant: a = ReadRegion(Worksheets(srcSheet))
Dim d As Object: Set d = CreateObject("Scripting.Dictionary"): d.CompareMode = 1
Dim out() As Variant: ReDim out(1 To UBound(a, 1), 1 To 1)
out(1, 1) = "DupFlag"
Dim r As Long
For r = 2 To UBound(a, 1)
Dim k As String: k = NormKey(a(r, 1))
If d.Exists(k) Then
out(r, 1) = "DUP" ' 2回目以降を重複とする
Else
d(k) = True
out(r, 1) = "" ' 初回はユニーク
End If
Next
Worksheets(srcSheet).Range(outColLetter & "1").Resize(UBound(out, 1), 1).Value = out
End Sub
VB重要ポイントの深掘り
- “初回ユニーク・2回目以降が重複”という運用が最も無難です。後続の「最初/最後採用」の基準にもなります。
- フラグ列を色分け(DUPを薄赤)すれば、目視確認が一気に楽になります。
重複除去:最初/最後採用、完全ユニーク化
最初採用(重複は捨てて1行だけ残す)
' ModDup_DistinctFirst.bas
Option Explicit
Public Sub DistinctKeepFirst(ByVal srcSheet As String, ByVal outStart As String)
Dim a As Variant: a = ReadRegion(Worksheets(srcSheet))
Dim d As Object: Set d = CreateObject("Scripting.Dictionary"): d.CompareMode = 1
Dim out() As Variant: ReDim out(1 To 1, 1 To UBound(a, 2))
' ヘッダコピー
Dim c As Long: For c = 1 To UBound(a, 2): out(1, c) = a(1, c): Next
Dim rowsOut As Long: rowsOut = 1
Dim r As Long
For r = 2 To UBound(a, 1)
Dim k As String: k = NormKey(a(r, 1))
If Not d.Exists(k) Then
d(k) = True
rowsOut = rowsOut + 1: ReDim Preserve out(1 To rowsOut, 1 To UBound(a, 2))
For c = 1 To UBound(a, 2): out(rowsOut, c) = a(r, c): Next
End If
Next
WriteBlock Worksheets(srcSheet), out, outStart
End Sub
VB最後採用(同じキーの“末尾”を残す)
' ModDup_DistinctLast.bas
Option Explicit
Public Sub DistinctKeepLast(ByVal srcSheet As String, ByVal outStart As String)
Dim a As Variant: a = ReadRegion(Worksheets(srcSheet))
Dim lastRow As Object: Set lastRow = CreateObject("Scripting.Dictionary"): lastRow.CompareMode = 1
Dim r As Long
For r = 2 To UBound(a, 1) ' 最終行位置を記録(上書き=末尾優先)
lastRow(NormKey(a(r, 1))) = r
Next
Dim out() As Variant: ReDim out(1 To lastRow.Count + 1, 1 To UBound(a, 2))
Dim c As Long: For c = 1 To UBound(a, 2): out(1, c) = a(1, c): Next
Dim i As Long: i = 2
Dim k As Variant
For Each k In lastRow.Keys
Dim rr As Long: rr = lastRow(k)
For c = 1 To UBound(a, 2): out(i, c) = a(rr, c): Next
i = i + 1
Next
WriteBlock Worksheets(srcSheet), out, outStart
End Sub
VB重要ポイントの深掘り
- 「最初採用」「最後採用」は結果が変わります。必ず仕様として先に決めてください。
- 最後採用は「上書き辞書」で簡潔に実装できます。最初採用は「未登録時のみ代入」にします。
複合キーの重複処理:A+Bで一意化する
2列キーで最初採用のユニーク化
' ModDup_Composite.bas
Option Explicit
Private Const SEP As String = Chr$(30)
Public Sub Distinct2Keys_KeepFirst(ByVal srcSheet As String, ByVal outStart As String)
Dim a As Variant: a = ReadRegion(Worksheets(srcSheet))
Dim d As Object: Set d = CreateObject("Scripting.Dictionary"): d.CompareMode = 1
Dim out() As Variant: ReDim out(1 To 1, 1 To UBound(a, 2))
Dim c As Long: For c = 1 To UBound(a, 2): out(1, c) = a(1, c): Next
Dim rowsOut As Long: rowsOut = 1
Dim r As Long
For r = 2 To UBound(a, 1)
Dim k As String: k = NormKey(a(r, 1)) & SEP & NormKey(a(r, 2))
If Not d.Exists(k) Then
d(k) = True
rowsOut = rowsOut + 1: ReDim Preserve out(1 To rowsOut, 1 To UBound(a, 2))
For c = 1 To UBound(a, 2): out(rowsOut, c) = a(r, c): Next
End If
Next
WriteBlock Worksheets(srcSheet), out, outStart
End Sub
VB重要ポイントの深掘り
- 複合キーは安全な区切りを必ず入れること。単純連結は衝突の温床です。
- 正規化(Trim/LCase)をキーごとに適用し、表記揺れを根絶します。
重複グループの一覧化:同じキーに属する行をまとめて監査
同キーの行番号リストを出す
' ModDup_Groups.bas
Option Explicit
Public Sub ListDupGroups(ByVal srcSheet As String, ByVal outStart As String)
Dim a As Variant: a = ReadRegion(Worksheets(srcSheet))
Dim idx As Object: Set idx = CreateObject("Scripting.Dictionary"): idx.CompareMode = 1
Dim r As Long
For r = 2 To UBound(a, 1)
Dim k As String: k = NormKey(a(r, 1))
If Not idx.Exists(k) Then
Dim col As Collection: Set col = New Collection
col.Add r: Set idx(k) = col
Else
idx(k).Add r
End If
Next
Dim out() As Variant: ReDim out(1 To 1, 1 To 3)
out(1, 1) = "Key": out(1, 2) = "Count": out(1, 3) = "Rows"
Dim rowsOut As Long: rowsOut = 1
Dim k As Variant
For Each k In idx.Keys
Dim col As Collection: Set col = idx(k)
If col.Count > 1 Then
rowsOut = rowsOut + 1: ReDim Preserve out(1 To rowsOut, 1 To 3)
out(rowsOut, 1) = k
out(rowsOut, 2) = col.Count
out(rowsOut, 3) = Join(CollectionToArray(col), ",")
End If
Next
WriteBlock Worksheets(srcSheet), out, outStart
End Sub
Private Function CollectionToArray(ByVal col As Collection) As String()
Dim i As Long: Dim arr() As String: ReDim arr(0 To col.Count - 1)
For i = 1 To col.Count: arr(i - 1) = CStr(col(i)): Next
CollectionToArray = arr
End Function
VB重要ポイントの深掘り
- “重複のまとまり”を一覧すると、誤入力の癖やデータ品質の問題が一目で分かります。
- 行番号を提示すると、現場で原本の修正がしやすくなります。
重複の集約・マージ:同じキーの行をまとめて1行にする
金額・数量を合算し最新日付を採用する例
' ModDup_Merge.bas
Option Explicit
' 入力:A=キー, B=日付, C=金額, D=数量
' 出力:キーごとに 金額合計・数量合計・最新日付
Public Sub MergeDuplicates(ByVal srcSheet As String, ByVal outStart As String)
Dim a As Variant: a = ReadRegion(Worksheets(srcSheet))
Dim sumAmt As Object: Set sumAmt = CreateObject("Scripting.Dictionary"): sumAmt.CompareMode = 1
Dim sumQty As Object: Set sumQty = CreateObject("Scripting.Dictionary"): sumQty.CompareMode = 1
Dim latest As Object: Set latest = CreateObject("Scripting.Dictionary"): latest.CompareMode = 1
Dim r As Long
For r = 2 To UBound(a, 1)
Dim k As String: k = NormKey(a(r, 1))
Dim dt As Date: dt = CDate(a(r, 2))
Dim amt As Double: amt = Val(CStr(a(r, 3)))
Dim qty As Double: qty = Val(CStr(a(r, 4)))
sumAmt(k) = IIf(sumAmt.Exists(k), sumAmt(k) + amt, amt)
sumQty(k) = IIf(sumQty.Exists(k), sumQty(k) + qty, qty)
latest(k) = IIf(latest.Exists(k), IIf(dt > latest(k), dt, latest(k)), dt)
Next
Dim out() As Variant: ReDim out(1 To sumAmt.Count + 1, 1 To 4)
out(1, 1) = a(1, 1): out(1, 2) = "SumAmount": out(1, 3) = "SumQty": out(1, 4) = "LatestDate"
Dim i As Long: i = 2
Dim k As Variant
For Each k In sumAmt.Keys
out(i, 1) = k
out(i, 2) = sumAmt(k)
out(i, 3) = sumQty(k)
out(i, 4) = latest(k)
i = i + 1
Next
WriteBlock Worksheets(srcSheet), out, outStart
End Sub
VB重要ポイントの深掘り
- “集約ポリシー”を明確に(合計・最大・最小・最新・最古など)。曖昧だと結果が揺れます。
- 数値は必ず数値化、日付は CDate で比較。文字のままでは誤判定になります。
あいまい重複(近似・類似):完全一致では拾えない重複候補を出す
しきい値付きの簡易類似判定(前方一致・距離判定の導入枠)
' ModDup_Fuzzy.bas
Option Explicit
' A=キー文字列、B以降は任意。前方一致(3文字)で重複候補を抽出
Public Sub FuzzyDuplicateCandidates(ByVal srcSheet As String, ByVal outStart As String, Optional ByVal prefixLen As Long = 3)
Dim a As Variant: a = ReadRegion(Worksheets(srcSheet))
Dim idx As Object: Set idx = CreateObject("Scripting.Dictionary"): idx.CompareMode = 1
Dim r As Long
For r = 2 To UBound(a, 1)
Dim s As String: s = NormKey(a(r, 1))
Dim pref As String: pref = Left$(s, prefixLen)
If Not idx.Exists(pref) Then
Dim col As Collection: Set col = New Collection
col.Add r: Set idx(pref) = col
Else
idx(pref).Add r
End If
Next
' 出力:prefix, count, rows
Dim out() As Variant: ReDim out(1 To 1, 1 To 3)
out(1, 1) = "Prefix": out(1, 2) = "Count": out(1, 3) = "Rows"
Dim rowsOut As Long: rowsOut = 1
Dim k As Variant
For Each k In idx.Keys
Dim col As Collection: Set col = idx(k)
If col.Count > 1 Then
rowsOut = rowsOut + 1: ReDim Preserve out(1 To rowsOut, 1 To 3)
out(rowsOut, 1) = k
out(rowsOut, 2) = col.Count
out(rowsOut, 3) = Join(CollectionToArray(col), ",")
End If
Next
WriteBlock Worksheets(srcSheet), out, outStart
End Sub
VB重要ポイントの深掘り
- 類似重複は“候補提示”に留め、最終判断は人が行うのが安全です。誤検知・取りこぼしのリスクを理解した上で使いましょう。
- 距離(レーベンシュタインなど)を実装して精度を上げる余地あり。まずは前方一致や包含で十分な現場が多いです。
例題の通し方:顧客IDの重複検出→最初採用ユニーク化→集約
手順例
' ModDup_Example.bas
Option Explicit
Public Sub Demo_DuplicatePipeline()
' 1) 重複フラグを付与して見える化
FlagDuplicates "Detail", "Z"
' 2) 最初採用でユニーク化(複合キーが必要なら Distinct2Keys_KeepFirst)
DistinctKeepFirst "Detail", "AA1"
' 3) 同キーの集約(合計・数量・最新日付)
MergeDuplicates "Detail", "AC1"
MsgBox "重複処理テンプレのパイプラインが完了しました。", vbInformation
End Sub
VB期待動作の確認ポイント
- フラグ列に重複が表示され、目視で確認可能。
- ユニーク化はヘッダ維持・順序安定。
- 集約結果が仕様どおり(合計・数量・最新日付)で算出される。
落とし穴と対策(深掘り)
正規化不足で見逃し・誤判定
Trim/LCaseで揺らぎを潰す。必要なら全半角統一や記号除去の前処理を追加。
「最初/最後」の仕様不一致
採用方針を最初に決め、コード名にも明示(KeepFirst/KeepLast)して混乱を防ぐ。
セル逐次書きで遅い・固まる
配列で結果を作り、一括書き戻す。10万行級でも一瞬で処理できます。
数値・日付の型扱いミス
Val/CDbl/CDateで明示的に変換。文字のまま比較・加算しない。
まとめ:重複は「見える化→方針決定→安全な実装」の型で、速く堅牢に処理する
- 辞書+正規化で重複検出・ユニーク化・集約を高速に。
- 最初/最後採用、複合キー、グループ一覧、合算・最新採用まで“部品化”して再利用。
- まずは見える化で事故を減らし、運用仕様(欠損値・採用方針)を固定してから自動化へ。
