C# Tips | 文字列処理：正規表現抽出

はじめに「正規表現抽出」は“文字列の中から欲しいところだけをすくい取る”技
Regex.Match で「最初の 1 件」を抜き出す
1. 一番シンプルな抽出
2. “中の数字だけ”を取りたいときはグループを使う
名前付きグループで「何を抜いたか」を分かりやすくする
1. Groups[1] だと、あとから読むとつらい
2. 名前付きグループを使う
Regex.Matches で「全部」を抜き出す
1. テキスト中のメールアドレスを全部抜き出す例
2. グループと組み合わせて“部分だけ”を全部抜く
null 安全な「抽出ユーティリティ」を作る
実務での設計ポイント「正規表現と意味付けを分ける」
1. パターンをベタ書きしない
2. 「意味のある名前のメソッド」に閉じ込める
まとめ「正規表現抽出ユーティリティ」は“テキストから意味のあるデータを取り出す入口”

はじめに「正規表現抽出」は“文字列の中から欲しいところだけをすくい取る”技

一致判定（IsMatch）は「ルールに合っているか？」を見るだけでしたが、
業務ではそれだけでは足りません。

ログの 1 行から日時だけ取りたい。
メッセージからユーザーIDだけ取りたい。
テキストの中からメールアドレスだけ全部抜き出したい。

こういうときに使うのが「正規表現による抽出」です。
C# では Regex.Match と Regex.Matches を使って、「最初の 1 件」や「全部」を取り出せます。

ここでは、初心者向けに、

Regex.Match / Regex.Matches の基本
グループ（()`で囲んだ部分）を使った“部分だけの抽出”
null 安全な抽出ユーティリティ
「1件だけ欲しい」「全部欲しい」をきれいに書くパターン

を、例題付きでかみ砕いて説明していきます。

Regex.Match で「最初の 1 件」を抜き出す

一番シンプルな抽出

まずは、「文字列の中から最初にマッチした部分」を取り出すパターンです。

using System;
using System.Text.RegularExpressions;

string text = "UserId=12345, Name=Taro";

Match m = Regex.Match(text, @"UserId=(\d+)");

if (m.Success)
{
    Console.WriteLine(m.Value); // UserId=12345
}

using System;
using System.Text.RegularExpressions;

string text = "UserId=12345, Name=Taro";

Match m = Regex.Match(text, @"UserId=(\d+)");

if (m.Success)
{
    Console.WriteLine(m.Value); // UserId=12345
}

ここでのポイントは次の 2 つです。

Regex.Match は「最初にマッチした 1 件」を返す
m.Success で「マッチしたかどうか」を判定できる

m.Value には、「パターン全体にマッチした文字列」が入ります。
この例だと UserId=12345 です。

“中の数字だけ”を取りたいときはグループを使う

「UserId=12345 全体」ではなく、「12345 だけ」が欲しいことが多いですよね。
そこで使うのが「キャプチャグループ」です。

string text = "UserId=12345, Name=Taro";

Match m = Regex.Match(text, @"UserId=(\d+)");

if (m.Success)
{
    string idText = m.Groups[1].Value;
    Console.WriteLine(idText); // 12345
}

string text = "UserId=12345, Name=Taro";

Match m = Regex.Match(text, @"UserId=(\d+)");

if (m.Success)
{
    string idText = m.Groups[1].Value;
    Console.WriteLine(idText); // 12345
}

(\d+) のように () で囲んだ部分が「グループ」です。

Groups[0] … パターン全体（UserId=12345）
Groups[1] … 最初のグループ（\d+ → 12345）

という対応になります。

ここが「正規表現抽出」の超重要ポイントです。
“パターン全体”ではなく“パターンの一部”を取りたいときは、必ず () で囲んでグループにします。

名前付きグループで「何を抜いたか」を分かりやすくする

Groups[1] だと、あとから読むとつらい

グループを増やしていくと、こうなります。

string text = "UserId=12345, Name=Taro";

Match m = Regex.Match(text, @"UserId=(\d+), Name=([A-Za-z]+)");

if (m.Success)
{
    string idText = m.Groups[1].Value;
    string name   = m.Groups[2].Value;
}

string text = "UserId=12345, Name=Taro";

Match m = Regex.Match(text, @"UserId=(\d+), Name=([A-Za-z]+)");

if (m.Success)
{
    string idText = m.Groups[1].Value;
    string name   = m.Groups[2].Value;
}

Groups[1] が ID、Groups[2] が名前——
今書いた本人は分かりますが、数週間後の自分や他の人には分かりづらいです。

名前付きグループを使う

そこでおすすめなのが「名前付きグループ」です。

string text = "UserId=12345, Name=Taro";

Match m = Regex.Match(
    text,
    @"UserId=(?<id>\d+), Name=(?<name>[A-Za-z]+)");

if (m.Success)
{
    string idText = m.Groups["id"].Value;
    string name   = m.Groups["name"].Value;

    Console.WriteLine(idText); // 12345
    Console.WriteLine(name);   // Taro
}

string text = "UserId=12345, Name=Taro";

Match m = Regex.Match(
    text,
    @"UserId=(?<id>\d+), Name=(?<name>[A-Za-z]+)");

if (m.Success)
{
    string idText = m.Groups["id"].Value;
    string name   = m.Groups["name"].Value;

    Console.WriteLine(idText); // 12345
    Console.WriteLine(name);   // Taro
}

(?<id>...) のように書くと、そのグループに id という名前が付きます。

Groups["id"]
Groups["name"]

のようにアクセスできるので、「何を抜いているのか」がコードから一目で分かります。

正規表現を業務で使うなら、「名前付きグループ」はぜひ最初から習慣にしてほしいテクニックです。

Regex.Matches で「全部」を抜き出す

テキスト中のメールアドレスを全部抜き出す例

Regex.Match は「最初の 1 件」でしたが、
「テキストの中にある全部のメールアドレスを抜きたい」といった場合は Regex.Matches を使います。

using System;
using System.Text.RegularExpressions;

string text = @"
お問い合わせは support@example.com まで。
技術的な質問は dev@sample.org へお願いします。
";

string pattern = @"[0-9A-Za-z._%+-]+@[0-9A-Za-z.-]+\.[A-Za-z]{2,}";

MatchCollection matches = Regex.Matches(text, pattern);

foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}

using System;
using System.Text.RegularExpressions;

string text = @"
お問い合わせは support@example.com まで。
技術的な質問は dev@sample.org へお願いします。
";

string pattern = @"[0-9A-Za-z._%+-]+@[0-9A-Za-z.-]+\.[A-Za-z]{2,}";

MatchCollection matches = Regex.Matches(text, pattern);

foreach (Match m in matches)
{
    Console.WriteLine(m.Value);
}

出力はこうなります。

support@example.com
dev@sample.org

Regex.Matches は、「マッチしたもの全部」を MatchCollection として返してくれます。
あとは foreach で回すだけです。

グループと組み合わせて“部分だけ”を全部抜く

例えば、ログの中から「日時だけ」を全部抜きたいとします。

string text = @"
[2025-02-01 10:00:00] INFO Start
[2025-02-01 10:01:00] INFO End
";

string pattern = @"\[(?<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]";

MatchCollection matches = Regex.Matches(text, pattern);

foreach (Match m in matches)
{
    string ts = m.Groups["ts"].Value;
    Console.WriteLine(ts);
}

string text = @"
[2025-02-01 10:00:00] INFO Start
[2025-02-01 10:01:00] INFO End
";

string pattern = @"\[(?<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]";

MatchCollection matches = Regex.Matches(text, pattern);

foreach (Match m in matches)
{
    string ts = m.Groups["ts"].Value;
    Console.WriteLine(ts);
}

出力はこうなります。

2025-02-01 10:00:00
2025-02-01 10:01:00

ここでも、「名前付きグループ」で ts（timestamp）という名前を付けておくことで、
後から読んだときにも「何を抜いているのか」が分かりやすくなっています。

null 安全な「抽出ユーティリティ」を作る

毎回 null チェック＋Success チェックを書くのはしんどい

生の Regex.Match を使うと、毎回こうなります。

if (value is null)
{
    return null;
}

Match m = Regex.Match(value, pattern);

if (!m.Success)
{
    return null;
}

return m.Groups[1].Value;

if (value is null)
{
    return null;
}

Match m = Regex.Match(value, pattern);

if (!m.Success)
{
    return null;
}

return m.Groups[1].Value;

これを毎回書くのはつらいので、
「1 件だけ抜きたい」「全部抜きたい」用のユーティリティを用意しておくと楽になります。

1 件だけ抜きたいときのユーティリティ

using System;
using System.Text.RegularExpressions;

public static class RegexExtractUtil
{
    public static string? ExtractFirstGroup(
        string? value,
        string pattern,
        string groupName = "0",
        RegexOptions options = RegexOptions.None)
    {
        if (string.IsNullOrEmpty(value))
        {
            return null;
        }

        Match m = Regex.Match(value, pattern, options);

        if (!m.Success)
        {
            return null;
        }

        if (groupName == "0")
        {
            return m.Value;
        }

        return m.Groups[groupName].Success
            ? m.Groups[groupName].Value
            : null;
    }
}

using System;
using System.Text.RegularExpressions;

public static class RegexExtractUtil
{
    public static string? ExtractFirstGroup(
        string? value,
        string pattern,
        string groupName = "0",
        RegexOptions options = RegexOptions.None)
    {
        if (string.IsNullOrEmpty(value))
        {
            return null;
        }

        Match m = Regex.Match(value, pattern, options);

        if (!m.Success)
        {
            return null;
        }

        if (groupName == "0")
        {
            return m.Value;
        }

        return m.Groups[groupName].Success
            ? m.Groups[groupName].Value
            : null;
    }
}

使い方の例です。

string text = "UserId=12345, Name=Taro";

string? id = RegexExtractUtil.ExtractFirstGroup(
    text,
    @"UserId=(?<id>\d+)",
    groupName: "id");

Console.WriteLine(id); // 12345

string text = "UserId=12345, Name=Taro";

string? id = RegexExtractUtil.ExtractFirstGroup(
    text,
    @"UserId=(?<id>\d+)",
    groupName: "id");

Console.WriteLine(id); // 12345

ここでは、

null や空文字なら null を返す
マッチしなければ null
指定したグループがあればその値、なければ null

という挙動にしています。

呼び出し側は「null なら取れなかった」と判断できます。

全部抜きたいときのユーティリティ

using System.Collections.Generic;

public static class RegexExtractUtil
{
    public static string[] ExtractAllGroups(
        string? value,
        string pattern,
        string groupName = "0",
        RegexOptions options = RegexOptions.None)
    {
        if (string.IsNullOrEmpty(value))
        {
            return Array.Empty<string>();
        }

        MatchCollection matches = Regex.Matches(value, pattern, options);

        var list = new List<string>();

        foreach (Match m in matches)
        {
            if (!m.Success)
            {
                continue;
            }

            if (groupName == "0")
            {
                list.Add(m.Value);
            }
            else
            {
                Group g = m.Groups[groupName];
                if (g.Success)
                {
                    list.Add(g.Value);
                }
            }
        }

        return list.ToArray();
    }
}

using System.Collections.Generic;

public static class RegexExtractUtil
{
    public static string[] ExtractAllGroups(
        string? value,
        string pattern,
        string groupName = "0",
        RegexOptions options = RegexOptions.None)
    {
        if (string.IsNullOrEmpty(value))
        {
            return Array.Empty<string>();
        }

        MatchCollection matches = Regex.Matches(value, pattern, options);

        var list = new List<string>();

        foreach (Match m in matches)
        {
            if (!m.Success)
            {
                continue;
            }

            if (groupName == "0")
            {
                list.Add(m.Value);
            }
            else
            {
                Group g = m.Groups[groupName];
                if (g.Success)
                {
                    list.Add(g.Value);
                }
            }
        }

        return list.ToArray();
    }
}

使い方の例です。

string text = @"
[2025-02-01 10:00:00] INFO Start
[2025-02-01 10:01:00] INFO End
";

string[] timestamps = RegexExtractUtil.ExtractAllGroups(
    text,
    @"\[(?<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]",
    groupName: "ts");

foreach (var ts in timestamps)
{
    Console.WriteLine(ts);
}

string text = @"
[2025-02-01 10:00:00] INFO Start
[2025-02-01 10:01:00] INFO End
";

string[] timestamps = RegexExtractUtil.ExtractAllGroups(
    text,
    @"\[(?<ts>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]",
    groupName: "ts");

foreach (var ts in timestamps)
{
    Console.WriteLine(ts);
}

null のときは空配列を返すようにしているので、
呼び出し側はそのまま foreach で回せます。

実務での設計ポイント「正規表現と意味付けを分ける」

パターンをベタ書きしない

抽出系の正規表現も、あちこちにベタ書きするとカオスになります。

var m1 = Regex.Match(text, @"UserId=(\d+)");
var m2 = Regex.Match(text, @"UserId=(\d+)");

var m1 = Regex.Match(text, @"UserId=(\d+)");
var m2 = Regex.Match(text, @"UserId=(\d+)");

同じパターンがコピペされていると、

修正漏れが起きる
どこで何を抜いているのか追いづらい

という問題が出ます。

「意味のある名前のメソッド」に閉じ込める

例えば、「ログ行からユーザーIDを抜く」処理なら、
こういうメソッドにしてしまうのが良いです。

public static class LogParseUtil
{
    private static readonly Regex UserIdRegex =
        new Regex(@"UserId=(?<id>\d+)", RegexOptions.Compiled);

    public static int? TryExtractUserId(string? logLine)
    {
        if (string.IsNullOrEmpty(logLine))
        {
            return null;
        }

        Match m = UserIdRegex.Match(logLine);

        if (!m.Success)
        {
            return null;
        }

        string idText = m.Groups["id"].Value;

        if (int.TryParse(idText, out int id))
        {
            return id;
        }

        return null;
    }
}

public static class LogParseUtil
{
    private static readonly Regex UserIdRegex =
        new Regex(@"UserId=(?<id>\d+)", RegexOptions.Compiled);

    public static int? TryExtractUserId(string? logLine)
    {
        if (string.IsNullOrEmpty(logLine))
        {
            return null;
        }

        Match m = UserIdRegex.Match(logLine);

        if (!m.Success)
        {
            return null;
        }

        string idText = m.Groups["id"].Value;

        if (int.TryParse(idText, out int id))
        {
            return id;
        }

        return null;
    }
}

呼び出し側はこう書けます。

int? userId = LogParseUtil.TryExtractUserId(line);

if (userId is not null)
{
    // ここで userId.Value を使う
}

int? userId = LogParseUtil.TryExtractUserId(line);

if (userId is not null)
{
    // ここで userId.Value を使う
}

「正規表現でどう抜いているか」はユーティリティの中に閉じ込めて、
呼び出し側は「何をしたいか」だけを書く——
これが、実務で長く保守されるコードの書き方です。

まとめ「正規表現抽出ユーティリティ」は“テキストから意味のあるデータを取り出す入口”

正規表現抽出は、「ただマッチしたかどうか」ではなく、
「テキストの中から意味のある部分だけを取り出す」ための強力な道具です。

押さえておきたいポイントは次の通りです。

Regex.Match は「最初の 1 件」、Regex.Matches は「全部」を返す
抽出したい部分は () でグループ化し、できれば名前付きグループにする
null やマッチ失敗時の扱い（null を返す／空配列を返す）をユーティリティとして決めておく
よく使う抽出パターンは、Regex をフィールドにしてコンパイル＋使い回しする
「正規表現そのもの」はユーティリティに閉じ込め、呼び出し側は“意味のあるメソッド名”だけを見るようにする

ここまでできれば、「とりあえず Match して Value を見ている」段階から一歩進んで、
“テキストから構造化された情報を取り出す入口”として、
正規表現抽出ユーティリティを設計・実装できるようになっていきます。

月	火	水	木	金	土	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

はじめに 「正規表現抽出」は“文字列の中から欲しいところだけをすくい取る”技