我有一份人们的身份证和名字的名单,还有一份人们的身份证和姓氏的名单。有些人没有名字,有些人没有姓;我想在这两个列表上做一个完整的外部连接。

下面列出:

ID  FirstName
--  ---------
 1  John
 2  Sue

ID  LastName
--  --------
 1  Doe
 3  Smith

应该生产:

ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue
 3             Smith

我已经发现了相当多的解决方案的“LINQ外部连接”,它们看起来都很相似,但似乎真的是离开外部连接。

到目前为止我的尝试是这样的:

private void OuterJoinTest()
{
    List<FirstName> firstNames = new List<FirstName>();
    firstNames.Add(new FirstName { ID = 1, Name = "John" });
    firstNames.Add(new FirstName { ID = 2, Name = "Sue" });

    List<LastName> lastNames = new List<LastName>();
    lastNames.Add(new LastName { ID = 1, Name = "Doe" });
    lastNames.Add(new LastName { ID = 3, Name = "Smith" });

    var outerJoin = from first in firstNames
        join last in lastNames
        on first.ID equals last.ID
        into temp
        from last in temp.DefaultIfEmpty()
        select new
        {
            id = first != null ? first.ID : last.ID,
            firstname = first != null ? first.Name : string.Empty,
            surname = last != null ? last.Name : string.Empty
        };
    }
}

public class FirstName
{
    public int ID;
    
    public string Name;
}
    
public class LastName
{
    public int ID;
    
    public string Name;
}

但结果是:

ID  FirstName  LastName
--  ---------  --------
 1  John       Doe
 2  Sue

我做错了什么?


当前回答

我认为LINQ join子句并不是这个问题的正确解决方案,因为join子句的目的并不是按照这个任务解决方案所需的方式来积累数据。合并创建的独立集合的代码变得太复杂了,也许这对于学习目的来说是可以的,但对于真正的应用程序来说不是。解决这个问题的方法之一是下面的代码:

class Program
{
    static void Main(string[] args)
    {
        List<FirstName> firstNames = new List<FirstName>();
        firstNames.Add(new FirstName { ID = 1, Name = "John" });
        firstNames.Add(new FirstName { ID = 2, Name = "Sue" });

        List<LastName> lastNames = new List<LastName>();
        lastNames.Add(new LastName { ID = 1, Name = "Doe" });
        lastNames.Add(new LastName { ID = 3, Name = "Smith" });

        HashSet<int> ids = new HashSet<int>();
        foreach (var name in firstNames)
        {
            ids.Add(name.ID);
        }
        foreach (var name in lastNames)
        {
            ids.Add(name.ID);
        }
        List<FullName> fullNames = new List<FullName>();
        foreach (int id in ids)
        {
            FullName fullName = new FullName();
            fullName.ID = id;
            FirstName firstName = firstNames.Find(f => f.ID == id);
            fullName.FirstName = firstName != null ? firstName.Name : string.Empty;
            LastName lastName = lastNames.Find(l => l.ID == id);
            fullName.LastName = lastName != null ? lastName.Name : string.Empty;
            fullNames.Add(fullName);
        }
    }
}
public class FirstName
{
    public int ID;

    public string Name;
}

public class LastName
{
    public int ID;

    public string Name;
}
class FullName
{
    public int ID;

    public string FirstName;

    public string LastName;
}

如果真正的集合对于HashSet的形成很大,而不是foreach循环可以使用下面的代码:

List<int> firstIds = firstNames.Select(f => f.ID).ToList();
List<int> LastIds = lastNames.Select(l => l.ID).ToList();
HashSet<int> ids = new HashSet<int>(firstIds.Union(LastIds));//Only unique IDs will be included in HashSet

其他回答

对于键在两个枚举对象中都是唯一的情况,我的干净解决方案:

 private static IEnumerable<TResult> FullOuterJoin<Ta, Tb, TKey, TResult>(
            IEnumerable<Ta> a, IEnumerable<Tb> b,
            Func<Ta, TKey> key_a, Func<Tb, TKey> key_b,
            Func<Ta, Tb, TResult> selector)
        {
            var alookup = a.ToLookup(key_a);
            var blookup = b.ToLookup(key_b);
            var keys = new HashSet<TKey>(alookup.Select(p => p.Key));
            keys.UnionWith(blookup.Select(p => p.Key));
            return keys.Select(key => selector(alookup[key].FirstOrDefault(), blookup[key].FirstOrDefault()));
        }

so

    var ax = new[] {
        new { id = 1, first_name = "ali" },
        new { id = 2, first_name = "mohammad" } };
    var bx = new[] {
        new { id = 1, last_name = "rezaei" },
        new { id = 3, last_name = "kazemi" } };

    var list = FullOuterJoin(ax, bx, a => a.id, b => b.id, (a, b) => "f: " + a?.first_name + " l: " + b?.last_name).ToArray();

输出:

f: ali l: rezaei
f: mohammad l:
f:  l: kazemi

正如您所发现的,Linq没有“外部连接”结构。您所能得到的最接近的是使用您所声明的查询的左外连接。为此,你可以添加任何没有在join中表示的姓氏列表元素:

outerJoin = outerJoin.Concat(lastNames.Select(l=>new
                            {
                                id = l.ID,
                                firstname = String.Empty,
                                surname = l.Name
                            }).Where(l=>!outerJoin.Any(o=>o.id == l.id)));

我不知道这是否适用于所有情况,但从逻辑上讲,这似乎是正确的。其思想是取一个左外连接和一个右外连接,然后取结果的并集。

var firstNames = new[]
{
    new { ID = 1, Name = "John" },
    new { ID = 2, Name = "Sue" },
};
var lastNames = new[]
{
    new { ID = 1, Name = "Doe" },
    new { ID = 3, Name = "Smith" },
};
var leftOuterJoin =
    from first in firstNames
    join last in lastNames on first.ID equals last.ID into temp
    from last in temp.DefaultIfEmpty()
    select new
    {
        first.ID,
        FirstName = first.Name,
        LastName = last?.Name,
    };
var rightOuterJoin =
    from last in lastNames
    join first in firstNames on last.ID equals first.ID into temp
    from first in temp.DefaultIfEmpty()
    select new
    {
        last.ID,
        FirstName = first?.Name,
        LastName = last.Name,
    };
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);

这就像写的一样,因为它是在LINQ to Objects中。如果LINQ to SQL或其他,查询处理器可能不支持安全导航或其他操作。你必须使用条件操作符有条件地获取值。

也就是说,

var leftOuterJoin =
    from first in firstNames
    join last in lastNames on first.ID equals last.ID into temp
    from last in temp.DefaultIfEmpty()
    select new
    {
        first.ID,
        FirstName = first.Name,
        LastName = last != null ? last.Name : default,
    };

更新1:提供一个真正通用的扩展方法FullOuterJoin 更新2:可选地接受键类型的自定义IEqualityComparer 更新3:这个实现最近已经成为MoreLinq的一部分-谢谢大家!

编辑新增FullOuterGroupJoin (ideone)。我重用了GetOuter<>实现,使它的性能比它可能的要低一些,但我现在的目标是“高级”代码,而不是前沿优化。

请登录http://ideone.com/O36nWc观看直播

static void Main(string[] args)
{
    var ax = new[] { 
        new { id = 1, name = "John" },
        new { id = 2, name = "Sue" } };
    var bx = new[] { 
        new { id = 1, surname = "Doe" },
        new { id = 3, surname = "Smith" } };

    ax.FullOuterJoin(bx, a => a.id, b => b.id, (a, b, id) => new {a, b})
        .ToList().ForEach(Console.WriteLine);
}

打印输出:

{ a = { id = 1, name = John }, b = { id = 1, surname = Doe } }
{ a = { id = 2, name = Sue }, b =  }
{ a = , b = { id = 3, surname = Smith } }

您还可以提供默认值:http://ideone.com/kG4kqO

    ax.FullOuterJoin(
            bx, a => a.id, b => b.id, 
            (a, b, id) => new { a.name, b.surname },
            new { id = -1, name    = "(no firstname)" },
            new { id = -2, surname = "(no surname)" }
        )

印刷:

{ name = John, surname = Doe }
{ name = Sue, surname = (no surname) }
{ name = (no firstname), surname = Smith }

使用术语的解释:

连接是一个借用自关系数据库设计的术语:

只要b中有对应键的元素,连接就会重复A中的元素(即:如果b为空,则什么都没有)。数据库行话称之为内部连接(equi)。 外部连接包括没有对应的元素 元素存在于b中(即:如果b为空则结果为偶数)。这通常称为左连接。 一个完整的外部连接包括来自A和b的记录,如果另一个中不存在相应的元素。(即,如果a为空,则结果为偶数)

在RDBMS中不常见的是组连接[1]:

组连接与上面描述的相同,但它不是为多个对应的b重复A中的元素,而是用对应的键对记录进行分组。当您希望根据公共键枚举“已连接”记录时,这通常更方便。

另请参阅GroupJoin,其中还包含一些一般的背景说明。


[1](我相信Oracle和MSSQL对此有专有扩展)

完整代码

一个通用的“drop-in”扩展类

internal static class MyExtensions
{
    internal static IEnumerable<TResult> FullOuterGroupJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA, 
        Func<TB, TKey> selectKeyB,
        Func<IEnumerable<TA>, IEnumerable<TB>, TKey, TResult> projection,
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   let xa = alookup[key]
                   let xb = blookup[key]
                   select projection(xa, xb, key);

        return join;
    }

    internal static IEnumerable<TResult> FullOuterJoin<TA, TB, TKey, TResult>(
        this IEnumerable<TA> a,
        IEnumerable<TB> b,
        Func<TA, TKey> selectKeyA, 
        Func<TB, TKey> selectKeyB,
        Func<TA, TB, TKey, TResult> projection,
        TA defaultA = default(TA), 
        TB defaultB = default(TB),
        IEqualityComparer<TKey> cmp = null)
    {
        cmp = cmp?? EqualityComparer<TKey>.Default;
        var alookup = a.ToLookup(selectKeyA, cmp);
        var blookup = b.ToLookup(selectKeyB, cmp);

        var keys = new HashSet<TKey>(alookup.Select(p => p.Key), cmp);
        keys.UnionWith(blookup.Select(p => p.Key));

        var join = from key in keys
                   from xa in alookup[key].DefaultIfEmpty(defaultA)
                   from xb in blookup[key].DefaultIfEmpty(defaultB)
                   select projection(xa, xb, key);

        return join;
    }
}

我认为LINQ join子句并不是这个问题的正确解决方案,因为join子句的目的并不是按照这个任务解决方案所需的方式来积累数据。合并创建的独立集合的代码变得太复杂了,也许这对于学习目的来说是可以的,但对于真正的应用程序来说不是。解决这个问题的方法之一是下面的代码:

class Program
{
    static void Main(string[] args)
    {
        List<FirstName> firstNames = new List<FirstName>();
        firstNames.Add(new FirstName { ID = 1, Name = "John" });
        firstNames.Add(new FirstName { ID = 2, Name = "Sue" });

        List<LastName> lastNames = new List<LastName>();
        lastNames.Add(new LastName { ID = 1, Name = "Doe" });
        lastNames.Add(new LastName { ID = 3, Name = "Smith" });

        HashSet<int> ids = new HashSet<int>();
        foreach (var name in firstNames)
        {
            ids.Add(name.ID);
        }
        foreach (var name in lastNames)
        {
            ids.Add(name.ID);
        }
        List<FullName> fullNames = new List<FullName>();
        foreach (int id in ids)
        {
            FullName fullName = new FullName();
            fullName.ID = id;
            FirstName firstName = firstNames.Find(f => f.ID == id);
            fullName.FirstName = firstName != null ? firstName.Name : string.Empty;
            LastName lastName = lastNames.Find(l => l.ID == id);
            fullName.LastName = lastName != null ? lastName.Name : string.Empty;
            fullNames.Add(fullName);
        }
    }
}
public class FirstName
{
    public int ID;

    public string Name;
}

public class LastName
{
    public int ID;

    public string Name;
}
class FullName
{
    public int ID;

    public string FirstName;

    public string LastName;
}

如果真正的集合对于HashSet的形成很大,而不是foreach循环可以使用下面的代码:

List<int> firstIds = firstNames.Select(f => f.ID).ToList();
List<int> LastIds = lastNames.Select(l => l.ID).ToList();
HashSet<int> ids = new HashSet<int>(firstIds.Union(LastIds));//Only unique IDs will be included in HashSet