我有一份人们的身份证和名字的名单,还有一份人们的身份证和姓氏的名单。有些人没有名字,有些人没有姓;我想在这两个列表上做一个完整的外部连接。
下面列出:
ID FirstName
-- ---------
1 John
2 Sue
ID LastName
-- --------
1 Doe
3 Smith
应该生产:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
3 Smith
我已经发现了相当多的解决方案的“LINQ外部连接”,它们看起来都很相似,但似乎真的是离开外部连接。
到目前为止我的尝试是这样的:
private void OuterJoinTest()
{
List<FirstName> firstNames = new List<FirstName>();
firstNames.Add(new FirstName { ID = 1, Name = "John" });
firstNames.Add(new FirstName { ID = 2, Name = "Sue" });
List<LastName> lastNames = new List<LastName>();
lastNames.Add(new LastName { ID = 1, Name = "Doe" });
lastNames.Add(new LastName { ID = 3, Name = "Smith" });
var outerJoin = from first in firstNames
join last in lastNames
on first.ID equals last.ID
into temp
from last in temp.DefaultIfEmpty()
select new
{
id = first != null ? first.ID : last.ID,
firstname = first != null ? first.Name : string.Empty,
surname = last != null ? last.Name : string.Empty
};
}
}
public class FirstName
{
public int ID;
public string Name;
}
public class LastName
{
public int ID;
public string Name;
}
但结果是:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
我做错了什么?
对于键在两个枚举对象中都是唯一的情况,我的干净解决方案:
private static IEnumerable<TResult> FullOuterJoin<Ta, Tb, TKey, TResult>(
IEnumerable<Ta> a, IEnumerable<Tb> b,
Func<Ta, TKey> key_a, Func<Tb, TKey> key_b,
Func<Ta, Tb, TResult> selector)
{
var alookup = a.ToLookup(key_a);
var blookup = b.ToLookup(key_b);
var keys = new HashSet<TKey>(alookup.Select(p => p.Key));
keys.UnionWith(blookup.Select(p => p.Key));
return keys.Select(key => selector(alookup[key].FirstOrDefault(), blookup[key].FirstOrDefault()));
}
so
var ax = new[] {
new { id = 1, first_name = "ali" },
new { id = 2, first_name = "mohammad" } };
var bx = new[] {
new { id = 1, last_name = "rezaei" },
new { id = 3, last_name = "kazemi" } };
var list = FullOuterJoin(ax, bx, a => a.id, b => b.id, (a, b) => "f: " + a?.first_name + " l: " + b?.last_name).ToArray();
输出:
f: ali l: rezaei
f: mohammad l:
f: l: kazemi
我认为LINQ join子句并不是这个问题的正确解决方案,因为join子句的目的并不是按照这个任务解决方案所需的方式来积累数据。合并创建的独立集合的代码变得太复杂了,也许这对于学习目的来说是可以的,但对于真正的应用程序来说不是。解决这个问题的方法之一是下面的代码:
class Program
{
static void Main(string[] args)
{
List<FirstName> firstNames = new List<FirstName>();
firstNames.Add(new FirstName { ID = 1, Name = "John" });
firstNames.Add(new FirstName { ID = 2, Name = "Sue" });
List<LastName> lastNames = new List<LastName>();
lastNames.Add(new LastName { ID = 1, Name = "Doe" });
lastNames.Add(new LastName { ID = 3, Name = "Smith" });
HashSet<int> ids = new HashSet<int>();
foreach (var name in firstNames)
{
ids.Add(name.ID);
}
foreach (var name in lastNames)
{
ids.Add(name.ID);
}
List<FullName> fullNames = new List<FullName>();
foreach (int id in ids)
{
FullName fullName = new FullName();
fullName.ID = id;
FirstName firstName = firstNames.Find(f => f.ID == id);
fullName.FirstName = firstName != null ? firstName.Name : string.Empty;
LastName lastName = lastNames.Find(l => l.ID == id);
fullName.LastName = lastName != null ? lastName.Name : string.Empty;
fullNames.Add(fullName);
}
}
}
public class FirstName
{
public int ID;
public string Name;
}
public class LastName
{
public int ID;
public string Name;
}
class FullName
{
public int ID;
public string FirstName;
public string LastName;
}
如果真正的集合对于HashSet的形成很大,而不是foreach循环可以使用下面的代码:
List<int> firstIds = firstNames.Select(f => f.ID).ToList();
List<int> LastIds = lastNames.Select(l => l.ID).ToList();
HashSet<int> ids = new HashSet<int>(firstIds.Union(LastIds));//Only unique IDs will be included in HashSet
我不知道这是否适用于所有情况,但从逻辑上讲,这似乎是正确的。其思想是取一个左外连接和一个右外连接,然后取结果的并集。
var firstNames = new[]
{
new { ID = 1, Name = "John" },
new { ID = 2, Name = "Sue" },
};
var lastNames = new[]
{
new { ID = 1, Name = "Doe" },
new { ID = 3, Name = "Smith" },
};
var leftOuterJoin =
from first in firstNames
join last in lastNames on first.ID equals last.ID into temp
from last in temp.DefaultIfEmpty()
select new
{
first.ID,
FirstName = first.Name,
LastName = last?.Name,
};
var rightOuterJoin =
from last in lastNames
join first in firstNames on last.ID equals first.ID into temp
from first in temp.DefaultIfEmpty()
select new
{
last.ID,
FirstName = first?.Name,
LastName = last.Name,
};
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);
这就像写的一样,因为它是在LINQ to Objects中。如果LINQ to SQL或其他,查询处理器可能不支持安全导航或其他操作。你必须使用条件操作符有条件地获取值。
也就是说,
var leftOuterJoin =
from first in firstNames
join last in lastNames on first.ID equals last.ID into temp
from last in temp.DefaultIfEmpty()
select new
{
first.ID,
FirstName = first.Name,
LastName = last != null ? last.Name : default,
};
下面是一个扩展方法:
public static IEnumerable<KeyValuePair<TLeft, TRight>> FullOuterJoin<TLeft, TRight>(this IEnumerable<TLeft> leftItems, Func<TLeft, object> leftIdSelector, IEnumerable<TRight> rightItems, Func<TRight, object> rightIdSelector)
{
var leftOuterJoin = from left in leftItems
join right in rightItems on leftIdSelector(left) equals rightIdSelector(right) into temp
from right in temp.DefaultIfEmpty()
select new { left, right };
var rightOuterJoin = from right in rightItems
join left in leftItems on rightIdSelector(right) equals leftIdSelector(left) into temp
from left in temp.DefaultIfEmpty()
select new { left, right };
var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);
return fullOuterJoin.Select(x => new KeyValuePair<TLeft, TRight>(x.left, x.right));
}
我猜@sehe的方法更强大,但在我更好地理解它之前,我发现自己跳过了@MichaelSander的扩展。我修改了它,以匹配这里描述的内置Enumerable.Join()方法的语法和返回类型。我在@JeffMercado的解决方案下为@cadrell0的注释添加了“distinct”后缀。
public static class MyExtensions {
public static IEnumerable<TResult> FullJoinDistinct<TLeft, TRight, TKey, TResult> (
this IEnumerable<TLeft> leftItems,
IEnumerable<TRight> rightItems,
Func<TLeft, TKey> leftKeySelector,
Func<TRight, TKey> rightKeySelector,
Func<TLeft, TRight, TResult> resultSelector
) {
var leftJoin =
from left in leftItems
join right in rightItems
on leftKeySelector(left) equals rightKeySelector(right) into temp
from right in temp.DefaultIfEmpty()
select resultSelector(left, right);
var rightJoin =
from right in rightItems
join left in leftItems
on rightKeySelector(right) equals leftKeySelector(left) into temp
from left in temp.DefaultIfEmpty()
select resultSelector(left, right);
return leftJoin.Union(rightJoin);
}
}
在这个例子中,你可以这样使用它:
var test =
firstNames
.FullJoinDistinct(
lastNames,
f=> f.ID,
j=> j.ID,
(f,j)=> new {
ID = f == null ? j.ID : f.ID,
leftName = f == null ? null : f.Name,
rightName = j == null ? null : j.Name
}
);
在未来,随着我了解更多,我有一种感觉,我会迁移到@sehe的逻辑,因为它很受欢迎。但即使这样,我也必须小心,因为我觉得如果可行的话,至少有一个重载与现有的“.Join()”方法的语法匹配是很重要的,原因有两个:
方法的一致性有助于节省时间、避免错误和避免意外行为。
如果将来有一个开箱即用的“. fulljoin()”方法,我想它会尽量保持当前存在的“. join()”方法的语法。如果是这样,那么如果您想迁移到它,您可以简单地重命名函数,而不需要更改参数或担心不同的返回类型破坏您的代码。
对于泛型、扩展、Func语句和其他特性,我还是新手,所以当然欢迎反馈。
编辑:我没花很长时间就意识到我的代码有问题。我在LINQPad中做了一个. dump(),并查看返回类型。它只是IEnumerable,所以我试着匹配它。但是当我实际上在我的扩展上做了. where()或. select()时,我得到了一个错误:“系统集合。IEnumerable'不包含'Select'和…"的定义。因此,最后我能够匹配. join()的输入语法,但不能匹配返回行为。
编辑:将“TResult”添加到函数的返回类型中。在阅读微软的文章时错过了这一点,当然这是有道理的。有了这个修复,现在看来返回行为是符合我的目标毕竟。