sql题目
现需要请向所有用户推荐其朋友收藏但是用户自己未收藏的商品,请从好友关系表(friendship_info)和收藏表(favor_info)中查询出应向哪位用户推荐哪些商品。
friendship_info
user1_id(用户1 id) | user2_id(用户2 id) |
---|---|
101 | 1010 |
101 | 108 |
101 | 106 |
?favor_info
????????????????
user_id(用户id) | sku_id(商品id) | create_date(收藏日期) |
---|---|---|
101 | 3 | 2021-09-23 |
101 | 12 | 2021-09-23 |
101 | 6 | 2021-09-25 |
期望效果
? ? ??
user_id <string> (用户id) | sku_id <string> (应向该用户推荐的商品id) |
---|---|
101 | 2 |
101 | 4 |
101 | 7 |
101 | 9 |
101 | 8 |
101 | 11 |
101 | 1 |
题目讨论
? ? 本题的思路分析是,首先找到朋友收藏的商品,然后将朋友的商品剔除掉包含自己收藏的商品
? ? ? ?
select
fi.user1_id as user_id,
sku_id
from
friendship_info fi
join favor_info fo on fi.user2_id = fo.user_id
group by
fi.user1_id,
sku_id
UNION ALL
select
user_id,
sku_id
from
favor_info
select
user_id,
sku_id
from t1
group by
user_id,
sku_id
having
count(*) = 1
select
user_id,
sku_id
from
(
select
user_id,
sku_id
from
(
select
fi.user1_id as user_id,
sku_id
from
friendship_info fi
join favor_info fo on fi.user2_id = fo.user_id
group by
fi.user1_id,
sku_id
) t1
UNION ALL
select
user_id,
sku_id
from
favor_info
) t
group by
user_id,
sku_id
having
count(*) = 1
friendship_info t1
JOIN favor_info t2
on t1.user2_id=t2.user_id
concat(t1.user1_id,t2.sku_id)
where concat(t1.user1_id,t2.sku_id) not in (
SELECT concat(user_id,sku_id)
from favor_info
)
SELECT t1.user1_id user_id,t2.sku_id
from friendship_info t1
JOIN favor_info t2
on t1.user2_id=t2.user_id
where concat(t1.user1_id,t2.sku_id) not in (
SELECT concat(user_id,sku_id)
from favor_info
)
group by t1.user1_id,t2.sku_id
? ?使用join进行关联我认为是最符合码农习惯的。关联获取数据然后根据左关联右关联,将控制空值过滤,该方法主要是熟悉join后数据集的生成结果
from friendship_info fi
join favor_info fi1
on fi.user2_id = fi1.user_id
left join favor_info fi2
on fi.user1_id = fi2.user_id and fi1.sku_id = fi2.sku_id
select distinct user1_id user_id,
fi1.sku_id sku_id
from t1
where fi2.sku_id is null;
select distinct user1_id user_id,
fi1.sku_id sku_id
from friendship_info fi
join favor_info fi1
on fi.user2_id = fi1.user_id
left join favor_info fi2
on fi.user1_id = fi2.user_id and fi1.sku_id = fi2.sku_id
where fi2.sku_id is null;
? ? except关键字有去重复数据效果,最新的hive语法支持except关键字。
可以使用 where ... is not exists替代except
select
friendship_info.user1_id as user_id,
sku_id
from
friendship_info
join favor_info on friendship_info.user2_id = favor_info.user_id
except
select
user_id,
sku_id
from
favor_info
? 本题最主要的还是要懂join后的结果集,才能做到对重复数据,无效数据去重。