物流实时数仓:采集通道搭建
物流实时数仓:数仓搭建
物流实时数仓:数仓搭建(DIM)
物流实时数仓:数仓搭建(DWD)一
物流实时数仓:数仓搭建(DWD)二
物流实时数仓:数仓搭建(DWS)一
物流实时数仓:数仓搭建(DWS)二
上一次的博客中,我们编写了很多第三方的工具类,所以剩下的内容搭建会简单一些。
在后期编写代码测试的时候,发现了一个之前代码的错误。
在dwd层中的DwdTransTransFinish文件,在计算TransportTime参数时出现了负数,后来发现是两个数值做差的时候位置错了,要修改一下。
在代码约67行的位置。
源代码
finishBean.setTransportTime(Long.parseLong(finishBean.getActualStartTime()) - Long.parseLong(finishBean.getActualEndTime()));
修改后
finishBean.setTransportTime(Long.parseLong(finishBean.getActualEndTime()) - Long.parseLong(finishBean.getActualStartTime()));
由于代码错误,所以我们已经将错误的代码写入了kafka,所以我们需要删除之前的topic,然后从新生成一个。
kafka-topics.sh --bootstrap-server hadoop102:9092 --delete --topic tms_dwd_trans_trans_finish
kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic tms_dwd_trans_trans_finish
要求:统计当日各货物类型下单次数和金额。
DwsTradeCargoTypeOrderDayBean.java
package com.atguigu.tms.realtime.beans;
import lombok.Builder;
import lombok.Data;
import java.math.BigDecimal;
/**
* 交易域货物类型下单聚合统计实体类
*/
@Data
@Builder
public class DwsTradeCargoTypeOrderDayBean {
// 当前日期
String curDate;
// 货物类型ID
String cargoType;
// 货物类型名称
String cargoTypeName;
// 下单金额
BigDecimal orderAmountBase;
// 下单次数
Long orderCountBase;
// 时间戳
Long ts;
}
DwsTradeCargoTypeOrderDay.java
package com.atguigu.tms.realtime.app.dws;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.atguigu.tms.realtime.app.func.DimAsyncFunction;
import com.atguigu.tms.realtime.app.func.MyAggregationFunction;
import com.atguigu.tms.realtime.app.func.MyTriggerFunction;
import com.atguigu.tms.realtime.beans.DwdTradeOrderDetailBean;
import com.atguigu.tms.realtime.beans.DwsTradeCargoTypeOrderDayBean;
import com.atguigu.tms.realtime.utils.ClickHouseUtil;
import com.atguigu.tms.realtime.utils.CreateEnvUtil;
import com.atguigu.tms.realtime.utils.DateFormatUtil;
import com.atguigu.tms.realtime.utils.KafkaUtil;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.util.concurrent.TimeUnit;
//交易域:货物类型下单数以及下单金额聚合统计
public class DwsTradeCargoTypeOrderDay {
public static void main(String[] args) throws Exception {
// 环境准备
StreamExecutionEnvironment env = CreateEnvUtil.getStreamEnv(args);
env.setParallelism(4);
// 从Kafka读取数据
String topic = "tms_dwd_trade_order_detail";
String groupId = "dws_trade_cargo_type_order_group";
KafkaSource<String> kafkaSource = KafkaUtil.getKafkaSource(topic, groupId, args);
SingleOutputStreamOperator<String> kafkaStrDS = env.fromSource(kafkaSource, WatermarkStrategy.noWatermarks(), "kafka_source")
.uid("kafka_source");
// 对流中的数据进行类型转换 jsonStr->实体类对象
SingleOutputStreamOperator<DwsTradeCargoTypeOrderDayBean> mapDS = kafkaStrDS.map(
new MapFunction<String, DwsTradeCargoTypeOrderDayBean>() {
@Override
public DwsTradeCargoTypeOrderDayBean map(String JsonStr) throws Exception {
DwdTradeOrderDetailBean dwdTradeOrderDetailBean = JSON.parseObject(JsonStr, DwdTradeOrderDetailBean.class);
DwsTradeCargoTypeOrderDayBean bean = DwsTradeCargoTypeOrderDayBean.builder()
.cargoType(dwdTradeOrderDetailBean.getCargoType())
.orderAmountBase(dwdTradeOrderDetailBean.getAmount())
.orderCountBase(1L)
.ts(dwdTradeOrderDetailBean.getTs() + 8 * 60 * 60 * 1000)
.build();
return bean;
}
}
);
// 指定Watermark以及提起事件时间字段
SingleOutputStreamOperator<DwsTradeCargoTypeOrderDayBean> withWatermarkDS = mapDS.assignTimestampsAndWatermarks(
WatermarkStrategy
.<DwsTradeCargoTypeOrderDayBean>forMonotonousTimestamps()
.withTimestampAssigner(
new SerializableTimestampAssigner<DwsTradeCargoTypeOrderDayBean>() {
@Override
public long extractTimestamp(DwsTradeCargoTypeOrderDayBean element, long l) {
return element.getTs();
}
}
)
);
// 按照货物类型进行分组
KeyedStream<DwsTradeCargoTypeOrderDayBean, String> keyedDS = withWatermarkDS.keyBy(DwsTradeCargoTypeOrderDayBean::getCargoType);
// 开窗
WindowedStream<DwsTradeCargoTypeOrderDayBean, String, TimeWindow> windowDS = keyedDS.window(TumblingEventTimeWindows.of(Time.days(1)));
// 指定自定义触发器
WindowedStream<DwsTradeCargoTypeOrderDayBean, String, TimeWindow> triggerDS = windowDS.trigger(new MyTriggerFunction<DwsTradeCargoTypeOrderDayBean>());
// 聚合计算
SingleOutputStreamOperator<DwsTradeCargoTypeOrderDayBean> aggregateDS = triggerDS.aggregate(
new MyAggregationFunction<DwsTradeCargoTypeOrderDayBean>() {
@Override
public DwsTradeCargoTypeOrderDayBean add(DwsTradeCargoTypeOrderDayBean value, DwsTradeCargoTypeOrderDayBean accumulator) {
if (accumulator == null) {
return value;
}
accumulator.setOrderAmountBase(value.getOrderAmountBase().add(accumulator.getOrderAmountBase()));
accumulator.setOrderCountBase(value.getOrderCountBase() + accumulator.getOrderCountBase());
return accumulator;
}
},
new ProcessWindowFunction<DwsTradeCargoTypeOrderDayBean, DwsTradeCargoTypeOrderDayBean, String, TimeWindow>() {
@Override
public void process(String s, ProcessWindowFunction<DwsTradeCargoTypeOrderDayBean, DwsTradeCargoTypeOrderDayBean, String, TimeWindow>.Context context, Iterable<DwsTradeCargoTypeOrderDayBean> elements, Collector<DwsTradeCargoTypeOrderDayBean> out) throws Exception {
long sst = context.window().getStart() - 8 * 60 * 60 * 1000L;
for (DwsTradeCargoTypeOrderDayBean bean : elements) {
String curDate = DateFormatUtil.toDate(sst);
bean.setCurDate(curDate);
bean.setTs(System.currentTimeMillis());
out.collect(bean);
}
}
}
);
// 关联货物维度
SingleOutputStreamOperator<DwsTradeCargoTypeOrderDayBean> withCargoTypeDS = AsyncDataStream.unorderedWait(
aggregateDS,
new DimAsyncFunction<DwsTradeCargoTypeOrderDayBean>("dim_base_dic") {
@Override
public void join(DwsTradeCargoTypeOrderDayBean bean, JSONObject dimInfoJsonObj) {
bean.setCargoTypeName(dimInfoJsonObj.getString("name"));
}
@Override
public Tuple2<String, String> getCondition(DwsTradeCargoTypeOrderDayBean bean) {
return Tuple2.of("id",bean.getCargoType());
}
},
60,
TimeUnit.SECONDS
);
// 将结果写入ck
withCargoTypeDS.print(">>>>");
withCargoTypeDS.addSink(
ClickHouseUtil.getJdbcSink("insert into dws_trade_cargo_type_order_day_base values(?,?,?,?,?,?)")
);
env.execute();
}
}
在我们创建的tms_realtime数据库中建表和视图。
CREATE TABLE IF NOT EXISTS dws_trade_cargo_type_order_day_base
(
`cur_date` Date COMMENT '统计日期',
`cargo_type` String COMMENT '货物类型ID',
`cargo_type_name` String COMMENT '货物类型名称',
`order_amount_base` Decimal(38, 20) COMMENT '下单金额',
`order_count_base` UInt64 COMMENT '下单次数',
`ts` UInt64 COMMENT '时间戳'
)
ENGINE = MergeTree
ORDER BY (cur_date, cargo_type, cargo_type_name);
CREATE MATERIALIZED VIEW IF NOT EXISTS dws_trade_cargo_type_order_day
ENGINE = AggregatingMergeTree()
ORDER BY (cur_date, cargo_type, cargo_type_name) AS
SELECT
cur_date,
cargo_type,
cargo_type_name,
argMaxState(order_amount_base, ts) AS order_amount,
argMaxState(order_count_base, ts) AS order_count
FROM dws_trade_cargo_type_order_day_base
GROUP BY
cur_date,
cargo_type,
cargo_type_name;
要求:统计当日各机构下单次数和金额,并补充城市维度信息
DwsTradeOrgOrderDayBean.java
package com.atguigu.tms.realtime.beans;
import lombok.Builder;
import lombok.Data;
import java.math.BigDecimal;
/**
* 交易域货物类型下单聚合统计实体类
*/
@Data
@Builder
public class DwsTradeOrgOrderDayBean {
// 日期
String curDate;
// 机构ID
String orgId;
// 机构名称
String orgName;
// 城市ID
String cityId;
// 城市名称
String cityName;
// 发货人区县ID
@TransientSink
String senderDistrictId;
// 下单金额
BigDecimal orderAmountBase;
// 下单次数
Long orderCountBase;
// 时间戳
Long ts;
}
DwsTradeOrgOrderDay.java
package com.atguigu.tms.realtime.app.dws;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.atguigu.tms.realtime.app.func.DimAsyncFunction;
import com.atguigu.tms.realtime.app.func.MyAggregationFunction;
import com.atguigu.tms.realtime.app.func.MyTriggerFunction;
import com.atguigu.tms.realtime.beans.DwdTradeOrderDetailBean;
import com.atguigu.tms.realtime.beans.DwsTradeOrgOrderDayBean;
import com.atguigu.tms.realtime.utils.ClickHouseUtil;
import com.atguigu.tms.realtime.utils.CreateEnvUtil;
import com.atguigu.tms.realtime.utils.DateFormatUtil;
import com.atguigu.tms.realtime.utils.KafkaUtil;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.util.concurrent.TimeUnit;
// 交易域:机构粒度下单聚合统计
public class DwsTradeOrgOrderDay {
public static void main(String[] args) throws Exception {
// 环境准备
StreamExecutionEnvironment env = CreateEnvUtil.getStreamEnv(args);
env.setParallelism(4);
// 从kafka的下单实时表中读取数据
String topic = "tms_dwd_trade_order_detail";
String groupId = "dws_trade_org_order_group";
KafkaSource<String> kafkaSource = KafkaUtil.getKafkaSource(topic, groupId, args);
SingleOutputStreamOperator<String> kafkaStrDS = env.fromSource(kafkaSource, WatermarkStrategy.noWatermarks(), "kafka_source")
.uid("kafka_source");
// 对读取的数据进行类型转换
SingleOutputStreamOperator<DwsTradeOrgOrderDayBean> mapDS = kafkaStrDS.map(
new MapFunction<String, DwsTradeOrgOrderDayBean>() {
@Override
public DwsTradeOrgOrderDayBean map(String jsonStr) throws Exception {
DwdTradeOrderDetailBean dwdTradeOrderDetailBean = JSON.parseObject(jsonStr, DwdTradeOrderDetailBean.class);
DwsTradeOrgOrderDayBean bean = DwsTradeOrgOrderDayBean.builder()
.senderDistrictId(dwdTradeOrderDetailBean.getSenderDistrictId())
.cityId(dwdTradeOrderDetailBean.getSenderCityId())
.orderAmountBase(dwdTradeOrderDetailBean.getAmount())
.orderCountBase(1L)
.ts(dwdTradeOrderDetailBean.getTs())
.build();
return bean;
}
}
);
// 关联机构维度
SingleOutputStreamOperator<DwsTradeOrgOrderDayBean> withOrgDS = AsyncDataStream.unorderedWait(
mapDS,
new DimAsyncFunction<DwsTradeOrgOrderDayBean>("dim_base_organ") {
@Override
public void join(DwsTradeOrgOrderDayBean bean, JSONObject dimInfoJsonObj) {
bean.setOrgId(dimInfoJsonObj.getString("id"));
bean.setOrgName(dimInfoJsonObj.getString("org_name"));
}
@Override
public Tuple2<String, String> getCondition(DwsTradeOrgOrderDayBean bean) {
return Tuple2.of("region_id", bean.getSenderDistrictId());
}
},
60, TimeUnit.SECONDS
);
// 指定Watermark
SingleOutputStreamOperator<DwsTradeOrgOrderDayBean> withWatermarkDS = withOrgDS.assignTimestampsAndWatermarks(
WatermarkStrategy
.<DwsTradeOrgOrderDayBean>forMonotonousTimestamps()
.withTimestampAssigner(
new SerializableTimestampAssigner<DwsTradeOrgOrderDayBean>() {
@Override
public long extractTimestamp(DwsTradeOrgOrderDayBean element, long l) {
return element.getTs();
}
}
)
);
// 按照机构id进行分组
KeyedStream<DwsTradeOrgOrderDayBean, String> keyedDS = withWatermarkDS.keyBy(DwsTradeOrgOrderDayBean::getOrgId);
// 开窗
WindowedStream<DwsTradeOrgOrderDayBean, String, TimeWindow> windowDS = keyedDS.window(TumblingEventTimeWindows.of(Time.days(1)));
// 指定自定义触发器
WindowedStream<DwsTradeOrgOrderDayBean, String, TimeWindow> triggerDS = windowDS.trigger(new MyTriggerFunction<DwsTradeOrgOrderDayBean>());
// 聚合
SingleOutputStreamOperator<DwsTradeOrgOrderDayBean> aggregateDS = triggerDS.aggregate(
new MyAggregationFunction<DwsTradeOrgOrderDayBean>() {
@Override
public DwsTradeOrgOrderDayBean add(DwsTradeOrgOrderDayBean value, DwsTradeOrgOrderDayBean accumulator) {
if (accumulator == null) {
return value;
}
accumulator.setOrderAmountBase(value.getOrderAmountBase().add(accumulator.getOrderAmountBase()));
accumulator.setOrderCountBase(value.getOrderCountBase() + accumulator.getOrderCountBase());
return accumulator;
}
},
new ProcessWindowFunction<DwsTradeOrgOrderDayBean, DwsTradeOrgOrderDayBean, String, TimeWindow>() {
@Override
public void process(String s, ProcessWindowFunction<DwsTradeOrgOrderDayBean, DwsTradeOrgOrderDayBean, String, TimeWindow>.Context context, Iterable<DwsTradeOrgOrderDayBean> elements, Collector<DwsTradeOrgOrderDayBean> out) throws Exception {
long stt = context.window().getStart() - 8 * 60 * 60 * 1000;
String curDare = DateFormatUtil.toDate(stt);
for (DwsTradeOrgOrderDayBean bean : elements) {
bean.setCurDate(curDare);
bean.setTs(System.currentTimeMillis());
out.collect(bean);
}
}
}
);
// 补充城市维度信息
SingleOutputStreamOperator<DwsTradeOrgOrderDayBean> withCityDS = AsyncDataStream.unorderedWait(
aggregateDS,
new DimAsyncFunction<DwsTradeOrgOrderDayBean>("dim_base_region_info") {
@Override
public void join(DwsTradeOrgOrderDayBean bean, JSONObject dimInfoJsonObj) {
bean.setCityName(dimInfoJsonObj.getString("name"));
}
@Override
public Tuple2<String, String> getCondition(DwsTradeOrgOrderDayBean bean) {
return Tuple2.of("id", bean.getCityId());
}
},
60, TimeUnit.SECONDS
);
// 将结果写入ck中
withCityDS.print(">>>");
withCityDS.addSink(
ClickHouseUtil.getJdbcSink("insert into dws_trade_org_order_day_base values(?,?,?,?,?,?,?,?)")
);
env.execute();
}
}
CREATE TABLE IF NOT EXISTS dws_trade_org_order_day_base
(
`cur_date` Date COMMENT '统计日期',
`org_id` String COMMENT '机构ID',
`org_name` String COMMENT '机构名称',
`city_id` String COMMENT '城市ID',
`city_name` String COMMENT '城市名称',
`order_amount_base` Decimal(38, 20) COMMENT '下单金额',
`order_count_base` UInt64 COMMENT '下单次数',
`ts` UInt64 COMMENT '时间戳'
)
ENGINE = MergeTree
ORDER BY (cur_date, org_id, org_name, city_id, city_name);
CREATE MATERIALIZED VIEW IF NOT EXISTS dws_trade_org_order_day
(
`cur_date` Date,
`org_id` String,
`org_name` String,
`city_id` String,
`city_name` String,
`order_amount` AggregateFunction(argMax, Decimal(38, 20), UInt64),
`order_count` AggregateFunction(argMax, UInt64, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (cur_date, org_id, org_name, city_id, city_name)
SETTINGS index_granularity = 8192 AS
SELECT
cur_date,
org_id,
org_name,
city_id,
city_name,
argMaxState(order_amount_base, ts) AS order_amount,
argMaxState(order_count_base, ts) AS order_count
FROM dws_trade_org_order_day_base
GROUP BY
cur_date,
org_id,
org_name,
city_id,
city_name;
要求:统计当日转运完成运单数,写入ClickHouse对应表。
DwsTransBoundFinishDayBean.java
package com.atguigu.tms.realtime.beans;
import lombok.Builder;
import lombok.Data;
/**
* 物流域转运完成实体类
*/
@Data
@Builder
public class DwsTransBoundFinishDayBean {
// 统计日期
String curDate;
// 转运完成次数
Long boundFinishOrderCountBase;
// 时间戳
Long ts;
}
DwsTransBoundFinishDay.java
package com.atguigu.tms.realtime.app.dws;
import com.alibaba.fastjson.JSON;
import com.atguigu.tms.realtime.app.func.MyAggregationFunction;
import com.atguigu.tms.realtime.app.func.MyTriggerFunction;
import com.atguigu.tms.realtime.beans.DwdTransDispatchDetailBean;
import com.atguigu.tms.realtime.beans.DwsTransBoundFinishDayBean;
import com.atguigu.tms.realtime.utils.ClickHouseUtil;
import com.atguigu.tms.realtime.utils.CreateEnvUtil;
import com.atguigu.tms.realtime.utils.DateFormatUtil;
import com.atguigu.tms.realtime.utils.KafkaUtil;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.streaming.api.datastream.AllWindowedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessAllWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.time.Duration;
/**
* 物流域转运完成统计
*/
public class DwsTransBoundFinishDay {
public static void main(String[] args) throws Exception {
// TODO 1. 环境准备
StreamExecutionEnvironment env = CreateEnvUtil.getStreamEnv(args);
// 并行度设置,部署时应注释,通过 args 指定全局并行度
env.setParallelism(4);
// TODO 2. 从 Kafka tms_dwd_trans_bound_finish_detail 主题读取数据
String topic = "tms_dwd_trans_bound_finish_detail";
String groupId = "dws_trans_bound_finish_day";
KafkaSource<String> kafkaConsumer = KafkaUtil.getKafkaSource(topic, groupId, args);
SingleOutputStreamOperator<String> source = env
.fromSource(kafkaConsumer, WatermarkStrategy.noWatermarks(), "kafka_source")
.uid("kafka_source");
// TODO 3. 转换数据结构
SingleOutputStreamOperator<DwsTransBoundFinishDayBean> mappedStream = source.map(jsonStr -> {
DwdTransDispatchDetailBean dispatchDetailBean = JSON.parseObject(jsonStr, DwdTransDispatchDetailBean.class);
return DwsTransBoundFinishDayBean.builder()
.boundFinishOrderCountBase(1L)
.ts(dispatchDetailBean.getTs() + 8 * 60 * 60 * 1000L)
.build();
});
// TODO 4. 设置水位线
SingleOutputStreamOperator<DwsTransBoundFinishDayBean> withWatermarkStream = mappedStream.assignTimestampsAndWatermarks(
WatermarkStrategy.<DwsTransBoundFinishDayBean>forBoundedOutOfOrderness(Duration.ofSeconds(5L))
.withTimestampAssigner(new SerializableTimestampAssigner<DwsTransBoundFinishDayBean>() {
@Override
public long extractTimestamp(DwsTransBoundFinishDayBean element, long recordTimestamp) {
return element.getTs();
}
})
).uid("watermark_stream");
// TODO 5. 开窗
AllWindowedStream<DwsTransBoundFinishDayBean, TimeWindow> windowedStream =
withWatermarkStream.windowAll(TumblingEventTimeWindows.of(
org.apache.flink.streaming.api.windowing.time.Time.days(1L)));
// TODO 6. 引入触发器
AllWindowedStream<DwsTransBoundFinishDayBean, TimeWindow> triggerStream = windowedStream.trigger(
new MyTriggerFunction<DwsTransBoundFinishDayBean>()
);
// TODO 7. 聚合
SingleOutputStreamOperator<DwsTransBoundFinishDayBean> aggregatedStream = triggerStream.aggregate(
new MyAggregationFunction<DwsTransBoundFinishDayBean>() {
public DwsTransBoundFinishDayBean add(DwsTransBoundFinishDayBean value, DwsTransBoundFinishDayBean accumulator) {
if (accumulator == null) {
return value;
}
accumulator.setBoundFinishOrderCountBase(
accumulator.getBoundFinishOrderCountBase() + value.getBoundFinishOrderCountBase()
);
return accumulator;
}
},
new ProcessAllWindowFunction<DwsTransBoundFinishDayBean, DwsTransBoundFinishDayBean, TimeWindow>() {
@Override
public void process(Context context, Iterable<DwsTransBoundFinishDayBean> elements, Collector<DwsTransBoundFinishDayBean> out) throws Exception {
for (DwsTransBoundFinishDayBean element : elements) {
String curDate = DateFormatUtil.toDate(context.window().getStart() - 8 * 60 * 60 * 1000L);
// 补充统计日期字段
element.setCurDate(curDate);
// 补充时间戳字段
element.setTs(System.currentTimeMillis());
out.collect(element);
}
}
}
).uid("aggregate_stream");
// TODO 8. 写出到 ClickHouse
aggregatedStream.print(">>>>");
aggregatedStream.addSink(
ClickHouseUtil.getJdbcSink("insert into dws_trans_bound_finish_day_base values(?,?,?)")
).uid("clickhouse_sink");
env.execute();
}
}
CREATE TABLE IF NOT EXISTS dws_trans_bound_finish_day_base
(
`cur_date` Date COMMENT '统计日期',
`bound_finish_order_count_base` UInt64 COMMENT '转运完成次数',
`ts` UInt64 COMMENT '时间戳'
)
ENGINE = MergeTree
ORDER BY cur_date;
CREATE MATERIALIZED VIEW IF NOT EXISTS dws_trans_bound_finish_day
(
`cur_date` Date,
`bound_finish_order_count` AggregateFunction(argMax, UInt64, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY cur_date
SETTINGS index_granularity = 8192 AS
SELECT
cur_date,
argMaxState(bound_finish_order_count_base, ts) AS bound_finish_order_count
FROM dws_trans_bound_finish_day_base
GROUP BY cur_date;
要求:统计当日发单数,写入ClickHouse。
DwsTransDispatchDayBean.java
package com.atguigu.tms.realtime.beans;
import lombok.Builder;
import lombok.Data;
/**
* 物流域发单统计实体类
*/
@Data
@Builder
public class DwsTransDispatchDayBean {
// 统计日期
String curDate;
// 发单数
Long dispatchOrderCountBase;
// 时间戳
Long ts;
}
DwsTransDispatchDay.java
package com.atguigu.tms.realtime.app.dws;
import com.alibaba.fastjson.JSON;
import com.atguigu.tms.realtime.app.func.MyAggregationFunction;
import com.atguigu.tms.realtime.app.func.MyTriggerFunction;
import com.atguigu.tms.realtime.beans.DwdTransDispatchDetailBean;
import com.atguigu.tms.realtime.beans.DwsTransDispatchDayBean;
import com.atguigu.tms.realtime.utils.ClickHouseUtil;
import com.atguigu.tms.realtime.utils.CreateEnvUtil;
import com.atguigu.tms.realtime.utils.DateFormatUtil;
import com.atguigu.tms.realtime.utils.KafkaUtil;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.streaming.api.datastream.AllWindowedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessAllWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
/**
* 物流域发单聚合统计
*/
public class DwsTransDispatchDay {
public static void main(String[] args) throws Exception {
// TODO 1. 环境准备
StreamExecutionEnvironment env = CreateEnvUtil.getStreamEnv(args);
// 并行度设置,部署时应注释,通过 args 指定全局并行度
env.setParallelism(4);
// TODO 2. 从 Kafka tms_dwd_trans_dispatch_detail 主题读取数据
String topic = "tms_dwd_trans_dispatch_detail";
String groupId = "dws_trans_dispatch_day";
KafkaSource<String> kafkaConsumer = KafkaUtil.getKafkaSource(topic, groupId, args);
SingleOutputStreamOperator<String> source = env
.fromSource(kafkaConsumer, WatermarkStrategy.noWatermarks(), "kafka_source")
.uid("kafka_source");
// TODO 3. 转换数据结构
SingleOutputStreamOperator<DwsTransDispatchDayBean> mappedStream = source.map(jsonStr -> {
DwdTransDispatchDetailBean dispatchDetailBean = JSON.parseObject(jsonStr, DwdTransDispatchDetailBean.class);
return DwsTransDispatchDayBean.builder()
.dispatchOrderCountBase(1L)
.ts(dispatchDetailBean.getTs() + 8 * 60 * 60 * 1000L)
.build();
});
// TODO 4. 设置水位线
SingleOutputStreamOperator<DwsTransDispatchDayBean> withWatermarkStream = mappedStream.assignTimestampsAndWatermarks(
// WatermarkStrategy.<DwsTransDispatchDayBean>forBoundedOutOfOrderness(Duration.ofSeconds(5L))
WatermarkStrategy.<DwsTransDispatchDayBean>forMonotonousTimestamps()
.withTimestampAssigner(new SerializableTimestampAssigner<DwsTransDispatchDayBean>() {
@Override
public long extractTimestamp(DwsTransDispatchDayBean element, long recordTimestamp) {
return element.getTs();
}
})
).uid("watermark_stream");
// TODO 5. 开窗
AllWindowedStream<DwsTransDispatchDayBean, TimeWindow> windowedStream =
withWatermarkStream.windowAll(TumblingEventTimeWindows.of(
org.apache.flink.streaming.api.windowing.time.Time.days(1L)));
// TODO 6. 引入触发器
AllWindowedStream<DwsTransDispatchDayBean, TimeWindow> triggerStream = windowedStream.trigger(
new MyTriggerFunction<DwsTransDispatchDayBean>()
);
// TODO 7. 聚合
SingleOutputStreamOperator<DwsTransDispatchDayBean> aggregatedStream = triggerStream.aggregate(
new MyAggregationFunction<DwsTransDispatchDayBean>() {
@Override
public DwsTransDispatchDayBean add(DwsTransDispatchDayBean value, DwsTransDispatchDayBean accumulator) {
if (accumulator == null) {
return value;
}
accumulator.setDispatchOrderCountBase(
accumulator.getDispatchOrderCountBase() + value.getDispatchOrderCountBase()
);
return accumulator;
}
},
new ProcessAllWindowFunction<DwsTransDispatchDayBean, DwsTransDispatchDayBean, TimeWindow>() {
@Override
public void process(Context context, Iterable<DwsTransDispatchDayBean> elements, Collector<DwsTransDispatchDayBean> out) throws Exception {
for (DwsTransDispatchDayBean element : elements) {
String curDate = DateFormatUtil.toDate(context.window().getStart() - 8 * 60 * 60 * 1000L);
// 补充统计日期字段
element.setCurDate(curDate);
// 补充时间戳字段
element.setTs(System.currentTimeMillis());
out.collect(element);
}
}
}
).uid("aggregate_stream");
// TODO 8. 写出到 ClickHouse
aggregatedStream.print(">>>>");
aggregatedStream.addSink(
ClickHouseUtil.getJdbcSink("insert into dws_trans_dispatch_day_base values(?,?,?)")
).uid("clickhouse_stream");
env.execute();
}
}
CREATE TABLE IF NOT EXISTS dws_trans_dispatch_day_base
(
`cur_date` Date COMMENT '统计日期',
`dispatch_order_count_base` UInt64 COMMENT '发单数',
`ts` UInt64 COMMENT '时间戳'
)
ENGINE = MergeTree
ORDER BY cur_date;
CREATE MATERIALIZED VIEW IF NOT EXISTS dws_trans_dispatch_day
(
`cur_date` Date,
`dispatch_order_count` AggregateFunction(argMax, UInt64, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY cur_date
SETTINGS index_granularity = 8192 AS
SELECT
cur_date,
argMaxState(dispatch_order_count_base, ts) AS dispatch_order_count
FROM dws_trans_dispatch_day_base
GROUP BY cur_date;
要求:统计当日各机构派送成功次数(运单数),写入ClickHouse。
DwsTransOrgDeliverSucDayBean.java
package com.atguigu.tms.realtime.beans;
import lombok.Builder;
import lombok.Data;
/**
* 物流域机构派送成功统计实体类
*/
@Data
@Builder
public class DwsTransOrgDeliverSucDayBean {
// 统计日期
String curDate;
// 机构 ID
String orgId;
// 机构名称
String orgName;
// 地区 ID
@TransientSink
String districtId;
// 城市 ID
String cityId;
// 城市名称
String cityName;
// 省份 ID
String provinceId;
// 省份名称
String provinceName;
// 派送成功次数
Long deliverSucCountBase;
// 时间戳
Long ts;
}
DwsTransOrgDeliverSucDay.java
package com.atguigu.tms.realtime.app.dws;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.atguigu.tms.realtime.app.func.DimAsyncFunction;
import com.atguigu.tms.realtime.app.func.MyAggregationFunction;
import com.atguigu.tms.realtime.app.func.MyTriggerFunction;
import com.atguigu.tms.realtime.beans.DwdTransDeliverSucDetailBean;
import com.atguigu.tms.realtime.beans.DwsTransOrgDeliverSucDayBean;
import com.atguigu.tms.realtime.utils.ClickHouseUtil;
import com.atguigu.tms.realtime.utils.CreateEnvUtil;
import com.atguigu.tms.realtime.utils.DateFormatUtil;
import com.atguigu.tms.realtime.utils.KafkaUtil;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.time.Duration;
import java.util.concurrent.TimeUnit;
/**
* 物流域机构派送成功统计
*/
public class DwsTransOrgDeliverSucDay {
public static void main(String[] args) throws Exception {
// TODO 1. 环境准备
StreamExecutionEnvironment env = CreateEnvUtil.getStreamEnv(args);
// 并行度设置,部署时应注释,通过 args 指定全局并行度
env.setParallelism(4);
// TODO 2. 从 Kafka tms_dwd_trans_deliver_detail 主题读取数据
String topic = "tms_dwd_trans_deliver_detail";
String groupId = "dws_trans_org_deliver_suc_day";
KafkaSource<String> kafkaConsumer = KafkaUtil.getKafkaSource(topic, groupId, args);
SingleOutputStreamOperator<String> source = env
.fromSource(kafkaConsumer, WatermarkStrategy.noWatermarks(), "kafka_source")
.uid("kafka_source");
// TODO 3. 转换数据结构
SingleOutputStreamOperator<DwsTransOrgDeliverSucDayBean> mappedStream = source.map(jsonStr -> {
DwdTransDeliverSucDetailBean dwdTransDeliverSucDetailBean = JSON.parseObject(jsonStr, DwdTransDeliverSucDetailBean.class);
return DwsTransOrgDeliverSucDayBean.builder()
.districtId(dwdTransDeliverSucDetailBean.getReceiverDistrictId())
.cityId(dwdTransDeliverSucDetailBean.getReceiverCityId())
.provinceId(dwdTransDeliverSucDetailBean.getReceiverProvinceId())
.deliverSucCountBase(1L)
.ts(dwdTransDeliverSucDetailBean.getTs() + 8 * 60 * 60 * 1000L)
.build();
});
// TODO 4. 获取维度信息
// 获取机构 ID
SingleOutputStreamOperator<DwsTransOrgDeliverSucDayBean> withOrgIdStream = AsyncDataStream.unorderedWait(
mappedStream,
new DimAsyncFunction<DwsTransOrgDeliverSucDayBean>("dim_base_organ") {
@Override
public void join(DwsTransOrgDeliverSucDayBean bean, JSONObject dimJsonObj) {
bean.setOrgId(dimJsonObj.getString("id"));
}
@Override
public Tuple2<String, String> getCondition(DwsTransOrgDeliverSucDayBean bean) {
return Tuple2.of("region_id", bean.getDistrictId());
}
},
60, TimeUnit.SECONDS
).uid("with_org_id_stream");
// TODO 5. 设置水位线
SingleOutputStreamOperator<DwsTransOrgDeliverSucDayBean> withWatermarkStream = withOrgIdStream.assignTimestampsAndWatermarks(
WatermarkStrategy.<DwsTransOrgDeliverSucDayBean>forBoundedOutOfOrderness(Duration.ofSeconds(5L))
.withTimestampAssigner(new SerializableTimestampAssigner<DwsTransOrgDeliverSucDayBean>() {
@Override
public long extractTimestamp(DwsTransOrgDeliverSucDayBean element, long recordTimestamp) {
return element.getTs();
}
})
.withIdleness(Duration.ofSeconds(20))
).uid("watermark_stream");
// TODO 6. 按照机构 ID 分组
KeyedStream<DwsTransOrgDeliverSucDayBean, String> keyedStream = withWatermarkStream.keyBy(DwsTransOrgDeliverSucDayBean::getOrgId);
// TODO 7. 开窗
WindowedStream<DwsTransOrgDeliverSucDayBean, String, TimeWindow> windowStream =
keyedStream.window(TumblingEventTimeWindows.of(org.apache.flink.streaming.api.windowing.time.Time.days(1L)));
// TODO 8. 引入触发器
WindowedStream<DwsTransOrgDeliverSucDayBean, String, TimeWindow> triggerStream = windowStream.trigger(new MyTriggerFunction<>());
// TODO 9. 聚合
SingleOutputStreamOperator<DwsTransOrgDeliverSucDayBean> aggregatedStream = triggerStream.aggregate(
new MyAggregationFunction<DwsTransOrgDeliverSucDayBean>() {
@Override
public DwsTransOrgDeliverSucDayBean add(DwsTransOrgDeliverSucDayBean value, DwsTransOrgDeliverSucDayBean accumulator) {
if (accumulator == null) {
return value;
}
accumulator.setDeliverSucCountBase(
accumulator.getDeliverSucCountBase() + value.getDeliverSucCountBase()
);
return accumulator;
}
},
new ProcessWindowFunction<DwsTransOrgDeliverSucDayBean, DwsTransOrgDeliverSucDayBean, String, TimeWindow>() {
@Override
public void process(String key, Context context, Iterable<DwsTransOrgDeliverSucDayBean> elements, Collector<DwsTransOrgDeliverSucDayBean> out) throws Exception {
for (DwsTransOrgDeliverSucDayBean element : elements) {
long stt = context.window().getStart();
element.setCurDate(DateFormatUtil.toDate(stt - 8 * 60 * 60 * 1000L));
element.setTs(System.currentTimeMillis());
out.collect(element);
}
}
}
).uid("aggregate_stream");
// TODO 10. 补全维度信息
// 10.1 补充机构名称
SingleOutputStreamOperator<DwsTransOrgDeliverSucDayBean> withOrgNameAndRegionIdStream = AsyncDataStream.unorderedWait(
aggregatedStream,
new DimAsyncFunction<DwsTransOrgDeliverSucDayBean>("dim_base_organ") {
@Override
public void join(DwsTransOrgDeliverSucDayBean bean, JSONObject dimJsonObj){
bean.setOrgName(dimJsonObj.getString("org_name"));
}
@Override
public Tuple2<String,String> getCondition(DwsTransOrgDeliverSucDayBean bean) {
return Tuple2.of("id",bean.getOrgId());
}
},
60, TimeUnit.SECONDS
).uid("with_org_name_and_region_id_stream");
// 10.2 补充城市名称
SingleOutputStreamOperator<DwsTransOrgDeliverSucDayBean> withCityNameStream = AsyncDataStream.unorderedWait(
withOrgNameAndRegionIdStream,
new DimAsyncFunction<DwsTransOrgDeliverSucDayBean>("dim_base_region_info") {
@Override
public void join(DwsTransOrgDeliverSucDayBean bean, JSONObject dimJsonObj) {
bean.setCityName(dimJsonObj.getString("name"));
}
@Override
public Tuple2<String,String> getCondition(DwsTransOrgDeliverSucDayBean bean) {
return Tuple2.of("id",bean.getCityId());
}
},
60, TimeUnit.SECONDS
).uid("with_city_name_stream");
// 11.3 补充省份名称
SingleOutputStreamOperator<DwsTransOrgDeliverSucDayBean> fullStream = AsyncDataStream.unorderedWait(
withCityNameStream,
new DimAsyncFunction<DwsTransOrgDeliverSucDayBean>("dim_base_region_info") {
@Override
public void join(DwsTransOrgDeliverSucDayBean bean, JSONObject dimJsonObj) {
bean.setProvinceName(dimJsonObj.getString("name"));
}
@Override
public Tuple2<String,String> getCondition(DwsTransOrgDeliverSucDayBean bean) {
return Tuple2.of("id",bean.getProvinceId());
}
},
60, TimeUnit.SECONDS
).uid("with_province_name_stream");
// TODO 12. 写出到 ClickHouse
fullStream.print(">>>");
fullStream.addSink(
ClickHouseUtil.getJdbcSink("insert into dws_trans_org_deliver_suc_day_base values(?,?,?,?,?,?,?,?,?)")
).uid("clickhouse_stream");
env.execute();
}
}
CREATE TABLE IF NOT EXISTS dws_trans_org_deliver_suc_day_base
(
`cur_date` Date COMMENT '统计日期',
`org_id` String COMMENT '机构ID',
`org_name` String COMMENT '机构名称',
`city_id` String COMMENT '城市ID',
`city_name` String COMMENT '城市名称',
`province_id` String COMMENT '地区ID',
`province_name` String COMMENT '地区名称',
`deliver_suc_count_base` UInt64 COMMENT '派送成功次数',
`ts` UInt64 COMMENT '时间戳'
)
ENGINE = MergeTree
ORDER BY (cur_date, org_id, org_name, city_id, city_name, province_id, province_name);
CREATE MATERIALIZED VIEW IF NOT EXISTS dws_trans_org_deliver_suc_day
(
`cur_date` Date,
`org_id` String,
`org_name` String,
`city_id` String,
`city_name` String,
`province_id` String,
`province_name` String,
`deliver_suc_count` AggregateFunction(argMax, UInt64, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (cur_date, org_id, org_name, city_id, city_name, province_id, province_name)
SETTINGS index_granularity = 8192 AS
SELECT
cur_date,
org_id,
org_name,
city_id,
city_name,
province_id,
province_name,
argMaxState(deliver_suc_count_base, ts) AS deliver_suc_count
FROM dws_trans_org_deliver_suc_day_base
GROUP BY
cur_date,
org_id,
org_name,
city_id,
city_name,
province_id,
province_name;
要求:统计当日各机构揽收次数,写入ClickHouse。
DwsTransOrgReceiveDayBean.java
package com.atguigu.tms.realtime.beans;
import lombok.Builder;
import lombok.Data;
/**
*物流域机构粒度揽收统计实体类
*/
@Data
@Builder
public class DwsTransOrgReceiveDayBean {
// 统计日期
String curDate;
// 转运站ID
String orgId;
// 转运站名称
String orgName;
// 地区ID
@TransientSink
String districtId;
// 城市ID
String cityId;
// 城市名称
String cityName;
// 省份ID
String provinceId;
// 省份名称
String provinceName;
// 揽收次数(一个订单算一次)
Long receiveOrderCountBase;
// 时间戳
Long ts;
}
DwsTransOrgReceiveDay.java
package com.atguigu.tms.realtime.app.dws;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.atguigu.tms.realtime.app.func.DimAsyncFunction;
import com.atguigu.tms.realtime.app.func.MyAggregationFunction;
import com.atguigu.tms.realtime.app.func.MyTriggerFunction;
import com.atguigu.tms.realtime.beans.DwdTransReceiveDetailBean;
import com.atguigu.tms.realtime.beans.DwsTransOrgReceiveDayBean;
import com.atguigu.tms.realtime.utils.ClickHouseUtil;
import com.atguigu.tms.realtime.utils.CreateEnvUtil;
import com.atguigu.tms.realtime.utils.DateFormatUtil;
import com.atguigu.tms.realtime.utils.KafkaUtil;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.time.Duration;
import java.util.concurrent.TimeUnit;
/**
*物流域机构粒度揽收聚合统计
*/
public class DwsTransOrgReceiveDay {
public static void main(String[] args) throws Exception {
// TODO 1. 环境准备
StreamExecutionEnvironment env = CreateEnvUtil.getStreamEnv(args);
// 并行度设置,部署时应注释,通过 args 指定全局并行度
env.setParallelism(4);
// TODO 2. 从指定主题读取数据
String topic = "tms_dwd_trans_receive_detail";
String groupId = "dws_trans_org_receive_day";
KafkaSource<String> kafkaConsumer = KafkaUtil.getKafkaSource(topic, groupId, args);
SingleOutputStreamOperator<String> source = env
.fromSource(kafkaConsumer, WatermarkStrategy.noWatermarks(), "kafka_source")
.uid("kafka_source");
// TODO 3. 转换数据结构
SingleOutputStreamOperator<DwsTransOrgReceiveDayBean> mappedStream = source.map(
jsonStr -> {
DwdTransReceiveDetailBean dwdTransReceiveDetailBean = JSON.parseObject(jsonStr, DwdTransReceiveDetailBean.class);
return DwsTransOrgReceiveDayBean.builder()
.districtId(dwdTransReceiveDetailBean.getSenderDistrictId())
.provinceId(dwdTransReceiveDetailBean.getSenderProvinceId())
.cityId(dwdTransReceiveDetailBean.getSenderCityId())
.receiveOrderCountBase(1L)
.ts(dwdTransReceiveDetailBean.getTs() + 8 * 60 * 60 * 1000L)
.build();
}
);
// TODO 4. 关联维度信息
// 关联机构id
SingleOutputStreamOperator<DwsTransOrgReceiveDayBean> withOrgIdStream = AsyncDataStream.unorderedWait(
mappedStream,
new DimAsyncFunction<DwsTransOrgReceiveDayBean>("dim_base_organ") {
@Override
public void join(DwsTransOrgReceiveDayBean bean, JSONObject dimJsonObj) {
bean.setOrgId(dimJsonObj.getString("id"));
}
@Override
public Tuple2<String, String> getCondition(DwsTransOrgReceiveDayBean bean) {
return Tuple2.of("region_id", bean.getDistrictId());
}
}, 5 * 60,
TimeUnit.SECONDS
).uid("with_org_id_stream");
// TODO 5. 设置水位线
SingleOutputStreamOperator<DwsTransOrgReceiveDayBean> withWatermarkStream = withOrgIdStream.assignTimestampsAndWatermarks(
WatermarkStrategy.<DwsTransOrgReceiveDayBean>forBoundedOutOfOrderness(Duration.ofSeconds(5L))
.withTimestampAssigner(
new SerializableTimestampAssigner<DwsTransOrgReceiveDayBean>() {
@Override
public long extractTimestamp(DwsTransOrgReceiveDayBean bean, long recordTimestamp) {
return bean.getTs();
}
}
)
).uid("watermark_stream");
// TODO 7. 按照 orgID 分组
KeyedStream<DwsTransOrgReceiveDayBean, String> keyedStream = withWatermarkStream.keyBy(DwsTransOrgReceiveDayBean::getOrgId);
// TODO 8. 开窗
WindowedStream<DwsTransOrgReceiveDayBean, String, TimeWindow> windowStream =
keyedStream.window(TumblingEventTimeWindows.of(
org.apache.flink.streaming.api.windowing.time.Time.days(1L)));
// TODO 9. 引入触发器
WindowedStream<DwsTransOrgReceiveDayBean, String, TimeWindow> triggerStream = windowStream.trigger(
new MyTriggerFunction<DwsTransOrgReceiveDayBean>()
);
// TODO 10. 聚合
SingleOutputStreamOperator<DwsTransOrgReceiveDayBean> aggregatedStream = triggerStream.aggregate(
new MyAggregationFunction<DwsTransOrgReceiveDayBean>() {
@Override
public DwsTransOrgReceiveDayBean add(DwsTransOrgReceiveDayBean value, DwsTransOrgReceiveDayBean accumulator) {
if (accumulator == null) {
return value;
}
accumulator.setReceiveOrderCountBase(
accumulator.getReceiveOrderCountBase() + value.getReceiveOrderCountBase());
return accumulator;
}
},
new ProcessWindowFunction<DwsTransOrgReceiveDayBean, DwsTransOrgReceiveDayBean, String, TimeWindow>() {
@Override
public void process(String key, Context context, Iterable<DwsTransOrgReceiveDayBean> elements, Collector<DwsTransOrgReceiveDayBean> out) throws Exception {
for (DwsTransOrgReceiveDayBean element : elements) {
// 补全统计日期字段
String curDate = DateFormatUtil.toDate(context.window().getStart() - 8 * 60 * 60 * 1000L);
element.setCurDate(curDate);
// 补全时间戳
element.setTs(System.currentTimeMillis());
out.collect(element);
}
}
}
).uid("aggregate_stream");
// TODO 11. 补充维度信息
// 11.1 补充转运站名称
SingleOutputStreamOperator<DwsTransOrgReceiveDayBean> withOrgNameStream = AsyncDataStream.unorderedWait(
aggregatedStream,
new DimAsyncFunction<DwsTransOrgReceiveDayBean>("dim_base_organ") {
@Override
public void join(DwsTransOrgReceiveDayBean bean, JSONObject dimJsonObj) {
bean.setOrgName(dimJsonObj.getString("org_name"));
}
@Override
public Tuple2<String,String> getCondition(DwsTransOrgReceiveDayBean bean) {
return Tuple2.of("id",bean.getOrgId());
}
},
60, TimeUnit.SECONDS
).uid("with_org_name_stream");
// 11.2 补充城市名称
SingleOutputStreamOperator<DwsTransOrgReceiveDayBean> withCityNameStream = AsyncDataStream.unorderedWait(
withOrgNameStream,
new DimAsyncFunction<DwsTransOrgReceiveDayBean>("dim_base_region_info") {
@Override
public void join(DwsTransOrgReceiveDayBean bean, JSONObject dimJsonObj) {
bean.setCityName(dimJsonObj.getString("name"));
}
@Override
public Tuple2<String,String> getCondition(DwsTransOrgReceiveDayBean bean) {
return Tuple2.of("id",bean.getCityId());
}
},
60, TimeUnit.SECONDS
).uid("with_city_name_stream");
// 11.3 补充省份名称
SingleOutputStreamOperator<DwsTransOrgReceiveDayBean> fullStream = AsyncDataStream.unorderedWait(
withCityNameStream,
new DimAsyncFunction<DwsTransOrgReceiveDayBean>("dim_base_region_info") {
@Override
public void join(DwsTransOrgReceiveDayBean bean, JSONObject dimJsonObj) {
bean.setProvinceName(dimJsonObj.getString("name"));
}
@Override
public Tuple2<String,String> getCondition(DwsTransOrgReceiveDayBean bean) {
return Tuple2.of("id",bean.getProvinceId());
}
},
60, TimeUnit.SECONDS
).uid("with_province_name_stream");
// TODO 12. 写出到 ClickHouse
fullStream.print(">>>");
fullStream.addSink(
ClickHouseUtil.getJdbcSink("insert into dws_trans_org_receive_day_base values(?,?,?,?,?,?,?,?,?)")
).uid("clickhouse_stream");
env.execute();
}
}
CREATE TABLE IF NOT EXISTS dws_trans_org_receive_day_base
(
`cur_date` Date COMMENT '统计日期',
`org_id` String COMMENT '转运站ID',
`org_name` String COMMENT '转运站名称',
`city_id` String COMMENT '城市ID',
`city_name` String COMMENT '城市名称',
`province_id` String COMMENT '地区ID',
`province_name` String COMMENT '地区名称',
`receive_order_count_base` UInt64 COMMENT '揽收次数',
`ts` UInt64 COMMENT '时间戳'
)
ENGINE = MergeTree
ORDER BY (cur_date, org_id, org_name, city_id, city_name, province_id, province_name);
CREATE MATERIALIZED VIEW IF NOT EXISTS dws_trans_org_receive_day
(
`cur_date` Date,
`org_id` String,
`org_name` String,
`city_id` String,
`city_name` String,
`province_id` String,
`province_name` String,
`receive_order_count` AggregateFunction(argMax, UInt64, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (cur_date, org_id, org_name, city_id, city_name, province_id, province_name)
SETTINGS index_granularity = 8192 AS
SELECT
cur_date,
org_id,
org_name,
city_id,
city_name,
province_id,
province_name,
argMaxState(receive_order_count_base, ts) AS receive_order_count
FROM dws_trans_org_receive_day_base
GROUP BY
cur_date,
org_id,
org_name,
city_id,
city_name,
province_id,
province_name;
要求:统计各机构各类别卡车当日运输完成次数、里程和历经时长,写入ClickHouse。
DwsTransOrgTruckModelTransFinishDayBean.java
package com.atguigu.tms.realtime.beans;
import lombok.Builder;
import lombok.Data;
import java.math.BigDecimal;
/*
* 物流域机构卡车类别粒度统计实体类
*/
@Data
@Builder
public class DwsTransOrgTruckModelTransFinishDayBean {
// 统计日期
String curDate;
// 机构ID
String orgId;
// 机构名称
String orgName;
// 卡车ID
@TransientSink
String truckId;
// 卡车型号ID
String truckModelId;
// 卡车型号名称
String truckModelName;
// 用于关联城市信息的一级机构ID
@TransientSink
String joinOrgId;
// 城市ID
String cityId;
// 城市名称
String cityName;
// 运输完成次数
Long transFinishCountBase;
// 运输完成里程
BigDecimal transFinishDistanceBase;
// 运输完成历经时长
Long transFinishDurTimeBase;
// 时间戳
Long ts;
}
DwsTransOrgTruckModelTransFinishDay.java
package com.atguigu.tms.realtime.app.dws;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.atguigu.tms.realtime.app.func.DimAsyncFunction;
import com.atguigu.tms.realtime.app.func.MyAggregationFunction;
import com.atguigu.tms.realtime.app.func.MyTriggerFunction;
import com.atguigu.tms.realtime.beans.DwdTransTransFinishBean;
import com.atguigu.tms.realtime.beans.DwsTransOrgTruckModelTransFinishDayBean;
import com.atguigu.tms.realtime.utils.ClickHouseUtil;
import com.atguigu.tms.realtime.utils.CreateEnvUtil;
import com.atguigu.tms.realtime.utils.DateFormatUtil;
import com.atguigu.tms.realtime.utils.KafkaUtil;
import org.apache.flink.api.common.eventtime.SerializableTimestampAssigner;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.streaming.api.datastream.AsyncDataStream;
import org.apache.flink.streaming.api.datastream.KeyedStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.datastream.WindowedStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.util.concurrent.TimeUnit;
// 物流域机构卡车类别粒度聚合统计
public class DwsTransOrgTruckModelTransFinishDay {
public static void main(String[] args) throws Exception {
// 环境准备
StreamExecutionEnvironment env = CreateEnvUtil.getStreamEnv(args);
env.setParallelism(4);
// 从Kafka的运输事实表中读取数据
String topic = "tms_dwd_trans_trans_finish";
String groupId = "dws_trans_org_truck_model_group";
KafkaSource<String> kafkaConsumer = KafkaUtil.getKafkaSource(topic, groupId, args);
SingleOutputStreamOperator<String> kafkaDS = env
.fromSource(kafkaConsumer, WatermarkStrategy.noWatermarks(), "kafka_source")
.uid("kafka_source");
// 对流中数据进行类型转换 jsonStr->实体类 关联卡车维度
SingleOutputStreamOperator<DwsTransOrgTruckModelTransFinishDayBean> mapDS = kafkaDS.map(
new MapFunction<String, DwsTransOrgTruckModelTransFinishDayBean>() {
@Override
public DwsTransOrgTruckModelTransFinishDayBean map(String jsonStr) throws Exception {
DwdTransTransFinishBean finishBean = JSON.parseObject(jsonStr, DwdTransTransFinishBean.class);
DwsTransOrgTruckModelTransFinishDayBean bean = DwsTransOrgTruckModelTransFinishDayBean.builder()
.orgId(finishBean.getStartOrgId())
.orgName(finishBean.getStartOrgName())
.truckId(finishBean.getTruckId())
.transFinishCountBase(1L)
.transFinishDistanceBase(finishBean.getActualDistance())
.transFinishDurTimeBase(finishBean.getTransportTime())
.ts(finishBean.getTs() + 8 * 60 * 60 * 1000L)
.build();
return bean;
}
}
);
// 关联卡车维度 获取卡车型号
SingleOutputStreamOperator<DwsTransOrgTruckModelTransFinishDayBean> withTruckDS = AsyncDataStream.unorderedWait(
mapDS,
new DimAsyncFunction<DwsTransOrgTruckModelTransFinishDayBean>("dim_truck_info") {
@Override
public void join(DwsTransOrgTruckModelTransFinishDayBean bean, JSONObject dimInfoJsonObj) {
bean.setTruckModelId(dimInfoJsonObj.getString("truck_model_id"));
}
@Override
public Tuple2<String, String> getCondition(DwsTransOrgTruckModelTransFinishDayBean bean) {
return Tuple2.of("id", bean.getTruckId());
}
},
60, TimeUnit.SECONDS
);
// 指定Watermark的生成策略并提起事件时间字段
SingleOutputStreamOperator<DwsTransOrgTruckModelTransFinishDayBean> withWatermarkDS = withTruckDS.assignTimestampsAndWatermarks(
WatermarkStrategy.<DwsTransOrgTruckModelTransFinishDayBean>forMonotonousTimestamps()
.withTimestampAssigner(
new SerializableTimestampAssigner<DwsTransOrgTruckModelTransFinishDayBean>() {
@Override
public long extractTimestamp(DwsTransOrgTruckModelTransFinishDayBean element, long l) {
return element.getTs();
}
}
)
);
// 按照机构id + 卡车型号进行分组
KeyedStream<DwsTransOrgTruckModelTransFinishDayBean, String> keyDS = withWatermarkDS.keyBy(
new KeySelector<DwsTransOrgTruckModelTransFinishDayBean, String>() {
@Override
public String getKey(DwsTransOrgTruckModelTransFinishDayBean bean) throws Exception {
return bean.getOrgId() + "+" + bean.getTruckModelId();
}
}
);
// 开窗
WindowedStream<DwsTransOrgTruckModelTransFinishDayBean, String, TimeWindow> windowDS = keyDS.window(TumblingEventTimeWindows.of(Time.days(1)));
// 指定自定义触发器
WindowedStream<DwsTransOrgTruckModelTransFinishDayBean, String, TimeWindow> triggerDS = windowDS.trigger(new MyTriggerFunction<DwsTransOrgTruckModelTransFinishDayBean>());
// 聚合
SingleOutputStreamOperator<DwsTransOrgTruckModelTransFinishDayBean> aggregateDS = triggerDS.aggregate(
new MyAggregationFunction<DwsTransOrgTruckModelTransFinishDayBean>() {
@Override
public DwsTransOrgTruckModelTransFinishDayBean add(DwsTransOrgTruckModelTransFinishDayBean value, DwsTransOrgTruckModelTransFinishDayBean accumulator) {
if (accumulator == null) {
return value;
}
accumulator.setTransFinishCountBase(value.getTransFinishCountBase() + accumulator.getTransFinishCountBase());
accumulator.setTransFinishDistanceBase(value.getTransFinishDistanceBase().add(accumulator.getTransFinishDistanceBase()));
accumulator.setTransFinishDurTimeBase(value.getTransFinishDurTimeBase() + accumulator.getTransFinishDurTimeBase());
return accumulator;
}
},
new ProcessWindowFunction<DwsTransOrgTruckModelTransFinishDayBean, DwsTransOrgTruckModelTransFinishDayBean, String, TimeWindow>() {
@Override
public void process(String s, Context context, Iterable<DwsTransOrgTruckModelTransFinishDayBean> elements, Collector<DwsTransOrgTruckModelTransFinishDayBean> out) throws Exception {
Long stt = context.window().getStart() - 8 * 60 * 60 * 1000L;
String curDate = DateFormatUtil.toDate(stt);
for (DwsTransOrgTruckModelTransFinishDayBean element : elements) {
element.setCurDate(curDate);
element.setTs(System.currentTimeMillis());
out.collect(element);
}
}
}
);
// 关联维度信息
// 获取卡车型号名称
SingleOutputStreamOperator<DwsTransOrgTruckModelTransFinishDayBean> withTruckModelDS = AsyncDataStream.unorderedWait(
aggregateDS,
new DimAsyncFunction<DwsTransOrgTruckModelTransFinishDayBean>("dim_truck_model") {
@Override
public void join(DwsTransOrgTruckModelTransFinishDayBean bean, JSONObject dimInfoJsonObj) {
bean.setTruckModelName(dimInfoJsonObj.getString("model_name"));
}
@Override
public Tuple2<String, String> getCondition(DwsTransOrgTruckModelTransFinishDayBean bean) {
return Tuple2.of("id", bean.getTruckModelId());
}
},
60, TimeUnit.SECONDS
);
// 获取机构(对应的转运中心)的id
SingleOutputStreamOperator<DwsTransOrgTruckModelTransFinishDayBean> withJoinOrgIdDS = AsyncDataStream.unorderedWait(
withTruckModelDS,
new DimAsyncFunction<DwsTransOrgTruckModelTransFinishDayBean>("dim_base_organ") {
@Override
public void join(DwsTransOrgTruckModelTransFinishDayBean bean, JSONObject dimInfoJsonObj) {
String orgParentId = dimInfoJsonObj.getString("org_parent_id");
bean.setJoinOrgId(orgParentId != null ? orgParentId : bean.getOrgId());
}
@Override
public Tuple2<String, String> getCondition(DwsTransOrgTruckModelTransFinishDayBean bean) {
return Tuple2.of("id", bean.getOrgId());
}
},
60, TimeUnit.SECONDS
);
// 根据转运中心的id,到机构表中获取城市id
SingleOutputStreamOperator<DwsTransOrgTruckModelTransFinishDayBean> withCityIdDS = AsyncDataStream.unorderedWait(
withJoinOrgIdDS,
new DimAsyncFunction<DwsTransOrgTruckModelTransFinishDayBean>("dim_base_organ") {
@Override
public void join(DwsTransOrgTruckModelTransFinishDayBean bean, JSONObject dimInfoJsonObj) {
bean.setCityId(dimInfoJsonObj.getString("region_id"));
}
@Override
public Tuple2<String, String> getCondition(DwsTransOrgTruckModelTransFinishDayBean bean) {
return Tuple2.of("id", bean.getJoinOrgId());
}
},
60, TimeUnit.SECONDS
);
;
// 根据城市id 到区域表中获取城市名称
SingleOutputStreamOperator<DwsTransOrgTruckModelTransFinishDayBean> withCityNameDS = AsyncDataStream.unorderedWait(
withCityIdDS,
new DimAsyncFunction<DwsTransOrgTruckModelTransFinishDayBean>("dim_base_region_info") {
@Override
public void join(DwsTransOrgTruckModelTransFinishDayBean bean, JSONObject dimInfoJsonObj) {
bean.setCityName(dimInfoJsonObj.getString("name"));
}
@Override
public Tuple2<String, String> getCondition(DwsTransOrgTruckModelTransFinishDayBean bean) {
return Tuple2.of("id", bean.getCityId());
}
},
60, TimeUnit.SECONDS
);
// 将结果写入ck
withCityNameDS.print(">>>");
withCityNameDS.addSink(
ClickHouseUtil.getJdbcSink("insert into dws_trans_org_truck_model_trans_finish_day_base values(?,?,?,?,?,?,?,?,?,?,?)")
);
env.execute();
}
}
CREATE TABLE IF NOT EXISTS dws_trans_org_truck_model_trans_finish_day_base
(
`cur_date` Date COMMENT '统计日期',
`org_id` String COMMENT '机构ID',
`org_name` String COMMENT '机构名称',
`truck_model_id` String COMMENT '卡车类型ID',
`truch_model_name` String COMMENT '卡车类型名称',
`city_id` String COMMENT '城市ID',
`city_name` String COMMENT '城市名称',
`trans_finish_count_base` UInt64 COMMENT '转运完成次数',
`trans_finish_distance_base` Decimal(38, 20) COMMENT '转运完成里程',
`trans_finish_dur_time_base` UInt64 COMMENT '转运完成历经时长',
`ts` UInt64 COMMENT '时间戳'
)
ENGINE = MergeTree
ORDER BY (cur_date, org_id, org_name, truck_model_id, truch_model_name, city_id, city_name);
CREATE MATERIALIZED VIEW IF NOT EXISTS dws_trans_org_truck_model_trans_finish_day
(
`cur_date` Date,
`org_id` String,
`org_name` String,
`truck_model_id` String,
`truch_model_name` String,
`city_id` String,
`city_name` String,
`trans_finish_count` AggregateFunction(argMax, UInt64, UInt64),
`trans_finish_distance` AggregateFunction(argMax, Decimal(38, 20), UInt64),
`trans_finish_dur_time` AggregateFunction(argMax, UInt64, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (cur_date, org_id, org_name, truck_model_id, truch_model_name, city_id, city_name)
SETTINGS index_granularity = 8192 AS
SELECT
cur_date,
org_id,
org_name,
truck_model_id,
truch_model_name,
city_id,
city_name,
argMaxState(trans_finish_count_base, ts) AS trans_finish_count,
argMaxState(trans_finish_distance_base, ts) AS trans_finish_distance,
argMaxState(trans_finish_dur_time_base, ts) AS trans_finish_dur_time
FROM dws_trans_org_truck_model_trans_finish_day_base
GROUP BY
cur_date,
org_id,
org_name,
truck_model_id,
truch_model_name,
city_id,
city_name;
我们要保证kafka中的topic的分区数,和程序中Flink设置的并行度一样都是4。
kafka-topics.sh --bootstrap-server hadoop102:9092 --alter --topic tms_dwd_trade_order_detail --partitions 4
kafka-topics.sh --bootstrap-server hadoop102:9092 --alter --topic tms_dwd_trans_bound_finish_detail --partitions 4
kafka-topics.sh --bootstrap-server hadoop102:9092 --alter --topic tms_dwd_trans_dispatch_detail --partitions 4
kafka-topics.sh --bootstrap-server hadoop102:9092 --alter --topic tms_dwd_trans_deliver_detail --partitions 4
kafka-topics.sh --bootstrap-server hadoop102:9092 --alter --topic tms_dwd_trans_trans_finish --partitions 4
将HDFS,zk,kf,hbase,redise和clickhouse全部即启动
因为我们要一次性测试7张表,所以我们要将ODS层和DWD层的四个文件全部启动。
OdsApp、DwdBoundRelevantApp、DwdOrderRelevantApp和DwdTransTransFinish。
你可以每次打开一个DWS层的应用,然后生成数据,查看CK数据库,也可以7个全部启动,只需要生成一次数据。有些数据可能不常见,要多生成几次。
测试代码
select
cur_date,
cargo_type,
cargo_type_name,
argMaxMerge(order_amount) as order_amount,
argMaxMerge(order_count) as order_count
from dws_trade_cargo_type_order_day
group by cur_date,
cargo_type,
cargo_type_name
LIMIT 10;
测试代码
select
cur_date,
org_id,
org_name,
city_id,
city_name,
argMaxMerge(order_amount) as order_amount,
argMaxMerge(order_count) as order_count
from dws_trade_org_order_day
group by cur_date,
org_id,
org_name,
city_id,
city_name
LIMIT 10;
测试代码
select
cur_date,
argMaxMerge(bound_finish_order_count) as bound_finish_order_count
from dws_trans_bound_finish_day
group by cur_date
LIMIT 10;
测试代码
select
cur_date,
argMaxMerge(dispatch_order_count) as dispatch_order_count
from dws_trans_dispatch_day
group by cur_date
LIMIT 10;
测试代码
SELECT
cur_date,
org_id,
org_name,
city_id,
city_name,
province_id,
province_name,
argMaxMerge(deliver_suc_count) AS deliver_suc_count
FROM dws_trans_org_deliver_suc_day
GROUP BY
cur_date,
org_id,
org_name,
city_id,
city_name,
province_id,
province_name
LIMIT 10;
测试代码
SELECT
cur_date,
org_id,
org_name,
city_id,
city_name,
province_id,
province_name,
argMaxMerge(receive_order_count) AS receive_order_count
FROM dws_trans_org_receive_day
GROUP BY
cur_date,
org_id,
org_name,
city_id,
city_name,
province_id,
province_name
LIMIT 10;
测试代码
SELECT
cur_date,
org_id,
org_name,
truck_model_id,
truch_model_name,
city_id,
city_name,
argMaxMerge(trans_finish_count) AS trans_finish_count,
argMaxMerge(trans_finish_distance) AS trans_finish_distance,
argMaxMerge(trans_finish_dur_time) AS trans_finish_dur_time
FROM dws_trans_org_truck_model_trans_finish_day
GROUP BY
cur_date,
org_id,
org_name,
truck_model_id,
truch_model_name,
city_id,
city_name
LIMIT 10;
至此实时数仓的DWS层就搭建完毕了,并且代码已经全度推到了github上。
URL:https://github.com/lcc-666/tms-parent