自定义

BI集成
我们感动!为了改善客户体验,Collibra数据质量用户指南已转移到新万博移动客户端 新万博移动客户端Collibra文档中心 作为Collibra数据质量2新万博移动客户端022.11版本的一部分。为了确保平稳过渡, dq-docs.新万博移动客户端collibra.com 仍然可以访问,但DQ用户指南现在只在文档中心进行维护。
可以通过将您最喜欢的BI工具连接到底层报表集市来利用自定义报表。下面是一些可以作为构建自己的报告灵感的查询。详情请参阅 ERD图 获取更大的表列表。
长期运行的作业
从dataset_activity中选择dataset,run_id,total_time,其中total_time不为空
提交的工作
从owlcheck_q中选择*
按用户分类的工作
选择count(*) as owlchecks,用户名from owlcheck_q where updt_ts < now() group by username order by owlchecks desc
按用户、数据集划分的作业
在dev.public中选择count(*), user_nm, dataset。Owl_check_history组按user_nm排列,数据集按计数顺序排列
按行数最大
Select dataset,rc as row_count from dataset_scan order by rc desc
按月分类的工作岗位
用grp as (select date_trunc('MONTH', run_id) as by_month from dataset_scan where run_id < now()) select count(*) as owlchecks, by_month from grp group by by_month order by by_month desc
用户划分规则
选择count(*)作为规则,从owl_rule组中选择user_nm by user_nm order by rules desc
按Spark(Cluster)使用率
Select * from opt_spark order by num_executors desc
来自代理的job id
从agent_q中选择remote_job_id,其中remote_job_id不为空
数据集的活动
从dataset_activity中选择dataset,run_id,total_time,其中total_time不为空
具有丰富指标的工作
select A.dataset, A.run_id, total_time from dataset_activity where total_time is not null order by total_time desc limit 100),扫描为(select * from dataset_scan where dataset in (select dataset from activity)),配置为(select * from opt_spark where dataset in (select dataset from activity)),模式为(select count(*) as col_cnt, dataset from dataset_schema where dataset in (select dataset from activity) group by dataset) select A.dataset, A.run_id, C.total_time, A.rc,D.col_cnt, B.driver_memory, B.num_executor,B. col_cnt, B.driver_memoryA. executor_cores, B.executor_memory, B.master FROM扫描A. INNER JOIN配置B ON A.dataset = B.dataset INNER JOIN活动C ON A.dataset = C.dataset and A.run_id = C.run_id INNER JOIN模式D ON A.dataset = D.dataset ORDER BY C.total_time
就业机会。加载时间和资源狗万新闻c
活动为(select dataset,run_id,total_time from public.)Dataset_activity where total_time不是null order by total_time),扫描为(select * from public)。Dataset_scan where dataset in (select dataset from activity)),配置为(select * from public)。Opt_spark where dataset in (select dataset from activity)), schema as (select count(*) as col_cnt, dataset from public。select A.dataset, A.run_id, A.updt_ts, C.total_time, A.rc, D.col_cnt, B.driver_memory, B.num_executors,B. datet_schema where dataset in (select dataset from activity) group by dataset)A. executor_cores, B.executor_memory, B.master FROM扫描A. INNER JOIN配置B ON A.dataset = B.dataset INNER JOIN活动C ON A.dataset = C.dataset and A.run_id = C.run_id INNER JOIN模式D ON A.dataset = D.dataset ORDER BY A.updt_ts desc limit
数据集扫描和评分模式
从public中选择*。Dataset_scan where dataset like 'public.%';
数据集扫描和按名称评分
从public中选择*。Dataset_scan where dataset ='public.atm_customer';
按模式按月扫描-“公共”
select dataset, DATE_TRUNC('MONTH', run_id)作为run_id, count(*)作为Total_Scans from dataset_scan where dataset like 'public%' group by dataset, run_id order by run_id asc
规则超过30天
select * from rule_output where run_id < NOW() - INTERVAL '30天';
计划作业队列
从public.owlcheck_q中选择job_id,agent_id,dataset,run_id,status,activity,start_time;
数据集架构中的列计数
Select dataset, count(*) from dataset_schema group by dataset;
分析统计数据
select dataset, run_id, field_nm, (null_ratio * 100) as null_percent, (empty_ratio * 100) as empty_percent, ROUND(CAST((100 - (null_ratio * 100) + (empty_ratio * 100)) as numeric), 3) as完整性from public。dataset_field where updt_ts > '2020-06-01'和dataset = 'ProcessOrder'和run_id > '2021-03-17 00:00:00+00'顺序根据完整性描述
元数据/模式/数据类型
Select * from public.dataset_schema;
配置文件数据
Select * from public.dataset_field;
查找相似的列
select不同的数据集,field_nm, max_abs from dataset_field where max_abs = '无线电信'
相同的列名
从dataset_field中选择不同的数据集,field_nm = 'authenticated_user'
类似的列名
从dataset_field中选择不同的数据集field_nm,如“%id%”
行为研究
从数据集='esg_data'的行为中选择*
Postgres Stats中Schema的所有列
SELECT table_name FROM information_schematable WHERE table_schema = 'public' ORDER BY table_name