我正在寻找一种方法,为我在Postgres中所有的表找到行数。我知道我可以一次做一张表:

SELECT count(*) FROM table_name;

但我想看看所有表的行数,然后按它排序,以了解所有表的大小。


当前回答

这里有一个更简单的方法。

tables="$(echo '\dt' | psql -U "${PGUSER}" | tail -n +4 | head -n-2 | tr -d ' ' | cut -d '|' -f2)"
for table in $tables; do
printf "%s: %s\n" "$table" "$(echo "SELECT COUNT(*) FROM $table;" | psql -U "${PGUSER}" | tail -n +3 | head -n-2 | tr -d ' ')"
done

输出应该如下所示

auth_group: 0
auth_group_permissions: 0
auth_permission: 36
auth_user: 2
auth_user_groups: 0
auth_user_user_permissions: 0
authtoken_token: 2
django_admin_log: 0
django_content_type: 9
django_migrations: 22
django_session: 0
mydata_table1: 9011
mydata_table2: 3499

你可以根据需要更新psql -U "${PGUSER}"部分来访问你的数据库

注意,head -n-2语法可能在macOS中不起作用,你可以使用不同的实现

在CentOS 7下的psql (PostgreSQL) 11.2上测试


如果你想按表排序,那就用sort来包装它

for table in $tables; do
printf "%s: %s\n" "$table" "$(echo "SELECT COUNT(*) FROM $table;" | psql -U "${PGUSER}" | tail -n +3 | head -n-2 | tr -d ' ')"
done | sort -k 2,2nr

输出;

mydata_table1: 9011
mydata_table2: 3499
auth_permission: 36
django_migrations: 22
django_content_type: 9
authtoken_token: 2
auth_user: 2
auth_group: 0
auth_group_permissions: 0
auth_user_groups: 0
auth_user_user_permissions: 0
django_admin_log: 0
django_session: 0

其他回答

简单的两步:(注意:不需要改变任何东西-只是复制粘贴) 1. 创建函数

create function 
cnt_rows(schema text, tablename text) returns integer
as
$body$
declare
  result integer;
  query varchar;
begin
  query := 'SELECT count(1) FROM ' || schema || '.' || tablename;
  execute query into result;
  return result;
end;
$body$
language plpgsql;

2. 运行此查询获取所有表的行数

select sum(cnt_rows) as total_no_of_rows from (select 
  cnt_rows(table_schema, table_name)
from information_schema.tables
where 
  table_schema not in ('pg_catalog', 'information_schema') 
  and table_type='BASE TABLE') as subq;

或 按表获取行数

select
  table_schema,
  table_name, 
  cnt_rows(table_schema, table_name)
from information_schema.tables
where 
  table_schema not in ('pg_catalog', 'information_schema') 
  and table_type='BASE TABLE'
order by 3 desc;

有三种方法可以得到这种计数,每种方法都有各自的权衡。

If you want a true count, you have to execute the SELECT statement like the one you used against each table. This is because PostgreSQL keeps row visibility information in the row itself, not anywhere else, so any accurate count can only be relative to some transaction. You're getting a count of what that transaction sees at the point in time when it executes. You could automate this to run against every table in the database, but you probably don't need that level of accuracy or want to wait that long.

WITH tbl AS
  (SELECT table_schema,
          TABLE_NAME
   FROM information_schema.tables
   WHERE TABLE_NAME not like 'pg_%'
     AND table_schema in ('public'))
SELECT table_schema,
       TABLE_NAME,
       (xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, TABLE_NAME), FALSE, TRUE, '')))[1]::text::int AS rows_n
FROM tbl
ORDER BY rows_n DESC;

第二种方法指出,统计信息收集器在任何时候大致跟踪有多少行是“活动的”(没有被后来的更新删除或废弃)。这个值在剧烈活动时可能会偏离一点,但通常是一个很好的估计:

SELECT schemaname,relname,n_live_tup 
  FROM pg_stat_user_tables 
ORDER BY n_live_tup DESC;

这还可以显示有多少行已死,这本身就是一个值得监视的有趣数字。

第三种方法是注意到系统ANALYZE命令,从PostgreSQL 8.3开始由autovacuum进程定期执行以更新表统计信息,它也计算行估计。你可以像这样抓取它:

SELECT 
  nspname AS schemaname,relname,reltuples
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE 
  nspname NOT IN ('pg_catalog', 'information_schema') AND
  relkind='r' 
ORDER BY reltuples DESC;

很难说使用这些查询中哪一个更好。通常,我根据是否有更多有用的信息也想在pg_class中使用,还是在pg_stat_user_tables中使用来做出决定。出于基本的计数目的,只是为了了解事物的总体大小,这两种方法都应该足够准确。

对于那些试图评估他们需要哪一个Heroku计划,又不能等待Heroku的慢行计数器刷新的人来说,一个简单实用的答案是:

基本上你想在psql中运行\dt,将结果复制到你最喜欢的文本编辑器中(它看起来像这样:

 public | auth_group                     | table | axrsosvelhutvw
 public | auth_group_permissions         | table | axrsosvelhutvw
 public | auth_permission                | table | axrsosvelhutvw
 public | auth_user                      | table | axrsosvelhutvw
 public | auth_user_groups               | table | axrsosvelhutvw
 public | auth_user_user_permissions     | table | axrsosvelhutvw
 public | background_task                | table | axrsosvelhutvw
 public | django_admin_log               | table | axrsosvelhutvw
 public | django_content_type            | table | axrsosvelhutvw
 public | django_migrations              | table | axrsosvelhutvw
 public | django_session                 | table | axrsosvelhutvw
 public | exercises_assignment           | table | axrsosvelhutvw

),然后运行regex搜索并替换,如下所示:

^[^|]*\|\s+([^|]*?)\s+\| table \|.*$

to:

select '\1', count(*) from \1 union/g

这将会给你一个非常类似的结果:

select 'auth_group', count(*) from auth_group union
select 'auth_group_permissions', count(*) from auth_group_permissions union
select 'auth_permission', count(*) from auth_permission union
select 'auth_user', count(*) from auth_user union
select 'auth_user_groups', count(*) from auth_user_groups union
select 'auth_user_user_permissions', count(*) from auth_user_user_permissions union
select 'background_task', count(*) from background_task union
select 'django_admin_log', count(*) from django_admin_log union
select 'django_content_type', count(*) from django_content_type union
select 'django_migrations', count(*) from django_migrations union
select 'django_session', count(*) from django_session
;

(您需要删除最后一个联合,并手动在末尾添加分号)

在psql中运行它,就完成了。

            ?column?            | count
--------------------------------+-------
 auth_group_permissions         |     0
 auth_user_user_permissions     |     0
 django_session                 |  1306
 django_content_type            |    17
 auth_user_groups               |   162
 django_admin_log               |  9106
 django_migrations              |    19
[..]

如果您在psql shell中,使用\gexec允许您执行syed的答案和Aur的答案中描述的语法,而无需在外部文本编辑器中手动编辑。

with x (y) as (
    select
        'select count(*), '''||
        tablename||
        ''' as "tablename" from '||
        tablename||' '
    from pg_tables
    where schemaname='public'
)
select
    string_agg(y,' union all '||chr(10)) || ' order by tablename'
from x \gexec

注意,string_agg()既用于分隔所有语句之间的联合,也用于将分隔的数据箭头粉碎为一个单元,以便传递到缓冲区。

\ gexec 将当前查询缓冲区发送到服务器,然后将查询输出的每一行的每一列(如果有的话)视为要执行的SQL语句。

不确定bash中的答案对您来说是否可以接受,但FWIW…

PGCOMMAND=" psql -h localhost -U fred -d mydb -At -c \"
            SELECT   table_name
            FROM     information_schema.tables
            WHERE    table_type='BASE TABLE'
            AND      table_schema='public'
            \""
TABLENAMES=$(export PGPASSWORD=test; eval "$PGCOMMAND")

for TABLENAME in $TABLENAMES; do
    PGCOMMAND=" psql -h localhost -U fred -d mydb -At -c \"
                SELECT   '$TABLENAME',
                         count(*) 
                FROM     $TABLENAME
                \""
    eval "$PGCOMMAND"
done