考虑下面四个百分比,用浮点数表示:

    13.626332%
    47.989636%
     9.596008%
    28.788024%
   -----------
   100.000000%

我需要用整数表示这些百分比。如果我简单地使用Math.round(),我最终得到的总数是101%。

14 + 48 + 10 + 29 = 101

如果我使用parseInt(),我最终得到了97%。

13 + 47 + 9 + 28 = 97

有什么好的算法可以将任何百分比数表示为整数,同时还保持总数为100%?


编辑:在阅读了一些评论和回答后,显然有很多方法可以解决这个问题。

在我看来,为了保持数字的真实性,“正确”的结果是最小化总体误差的结果,定义为相对于实际值会引入多少误差舍入:

        value  rounded     error               decision
   ----------------------------------------------------
    13.626332       14      2.7%          round up (14)
    47.989636       48      0.0%          round up (48)
     9.596008       10      4.0%    don't round up  (9)
    28.788024       29      2.7%          round up (29)

在平局的情况下(3.33,3.33,3.33)可以做出任意的决定(例如3,4,3)。


当前回答

我已经实现了Varun Vohra的答案在这里的列表和字典的方法。

import math
import numbers
import operator
import itertools


def round_list_percentages(number_list):
    """
    Takes a list where all values are numbers that add up to 100,
    and rounds them off to integers while still retaining a sum of 100.

    A total value sum that rounds to 100.00 with two decimals is acceptable.
    This ensures that all input where the values are calculated with [fraction]/[total]
    and the sum of all fractions equal the total, should pass.
    """
    # Check input
    if not all(isinstance(i, numbers.Number) for i in number_list):
        raise ValueError('All values of the list must be a number')

    # Generate a key for each value
    key_generator = itertools.count()
    value_dict = {next(key_generator): value for value in number_list}
    return round_dictionary_percentages(value_dict).values()


def round_dictionary_percentages(dictionary):
    """
    Takes a dictionary where all values are numbers that add up to 100,
    and rounds them off to integers while still retaining a sum of 100.

    A total value sum that rounds to 100.00 with two decimals is acceptable.
    This ensures that all input where the values are calculated with [fraction]/[total]
    and the sum of all fractions equal the total, should pass.
    """
    # Check input
    # Only allow numbers
    if not all(isinstance(i, numbers.Number) for i in dictionary.values()):
        raise ValueError('All values of the dictionary must be a number')
    # Make sure the sum is close enough to 100
    # Round value_sum to 2 decimals to avoid floating point representation errors
    value_sum = round(sum(dictionary.values()), 2)
    if not value_sum == 100:
        raise ValueError('The sum of the values must be 100')

    # Initial floored results
    # Does not add up to 100, so we need to add something
    result = {key: int(math.floor(value)) for key, value in dictionary.items()}

    # Remainders for each key
    result_remainders = {key: value % 1 for key, value in dictionary.items()}
    # Keys sorted by remainder (biggest first)
    sorted_keys = [key for key, value in sorted(result_remainders.items(), key=operator.itemgetter(1), reverse=True)]

    # Otherwise add missing values up to 100
    # One cycle is enough, since flooring removes a max value of < 1 per item,
    # i.e. this loop should always break before going through the whole list
    for key in sorted_keys:
        if sum(result.values()) == 100:
            break
        result[key] += 1

    # Return
    return result

其他回答

下面是一个实现了最大余数方法的Ruby宝石: https://github.com/jethroo/lare_round

使用方法:

a =  Array.new(3){ BigDecimal('0.3334') }
# => [#<BigDecimal:887b6c8,'0.3334E0',9(18)>, #<BigDecimal:887b600,'0.3334E0',9(18)>, #<BigDecimal:887b4c0,'0.3334E0',9(18)>]
a = LareRound.round(a,2)
# => [#<BigDecimal:8867330,'0.34E0',9(36)>, #<BigDecimal:8867290,'0.33E0',9(36)>, #<BigDecimal:88671f0,'0.33E0',9(36)>]
a.reduce(:+).to_f
# => 1.0

只要您不关心对原始十进制数据的依赖,就有许多方法可以做到这一点。

第一种也是最流行的方法是最大余数法

基本上就是:

四舍五入 求sum和100的差值 将差值按小数部分的递减顺序加1

在你的例子中,它是这样的:

13.626332%
47.989636%
 9.596008%
28.788024%

如果取整数部分,就得到

13
47
 9
28

加起来是97,再加3。现在,你看小数点部分

.626332%
.989636%
.596008%
.788024%

取最大的,直到总数达到100。所以你会得到:

14
48
 9
29

或者,您可以简单地选择显示一个小数位而不是整数值。所以数字是48.3和23.9等等。这会使方差从100下降很多。

可能做到这一点的“最佳”方法(引用是因为“最佳”是一个主观术语)是保持你所处位置的连续(非积分)计数,并四舍五入该值。

然后将其与历史记录一起使用,以确定应该使用什么值。例如,使用您给出的值:

Value      CumulValue  CumulRounded  PrevBaseline  Need
---------  ----------  ------------  ------------  ----
                                  0
13.626332   13.626332            14             0    14 ( 14 -  0)
47.989636   61.615968            62            14    48 ( 62 - 14)
 9.596008   71.211976            71            62     9 ( 71 - 62)
28.788024  100.000000           100            71    29 (100 - 71)
                                                    ---
                                                    100

在每个阶段,都不需要四舍五入数字本身。相反,将累积值四舍五入,并计算出从上一个基线中达到该值的最佳整数——该基线是前一行的累积值(四舍五入)。

这是可行的,因为您不会在每个阶段都丢失信息,而是更聪明地使用信息。“正确的”四舍五入值在最后一列,你可以看到它们的和是100。

在上面的第三个值中,您可以看到这与盲目舍入每个值之间的区别。虽然9.596008通常会四舍五入到10,但累积的71.211976正确地四舍五入到71 -这意味着只需要9就可以加上之前的基线62。


这也适用于“有问题的”序列,比如三个大约1/3的值,其中一个应该四舍五入:

Value      CumulValue  CumulRounded  PrevBaseline  Need
---------  ----------  ------------  ------------  ----
                                  0
33.333333   33.333333            33             0    33 ( 33 -  0)
33.333333   66.666666            67            33    34 ( 67 - 33)
33.333333   99.999999           100            67    33 (100 - 67)
                                                    ---
                                                    100

对于那些在熊猫系列中有百分比的人,这里是我的最大余数方法的实现(就像Varun Vohra的答案一样),在那里你甚至可以选择你想要四舍五入的小数。

import numpy as np

def largestRemainderMethod(pd_series, decimals=1):

    floor_series = ((10**decimals * pd_series).astype(np.int)).apply(np.floor)
    diff = 100 * (10**decimals) - floor_series.sum().astype(np.int)
    series_decimals = pd_series - floor_series / (10**decimals)
    series_sorted_by_decimals = series_decimals.sort_values(ascending=False)

    for i in range(0, len(series_sorted_by_decimals)):
        if i < diff:
            series_sorted_by_decimals.iloc[[i]] = 1
        else:
            series_sorted_by_decimals.iloc[[i]] = 0

    out_series = ((floor_series + series_sorted_by_decimals) / (10**decimals)).sort_values(ascending=False)

    return out_series

我的JS实现由Varun Vohra投票的答案

const set1 = [13.626332, 47.989636, 9.596008, 28.788024];
// const set2 = [24.25, 23.25, 27.25, 25.25];

const values = set1;

console.log('Total: ', values.reduce((accum, each) => accum + each));
console.log('Incorrectly Rounded: ', 
  values.reduce((accum, each) => accum + Math.round(each), 0));

const adjustValues = (values) => {
  // 1. Separate integer and decimal part
  // 2. Store both in a new array of objects sorted by decimal part descending
  // 3. Add in original position to "put back" at the end
  const flooredAndSortedByDecimal = values.map((value, position) => (
    {
        floored: Math.floor(value),
        decimal: value - Number.parseInt(value),
        position
    }
  )).sort(({decimal}, {decimal: otherDecimal}) => otherDecimal - decimal);

  const roundedTotal = values.reduce((total, value) => total + Math.floor(value), 0);
  let availableForDistribution = 100 - roundedTotal;

  // Add 1 to each value from what's available
  const adjustedValues = flooredAndSortedByDecimal.map(value => {
    const { floored, ...rest } = value;
    let finalPercentage = floored;
    if(availableForDistribution > 0){
        finalPercentage = floored + 1;
        availableForDistribution--;
    }

    return {
        finalPercentage,
        ...rest
    }
  });

  // Put back and return the new values
  return adjustedValues
    .sort(({position}, {position: otherPosition}) => position - otherPosition)
    .map(({finalPercentage}) => finalPercentage);
}

const finalPercentages = adjustValues(values);
console.log({finalPercentages})

// { finalPercentage: [14, 48, 9, 29]}