机器学习约束结果_机器学习与约束理论

最新推荐文章于 2021-09-09 20:59:22 发布

cullen2012

最新推荐文章于 2021-09-09 20:59:22 发布

阅读量470

点赞数

文章标签：算法机器学习深度学习人工智能 java

原文链接：https://habr.com/en/post/463367/

版权

机器学习约束结果

Should we sacrifice accuracy to simplify our model? No. Large number of features may result in unstable prediction and overfit. Dimension can be reduced with increase of accuracy. Complex system is described by small number of parameters due to the nonlinear tie-up. Non-linearity means simplification, not complication. This idea was remarked by Eliyahu Goldratt – founder of TOC. If acquisition, conversion are closely connected – than linear decomposition is not possible. We may select a single parameter (acquisition or conversion) — an optimum of efficiency/cost.

我们应该牺牲准确性来简化模型吗？否。大量特征可能会导致不稳定的预测和过度拟合。尺寸可以随着精度的提高而减小。由于非线性关联，用少量参数描述复杂的系统。非线性意味着简化而不是复杂。 TOC的创始人Eliyahu Goldratt表示了这一想法。如果进行采集，则转换紧密相连–则不可能进行线性分解。我们可以选择一个参数(获取或转换)-效率/成本的最佳选择。

Machine learning (ML) models open a window for realistic dimension reduction. It gives an outlook how to get the most effective strategy. An unambiguous {strategy<=>metric} mapping simplifies the task. ML combines feature weighting with realistic nonlinear (!) outputs like logistic function or neural network. The approach is based on practical question – what set of metrics/strategies is enough to predict the business goal with acceptable accuracy like 90%? Strategy (S1) efficiency is supposed to be a function of its metric (M1) prediction power.

机器学习(ML)模型打开了一个减少实际尺寸的窗口。它给出了如何获得最有效策略的前景。明确的{strategy <=> metric}映射简化了任务。 ML将特征加权与现实的非线性(！)输出(例如逻辑函数或神经网络)结合在一起。该方法基于实际问题–哪一组度量标准/策略足以以可接受的准确性(例如90％)预测业务目标？策略(S1)效率被认为是其度量(M1)预测能力的函数。

We have the iterative process. First: analyze the influence of a full group of metrics. Next iteration includes throwing away one of N metrics with minimum affect on prediction power. N combinations have to be tested to get the optimum in the first iteration. Part of mixed historical data is used to get prediction for the out-of-sample part of data. Iterative process goes on until the acceptable threshold of accuracy is reached. The same (!) ML model is used in all iterations. According to the TOC theory there should be a single constraint in each moment that limits business most. Therefore iterative process should be stopped at N=1. N! is the complexity of algo with N=1 stop condition.

我们有反复的过程。首先：分析整套指标的影响。下一迭代包括丢弃对预测能力影响最小的N个度量之一。必须测试N个组合才能在第一次迭代中获得最佳效果。混合历史数据的一部分用于获取数据样本外部分的预测。迭代过程一直进行到达到可接受的精度阈值为止。在所有迭代中都使用相同的(！)ML模型。根据TOC理论，每个时刻都应该有一个单一的约束条件，该约束条件会最大程度地限制业务。因此，迭代过程应在N = 1处停止。 N！是N = 1停止条件时算法的复杂度。

It seems that ML requires extra Big Data. However we may divide the target range into the prediction intervals: ROI=(10%-20%),(20%-30%), e.t.c. Less intervals — less records/data is needed to apply ML. If accuracy threshold is reached before N=1 there are 2 ways. First: weights can be required for N constraints/metrics. Second: less intervals and rougher binarization. The example of web strategy evaluation is given here. Some peace of Jupyter code is given here. If dimension is sufficiently reduced, online/offline simple and stable seperation is possible. We know online weight and target.

ML似乎需要额外的大数据。但是，我们可以将目标范围划分为预测间隔：ROI =(10％-20％)，(20％-30％)，等等。间隔更短-应用ML所需的记录/数据更少。如果在N = 1之前达到准确性阈值，则有2种方法。第一：N个约束/度量可能需要权重。第二：更小的间隔和更粗糙的二值化。网络战略评估的例子给出了这里。这里给出了一些Jupyter代码的安全性。如果尺寸被充分减小，则在线/离线简单稳定的分离是可能的。我们知道在线体重和目标。

In this case the target (profit margin,offers,ROI) can be represented in the following way:

在这种情况下，可以通过以下方式表示目标(利润率，要约，ROI)：