`@HaomingJiang`

`2018-07-12T16:46:10.000000Z`

`字数 1769`

`阅读 830`

This paper proposed a new algorithm for solving the generalized parametric additive models with convergence guarantee. The algorithm solves large scale problem by adopting doubly stochastic optimization and non-orthogonal basis. The convergence analysis is provided. I feel that the contribution is not substantial enough.

- They investigate the usage of non-orthogonal basis and achieve great empirical results.
- They used the Doubly stochastic gradient to solve the problem of massive parameters and samples.

- The relaxation from (2) to (3) is not very sound to me.
- Wrong Template: "Submitted to 31st Conference on Neural Information Processing Systems (NIPS 2017)"
- The batch size of DSG is not provided.
- The way of generating irrelevant features is not very sound to me. For example, when there exists correlation between features, can the algorithm still perform well? Should not we consider real data that is naturally high dimensional?
- The convergence analysis is based on the assumption of strong convexity and Lipschitz smoothness. That can be directly concluded from Zhao et al. [24] and can not be considered as an important contribution of this paper.

This paper proposed a new screening method for sparse conditional random field. They introduce a dynamic screening method to accelerate the computation based on the dual optimum estimation technique.

- Authors carefully explore the structure of the dual problem and proposed a dual optimum estimation.
- The novelty of introducing dynamic screening method to sparse CRF.

- In Lemma3 (iii), what is the meaning of theta_2 ?
- Lack of comparison with other sparse CRF algorithms and screening method. (i.e. static screening method)
- The numbers in Table 1 is not clear.
- The meaning of rejection ratio is not clear. Shouldn't we consider the irrelevant features given by the algorithm with the screening versus the one without the screening?
- What is "the general training algorithms"? Reference should be given. Does the framework also work for other algorithms?

添加新批注

在作者公开此批注前，只有你和作者可见。

回复批注