What is the formula for projection onto spectraplex?












3












$begingroup$


A spectraplex (special case of spectrahedron) is the set of all positive semi-definite matrices whose trace is equal to one. Formally, let
$$
S={textbf{W} in mathbb{R}^{d times d} mid textbf{W} succeq 0, text{Tr}(textbf{W})=1}
$$



On page 7, lemma 2 in On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA, the papers says "it is well known that the projection of $textbf{W}$ onto $S$ is given by"
$$
Pi_S[textbf{W}]=(lambda_1(textbf{W})-lambda)w'w'^T+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }y_iy_i^T
$$



where $(w',y_2,y_3,cdots,y_d)$ are eigenvectors of $textbf{W}$ corresponding to descending eigenvalues such that



$$
lambda_1(textbf{W})-lambda+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }=1
$$

for some $lambda geq 0$.



Where the above comes from?



Is there any reference that explains such projections like this?










share|cite|improve this question









$endgroup$












  • $begingroup$
    I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
    $endgroup$
    – Brian Borchers
    Jan 5 at 16:15
















3












$begingroup$


A spectraplex (special case of spectrahedron) is the set of all positive semi-definite matrices whose trace is equal to one. Formally, let
$$
S={textbf{W} in mathbb{R}^{d times d} mid textbf{W} succeq 0, text{Tr}(textbf{W})=1}
$$



On page 7, lemma 2 in On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA, the papers says "it is well known that the projection of $textbf{W}$ onto $S$ is given by"
$$
Pi_S[textbf{W}]=(lambda_1(textbf{W})-lambda)w'w'^T+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }y_iy_i^T
$$



where $(w',y_2,y_3,cdots,y_d)$ are eigenvectors of $textbf{W}$ corresponding to descending eigenvalues such that



$$
lambda_1(textbf{W})-lambda+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }=1
$$

for some $lambda geq 0$.



Where the above comes from?



Is there any reference that explains such projections like this?










share|cite|improve this question









$endgroup$












  • $begingroup$
    I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
    $endgroup$
    – Brian Borchers
    Jan 5 at 16:15














3












3








3





$begingroup$


A spectraplex (special case of spectrahedron) is the set of all positive semi-definite matrices whose trace is equal to one. Formally, let
$$
S={textbf{W} in mathbb{R}^{d times d} mid textbf{W} succeq 0, text{Tr}(textbf{W})=1}
$$



On page 7, lemma 2 in On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA, the papers says "it is well known that the projection of $textbf{W}$ onto $S$ is given by"
$$
Pi_S[textbf{W}]=(lambda_1(textbf{W})-lambda)w'w'^T+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }y_iy_i^T
$$



where $(w',y_2,y_3,cdots,y_d)$ are eigenvectors of $textbf{W}$ corresponding to descending eigenvalues such that



$$
lambda_1(textbf{W})-lambda+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }=1
$$

for some $lambda geq 0$.



Where the above comes from?



Is there any reference that explains such projections like this?










share|cite|improve this question









$endgroup$




A spectraplex (special case of spectrahedron) is the set of all positive semi-definite matrices whose trace is equal to one. Formally, let
$$
S={textbf{W} in mathbb{R}^{d times d} mid textbf{W} succeq 0, text{Tr}(textbf{W})=1}
$$



On page 7, lemma 2 in On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA, the papers says "it is well known that the projection of $textbf{W}$ onto $S$ is given by"
$$
Pi_S[textbf{W}]=(lambda_1(textbf{W})-lambda)w'w'^T+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }y_iy_i^T
$$



where $(w',y_2,y_3,cdots,y_d)$ are eigenvectors of $textbf{W}$ corresponding to descending eigenvalues such that



$$
lambda_1(textbf{W})-lambda+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }=1
$$

for some $lambda geq 0$.



Where the above comes from?



Is there any reference that explains such projections like this?







eigenvalues-eigenvectors convex-optimization positive-semidefinite projection






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Dec 27 '18 at 3:46









SaeedSaeed

928310




928310












  • $begingroup$
    I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
    $endgroup$
    – Brian Borchers
    Jan 5 at 16:15


















  • $begingroup$
    I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
    $endgroup$
    – Brian Borchers
    Jan 5 at 16:15
















$begingroup$
I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
$endgroup$
– Brian Borchers
Jan 5 at 16:15




$begingroup$
I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
$endgroup$
– Brian Borchers
Jan 5 at 16:15










1 Answer
1






active

oldest

votes


















2












$begingroup$

First, note that no such $lambda$ exists if



$sum_{i=1}^{d} lambda_{i} < 1$.



For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.



I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.



In the following, we'll assume that



$sum_{i=1}^{d} lambda_{i} geq 1$.



Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as



$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$



or



$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$



where



$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.



We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is



$langle A, B rangle = mbox{tr}(AB)$



and the norm is



$| A |_{F}=sqrt{langle A, A rangle}$.



Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.



$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$



$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.



Also,



$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$
.



The desired projection is



$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$



subject to the constraints



$mbox{tr}(Y)=1$



$Y succeq 0$.



Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.



$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$



Now, by the Pythagorean theorem,



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.



Using the orthogonality of the $Y^{(i)}$ matrices, we get that



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since $W perp Y^{(0)}$,



$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since



$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,



and



$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,



using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.



We've now reduced the problem to



$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$



subject to



$sum_{i=1}^{d} alpha_{i}=1$



$alpha geq 0$.



It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.



If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.



In the case $alpha_{i}>0$ and $nu_{i}=0$, we have



$-2(lambda_{i}-alpha_{i})=mu$



or



$(lambda_{i}-alpha_{i})=mu/2$.



In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint



$sum_{alpha_{i}>0} alpha_{i}=1$



to get



$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$



Adding in the terms for $alpha_{i}=0$, we get



$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.



Once $mu$ has been determined, let



$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Wow, it is a very rigorous proof.
    $endgroup$
    – Sepide
    Jan 7 at 19:01













Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3053571%2fwhat-is-the-formula-for-projection-onto-spectraplex%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

First, note that no such $lambda$ exists if



$sum_{i=1}^{d} lambda_{i} < 1$.



For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.



I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.



In the following, we'll assume that



$sum_{i=1}^{d} lambda_{i} geq 1$.



Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as



$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$



or



$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$



where



$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.



We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is



$langle A, B rangle = mbox{tr}(AB)$



and the norm is



$| A |_{F}=sqrt{langle A, A rangle}$.



Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.



$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$



$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.



Also,



$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$
.



The desired projection is



$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$



subject to the constraints



$mbox{tr}(Y)=1$



$Y succeq 0$.



Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.



$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$



Now, by the Pythagorean theorem,



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.



Using the orthogonality of the $Y^{(i)}$ matrices, we get that



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since $W perp Y^{(0)}$,



$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since



$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,



and



$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,



using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.



We've now reduced the problem to



$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$



subject to



$sum_{i=1}^{d} alpha_{i}=1$



$alpha geq 0$.



It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.



If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.



In the case $alpha_{i}>0$ and $nu_{i}=0$, we have



$-2(lambda_{i}-alpha_{i})=mu$



or



$(lambda_{i}-alpha_{i})=mu/2$.



In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint



$sum_{alpha_{i}>0} alpha_{i}=1$



to get



$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$



Adding in the terms for $alpha_{i}=0$, we get



$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.



Once $mu$ has been determined, let



$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Wow, it is a very rigorous proof.
    $endgroup$
    – Sepide
    Jan 7 at 19:01


















2












$begingroup$

First, note that no such $lambda$ exists if



$sum_{i=1}^{d} lambda_{i} < 1$.



For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.



I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.



In the following, we'll assume that



$sum_{i=1}^{d} lambda_{i} geq 1$.



Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as



$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$



or



$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$



where



$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.



We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is



$langle A, B rangle = mbox{tr}(AB)$



and the norm is



$| A |_{F}=sqrt{langle A, A rangle}$.



Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.



$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$



$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.



Also,



$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$
.



The desired projection is



$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$



subject to the constraints



$mbox{tr}(Y)=1$



$Y succeq 0$.



Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.



$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$



Now, by the Pythagorean theorem,



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.



Using the orthogonality of the $Y^{(i)}$ matrices, we get that



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since $W perp Y^{(0)}$,



$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since



$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,



and



$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,



using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.



We've now reduced the problem to



$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$



subject to



$sum_{i=1}^{d} alpha_{i}=1$



$alpha geq 0$.



It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.



If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.



In the case $alpha_{i}>0$ and $nu_{i}=0$, we have



$-2(lambda_{i}-alpha_{i})=mu$



or



$(lambda_{i}-alpha_{i})=mu/2$.



In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint



$sum_{alpha_{i}>0} alpha_{i}=1$



to get



$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$



Adding in the terms for $alpha_{i}=0$, we get



$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.



Once $mu$ has been determined, let



$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Wow, it is a very rigorous proof.
    $endgroup$
    – Sepide
    Jan 7 at 19:01
















2












2








2





$begingroup$

First, note that no such $lambda$ exists if



$sum_{i=1}^{d} lambda_{i} < 1$.



For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.



I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.



In the following, we'll assume that



$sum_{i=1}^{d} lambda_{i} geq 1$.



Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as



$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$



or



$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$



where



$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.



We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is



$langle A, B rangle = mbox{tr}(AB)$



and the norm is



$| A |_{F}=sqrt{langle A, A rangle}$.



Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.



$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$



$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.



Also,



$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$
.



The desired projection is



$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$



subject to the constraints



$mbox{tr}(Y)=1$



$Y succeq 0$.



Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.



$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$



Now, by the Pythagorean theorem,



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.



Using the orthogonality of the $Y^{(i)}$ matrices, we get that



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since $W perp Y^{(0)}$,



$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since



$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,



and



$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,



using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.



We've now reduced the problem to



$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$



subject to



$sum_{i=1}^{d} alpha_{i}=1$



$alpha geq 0$.



It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.



If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.



In the case $alpha_{i}>0$ and $nu_{i}=0$, we have



$-2(lambda_{i}-alpha_{i})=mu$



or



$(lambda_{i}-alpha_{i})=mu/2$.



In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint



$sum_{alpha_{i}>0} alpha_{i}=1$



to get



$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$



Adding in the terms for $alpha_{i}=0$, we get



$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.



Once $mu$ has been determined, let



$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.






share|cite|improve this answer











$endgroup$



First, note that no such $lambda$ exists if



$sum_{i=1}^{d} lambda_{i} < 1$.



For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.



I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.



In the following, we'll assume that



$sum_{i=1}^{d} lambda_{i} geq 1$.



Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as



$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$



or



$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$



where



$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.



We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is



$langle A, B rangle = mbox{tr}(AB)$



and the norm is



$| A |_{F}=sqrt{langle A, A rangle}$.



Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.



$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$



$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.



Also,



$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$
.



The desired projection is



$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$



subject to the constraints



$mbox{tr}(Y)=1$



$Y succeq 0$.



Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.



$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$



Now, by the Pythagorean theorem,



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.



Using the orthogonality of the $Y^{(i)}$ matrices, we get that



$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since $W perp Y^{(0)}$,



$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.



Since



$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,



and



$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,



using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.



We've now reduced the problem to



$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$



subject to



$sum_{i=1}^{d} alpha_{i}=1$



$alpha geq 0$.



It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.



If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.



In the case $alpha_{i}>0$ and $nu_{i}=0$, we have



$-2(lambda_{i}-alpha_{i})=mu$



or



$(lambda_{i}-alpha_{i})=mu/2$.



In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint



$sum_{alpha_{i}>0} alpha_{i}=1$



to get



$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$



Adding in the terms for $alpha_{i}=0$, we get



$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.



Once $mu$ has been determined, let



$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Jan 6 at 18:20

























answered Jan 6 at 17:05









Brian BorchersBrian Borchers

5,94111119




5,94111119












  • $begingroup$
    Wow, it is a very rigorous proof.
    $endgroup$
    – Sepide
    Jan 7 at 19:01




















  • $begingroup$
    Wow, it is a very rigorous proof.
    $endgroup$
    – Sepide
    Jan 7 at 19:01


















$begingroup$
Wow, it is a very rigorous proof.
$endgroup$
– Sepide
Jan 7 at 19:01






$begingroup$
Wow, it is a very rigorous proof.
$endgroup$
– Sepide
Jan 7 at 19:01




















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3053571%2fwhat-is-the-formula-for-projection-onto-spectraplex%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

An IMO inspired problem

Management

Has there ever been an instance of an active nuclear power plant within or near a war zone?