What is the formula for projection onto spectraplex?
$begingroup$
A spectraplex (special case of spectrahedron) is the set of all positive semi-definite matrices whose trace is equal to one. Formally, let
$$
S={textbf{W} in mathbb{R}^{d times d} mid textbf{W} succeq 0, text{Tr}(textbf{W})=1}
$$
On page 7, lemma 2 in On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA, the papers says "it is well known that the projection of $textbf{W}$ onto $S$ is given by"
$$
Pi_S[textbf{W}]=(lambda_1(textbf{W})-lambda)w'w'^T+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }y_iy_i^T
$$
where $(w',y_2,y_3,cdots,y_d)$ are eigenvectors of $textbf{W}$ corresponding to descending eigenvalues such that
$$
lambda_1(textbf{W})-lambda+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }=1
$$
for some $lambda geq 0$.
Where the above comes from?
Is there any reference that explains such projections like this?
eigenvalues-eigenvectors convex-optimization positive-semidefinite projection
$endgroup$
add a comment |
$begingroup$
A spectraplex (special case of spectrahedron) is the set of all positive semi-definite matrices whose trace is equal to one. Formally, let
$$
S={textbf{W} in mathbb{R}^{d times d} mid textbf{W} succeq 0, text{Tr}(textbf{W})=1}
$$
On page 7, lemma 2 in On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA, the papers says "it is well known that the projection of $textbf{W}$ onto $S$ is given by"
$$
Pi_S[textbf{W}]=(lambda_1(textbf{W})-lambda)w'w'^T+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }y_iy_i^T
$$
where $(w',y_2,y_3,cdots,y_d)$ are eigenvectors of $textbf{W}$ corresponding to descending eigenvalues such that
$$
lambda_1(textbf{W})-lambda+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }=1
$$
for some $lambda geq 0$.
Where the above comes from?
Is there any reference that explains such projections like this?
eigenvalues-eigenvectors convex-optimization positive-semidefinite projection
$endgroup$
$begingroup$
I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
$endgroup$
– Brian Borchers
Jan 5 at 16:15
add a comment |
$begingroup$
A spectraplex (special case of spectrahedron) is the set of all positive semi-definite matrices whose trace is equal to one. Formally, let
$$
S={textbf{W} in mathbb{R}^{d times d} mid textbf{W} succeq 0, text{Tr}(textbf{W})=1}
$$
On page 7, lemma 2 in On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA, the papers says "it is well known that the projection of $textbf{W}$ onto $S$ is given by"
$$
Pi_S[textbf{W}]=(lambda_1(textbf{W})-lambda)w'w'^T+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }y_iy_i^T
$$
where $(w',y_2,y_3,cdots,y_d)$ are eigenvectors of $textbf{W}$ corresponding to descending eigenvalues such that
$$
lambda_1(textbf{W})-lambda+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }=1
$$
for some $lambda geq 0$.
Where the above comes from?
Is there any reference that explains such projections like this?
eigenvalues-eigenvectors convex-optimization positive-semidefinite projection
$endgroup$
A spectraplex (special case of spectrahedron) is the set of all positive semi-definite matrices whose trace is equal to one. Formally, let
$$
S={textbf{W} in mathbb{R}^{d times d} mid textbf{W} succeq 0, text{Tr}(textbf{W})=1}
$$
On page 7, lemma 2 in On the Regret Minimization of Nonconvex Online Gradient Ascent for Online PCA, the papers says "it is well known that the projection of $textbf{W}$ onto $S$ is given by"
$$
Pi_S[textbf{W}]=(lambda_1(textbf{W})-lambda)w'w'^T+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }y_iy_i^T
$$
where $(w',y_2,y_3,cdots,y_d)$ are eigenvectors of $textbf{W}$ corresponding to descending eigenvalues such that
$$
lambda_1(textbf{W})-lambda+ sum_{i=2}^{d}max{0,lambda_i(textbf{W})-lambda }=1
$$
for some $lambda geq 0$.
Where the above comes from?
Is there any reference that explains such projections like this?
eigenvalues-eigenvectors convex-optimization positive-semidefinite projection
eigenvalues-eigenvectors convex-optimization positive-semidefinite projection
asked Dec 27 '18 at 3:46
SaeedSaeed
928310
928310
$begingroup$
I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
$endgroup$
– Brian Borchers
Jan 5 at 16:15
add a comment |
$begingroup$
I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
$endgroup$
– Brian Borchers
Jan 5 at 16:15
$begingroup$
I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
$endgroup$
– Brian Borchers
Jan 5 at 16:15
$begingroup$
I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
$endgroup$
– Brian Borchers
Jan 5 at 16:15
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
First, note that no such $lambda$ exists if
$sum_{i=1}^{d} lambda_{i} < 1$.
For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.
I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.
In the following, we'll assume that
$sum_{i=1}^{d} lambda_{i} geq 1$.
Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as
$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$
or
$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$
where
$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.
We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is
$langle A, B rangle = mbox{tr}(AB)$
and the norm is
$| A |_{F}=sqrt{langle A, A rangle}$.
Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.
$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$
$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.
Also,
$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$.
The desired projection is
$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$
subject to the constraints
$mbox{tr}(Y)=1$
$Y succeq 0$.
Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.
$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$
Now, by the Pythagorean theorem,
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.
Using the orthogonality of the $Y^{(i)}$ matrices, we get that
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since $W perp Y^{(0)}$,
$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since
$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,
and
$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,
using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.
We've now reduced the problem to
$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$
subject to
$sum_{i=1}^{d} alpha_{i}=1$
$alpha geq 0$.
It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.
If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.
In the case $alpha_{i}>0$ and $nu_{i}=0$, we have
$-2(lambda_{i}-alpha_{i})=mu$
or
$(lambda_{i}-alpha_{i})=mu/2$.
In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint
$sum_{alpha_{i}>0} alpha_{i}=1$
to get
$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$
Adding in the terms for $alpha_{i}=0$, we get
$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.
Once $mu$ has been determined, let
$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.
$endgroup$
$begingroup$
Wow, it is a very rigorous proof.
$endgroup$
– Sepide
Jan 7 at 19:01
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3053571%2fwhat-is-the-formula-for-projection-onto-spectraplex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
First, note that no such $lambda$ exists if
$sum_{i=1}^{d} lambda_{i} < 1$.
For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.
I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.
In the following, we'll assume that
$sum_{i=1}^{d} lambda_{i} geq 1$.
Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as
$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$
or
$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$
where
$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.
We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is
$langle A, B rangle = mbox{tr}(AB)$
and the norm is
$| A |_{F}=sqrt{langle A, A rangle}$.
Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.
$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$
$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.
Also,
$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$.
The desired projection is
$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$
subject to the constraints
$mbox{tr}(Y)=1$
$Y succeq 0$.
Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.
$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$
Now, by the Pythagorean theorem,
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.
Using the orthogonality of the $Y^{(i)}$ matrices, we get that
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since $W perp Y^{(0)}$,
$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since
$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,
and
$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,
using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.
We've now reduced the problem to
$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$
subject to
$sum_{i=1}^{d} alpha_{i}=1$
$alpha geq 0$.
It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.
If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.
In the case $alpha_{i}>0$ and $nu_{i}=0$, we have
$-2(lambda_{i}-alpha_{i})=mu$
or
$(lambda_{i}-alpha_{i})=mu/2$.
In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint
$sum_{alpha_{i}>0} alpha_{i}=1$
to get
$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$
Adding in the terms for $alpha_{i}=0$, we get
$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.
Once $mu$ has been determined, let
$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.
$endgroup$
$begingroup$
Wow, it is a very rigorous proof.
$endgroup$
– Sepide
Jan 7 at 19:01
add a comment |
$begingroup$
First, note that no such $lambda$ exists if
$sum_{i=1}^{d} lambda_{i} < 1$.
For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.
I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.
In the following, we'll assume that
$sum_{i=1}^{d} lambda_{i} geq 1$.
Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as
$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$
or
$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$
where
$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.
We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is
$langle A, B rangle = mbox{tr}(AB)$
and the norm is
$| A |_{F}=sqrt{langle A, A rangle}$.
Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.
$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$
$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.
Also,
$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$.
The desired projection is
$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$
subject to the constraints
$mbox{tr}(Y)=1$
$Y succeq 0$.
Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.
$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$
Now, by the Pythagorean theorem,
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.
Using the orthogonality of the $Y^{(i)}$ matrices, we get that
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since $W perp Y^{(0)}$,
$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since
$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,
and
$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,
using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.
We've now reduced the problem to
$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$
subject to
$sum_{i=1}^{d} alpha_{i}=1$
$alpha geq 0$.
It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.
If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.
In the case $alpha_{i}>0$ and $nu_{i}=0$, we have
$-2(lambda_{i}-alpha_{i})=mu$
or
$(lambda_{i}-alpha_{i})=mu/2$.
In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint
$sum_{alpha_{i}>0} alpha_{i}=1$
to get
$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$
Adding in the terms for $alpha_{i}=0$, we get
$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.
Once $mu$ has been determined, let
$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.
$endgroup$
$begingroup$
Wow, it is a very rigorous proof.
$endgroup$
– Sepide
Jan 7 at 19:01
add a comment |
$begingroup$
First, note that no such $lambda$ exists if
$sum_{i=1}^{d} lambda_{i} < 1$.
For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.
I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.
In the following, we'll assume that
$sum_{i=1}^{d} lambda_{i} geq 1$.
Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as
$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$
or
$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$
where
$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.
We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is
$langle A, B rangle = mbox{tr}(AB)$
and the norm is
$| A |_{F}=sqrt{langle A, A rangle}$.
Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.
$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$
$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.
Also,
$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$.
The desired projection is
$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$
subject to the constraints
$mbox{tr}(Y)=1$
$Y succeq 0$.
Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.
$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$
Now, by the Pythagorean theorem,
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.
Using the orthogonality of the $Y^{(i)}$ matrices, we get that
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since $W perp Y^{(0)}$,
$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since
$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,
and
$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,
using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.
We've now reduced the problem to
$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$
subject to
$sum_{i=1}^{d} alpha_{i}=1$
$alpha geq 0$.
It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.
If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.
In the case $alpha_{i}>0$ and $nu_{i}=0$, we have
$-2(lambda_{i}-alpha_{i})=mu$
or
$(lambda_{i}-alpha_{i})=mu/2$.
In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint
$sum_{alpha_{i}>0} alpha_{i}=1$
to get
$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$
Adding in the terms for $alpha_{i}=0$, we get
$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.
Once $mu$ has been determined, let
$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.
$endgroup$
First, note that no such $lambda$ exists if
$sum_{i=1}^{d} lambda_{i} < 1$.
For example, the method fails if $W=0$ or if $W=(1/(2d))I$. It appears that in the context where this is presented in the paper that you cited we can be sure that the sum of the $lambda_{i}$ is large enough.
I believe that if you allow for negative values of $lambda$ the method will actually produce correct results for matrices $W$ that have $sum_{i=1}^{d} lambda_{i}<1$.
In the following, we'll assume that
$sum_{i=1}^{d} lambda_{i} geq 1$.
Since $W$ is symmetric, we can write it in terms of eigenvalues and normalized eigenvectors as
$W=sum_{i=1}^{d} lambda_{i} y^{(i)}{y^{(i)}}^{T}$
or
$W=sum_{i=1}^{d} lambda_{i} Y^{(i)}$
where
$Y^{(i)}=y^{(i)}{y^{(i)}}^{T}$.
We need to be specific about the matrix inner product and norm that we're projecting with respect to. Since $W$ lives in the space of symmetric matrices, The appropriate inner product is
$langle A, B rangle = mbox{tr}(AB)$
and the norm is
$| A |_{F}=sqrt{langle A, A rangle}$.
Note that with respect to this inner product, the $Y^{(i)}$ matrices are orthogonal because the $y_{i}$ eigenvectors are orthogonal.
$langle Y^{(i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}{y^{(i)}}^{T}y^{(j)}{y^{(j)}}^{T})$
$langle Y^({i)}, Y^{(j)} rangle=mbox{tr}(y^{(i)}0{y^{(j)}}^{T})=0$.
Also,
$| Y^{(i)} |_{F}^{2}=mbox{tr}(Y^{(i)}Y^{(i)})=
mbox{tr}
(y^{(i)}{y^{(i)}}^{T} y^{(i)}{y^{(i)}}^{T})
=mbox{tr}(y^{(i)}{y^{(i)}}^{T})
=mbox{tr}({y^{(i)}}^{T}y^{(i)})
=1$.
The desired projection is
$Pi_{S}(W)=mbox{arg} mbox{min}_{Y} | W-Y |_{F}^{2}$
subject to the constraints
$mbox{tr}(Y)=1$
$Y succeq 0$.
Write $Y$ as a linear combination of the $Y^{(i)}$ matrices plus an additional component that is in the orthogonal complement of the span of the $Y^{(i)}$.
$Y=Y^{(0)}+sum_{i=1}^{d} alpha_{i} Y^{(i)}$
Now, by the Pythagorean theorem,
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| left( W- sum_{i=1}^{d} alpha_{i} Y^{(i)} right) right|_{F}^{2}$.
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+left| sum_{i=1}^{d} (lambda_{i}-alpha_{i}) Y^{(i)} right|_{F}^{2}$.
Using the orthogonality of the $Y^{(i)}$ matrices, we get that
$| W-Y |_{F}^{2}=left| W-Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since $W perp Y^{(0)}$,
$| W-Y |_{F}^{2}=left| W right|_{F}^{2}+left| Y^{(0)} right|_{F}^{2}+sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$.
Since
$mbox{tr}(Y)=mbox{tr}(Y^{(0)})+sum_{i=1}^{d} alpha_{i}$,
and
$sum_{i=1}^{d} alpha_{i} leq sum_{i=1}^{d} lambda_{i}$,
using a non-zero value of $Y^{(0)}$ can only increase this objective. Thus it's optimal to set $Y^{(0)}=0$.
We've now reduced the problem to
$min sum_{i=1}^{d} (lambda_{i}-alpha_{i})^{2}$
subject to
$sum_{i=1}^{d} alpha_{i}=1$
$alpha geq 0$.
It's easy to see that the equation given in the paper results from applying the KKT conditions to this problem.
If we let $mu$ be the Lagrange multiplier for the constraint $sum_{i=1}^{d}alpha_{i}=1$ and $nu_{i}$ be the Lagrange multiplier for the constraint $alpha_{i} geq 0$, then the ith component of the KKT conditions depends on whether $alpha_{i}>0$, in which case $nu_{i}=0$, or $alpha_{i}=0$, in which case $nu_{i}$ is free to be nonzero.
In the case $alpha_{i}>0$ and $nu_{i}=0$, we have
$-2(lambda_{i}-alpha_{i})=mu$
or
$(lambda_{i}-alpha_{i})=mu/2$.
In the case where $alpha_{i}=0$ and $nu_{i}$ is positive, the equation can satisfied by adjusting $nu$. Next, we use the constraint
$sum_{alpha_{i}>0} alpha_{i}=1$
to get
$sum_{alpha_{i}>0} (lambda_{i}-mu/2)=1$
Adding in the terms for $alpha_{i}=0$, we get
$sum_{i=1}^{d} max(0,lambda_{i}-mu/2)=1$.
Once $mu$ has been determined, let
$alpha_{i}=max(0,lambda_{i}-mu/2)$ for $i=1, 2, ldots, d$.
edited Jan 6 at 18:20
answered Jan 6 at 17:05
Brian BorchersBrian Borchers
5,94111119
5,94111119
$begingroup$
Wow, it is a very rigorous proof.
$endgroup$
– Sepide
Jan 7 at 19:01
add a comment |
$begingroup$
Wow, it is a very rigorous proof.
$endgroup$
– Sepide
Jan 7 at 19:01
$begingroup$
Wow, it is a very rigorous proof.
$endgroup$
– Sepide
Jan 7 at 19:01
$begingroup$
Wow, it is a very rigorous proof.
$endgroup$
– Sepide
Jan 7 at 19:01
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3053571%2fwhat-is-the-formula-for-projection-onto-spectraplex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
I believe that you can reduce this problem to the problem of projecting onto the unit simplex in $R^{n}_{+}$ by using the orthogonality of the $y_{i}y_{i}^{T}$ with respect to the trace inner product. The equation for $lambda$ is the same equation that appears in the commonly used algorithm for computing the projection on the unit simplex in $R^{n}_{+}$.
$endgroup$
– Brian Borchers
Jan 5 at 16:15