Is the average of betas from Y ~ X and X ~ Y valid?












6














I am interested in the relationship between two time series variables: $Y$ and $X$. The two variables are related to each other, and it's not clear from theory which one causes the other.



Given this, I have no good reason to prefer the linear regression $ Y = alpha + beta.X$ over $ X = kappa + gamma .Y $.



Clearly there is some relationship between $beta$ and $gamma$, though I recall enough statistics to understand that $beta$ = $1/ gamma$ is not true. Or perhaps it's not even close? I'm a bit hazy.



The problem is to decide how much of $X$ one ought to hold against $Y$.



I'm considering taking the average of $beta$ and $1/ gamma$ and using that as the hedge ratio.



Is the average of $beta$ and $1/ gamma$ a meaningful concept? What is the appropriate way to deal with the fact that the two variables are related to each other -- meaning that there really isn't an independent and dependent variable?










share|cite|improve this question
























  • The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making "Y = a + B x + error" the common expression) Do you have an idea about the errors in the measurement of X and Y.
    – Martijn Weterings
    yesterday












  • To determine causality you need a controlled experiment. An experiment where you are able to change some variable independently from the others. (or a very unique situation where two populations can be considered/assumed equal except for one or more particular variables that are to be considered as "independent" variables)
    – Martijn Weterings
    yesterday








  • 1




    The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables..., and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis's answer to the same question, and he introduces a"least rectangles" regression that you might want .....
    – Dilip Sarwate
    yesterday








  • 3




    You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the out-of-sample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or vice-versa.
    – Chris Haug
    23 hours ago






  • 1




    You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
    – Henry
    21 hours ago


















6














I am interested in the relationship between two time series variables: $Y$ and $X$. The two variables are related to each other, and it's not clear from theory which one causes the other.



Given this, I have no good reason to prefer the linear regression $ Y = alpha + beta.X$ over $ X = kappa + gamma .Y $.



Clearly there is some relationship between $beta$ and $gamma$, though I recall enough statistics to understand that $beta$ = $1/ gamma$ is not true. Or perhaps it's not even close? I'm a bit hazy.



The problem is to decide how much of $X$ one ought to hold against $Y$.



I'm considering taking the average of $beta$ and $1/ gamma$ and using that as the hedge ratio.



Is the average of $beta$ and $1/ gamma$ a meaningful concept? What is the appropriate way to deal with the fact that the two variables are related to each other -- meaning that there really isn't an independent and dependent variable?










share|cite|improve this question
























  • The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making "Y = a + B x + error" the common expression) Do you have an idea about the errors in the measurement of X and Y.
    – Martijn Weterings
    yesterday












  • To determine causality you need a controlled experiment. An experiment where you are able to change some variable independently from the others. (or a very unique situation where two populations can be considered/assumed equal except for one or more particular variables that are to be considered as "independent" variables)
    – Martijn Weterings
    yesterday








  • 1




    The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables..., and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis's answer to the same question, and he introduces a"least rectangles" regression that you might want .....
    – Dilip Sarwate
    yesterday








  • 3




    You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the out-of-sample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or vice-versa.
    – Chris Haug
    23 hours ago






  • 1




    You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
    – Henry
    21 hours ago
















6












6








6


2





I am interested in the relationship between two time series variables: $Y$ and $X$. The two variables are related to each other, and it's not clear from theory which one causes the other.



Given this, I have no good reason to prefer the linear regression $ Y = alpha + beta.X$ over $ X = kappa + gamma .Y $.



Clearly there is some relationship between $beta$ and $gamma$, though I recall enough statistics to understand that $beta$ = $1/ gamma$ is not true. Or perhaps it's not even close? I'm a bit hazy.



The problem is to decide how much of $X$ one ought to hold against $Y$.



I'm considering taking the average of $beta$ and $1/ gamma$ and using that as the hedge ratio.



Is the average of $beta$ and $1/ gamma$ a meaningful concept? What is the appropriate way to deal with the fact that the two variables are related to each other -- meaning that there really isn't an independent and dependent variable?










share|cite|improve this question















I am interested in the relationship between two time series variables: $Y$ and $X$. The two variables are related to each other, and it's not clear from theory which one causes the other.



Given this, I have no good reason to prefer the linear regression $ Y = alpha + beta.X$ over $ X = kappa + gamma .Y $.



Clearly there is some relationship between $beta$ and $gamma$, though I recall enough statistics to understand that $beta$ = $1/ gamma$ is not true. Or perhaps it's not even close? I'm a bit hazy.



The problem is to decide how much of $X$ one ought to hold against $Y$.



I'm considering taking the average of $beta$ and $1/ gamma$ and using that as the hedge ratio.



Is the average of $beta$ and $1/ gamma$ a meaningful concept? What is the appropriate way to deal with the fact that the two variables are related to each other -- meaning that there really isn't an independent and dependent variable?







regression regression-coefficients






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited yesterday

























asked yesterday









ricardo

1385




1385












  • The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making "Y = a + B x + error" the common expression) Do you have an idea about the errors in the measurement of X and Y.
    – Martijn Weterings
    yesterday












  • To determine causality you need a controlled experiment. An experiment where you are able to change some variable independently from the others. (or a very unique situation where two populations can be considered/assumed equal except for one or more particular variables that are to be considered as "independent" variables)
    – Martijn Weterings
    yesterday








  • 1




    The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables..., and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis's answer to the same question, and he introduces a"least rectangles" regression that you might want .....
    – Dilip Sarwate
    yesterday








  • 3




    You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the out-of-sample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or vice-versa.
    – Chris Haug
    23 hours ago






  • 1




    You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
    – Henry
    21 hours ago




















  • The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making "Y = a + B x + error" the common expression) Do you have an idea about the errors in the measurement of X and Y.
    – Martijn Weterings
    yesterday












  • To determine causality you need a controlled experiment. An experiment where you are able to change some variable independently from the others. (or a very unique situation where two populations can be considered/assumed equal except for one or more particular variables that are to be considered as "independent" variables)
    – Martijn Weterings
    yesterday








  • 1




    The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables..., and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis's answer to the same question, and he introduces a"least rectangles" regression that you might want .....
    – Dilip Sarwate
    yesterday








  • 3




    You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the out-of-sample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or vice-versa.
    – Chris Haug
    23 hours ago






  • 1




    You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
    – Henry
    21 hours ago


















The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making "Y = a + B x + error" the common expression) Do you have an idea about the errors in the measurement of X and Y.
– Martijn Weterings
yesterday






The problem is not causality but instead the errors of measurement (it is just that often the dependent variable Y is the one with large measurement error, making "Y = a + B x + error" the common expression) Do you have an idea about the errors in the measurement of X and Y.
– Martijn Weterings
yesterday














To determine causality you need a controlled experiment. An experiment where you are able to change some variable independently from the others. (or a very unique situation where two populations can be considered/assumed equal except for one or more particular variables that are to be considered as "independent" variables)
– Martijn Weterings
yesterday






To determine causality you need a controlled experiment. An experiment where you are able to change some variable independently from the others. (or a very unique situation where two populations can be considered/assumed equal except for one or more particular variables that are to be considered as "independent" variables)
– Martijn Weterings
yesterday






1




1




The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables..., and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis's answer to the same question, and he introduces a"least rectangles" regression that you might want .....
– Dilip Sarwate
yesterday






The exact values of $beta$ and $gamma$ can be found in this answer of mine to Effect of switching responses and explanatory variables..., and, as you suspect, $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $1/gamma$ is not the right way to go. A pictorial view of what $beta$ and $gamma$ are minimizing is given in Elvis's answer to the same question, and he introduces a"least rectangles" regression that you might want .....
– Dilip Sarwate
yesterday






3




3




You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the out-of-sample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or vice-versa.
– Chris Haug
23 hours ago




You are in the ideal scenario where the choice of technique has a direct, physically measurable impact; you can simply measure the out-of-sample hedging error for each estimate, and compare them. Also, typically optimal hedging is better handled by using a VECM model (see for example Gatarek & Johansen, 2014, Optimal hedging with the cointegrated vector autoregressive model), which does not require choosing to model Y as a function of X or vice-versa.
– Chris Haug
23 hours ago




1




1




You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
– Henry
21 hours ago






You might want to look at the geometric mean $sqrt{dfrac{beta}{gamma}}$ as a possibility (if they are both negative you might take the negative square root). Then look at $dfrac{s_y}{s_x}$, which should be very similar
– Henry
21 hours ago












4 Answers
4






active

oldest

votes


















2














Converted from a comment.....



The exact values of $beta$ and $gamma$
can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
$beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
(or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
are minimizing is given in Elvis's answer to the same question, and in the answer, he introduces a "least rectangles" regression that might be what you are looking for. The comments following Elvis's answer should not be neglected; they relate this "least rectangles" regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.






share|cite|improve this answer





























    9














    To see the connection between both representations, take a bivariate Normal vector:
    $$
    begin{pmatrix}
    X_1 \
    X_2
    end{pmatrix} sim mathcal{N} left( begin{pmatrix}
    mu_1 \
    mu_2
    end{pmatrix} , begin{pmatrix}
    sigma^2_1 & rho sigma_1 sigma_2 \
    rho sigma_1 sigma_2 & sigma^2_2
    end{pmatrix} right)
    $$

    with conditionals
    $$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 - mu_2),(1-rho^2)sigma^2_1 right)$$
    and
    $$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 - mu_1),(1-rho^2)sigma^2_2 right)$$
    This means that
    $$X_1=underbrace{left(mu_1-rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1-rho^2}sigma_1epsilon_1$$
    and
    $$X_2=underbrace{left(mu_2-rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1-rho^2}sigma_2epsilon_2$$
    which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.






    share|cite|improve this answer























    • How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
      – ricardo
      yesterday






    • 4




      I have no idea.
      – Xi'an
      yesterday










    • @ricardo By measuring the out-of-sample hedging error under each estimate, which is ultimately what you are trying to minimize.
      – Chris Haug
      23 hours ago





















    2















    $beta$ and $gamma$



    As Xi'an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $X|Y$ and $Y|X$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta = 1/gamma$. This is neither the case if you would 'know' the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$



    See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.



    It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean. With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. But with $rho_{XY} < 1$ you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$.





    Is a regression line the right method?



    You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.



    Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).





    Alternative



    Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:



    $$H = alpha X + (1-alpha) Y sim N(mu_H,sigma_H^2)$$



    were $0 leq alpha leq 1$ and with



    $$begin{array}{rcl}
    mu_H &=& alpha mu_X+(1-alpha) mu_Y \
    sigma_H^2 &=& alpha^2 sigma_X^2 + (1-alpha)^2 sigma_Y^2 + 2 alpha (1-alpha) rho_{XY} sigma_X sigma_Y \
    & =& alpha^2(sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y) + alpha (-2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
    end{array} $$



    The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.



    The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 - frac{sigma_X^2 -rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2-rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} $$



    The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains



    Note that now there is a symmetry between $alpha$ and $1-alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1-alpha_1)Y$ or the hedge $H=alpha_2 Y + (1-alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1-alpha_2$.



    Minimal variance case and relation with principle components



    In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1-alpha} = frac{var(Y) - cov(X,Y)}{var(X)-cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1-alpha} = frac{1-beta}{1-gamma}$$



    In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.





    Variants



    Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values for the pair $X,Y$.





    $dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.






    share|cite|improve this answer



















    • 1




      I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
      – Martijn Weterings
      4 hours ago












    • I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow non-negative holdings? Adopting your terminology, i'd have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y ... but it could be 0.2 units or 5 units, depending on the math.
      – ricardo
      4 hours ago










    • long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
      – ricardo
      3 hours ago










    • "The problem is to decide how much of X one ought to hold against Y." My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
      – Martijn Weterings
      3 hours ago












    • Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
      – Martijn Weterings
      3 hours ago



















    1














    Perhaps the approach of "Granger causality" might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).



    Wikipedia gives a simple explanation:
    A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.



    What you do is the following:




    • regress X(t-1) and Y(t-1) on Y(t)

    • regress X(t-1), X(t-2), Y(t-1), Y(t-2) on Y(t)

    • regress X(t-1), X(t-2), X(t-3), Y(t-1), Y(t-2), Y(t-3) on Y(t)


    Continue for whatever history length might be reasonable. Check the significance of the F-statistics for each regression.
    Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant F-values.



    A very straightforward example, with R code, is found here.
    Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about "predictive causality," which is exactly what the Granger causality approach is meant for.



    The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially--and incorrectly--compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?






    share|cite|improve this answer





















    • I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don't think that Granger Causailty is the answer in this case. I've upvoted the answer in any case, as it is useful -- esp. the R code.
      – ricardo
      12 hours ago










    • That is why I mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that the question here is more about "predictive causality," which is what Granger causality is meant for.
      – Steve G. Jones
      8 hours ago










    • That is why I explicitly mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that your question is more about establishing "predictive causality," which is what Granger causality is meant for. In addition, Granger's approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) re-estimate the effects over time. I expect that the Granger effects are more stable than cross-sectional OLS (you can test this beforehand, using historical data). HTH
      – Steve G. Jones
      8 hours ago











    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f385812%2fis-the-average-of-betas-from-y-x-and-x-y-valid%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    Converted from a comment.....



    The exact values of $beta$ and $gamma$
    can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
    $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
    (or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
    are minimizing is given in Elvis's answer to the same question, and in the answer, he introduces a "least rectangles" regression that might be what you are looking for. The comments following Elvis's answer should not be neglected; they relate this "least rectangles" regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.






    share|cite|improve this answer


























      2














      Converted from a comment.....



      The exact values of $beta$ and $gamma$
      can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
      $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
      (or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
      are minimizing is given in Elvis's answer to the same question, and in the answer, he introduces a "least rectangles" regression that might be what you are looking for. The comments following Elvis's answer should not be neglected; they relate this "least rectangles" regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.






      share|cite|improve this answer
























        2












        2








        2






        Converted from a comment.....



        The exact values of $beta$ and $gamma$
        can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
        $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
        (or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
        are minimizing is given in Elvis's answer to the same question, and in the answer, he introduces a "least rectangles" regression that might be what you are looking for. The comments following Elvis's answer should not be neglected; they relate this "least rectangles" regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.






        share|cite|improve this answer












        Converted from a comment.....



        The exact values of $beta$ and $gamma$
        can be found in this answer of mine to Effect of switching responses and explanatory variables in simple linear regression, and, as you suspect,
        $beta$ is not the reciprocal of $gamma$, and averaging $beta$ and $gamma$
        (or averaging $beta$ and $1/gamma$) is not the right way to go. A pictorial view of what $beta$ and $gamma$
        are minimizing is given in Elvis's answer to the same question, and in the answer, he introduces a "least rectangles" regression that might be what you are looking for. The comments following Elvis's answer should not be neglected; they relate this "least rectangles" regression to other, previously studied, techniques. In particular, note that Moderator chl points out that this method is of interest when it is not clear which is the predictor variable and which the response variable.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 16 hours ago









        Dilip Sarwate

        29.9k252147




        29.9k252147

























            9














            To see the connection between both representations, take a bivariate Normal vector:
            $$
            begin{pmatrix}
            X_1 \
            X_2
            end{pmatrix} sim mathcal{N} left( begin{pmatrix}
            mu_1 \
            mu_2
            end{pmatrix} , begin{pmatrix}
            sigma^2_1 & rho sigma_1 sigma_2 \
            rho sigma_1 sigma_2 & sigma^2_2
            end{pmatrix} right)
            $$

            with conditionals
            $$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 - mu_2),(1-rho^2)sigma^2_1 right)$$
            and
            $$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 - mu_1),(1-rho^2)sigma^2_2 right)$$
            This means that
            $$X_1=underbrace{left(mu_1-rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1-rho^2}sigma_1epsilon_1$$
            and
            $$X_2=underbrace{left(mu_2-rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1-rho^2}sigma_2epsilon_2$$
            which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.






            share|cite|improve this answer























            • How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
              – ricardo
              yesterday






            • 4




              I have no idea.
              – Xi'an
              yesterday










            • @ricardo By measuring the out-of-sample hedging error under each estimate, which is ultimately what you are trying to minimize.
              – Chris Haug
              23 hours ago


















            9














            To see the connection between both representations, take a bivariate Normal vector:
            $$
            begin{pmatrix}
            X_1 \
            X_2
            end{pmatrix} sim mathcal{N} left( begin{pmatrix}
            mu_1 \
            mu_2
            end{pmatrix} , begin{pmatrix}
            sigma^2_1 & rho sigma_1 sigma_2 \
            rho sigma_1 sigma_2 & sigma^2_2
            end{pmatrix} right)
            $$

            with conditionals
            $$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 - mu_2),(1-rho^2)sigma^2_1 right)$$
            and
            $$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 - mu_1),(1-rho^2)sigma^2_2 right)$$
            This means that
            $$X_1=underbrace{left(mu_1-rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1-rho^2}sigma_1epsilon_1$$
            and
            $$X_2=underbrace{left(mu_2-rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1-rho^2}sigma_2epsilon_2$$
            which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.






            share|cite|improve this answer























            • How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
              – ricardo
              yesterday






            • 4




              I have no idea.
              – Xi'an
              yesterday










            • @ricardo By measuring the out-of-sample hedging error under each estimate, which is ultimately what you are trying to minimize.
              – Chris Haug
              23 hours ago
















            9












            9








            9






            To see the connection between both representations, take a bivariate Normal vector:
            $$
            begin{pmatrix}
            X_1 \
            X_2
            end{pmatrix} sim mathcal{N} left( begin{pmatrix}
            mu_1 \
            mu_2
            end{pmatrix} , begin{pmatrix}
            sigma^2_1 & rho sigma_1 sigma_2 \
            rho sigma_1 sigma_2 & sigma^2_2
            end{pmatrix} right)
            $$

            with conditionals
            $$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 - mu_2),(1-rho^2)sigma^2_1 right)$$
            and
            $$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 - mu_1),(1-rho^2)sigma^2_2 right)$$
            This means that
            $$X_1=underbrace{left(mu_1-rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1-rho^2}sigma_1epsilon_1$$
            and
            $$X_2=underbrace{left(mu_2-rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1-rho^2}sigma_2epsilon_2$$
            which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.






            share|cite|improve this answer














            To see the connection between both representations, take a bivariate Normal vector:
            $$
            begin{pmatrix}
            X_1 \
            X_2
            end{pmatrix} sim mathcal{N} left( begin{pmatrix}
            mu_1 \
            mu_2
            end{pmatrix} , begin{pmatrix}
            sigma^2_1 & rho sigma_1 sigma_2 \
            rho sigma_1 sigma_2 & sigma^2_2
            end{pmatrix} right)
            $$

            with conditionals
            $$X_1 mid X_2=x_2 sim mathcal{N} left( mu_1 + rho frac{sigma_1}{sigma_2}(x_2 - mu_2),(1-rho^2)sigma^2_1 right)$$
            and
            $$X_2 mid X_1=x_1 sim mathcal{N} left( mu_2 + rho frac{sigma_2}{sigma_1}(x_1 - mu_1),(1-rho^2)sigma^2_2 right)$$
            This means that
            $$X_1=underbrace{left(mu_1-rho frac{sigma_1}{sigma_2}mu_2right)}_alpha+underbrace{rho frac{sigma_1}{sigma_2}}_beta X_2+sqrt{1-rho^2}sigma_1epsilon_1$$
            and
            $$X_2=underbrace{left(mu_2-rho frac{sigma_2}{sigma_1}mu_1right)}_kappa+underbrace{rho frac{sigma_2}{sigma_1}}_gamma X_1+sqrt{1-rho^2}sigma_2epsilon_2$$
            which means (a) $gamma$ is not $1/beta$ and (b) the connection between the two regressions depends on the joint distribution of $(X_1,X_2)$.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited 5 hours ago









            Martijn Weterings

            12.6k1355




            12.6k1355










            answered yesterday









            Xi'an

            54k690348




            54k690348












            • How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
              – ricardo
              yesterday






            • 4




              I have no idea.
              – Xi'an
              yesterday










            • @ricardo By measuring the out-of-sample hedging error under each estimate, which is ultimately what you are trying to minimize.
              – Chris Haug
              23 hours ago




















            • How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
              – ricardo
              yesterday






            • 4




              I have no idea.
              – Xi'an
              yesterday










            • @ricardo By measuring the out-of-sample hedging error under each estimate, which is ultimately what you are trying to minimize.
              – Chris Haug
              23 hours ago


















            How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
            – ricardo
            yesterday




            How would I decide if the average of the two betas is a better measure of the hedge ratio than one or the other?
            – ricardo
            yesterday




            4




            4




            I have no idea.
            – Xi'an
            yesterday




            I have no idea.
            – Xi'an
            yesterday












            @ricardo By measuring the out-of-sample hedging error under each estimate, which is ultimately what you are trying to minimize.
            – Chris Haug
            23 hours ago






            @ricardo By measuring the out-of-sample hedging error under each estimate, which is ultimately what you are trying to minimize.
            – Chris Haug
            23 hours ago













            2















            $beta$ and $gamma$



            As Xi'an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $X|Y$ and $Y|X$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta = 1/gamma$. This is neither the case if you would 'know' the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$



            See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.



            It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean. With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. But with $rho_{XY} < 1$ you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$.





            Is a regression line the right method?



            You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.



            Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).





            Alternative



            Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:



            $$H = alpha X + (1-alpha) Y sim N(mu_H,sigma_H^2)$$



            were $0 leq alpha leq 1$ and with



            $$begin{array}{rcl}
            mu_H &=& alpha mu_X+(1-alpha) mu_Y \
            sigma_H^2 &=& alpha^2 sigma_X^2 + (1-alpha)^2 sigma_Y^2 + 2 alpha (1-alpha) rho_{XY} sigma_X sigma_Y \
            & =& alpha^2(sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y) + alpha (-2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
            end{array} $$



            The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.



            The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 - frac{sigma_X^2 -rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2-rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} $$



            The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains



            Note that now there is a symmetry between $alpha$ and $1-alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1-alpha_1)Y$ or the hedge $H=alpha_2 Y + (1-alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1-alpha_2$.



            Minimal variance case and relation with principle components



            In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1-alpha} = frac{var(Y) - cov(X,Y)}{var(X)-cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1-alpha} = frac{1-beta}{1-gamma}$$



            In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.





            Variants



            Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values for the pair $X,Y$.





            $dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.






            share|cite|improve this answer



















            • 1




              I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
              – Martijn Weterings
              4 hours ago












            • I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow non-negative holdings? Adopting your terminology, i'd have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y ... but it could be 0.2 units or 5 units, depending on the math.
              – ricardo
              4 hours ago










            • long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
              – ricardo
              3 hours ago










            • "The problem is to decide how much of X one ought to hold against Y." My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
              – Martijn Weterings
              3 hours ago












            • Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
              – Martijn Weterings
              3 hours ago
















            2















            $beta$ and $gamma$



            As Xi'an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $X|Y$ and $Y|X$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta = 1/gamma$. This is neither the case if you would 'know' the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$



            See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.



            It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean. With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. But with $rho_{XY} < 1$ you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$.





            Is a regression line the right method?



            You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.



            Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).





            Alternative



            Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:



            $$H = alpha X + (1-alpha) Y sim N(mu_H,sigma_H^2)$$



            were $0 leq alpha leq 1$ and with



            $$begin{array}{rcl}
            mu_H &=& alpha mu_X+(1-alpha) mu_Y \
            sigma_H^2 &=& alpha^2 sigma_X^2 + (1-alpha)^2 sigma_Y^2 + 2 alpha (1-alpha) rho_{XY} sigma_X sigma_Y \
            & =& alpha^2(sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y) + alpha (-2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
            end{array} $$



            The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.



            The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 - frac{sigma_X^2 -rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2-rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} $$



            The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains



            Note that now there is a symmetry between $alpha$ and $1-alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1-alpha_1)Y$ or the hedge $H=alpha_2 Y + (1-alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1-alpha_2$.



            Minimal variance case and relation with principle components



            In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1-alpha} = frac{var(Y) - cov(X,Y)}{var(X)-cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1-alpha} = frac{1-beta}{1-gamma}$$



            In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.





            Variants



            Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values for the pair $X,Y$.





            $dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.






            share|cite|improve this answer



















            • 1




              I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
              – Martijn Weterings
              4 hours ago












            • I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow non-negative holdings? Adopting your terminology, i'd have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y ... but it could be 0.2 units or 5 units, depending on the math.
              – ricardo
              4 hours ago










            • long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
              – ricardo
              3 hours ago










            • "The problem is to decide how much of X one ought to hold against Y." My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
              – Martijn Weterings
              3 hours ago












            • Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
              – Martijn Weterings
              3 hours ago














            2












            2








            2







            $beta$ and $gamma$



            As Xi'an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $X|Y$ and $Y|X$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta = 1/gamma$. This is neither the case if you would 'know' the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$



            See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.



            It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean. With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. But with $rho_{XY} < 1$ you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$.





            Is a regression line the right method?



            You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.



            Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).





            Alternative



            Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:



            $$H = alpha X + (1-alpha) Y sim N(mu_H,sigma_H^2)$$



            were $0 leq alpha leq 1$ and with



            $$begin{array}{rcl}
            mu_H &=& alpha mu_X+(1-alpha) mu_Y \
            sigma_H^2 &=& alpha^2 sigma_X^2 + (1-alpha)^2 sigma_Y^2 + 2 alpha (1-alpha) rho_{XY} sigma_X sigma_Y \
            & =& alpha^2(sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y) + alpha (-2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
            end{array} $$



            The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.



            The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 - frac{sigma_X^2 -rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2-rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} $$



            The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains



            Note that now there is a symmetry between $alpha$ and $1-alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1-alpha_1)Y$ or the hedge $H=alpha_2 Y + (1-alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1-alpha_2$.



            Minimal variance case and relation with principle components



            In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1-alpha} = frac{var(Y) - cov(X,Y)}{var(X)-cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1-alpha} = frac{1-beta}{1-gamma}$$



            In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.





            Variants



            Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values for the pair $X,Y$.





            $dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.






            share|cite|improve this answer















            $beta$ and $gamma$



            As Xi'an noted in his answer the $beta$ and $gamma$ are related to each other by relating to the conditional means $X|Y$ and $Y|X$ (which in their turn relate to a single joint distribution) these are not symmetric in the sense that $beta = 1/gamma$. This is neither the case if you would 'know' the true $sigma$ and $rho$ instead of using estimates. You have $$beta = rho_{XY} frac{sigma_Y}{sigma_X}$$ and $$gamma = rho_{XY} frac{sigma_X}{sigma_Y}$$



            See also simple linear regression on wikipedia for computation of the $beta$ and $gamma$.



            It is this correlation term which sort of disturbs the symmetry. When the $beta$ and $gamma$ would be simply the ratio of the standard deviation $sigma_Y/sigma_X$ and $sigma_X/sigma_Y$ then they would indeed be each others inverse. The $rho_{XY}$ term can be seen as modifying this as a sort of regression to the mean. With perfect correlation $rho_{XY} = 1$ then you can fully predict $X$ based on $Y$ or vice versa. But with $rho_{XY} < 1$ you can not make those perfect predictions and the conditional mean will be somewhat closer to the unconditional mean, in comparison to a simple scaling by $sigma_Y/sigma_X$ or $sigma_X/sigma_Y$.





            Is a regression line the right method?



            You may wonder whether these conditional probabilities and regression lines is what you need to determine your ratios of $X$ and $Y$. It is unclear to me how you would wish to use a regression line in the computation of an optimal ratio.



            Below is an alternative way to compute the ratio. This method does have symmetry (ie if you switch X and Y then you will get the same ratio).





            Alternative



            Say, the yields of bonds $X$ and $Y$ are distributed according to a multivariate normal distribution$^dagger$ with correlation $rho_{XY}$ and standard deviations $sigma_X$ and $sigma_Y$ then the yield of a hedge that is sum of $X$ and $Y$ will be normal distributed:



            $$H = alpha X + (1-alpha) Y sim N(mu_H,sigma_H^2)$$



            were $0 leq alpha leq 1$ and with



            $$begin{array}{rcl}
            mu_H &=& alpha mu_X+(1-alpha) mu_Y \
            sigma_H^2 &=& alpha^2 sigma_X^2 + (1-alpha)^2 sigma_Y^2 + 2 alpha (1-alpha) rho_{XY} sigma_X sigma_Y \
            & =& alpha^2(sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y) + alpha (-2 sigma_Y^2+2rho_{XY}sigma_Xsigma_Y) +sigma_Y^2
            end{array} $$



            The maximum of the mean $mu_H$ will be at $$alpha = 0 text{ or } alpha=1$$ or not existing when $mu_X=mu_Y$.



            The minimum of the variance $sigma_H^2$ will be at $$alpha = 1 - frac{sigma_X^2 -rho_{XY}sigma_Xsigma_Y}{sigma_X^2 +sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} = frac{sigma_Y^2-rho_{XY}sigma_Xsigma_Y}{sigma_X^2+sigma_Y^2 -2 rho_{XY} sigma_Xsigma_Y} $$



            The optimum will be somewhere in between those two extremes and depends on how you wish to compare losses and gains



            Note that now there is a symmetry between $alpha$ and $1-alpha$. It does not matter whether you use the hedge $H=alpha_1 X+(1-alpha_1)Y$ or the hedge $H=alpha_2 Y + (1-alpha_2) X$. You will get the same ratios in terms of $alpha_1 = 1-alpha_2$.



            Minimal variance case and relation with principle components



            In the minimal variance case (here you actually do not need to assume a multivariate Normal distribution) you get the following hedge ratio as optimum $$frac{alpha}{1-alpha} = frac{var(Y) - cov(X,Y)}{var(X)-cov(X,Y)}$$ which can be expressed in terms of the regression coefficients $beta = cov(X,Y)/var(X)$ and $gamma = cov(X,Y)/var(Y)$ and is as following $$frac{alpha}{1-alpha} = frac{1-beta}{1-gamma}$$



            In a situation with more than two variables/stocks/bonds you might generalize this to the last (smallest eigenvalue) principle component.





            Variants



            Improvements of the model can be made by using different distributions than multivariate normal. Also you could incorporate the time in a more sophisticated model to make better predictions of future values for the pair $X,Y$.





            $dagger$ This is a simplification but it suits the purpose of explaining how one can, and should, perform the analysis to find an optimal ratio without a regression line.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited 1 hour ago

























            answered 6 hours ago









            Martijn Weterings

            12.6k1355




            12.6k1355








            • 1




              I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
              – Martijn Weterings
              4 hours ago












            • I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow non-negative holdings? Adopting your terminology, i'd have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y ... but it could be 0.2 units or 5 units, depending on the math.
              – ricardo
              4 hours ago










            • long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
              – ricardo
              3 hours ago










            • "The problem is to decide how much of X one ought to hold against Y." My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
              – Martijn Weterings
              3 hours ago












            • Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
              – Martijn Weterings
              3 hours ago














            • 1




              I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
              – Martijn Weterings
              4 hours ago












            • I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow non-negative holdings? Adopting your terminology, i'd have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y ... but it could be 0.2 units or 5 units, depending on the math.
              – ricardo
              4 hours ago










            • long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
              – ricardo
              3 hours ago










            • "The problem is to decide how much of X one ought to hold against Y." My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
              – Martijn Weterings
              3 hours ago












            • Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
              – Martijn Weterings
              3 hours ago








            1




            1




            I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
            – Martijn Weterings
            4 hours ago






            I am sorry, but as a physicist, I know too little about the language (long, short, holdings, etc.) related to stocks, bonds and finance. If you could cast it in simpler language I might be able to understand it and work with it. My answer is just a very simple expression that is unaware of the details and possibilities how to express hedging and stocks, but it shows the basic principle how you can get away from the use of a regression line (go back to first principles, express the model for profit which is at the core instead of using regression lines whose relevance is not directly clear).
            – Martijn Weterings
            4 hours ago














            I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow non-negative holdings? Adopting your terminology, i'd have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y ... but it could be 0.2 units or 5 units, depending on the math.
            – ricardo
            4 hours ago




            I think i understand. The problem is that 1/ρ_{XY} ne p_{XY}$. indeed, $p_{XY}$ often changes quite and bit when we take the inverse. Your alternative is close to the case I am thinking about, but i do want to check one thing: does this allow non-negative holdings? Adopting your terminology, i'd have a unit holding of bond X, and a negative holding of Y. Say long one unit of bond X and short (say) 1.2 units of bond Y ... but it could be 0.2 units or 5 units, depending on the math.
            – ricardo
            4 hours ago












            long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
            – ricardo
            3 hours ago




            long means that i make 1% on a bond if the price increases by ~1%; short means that i lose ~1% on a bond if the price increases by ~1%. So the idea is that i am long one unit of one bond (so i benefit from an appreciation) and am short some amount of the other bond (so i lose from an appreciation).
            – ricardo
            3 hours ago












            "The problem is to decide how much of X one ought to hold against Y." My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
            – Martijn Weterings
            3 hours ago






            "The problem is to decide how much of X one ought to hold against Y." My problem with this is that there is no explanation/model/expression how you decide about this. How do you define losses and gains and how much do you value them?
            – Martijn Weterings
            3 hours ago














            Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
            – Martijn Weterings
            3 hours ago




            Are there costs associated with being short and long? I imagine that you have a given amount to invest and this limits how much you can be short/long in those bonds. Then based on your previous knowledge you can estimate/determine the distribution of losses/gains for whatever combination on that limit. Finally, based on some function that determines how you value losses and gains (this expresses why/how you hedge) you can decide which combination to choose.
            – Martijn Weterings
            3 hours ago











            1














            Perhaps the approach of "Granger causality" might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).



            Wikipedia gives a simple explanation:
            A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.



            What you do is the following:




            • regress X(t-1) and Y(t-1) on Y(t)

            • regress X(t-1), X(t-2), Y(t-1), Y(t-2) on Y(t)

            • regress X(t-1), X(t-2), X(t-3), Y(t-1), Y(t-2), Y(t-3) on Y(t)


            Continue for whatever history length might be reasonable. Check the significance of the F-statistics for each regression.
            Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant F-values.



            A very straightforward example, with R code, is found here.
            Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about "predictive causality," which is exactly what the Granger causality approach is meant for.



            The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially--and incorrectly--compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?






            share|cite|improve this answer





















            • I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don't think that Granger Causailty is the answer in this case. I've upvoted the answer in any case, as it is useful -- esp. the R code.
              – ricardo
              12 hours ago










            • That is why I mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that the question here is more about "predictive causality," which is what Granger causality is meant for.
              – Steve G. Jones
              8 hours ago










            • That is why I explicitly mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that your question is more about establishing "predictive causality," which is what Granger causality is meant for. In addition, Granger's approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) re-estimate the effects over time. I expect that the Granger effects are more stable than cross-sectional OLS (you can test this beforehand, using historical data). HTH
              – Steve G. Jones
              8 hours ago
















            1














            Perhaps the approach of "Granger causality" might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).



            Wikipedia gives a simple explanation:
            A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.



            What you do is the following:




            • regress X(t-1) and Y(t-1) on Y(t)

            • regress X(t-1), X(t-2), Y(t-1), Y(t-2) on Y(t)

            • regress X(t-1), X(t-2), X(t-3), Y(t-1), Y(t-2), Y(t-3) on Y(t)


            Continue for whatever history length might be reasonable. Check the significance of the F-statistics for each regression.
            Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant F-values.



            A very straightforward example, with R code, is found here.
            Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about "predictive causality," which is exactly what the Granger causality approach is meant for.



            The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially--and incorrectly--compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?






            share|cite|improve this answer





















            • I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don't think that Granger Causailty is the answer in this case. I've upvoted the answer in any case, as it is useful -- esp. the R code.
              – ricardo
              12 hours ago










            • That is why I mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that the question here is more about "predictive causality," which is what Granger causality is meant for.
              – Steve G. Jones
              8 hours ago










            • That is why I explicitly mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that your question is more about establishing "predictive causality," which is what Granger causality is meant for. In addition, Granger's approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) re-estimate the effects over time. I expect that the Granger effects are more stable than cross-sectional OLS (you can test this beforehand, using historical data). HTH
              – Steve G. Jones
              8 hours ago














            1












            1








            1






            Perhaps the approach of "Granger causality" might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).



            Wikipedia gives a simple explanation:
            A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.



            What you do is the following:




            • regress X(t-1) and Y(t-1) on Y(t)

            • regress X(t-1), X(t-2), Y(t-1), Y(t-2) on Y(t)

            • regress X(t-1), X(t-2), X(t-3), Y(t-1), Y(t-2), Y(t-3) on Y(t)


            Continue for whatever history length might be reasonable. Check the significance of the F-statistics for each regression.
            Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant F-values.



            A very straightforward example, with R code, is found here.
            Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about "predictive causality," which is exactly what the Granger causality approach is meant for.



            The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially--and incorrectly--compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?






            share|cite|improve this answer












            Perhaps the approach of "Granger causality" might help. This would help you to assess whether X is a good predictor of Y or whether X is a better of Y. In other words, it tells you whether beta or gamma is the thing to take more seriously. Also, considering that you are dealing with time series data, it tells you how much of the history of X counts towards the prediction of Y (or vice versa).



            Wikipedia gives a simple explanation:
            A time series X is said to Granger-cause Y if it can be shown, usually through a series of t-tests and F-tests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y.



            What you do is the following:




            • regress X(t-1) and Y(t-1) on Y(t)

            • regress X(t-1), X(t-2), Y(t-1), Y(t-2) on Y(t)

            • regress X(t-1), X(t-2), X(t-3), Y(t-1), Y(t-2), Y(t-3) on Y(t)


            Continue for whatever history length might be reasonable. Check the significance of the F-statistics for each regression.
            Then do the same the reverse (so, now regress the past values of X and Y on X(t)) and see which regressions have significant F-values.



            A very straightforward example, with R code, is found here.
            Granger causality has been critiqued for not actually establishing causality (in some cases). But it seems that you application is really about "predictive causality," which is exactly what the Granger causality approach is meant for.



            The point is that the approach will tell you whether X predicts Y or whether Y predicts X (so you no longer would be tempted to artificially--and incorrectly--compound the two regression coefficients) and it gives you a better prediction (as you will know how much history of X and Y you need to know to predict Y), which is useful for hedging purposes, right?







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered yesterday









            Steve G. Jones

            1485




            1485












            • I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don't think that Granger Causailty is the answer in this case. I've upvoted the answer in any case, as it is useful -- esp. the R code.
              – ricardo
              12 hours ago










            • That is why I mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that the question here is more about "predictive causality," which is what Granger causality is meant for.
              – Steve G. Jones
              8 hours ago










            • That is why I explicitly mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that your question is more about establishing "predictive causality," which is what Granger causality is meant for. In addition, Granger's approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) re-estimate the effects over time. I expect that the Granger effects are more stable than cross-sectional OLS (you can test this beforehand, using historical data). HTH
              – Steve G. Jones
              8 hours ago


















            • I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don't think that Granger Causailty is the answer in this case. I've upvoted the answer in any case, as it is useful -- esp. the R code.
              – ricardo
              12 hours ago










            • That is why I mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that the question here is more about "predictive causality," which is what Granger causality is meant for.
              – Steve G. Jones
              8 hours ago










            • That is why I explicitly mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that your question is more about establishing "predictive causality," which is what Granger causality is meant for. In addition, Granger's approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) re-estimate the effects over time. I expect that the Granger effects are more stable than cross-sectional OLS (you can test this beforehand, using historical data). HTH
              – Steve G. Jones
              8 hours ago
















            I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don't think that Granger Causailty is the answer in this case. I've upvoted the answer in any case, as it is useful -- esp. the R code.
            – ricardo
            12 hours ago




            I have a strong theoretical reason to believe that neither is truly a cause, and that even if one became a cause that it would not remain true over time. So i don't think that Granger Causailty is the answer in this case. I've upvoted the answer in any case, as it is useful -- esp. the R code.
            – ricardo
            12 hours ago












            That is why I mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that the question here is more about "predictive causality," which is what Granger causality is meant for.
            – Steve G. Jones
            8 hours ago




            That is why I mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that the question here is more about "predictive causality," which is what Granger causality is meant for.
            – Steve G. Jones
            8 hours ago












            That is why I explicitly mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that your question is more about establishing "predictive causality," which is what Granger causality is meant for. In addition, Granger's approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) re-estimate the effects over time. I expect that the Granger effects are more stable than cross-sectional OLS (you can test this beforehand, using historical data). HTH
            – Steve G. Jones
            8 hours ago




            That is why I explicitly mention that "Granger causality has been critiqued for not actually establishing causality (in some cases)." It seems to me that your question is more about establishing "predictive causality," which is what Granger causality is meant for. In addition, Granger's approach uses the information in your time series data, which are a waste not to use if you have them. Of course, you can (should?) re-estimate the effects over time. I expect that the Granger effects are more stable than cross-sectional OLS (you can test this beforehand, using historical data). HTH
            – Steve G. Jones
            8 hours ago


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f385812%2fis-the-average-of-betas-from-y-x-and-x-y-valid%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            An IMO inspired problem

            Management

            Investment