Spurious Independence: is it real?

Reading Time: 14 minutes

First things first: Spurious Dependence

Depending on your background, you have already heard of spurious dependence in a way or another. It goes by the names of spurious association, spurious dependence, the famous quote “correlation does not imply causation” and also other versions based on the same idea that you can not say that X necessarily causes Y (or vice versa) solely because X and Y are associated, that is, because they tend to occur together. Even if one of the events always happens before the other, let’s say X preceding Y, still, you can not say that X causes Y. There is a statistical test very famous in economics known as Granger causality.

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969.[1] Ordinarily, regressions reflect “mere” correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of “true causality” is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only “predictive causality”.

Granger Causality at Wikipedia.

The post hoc ergo propter hoc fallacy is also known as “after this, therefore because of this”. It’s pretty clear today that Granger causality is not an adequate tool to infer causal relationships and this is one of the reasons that when X and Y are tested by the granger causality test, and an association is found, it’s said that X Granger-causes Y instead of saying that X causes Y. Maybe it’s not clear to you why the association between two variables and the notion that one always precedes the other is not enough to say that one is causing the other. One explanation for a hypothetical situation, for example, would be a third lurking variable C, also known as a confounder, that causes both events, a phenomenon known as confounding. By ignoring the existence of C (which in some contexts happens by design and is a strong assumption called unconfoundedness), you fail to realize that the events X and Y are actually independent when taking into consideration this third variable C, the confounder. Since you ignored it, they seem dependent, associated. A very famous and straight forward example is the positive correlation between (a) ice cream sales and death by drowning or (b) ice cream sales and homicide rate.


Best links of the week #47

Reading Time: 3 minutes

Best links of the week from 25th November to 1st December

This image has an empty alt attribute; its file name is meme-1.jpg


  1. Researchers Have Successfully Tricked A.I. Into Seeing The Wrong Things at PopSci.
  2. Fooling the machine at PopSci.
  3. Why isn’t confounding a statistical concept? at Judea Pearl’s discussion with readers.
  4. The impossibility of asymmetric causation at Judea Pearl’s discussion with readers.
  5. d-SEPARATION WITHOUT TEARS at Judea Pearl’s discussion with readers. There is an interactive adaptation from this at dagitty’s website here.
  6. An Illustration of Pearl’s Simpson Machine at dagitty.
  7. Do you think you know DAG terminology? This game can help you try your skills. There is also another game here for testing your knowledge on covariate roles and another one about Table 2 Fallacy. All this at dagitty.
  8.  On causality and decision trees at Judea Pearl’s discussion with readers.
  9. On causality and decision trees (cont.) at Judea Pearl’s discussion with readers.
  10. Back-door criterion and epidemiology at Judea Pearl’s discussion with readers.
  11. Indirect Effects at Judea Pearl’s discussion with readers.
  12. The meaning of counterfactuals at Judea Pearl’s discussion with readers.
  13. Has causality been defined? at Judea Pearl’s discussion with readers.
  14. The tidyverse for Machine Learning presentation by Bruna Wundervald at satRday São Paulo.
  15. Centrality measures as a proxy for causal influence? at Fabian Dablander‘s website.
  16. Garoto de 12 anos já trabalha como cientista de dados at Olhar Digital.
  17. CGU lança novo Painel Correição em Dados at CGU.


  1. Causality in Machine Learning 101 for Dummies like Me by Sangeet Moy Das at Towards Data Science.
  2. An introduction to Causal inference at Fabian Dablander‘s Blog.
  3. Spurious correlations and random walks at Fabian Dablander‘s Blog.
  4. Curve fitting and the Gaussian distribution at Fabian Dablander‘s Blog.
  5. In Review: Ten Great Ideas About Chance at Fabian Dablander‘s Blog.
  6. Using causal graphs to understand missingness and how to deal with it at Cookie Scientist.


  1. A network of science: 150 years of Nature papers at nature video‘s YouTube channel.
  2. ViennaR Meetup March 2019 | Hadley Wickham Tidy Data at Quantargo‘s YouTube channel.
  3. Causal Graphs by Julian Schüssler at MZES Methods Bites‘s YouTube channel.

Positions available

  1. Lecturer/Senior Lecturer/Reader in Media & Data Science at the University of Glasgow.
  2. Ph.D. fellowship in Machine Learning for Robot Manipulation at Bosch.
  3. Fully Funded Ph.D. position in AI and Machine Learning for mental well being at Örebro University.
  4. Research Assistant in Computer Vision and Deep Learning at Edge Hill University.
  5. Tenure Track ML Teaching Professor Position at UCSD.
  6. Post-doctoral fellowship (Genomics) at Instituto Tecnológico Vale.
  7. Data Science Vice President at Big Cloud.
  8. Director of Data Science at Ideal Team Consulting.
  9. Gerente de Governança e Arquitetura de Dados at Wiz.
  10. Senior Business Intelligence Analyst at SumUp.
  11. Data Architect – Restaurant Product at iFood.
  12. Lead Data Engineer at QuintoAndar.
  13. Software Engineer at Google.
  14. Senior SQL Server/ETL Developer at Cognizant.
  15. Data Architect D2- Lunch DFN at iFood.
    The next opportunities (30+) are reserved for readers registered in the newsletter. By having registered, you will receive updates on the posts in the blog!

Best links of the week #31

Best links of the week #29

Best links of the week #26

Best links of the week #18

Best links of the week #17

Reading Time: 2 minutes

Best links of the week from 29nd April to 5th May

Source: Dilbert.


  1. This Will Be The Biggest Disruption In Higher Education at Forbes.
  2. Dead Facebook users could outnumber living ones within 50 years at MIT Technology Review.
  3. To Build Truly Intelligent Machines, Teach Them Cause and Effect at QuantaMagazine.
  4. The Worlds largest listings of AI Conferences, Events and Meetups with the biggest collection of conference discount codes.
  5. 2nd International Summer School on Artificial Intelligence: From Deep Learning to Data Analytics.
  6. Microsoft launches a drag-and-drop machine learning tool at TechCrunch.
  7. Actively curated list of awesome BI tools at Jan Kyri’s GitHub.
  8. How much of human height is genetic and how much is due to nutrition at Scientific American.
  9. Announcing JupyterHub 1.0!
  10. I hate it that sometimes Jupyter notebooks don’t render properly (or take a long time to render) at GitHub. If you’ve faced similar situations, your solution is here!
  11. Cryptography That Can’t Be Hacked at QuantaMagazine.
  12. Hacker-Proof Code Confirmed at QuantaMagazine.
  13. “PUT DOWN THE DEEP LEARNING: When not to use neural networks (and what to do instead)”, a talk by Rachael Tatman. Code here.
  14. Socially-Stratified Validation for ML Fairness, another talk by Rachael Tatman.
  15. Google Books Ngram Viewer is a tool that displays a graph showing how phrases specified by you have occurred in a corpus of books through time.
  16. Why Generation Y Yuppies Are Unhappy at Wait But Why.
  17. Looking for data? You mean data? DATA? Yes, data, data and data!!
  18. Becoming a Data Scientist – Curriculum via Metromap at nirvacana.
  19. Demystifying Artificial Intelligence. What is Artificial Intelligence & explaining it from different dimensions at nirvacana.
  20. The weakening relationship between the Impact Factor and papers’ citations in the digital age at arXiv.org.
  21. Visualização GeoEspacial com R at Gabriel Sartori’s GitHub.

Best links of the week #14

Best links of the week #5

Reading Time: < 1 minute

Best links of the week from 4th February to 10th February.


  1. Como controlar o braço de outra pessoa com o poder da sua mente? at UOL.
  2. vidente is an R package I am currently writing to parse and analyze data from the Surveillance, Epidemiology and End Results (SEER) Program, which covers over 1/3 of the US population on cancer incidence and survival.
  3. Ciência de Dados com R is a book on Data Science using R at Instituto Brasileiro de Pesquisa e Análise de Dados.
  4. Data Science & Machine Learning Course at Ivanovitch Silva’s GitHub repository.
  5. A receita dos candidatos a deputado federal em 2018 at Nexo Jornal.
  6. AI 100: The Artificial Intelligence Startups Redefining Industries at CB Insights.
  7. The open-source and crowd sourced conference website.
  8. Ranking of IT conferences.

Best links of the week #3

Reading Time: 2 minutes

Best links of the week from 21th January to 27th January.


  1. The bioinformatics chat is a podcast about computational biology, bioinformatics, and next generation sequencing.
  2. Trilha de estudos de Data Science at fcqueiroz’s GitHub.
  3. Jonas Salk: Good at Virology, Bad at Economics at Slate.
  4. Why is Data Preprocessing required? at Ques10.
  5. What bioRxiv’s first 30,000 preprints reveal about biologists at Nature.
  6. What are some recent examples of Simpson’s Paradox in the media? at Quora.
  7. More funny examples of correlations at Bloomberg.
  8. If correlation doesn’t imply causation, then what does? at Data-Driven Intelligence.
  9. Berkson’s Paradox at Brilliant.
  10. Causal Inference & Paradoxes at IRT SystemX.
  11. 10 Biggest Struggles of PhD Students at INOMICS.
  12. Interesting Correlation does not imply causation article at Wikipedia.
  13. If two variables are independent, they will also be linearly independent, therefore their pearson product-moment correlation coefficient (or aka correlation) will be 0. However, the converse is not necessarily true, for they can be linearly independent but non-linearly dependent. However, it is sometimes mistakenly thought that linearly independence does imply independence when the two random variables are normally distributed. This is false though, because normally distributed and uncorrelated does not imply independent at Wikipedia.
  14. What is a Monotonic Relationship? at Statistics How To.
  15. One-Tailed and Two-Tailed Hypothesis Tests Explained at Statistics by Jim.
  16. What are the differences between one-tailed and two-tailed tests? (and many more interesting statistics questions here) at Institute for Digital Research and Education.
  17. Probability Tutorials at Probability.NET
  18. Google DataSet Search
  19. What Is An Intuitive Way To Understand Entropy? at Forbes.