# Tag: causality

## Continuous Machine Learning – Part II

This is a 3-part series about Continuous Machine Learning. You can check Part I here and Part III here. This post is a continuation of the previous one, in which we initiated our experience on automating Data Science in GitHub with CML. We will basically make use of Docker to improve the computation time in our GitHub Actions checks.

You can think of a Docker image as taking a snapshot of the software environment of a project, and then being able to setup that snapshot on any other computer. When GitHub Actions is called, it loads your Docker image in their infrastructure and then runs your code. That’s why it’s quicker, because when you use a Docker container with your dependencies already installed, you don’t have to spend time setting them up all over again on your GitHub Actions runner every time it is triggered, which is the way we did in the first part of this series.

## Continuous Machine Learning – Part I

This is a 3 part series about Continuous Machine Learning. You can check Part II here and Part III here.

## What is it?

Continuous Machine Learning (CML) follows the same concept of Continuous Integration and Continuous Delivery (CI/CD), famous concepts in Software Engineering / DevOps, but applied to Machine Learning and Data Science projects.

## What is this post about?

I will cover a set of tools that can make your life as a Data Scientist much more interesting. We will use MIIC, a network inference algorithm, to infer the network of a famous dataset (alarm from bnlearn). We will then use (1) git to track our code, (2) DVC to track our dataset, outputs and pipeline, (3) we will use GitHub as a git remote and (4) Google Drive as a DVC remote. I’ve written a tutorial on managing Data Science projects with DVC, so if you’re interested on it open a tab here to check it later.

## Spurious Independence: is it real?

### First things first: Spurious Dependence

Depending on your background, you have already heard of spurious dependence in a way or another. It goes by the names of spurious association, spurious dependence, the famous quote “correlation does not imply causation” and also other versions based on the same idea that you can not say that $X$ necessarily causes $Y$ (or vice versa) solely because $X$ and $Y$ are associated, that is, because they tend to occur together. Even if one of the events always happens before the other, let’s say $X$ preceding $Y$, still, you can not say that $X$ causes $Y$. There is a statistical test very famous in economics known as Granger causality.

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969.[1] Ordinarily, regressions reflect “mere” correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of “true causality” is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only “predictive causality”.

Granger Causality at Wikipedia.

The post hoc ergo propter hoc fallacy is also known as “after this, therefore because of this”. It’s pretty clear today that Granger causality is not an adequate tool to infer causal relationships and this is one of the reasons that when $X$ and $Y$ are tested by the granger causality test, and an association is found, it’s said that $X$ Granger-causes $Y$ instead of saying that $X$ causes $Y$. Maybe it’s not clear to you why the association between two variables and the notion that one always precedes the other is not enough to say that one is causing the other. One explanation for a hypothetical situation, for example, would be a third lurking variable $C$, also known as a confounder, that causes both events, a phenomenon known as confounding. By ignoring the existence of $C$ (which in some contexts happens by design and is a strong assumption called unconfoundedness), you fail to realize that the events $X$ and $Y$ are actually independent when taking into consideration this third variable $C$, the confounder. Since you ignored it, they seem dependent, associated. A very famous and straight forward example is the positive correlation between (a) ice cream sales and death by drowning or (b) ice cream sales and homicide rate.

## Best links of the week #47

### Best links of the week from 25th November to 1st December

1. Researchers Have Successfully Tricked A.I. Into Seeing The Wrong Things at PopSci.
2. Fooling the machine at PopSci.
3. Why isn’t confounding a statistical concept? at Judea Pearl’s discussion with readers.
4. The impossibility of asymmetric causation at Judea Pearl’s discussion with readers.
5. d-SEPARATION WITHOUT TEARS at Judea Pearl’s discussion with readers. There is an interactive adaptation from this at dagitty’s website here.
6. An Illustration of Pearl’s Simpson Machine at dagitty.
7. Do you think you know DAG terminology? This game can help you try your skills. There is also another game here for testing your knowledge on covariate roles and another one about Table 2 Fallacy. All this at dagitty.
8.  On causality and decision trees at Judea Pearl’s discussion with readers.
9. On causality and decision trees (cont.) at Judea Pearl’s discussion with readers.
10. Back-door criterion and epidemiology at Judea Pearl’s discussion with readers.
11. Indirect Effects at Judea Pearl’s discussion with readers.
12. The meaning of counterfactuals at Judea Pearl’s discussion with readers.
13. Has causality been defined? at Judea Pearl’s discussion with readers.
14. The tidyverse for Machine Learning presentation by Bruna Wundervald at satRday São Paulo.
15. Centrality measures as a proxy for causal influence? at Fabian Dablander‘s website.
16. Garoto de 12 anos já trabalha como cientista de dados at Olhar Digital.
17. CGU lança novo Painel Correição em Dados at CGU.

#### Videos

1. A network of science: 150 years of Nature papers at nature video‘s YouTube channel.
2. ViennaR Meetup March 2019 | Hadley Wickham Tidy Data at Quantargo‘s YouTube channel.
3. Causal Graphs by Julian Schüssler at MZES Methods Bites‘s YouTube channel.

#### Positions available

1. Lecturer/Senior Lecturer/Reader in Media & Data Science at the University of Glasgow.
2. Ph.D. fellowship in Machine Learning for Robot Manipulation at Bosch.
3. Fully Funded Ph.D. position in AI and Machine Learning for mental well being at Örebro University.
4. Research Assistant in Computer Vision and Deep Learning at Edge Hill University.
5. Tenure Track ML Teaching Professor Position at UCSD.
6. Post-doctoral fellowship (Genomics) at Instituto Tecnológico Vale.
7. Data Science Vice President at Big Cloud.
8. Director of Data Science at Ideal Team Consulting.
9. Gerente de Governança e Arquitetura de Dados at Wiz.
10. Senior Business Intelligence Analyst at SumUp.
11. Data Architect – Restaurant Product at iFood.
12. Lead Data Engineer at QuintoAndar.
14. Senior SQL Server/ETL Developer at Cognizant.
15. Data Architect D2- Lunch DFN at iFood.

## Best links of the week #17

### Best links of the week from 29nd April to 5th May

1. This Will Be The Biggest Disruption In Higher Education at Forbes.
2. Dead Facebook users could outnumber living ones within 50 years at MIT Technology Review.
3. To Build Truly Intelligent Machines, Teach Them Cause and Effect at QuantaMagazine.
4. The Worlds largest listings of AI Conferences, Events and Meetups with the biggest collection of conference discount codes.
5. 2nd International Summer School on Artificial Intelligence: From Deep Learning to Data Analytics.
6. Microsoft launches a drag-and-drop machine learning tool at TechCrunch.
7. Actively curated list of awesome BI tools at Jan Kyri’s GitHub.
8. How much of human height is genetic and how much is due to nutrition at Scientific American.
9. Announcing JupyterHub 1.0!
10. I hate it that sometimes Jupyter notebooks don’t render properly (or take a long time to render) at GitHub. If you’ve faced similar situations, your solution is here!
11. Cryptography That Can’t Be Hacked at QuantaMagazine.
12. Hacker-Proof Code Confirmed at QuantaMagazine.
13. “PUT DOWN THE DEEP LEARNING: When not to use neural networks (and what to do instead)”, a talk by Rachael Tatman. Code here.
14. Socially-Stratified Validation for ML Fairness, another talk by Rachael Tatman.
15. Google Books Ngram Viewer is a tool that displays a graph showing how phrases specified by you have occurred in a corpus of books through time.
16. Why Generation Y Yuppies Are Unhappy at Wait But Why.
17. Looking for data? You mean data? DATA? Yes, data, data and data!!
18. Becoming a Data Scientist – Curriculum via Metromap at nirvacana.
19. Demystifying Artificial Intelligence. What is Artificial Intelligence & explaining it from different dimensions at nirvacana.
20. The weakening relationship between the Impact Factor and papers’ citations in the digital age at arXiv.org.
21. Visualização GeoEspacial com R at Gabriel Sartori’s GitHub.