Best links of the week #62

Reading time: 2 minutes

Best links of the week from 23rd March to 29th March

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Share Code from any Device.
  2. How Fast Does a Virus Spread? Let’s Do the Math at WIRED.
  3. The Promising Math Behind ‘Flattening the Curve’ at WIRED.
  4. SAS libera mais de 100 cursos online de análise de dados at CanalTech.
  5. Dashboard do Ministério da Saúde para a COVID-19.
  6. Dashboard do governo Frânces para a COVID-19.
  7. How many tests for COVID-19 are being performed around the world? at Our World in Data.
  8. Corona Data Scraper.
  9. Why the ‘gold standard’ of medical research is no longer enough at statnews.
  10. Coughona: Identifying Coronavirus based on cough noise.
  11. Graph theory suggests COVID-19 might be a ‘small world’ after all at ZDNet.
  12. Case control study by Timiresh Das at SlideShare.

Best links of the week #61

Reading time: 2 minutes

Best links of the week from 16th March to 22nd March

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Microsoft COVID-19 Tracker.
  2. Auto-avaliação de Coronavírus (COVID-19).
  3. Comparing Corona Trajectories.
  4. Modeling COVID-19 Spread vs Healthcare Capacity.
  5. Current covid-19 situation.
  6. A Crash Course in Good and Bad Controls.
  7. An Intuitive (and Short) Explanation of Bayes’ Theorem at Better Explained.
  8. False Positives and False Negatives at Math Is Fun.
  9. Matthews correlation coefficient at the Wikipedia.
  10. Demythifying Matthew Correlation Coefficients (MCC) at Kaggle.

Best links of the week #60

Reading time: 3 minutes

Best links of the week from 9th March to 15th March

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Testes Diagnósticos (Bioestatística) at Prof. Dr. Enrico Colosimo personal website.
  2. Bayes’ Rule for Clinicians: An Introduction published in Frontiers in Psychology.
  3. Zencastr: High fidelity podcasting.
  4. Do you (want to) use RAIS, the Brazilian matched employer-employee dataset? Let me save you some months of painful data cleaning work.
  5. Extrator de dados históricos do coronavírus no Brasil.
  6. Projetos com dados abertos no Github.
  7. Quer acessar dados do INEP, mas não sabe como?
  8. Visualizing the History of Pandemics.
  9. COVID-19 Open Research Dataset (CORD-19).
  10. Modeling COVID-19 Spread vs Healthcare Capacity.
  11. Tech4Covid19: em 48 horas, 600 portugueses juntaram-se para criar 12 ferramentas contra o coronavírus.
  12. slide transitions for xaringan.
  13. What Great Data Analysts Do — and Why Every Organization Needs Them at Harvard Business Review.

Best links of the week #59

Reading time: 2 minutes

Best links of the week from 2nd March to 8th March

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Understanding Bayesian Networks with Examples in R at bnlearn.
  2. Rockefeller, Mastercard team up to leverage data science for social impact at devex.
  3. Introduction to ggridges.
  4. Dados do mercado de trabalho para Cientista de Dados.
  5. Desmistificando a inteligência artificial – onde estamos e para onde vamos? (1/5) at Deviante.

Manage your Data Science Project in R

Reading time: 9 minutes

A simple project tutorial with R/RMarkdown, Packrat, Git, and DVC.

Source: Here.

The pain of managing a Data Science project

Something has been bothering me for a while: Reproducibility and data tracking in data science projects. I have read about some technologies but had never really tried any of them out until recently when I couldn’t stand this feeling of losing track of my analyses anymore. At some point, I decided to give DVC a try after some friends, mostly Flávio Clésio, suggested it to me. In this post, I will talk about Git, DVC, R, RMarkdown and Packrat, everything I think you may need to manage your Data Science project, but the focus is definitely on DVC.

Best links of the week #58

Reading time: 2 minutes

Best links of the week from 24th February to 1st March

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Architects of Intelligence: The truth about AI from the people building it (Interview with Judea Pearl).
  2. Git Command Explorer.
  3. Git Cheat Sheet by Git Tower.
  4. Asymptotic normality and Central Limit Theorem at Stack Exchange (Mathematics).
  5. Asymptotic theory (statistics) at Wikipedia.
  6. Get Started with DVC at DVC.
  7. Introduction to Data Version Control(DVC) by Kurian Benoy at Kaggle.
  8. The most important formula in data science was first used to prove the existence of God at QUARTZ.
  9. Difference between R MarkDown and R NoteBook at Stack Overflow.
  10. A group of ex-NSA and Amazon engineers are building a ‘GitHub for data’ at TechCrunch.
  11. A saga do herói… ou melhor, do software! by Bruna Diirr at Portal Deviante.
  12. A saga do herói… Ou melhor, do software! – Capítulo 1: Levantamento de requisitos by Bruna Diirr at Portal Deviante.
  13. A saga do herói… Ou melhor, do software! – Capítulo 2: Análise by Bruna Diirr at Portal Deviante.
  14. Processos de software by Bruna Diirr at Portal Deviante.
  15. Somos todos conscientes? O alvorecer da consciência com a destruição da mente bicameral by Felipe Novaes at Portal Deviante.
  16. Atacando a ciência, por lucro ou diversão: as armas mais comuns at Revista Questão de Ciência.

Best links of the week #57

Reading time: 2 minutes

Best links of the week from 17th February to 23rd February

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Luna. A WYSIWYG language for data processing.
  2. 40 powerful concepts for understanding the world at Gurwinder Bhogal’s Twitter.
  3. Mathematics for Machine Learning (Book).
  4. Fairness and machine learning (Book).
  5. Mathematicians propose new way of using neural networks to work with noisy, high-dimensional data at Phys.org.
  6. Neuroscience opens the black box of artificial intelligence at TechXplore.
  7. What are the differences between Factor Analysis and Principal Component Analysis? at Stack Exchange.
  8. How does Factor Analysis explain the covariance while PCA explains the variance? at Stack Exchange.
  9. Overview of Lord’s, Simpson’s and Birth Weight paradox at Michael Clark’s website.
  10. Em decisão inédita ANVISA libera terapias alterantivas no Brasil.
  11. “Síndrome do sobrinho” na divulgação científica at Revista Questão de Ciência.

Best links of the week #56

Reading time: 2 minutes

Best links of the week from 10th February to 16th February

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Samsung deve anunciar “teclado invisível” com IA nesta semana at InfoMoney.
  2. Poster with all the RNA sequencing methods at Illumina.
  3. Poster with all the sequencing methods at Illumina.
  4. Hidden Computational Power Found in the Arms of Neurons at QuantaMagazine.
  5. Petition to allow remote paper & poster presentations at scientific conferences.
  6. Why People Quit Their Jobs at Harvard Business Review.
  7. ICML Introduction to Bandits: Algorithms and Theory.
  8. 3000 Free medical images to illustrate your publications and Powerpoint presentations at smart servier medical art.
  9. Genomics Education Programme (Pictures on Genomics)
  10. Create Professional Science Figures in Minutes at BioRender.
  11. Interview with Prof. Clark Glymour at 3AM.

Best links of the week #55

Reading time: 3 minutes

Best links of the week from 3rd February to 9th February

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Collection of 79 attributes from Brazilian Cities at Giovanna Badaró‘s GitHub.
  2. Do androids dream of big data domination? at Times Higher Education.
  3. Robot “Scientist” Helps Discover New Ingredient for Antimalarial Drug at Futurism.
  4. Is your Research Software Correct? at Mike Croucher’s GitHub page.
  5. The long road to fairer algorithms at Nature.
  6. Someone Used Neural Networks To Upscale An 1895 Film To 4K 60 FPS, And The Result Is Really Quite Astounding at Digg.
  7. List Of 20 Common Thesis Defense Questions You Should Be Prepared For at ARTISTSWITHAVISION.
  8. People will not trust unkind science by Gail Cardew at Nature.
  9. Scientific method: Statistical errors by Regina Nuzzo at Nature.
  10. Psychology journal bans P values by Chris Woolston at Nature.
  11. Top tips for getting your science out there by Craig Cormick at Nature.

Spurious Independence: is it real?

Reading time: 14 minutes

First things first: Spurious Dependence

Depending on your background, you have already heard of spurious dependence in a way or another. It goes by the names of spurious association, spurious dependence, the famous quote “correlation does not imply causation” and also other versions based on the same idea that you can not say that X necessarily causes Y (or vice versa) solely because X and Y are associated, that is, because they tend to occur together. Even if one of the events always happens before the other, let’s say X preceding Y, still, you can not say that X causes Y. There is a statistical test very famous in economics known as Granger causality.

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969.[1] Ordinarily, regressions reflect “mere” correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of “true causality” is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only “predictive causality”.

Granger Causality at Wikipedia.

The post hoc ergo propter hoc fallacy is also known as “after this, therefore because of this”. It’s pretty clear today that Granger causality is not an adequate tool to infer causal relationships and this is one of the reasons that when X and Y are tested by the granger causality test, and an association is found, it’s said that X Granger-causes Y instead of saying that X causes Y. Maybe it’s not clear to you why the association between two variables and the notion that one always precedes the other is not enough to say that one is causing the other. One explanation for a hypothetical situation, for example, would be a third lurking variable C, also known as a confounder, that causes both events, a phenomenon known as confounding. By ignoring the existence of C (which in some contexts happens by design and is a strong assumption called unconfoundedness), you fail to realize that the events X and Y are actually independent when taking into consideration this third variable C, the confounder. Since you ignored it, they seem dependent, associated. A very famous and straight forward example is the positive correlation between (a) ice cream sales and death by drowning or (b) ice cream sales and homicide rate.