Tag: statistics

BestLinks

Best links of the week #79

Reading time: 2 minutes

Best links of the week from 14th September to 27th September

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Painel de análise do excesso de mortalidade por causas naturais no Brasil em 2020 at CONASS.
  2. iDiscover.
  3. Several dashboards related to COVID19 in Brazil by IBGE.
  4. Divulgação de Candidaturas e Contas Eleitorais.
  5. The Cognitive Bias Index at WikiMedia.
  6. Loki’s Wager and The Merchant of Venice at Wikipedia.
  7. GAN School by Junior Koch.
  8. Fato ou Fake COVID-19.
BestLinks, Causality, Data Science, R

Best links of the week #76

Reading time: 2 minutes

Best links of the week from 24th August to 30th August

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Difference-in-Difference Estimation at Columbia Public Health.
  2. Cluster Analysis Using K-Means at Columbia Public Health.
  3. Discrete Choice Analysis at Columbia Public Health.
  4. Exploratory Factor Analysis at Columbia Public Health.
  5. Instrumental Variables at Columbia Public Health.
  6. Spline Regression at Columbia Public Health.
  7. Principal Components Analysis at Columbia Public Health.
  8. Propensity Score at Columbia Public Health.
  9. Inverse Probability Weighting at Columbia Public Health.
  10. Path Analysis at Columbia Public Health.
  11. Probabilistic Sensitivity Analysis of Misclassification at Columbia Public Health.
  12. Markov Chain Monte Carlo at Columbia Public Health.
  13. Raio X dos Municípios.
  14. Correlation or causation? Mathematics can finally give us an answer at NewScientist.
  15. A new kind of logic: How to upgrade the way we think at NewScientist.
  16. An Unpredictable Universe: A Deep Dive Into Chaos Theory at Space.
  17. Introduction to Causal Inference Course by Brady Neal.

Blog posts

  1. Data versus Science: Contesting the Soul of Data-Science at Causal Analysis in Theory and Practice.
  2. Race, COVID Mortality, and Simpson’s Paradox at Causal Analysis in Theory and Practice.
  3. What Statisticians Want to Know about Causal Inference and The Book of Why at Causal Analysis in Theory and Practice.

Videos

  1. The Science Behind the Butterfly Effect at Veritasium.
BestLinks

Best links of the week #73

Reading time: 2 minutes

Best links of the week from 6th July to 19th July

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Left-hand & right-hand side nomenclature in regression models at Cross Validated.
  2. Scientists invite 4,000 music fans to a live concert to assess spread of coronavirus at Classic FM.
  3. Solicitando dados via lei de acesso a informação at Escola de Dados.
  4. Doctor Penguin: Catch the Latest AI+Healthcare Research.
  5. R Weekly.

Blog posts

  1. The Difference between Linear and Nonlinear Regression Models at Statistics By Jim.
  2. Multicollinearity in Regression Analysis: Problems, Detection, and Solutions at Statistics By Jim.
  3. How To Interpret R-squared in Regression Analysis at Statistics By Jim.
  4. Check Your Residual Plots to Ensure Trustworthy Regression Results! at Statistics By Jim.
  5. Standard Error of the Regression vs. R-squared at Statistics By Jim.
  6. R-squared Is Not Valid for Nonlinear Regression at Statistics By Jim.
  7. How to Choose Between Linear and Nonlinear Regression at Statistics By Jim.
  8. Heteroscedasticity in Regression Analysis at Statistics By Jim.
BestLinks

Best links of the week #70

Reading time: 2 minutes

Best links of the week from 18th May to 7th June

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Here are 450 Ivy League courses you can take online right now for free at FreeCodeCamp.
  2. Our weird behavior during the pandemic is messing with AI models at MIT Technology Review.
  3. microdatasus Python Package.
  4. Meet xaringan: Making slides in R Markdown by Alison Hill at Advanced R Markdown Workshop.
  5. How to Make Slides in R by Zhi Yang.
  6. Recall bias at Wikipedia.
  7. Confidence Interval cartoon at xkcd.

Blog posts

  1. Machine Learning is too easy at John Langford’s Blog.
  2. Naive Bayes for Dummies; A Simple Explanation at Data Science Central.
  3. Support Vector Machines for dummies; A Simple Explanation at Aylien’s Blog.
  4. Everything You Wanted to Know about the Kernel Trick (But Were Too Afraid to Ask) at Eric Kim’s Blog.
  5. Machina Machinae Lupus est? at Portal Deviante.

Videos

  1. Ten Craziest Things Cells Do by Wallace Marshall at iBiology’s YouTube channel.
  2. O mundo a partir do coronavírus, ed. 09 | Modelos computacionais e isolamento social social at Academia Brasileira de Ciências.
BestLinks

Best links of the week #62

Reading time: 2 minutes

Best links of the week from 23rd March to 29th March

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Share Code from any Device.
  2. How Fast Does a Virus Spread? Let’s Do the Math at WIRED.
  3. The Promising Math Behind ‘Flattening the Curve’ at WIRED.
  4. SAS libera mais de 100 cursos online de análise de dados at CanalTech.
  5. Dashboard do Ministério da Saúde para a COVID-19.
  6. Dashboard do governo Frânces para a COVID-19.
  7. How many tests for COVID-19 are being performed around the world? at Our World in Data.
  8. Corona Data Scraper.
  9. Why the ‘gold standard’ of medical research is no longer enough at statnews.
  10. Coughona: Identifying Coronavirus based on cough noise.
  11. Graph theory suggests COVID-19 might be a ‘small world’ after all at ZDNet.
  12. Case control study by Timiresh Das at SlideShare.
Causality, Data Science, PhD, R

Spurious Independence: is it real?

Reading time: 14 minutes

First things first: Spurious Dependence

Depending on your background, you have already heard of spurious dependence in a way or another. It goes by the names of spurious association, spurious dependence, the famous quote “correlation does not imply causation” and also other versions based on the same idea that you can not say that X necessarily causes Y (or vice versa) solely because X and Y are associated, that is, because they tend to occur together. Even if one of the events always happens before the other, let’s say X preceding Y, still, you can not say that X causes Y. There is a statistical test very famous in economics known as Granger causality.

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969.[1] Ordinarily, regressions reflect “mere” correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of “true causality” is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only “predictive causality”.

Granger Causality at Wikipedia.

The post hoc ergo propter hoc fallacy is also known as “after this, therefore because of this”. It’s pretty clear today that Granger causality is not an adequate tool to infer causal relationships and this is one of the reasons that when X and Y are tested by the granger causality test, and an association is found, it’s said that X Granger-causes Y instead of saying that X causes Y. Maybe it’s not clear to you why the association between two variables and the notion that one always precedes the other is not enough to say that one is causing the other. One explanation for a hypothetical situation, for example, would be a third lurking variable C, also known as a confounder, that causes both events, a phenomenon known as confounding. By ignoring the existence of C (which in some contexts happens by design and is a strong assumption called unconfoundedness), you fail to realize that the events X and Y are actually independent when taking into consideration this third variable C, the confounder. Since you ignored it, they seem dependent, associated. A very famous and straight forward example is the positive correlation between (a) ice cream sales and death by drowning or (b) ice cream sales and homicide rate.

BestLinks

Best links of the week #43

Reading time: 3 minutes

Best links of the week from 28th October to 3rd November

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Several links related to governmental data in Brazil at Colaboradados.
  2. What They Forgot to Teach You About R by Jennifer Bryan and Jim Hester.
  3. Joplin, an open-source note taking and to-do application with synchronization capabilities.
  4. Typora, a truly minimal markdown editor (goes hand in hand with Joplin.
  5. How to Turn Vim Into an IDE for R at Kade Killary‘s Medium.
  6. Otter.ai: 600 minutes of free transcription per month!
  7. Free Online Image Editor.
  8. How Millennial Parents Are Embracing Health and Wellness Technologies for Their Generation Alpha Kids at IEEE Transmitter.
  9. Giving algorithms a sense of uncertainty could make them more ethical at MIT Technology Review.
  10. What would a PhD graduate advise a new PhD student? at Quora.
  11. Por que o novo filme do ‘Exterminador do Futuro’ está irritando pesquisadores de inteligência artificial at BBC News Brasil.
  12. Entrevista: Por que homeopatia é placebo – e não deve ser paga pelo SUS at Super Interessante.
BestLinks

Best links of the week #31

Reading time: 2 minutes

Best links of the week from 5th August to 11th August

Source: here.

Links

  1. randomizr: R Package for randomized experiments.
  2. Bayes’rule: Guide (course with several different levels).
  3. Extracting Brazilian schools census data with R at Fernando Barbalho’s gists.
  4. Download all data from DATASUS (several Brazilian health-related datasets) with R at Fernando Barbalho’s gists.
BestLinks

Best links of the week #30

Reading time: 2 minutes

Best links of the week from 29th July to 4th August

Source: here.

Links

  1. Some interesting shiny apps at Tychobra.
  2. Learn git branching!
  3. Learn vim.
  4. rThreeJS R Package.
  5. Difference Between Covariance and Correlation at Key Differences.
  6. Variance vs. Covariance: What’s the Difference? at Investopedia.
  7. Difference Between Correlation and Regression at Key Differences.
  8. Difference Between Parametric and Nonparametric Test at Key Differences.
  9. Preferential attachment at Wikipedia.
  10. Voice automated shiny app (example here) at Yihui Xie’s GitHub.
  11. Webcam (face) automated shiny app (example here) at Yihui Xie’s GitHub.
  12. Xaringan (presentation on xaringan here) at Yihui Xie’s GitHub.
  13. Learn R fast with fasteR!
  14. We’re told that too much screen time hurts our kids. Where’s the evidence? at The Guardian.
  15. pagedown: Creating beautiful PDFs with R Markdown and CSS at rstudio::conf 2019 website.
  16. Por que cientistas precisam ser também bons comunicadores at NEXO Jornal.
  17. Portugal cria visto especial para atrair profissionais de TI brasileiros at Folha de São Paulo.
BestLinks

Best links of the week #29

Reading time: 3 minutes

Best links of the week from 22nd July to 28th July

Links

  1. Listen to people all over the world pronouncing the name of countries and capitals.
  2. Write a letter to the future!
  3. A Personal Journey into Bayesian Networks by Judea Pearl.
  4. An innovative way to publish at Nature.
  5. Here’s What Fruits And Vegetables Looked Like Before We Domesticated Them at Science Alert.
  6. Regression Sensitivity Analysis: the Robustness Value and the partial R², a shiny app by Carlos Cinelli.
  7. Do you need to normalize your input data for Random Forests and Neural Networks? (More on Random Forests here) at Data Science (Stack Exchange).
  8. Cumulative Variable Importance for Random Forest (RF) Models at Rich Pauloo’s Gists.
  9. Contributing to the R ecosystem by Colin Fay at SpeakerDeck.
  10. Entrevista: Por que homeopatia é placebo – e não deve ser paga pelo SUS at Super Interessante.