Tag: data science

Causality, Data Science, R, tools, Uncategorized

Continuous Machine Learning – Part I

Reading time: 9 minutes
Image by Taras Tymoshchuck from here.

This is a 3 part series about Continuous Machine Learning. You can check Part II here and Part III here.

What is it?

Continuous Machine Learning (CML) follows the same concept of Continuous Integration and Continuous Delivery (CI/CD), famous concepts in Software Engineering / DevOps, but applied to Machine Learning and Data Science projects.

What is this post about?

I will cover a set of tools that can make your life as a Data Scientist much more interesting. We will use MIIC, a network inference algorithm, to infer the network of a famous dataset (alarm from bnlearn). We will then use (1) git to track our code, (2) DVC to track our dataset, outputs and pipeline, (3) we will use GitHub as a git remote and (4) Google Drive as a DVC remote. I’ve written a tutorial on managing Data Science projects with DVC, so if you’re interested on it open a tab here to check it later.

BestLinks

Best links of the week #73

Reading time: 2 minutes

Best links of the week from 6th July to 19th July

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Left-hand & right-hand side nomenclature in regression models at Cross Validated.
  2. Scientists invite 4,000 music fans to a live concert to assess spread of coronavirus at Classic FM.
  3. Solicitando dados via lei de acesso a informação at Escola de Dados.
  4. Doctor Penguin: Catch the Latest AI+Healthcare Research.
  5. R Weekly.

Blog posts

  1. The Difference between Linear and Nonlinear Regression Models at Statistics By Jim.
  2. Multicollinearity in Regression Analysis: Problems, Detection, and Solutions at Statistics By Jim.
  3. How To Interpret R-squared in Regression Analysis at Statistics By Jim.
  4. Check Your Residual Plots to Ensure Trustworthy Regression Results! at Statistics By Jim.
  5. Standard Error of the Regression vs. R-squared at Statistics By Jim.
  6. R-squared Is Not Valid for Nonlinear Regression at Statistics By Jim.
  7. How to Choose Between Linear and Nonlinear Regression at Statistics By Jim.
  8. Heteroscedasticity in Regression Analysis at Statistics By Jim.
BestLinks

Best links of the week #72

Reading time: 2 minutes

Best links of the week from 15th June to 5th July

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Os 11 melhores canais de Data Science no Telegram at Insight.
  2. Prove Your Grit in our Competitions at bitgrit.
  3. Confounding in epidemiological studies at Health Knowledge.
  4. Cochran–Mantel–Haenszel statistics at Wikipedia.
  5. University and college students, learn for free with Coursera!
  6. Determine the most significant overlap between subsets of two or three sorted lists with DynaVenn.
  7. Scholarly Community Encyclopedia.
  8. Category mistake at Wikipedia.
  9. Black swan theory at Wikipedia.
  10. Hindsight bias at Wikipedia.
  11. Cochran–Mantel–Haenszel statistics at Wikipedia.
  12. Pareidolia e Apofenia at Wikipedia.
  13. Levenshtein distance at Wikipedia.
  14. A periodic table of visualization methods at Visual Literacy.
  15. Why It’s Hard to Evaluate State Policies in the Pandemic at Penn LDI.

Blog posts

  1. A Gentle Introduction to Concept Drift in Machine Learning at Machine Learning Mastery.
  2. What is the difference between Bagging and Boosting? at QuantDare.
  3. A PRIMER TO ENSEMBLE LEARNING – BAGGING AND BOOSTING at Analytics India Mag
  4. O modelo de #SquadGoals do Spotify falhou. at Flavio Clesio’s Blog.
  5. That one weird third variable problem nobody ever mentions: Conditioning on a collider at the 100 CI.
  6. Why Statistics Don’t Capture The Full Extent Of The Systemic Bias In Policing at Five Thirty Eight.
  7. Why Is the Average Human Body Temperature Decreasing? at Science and Philosophy’s Medium.
  8. CITAÇÃO DE CITAÇÃO SEGUNDO AS REGRAS ABNT: ACABE COM SUAS DÚVIDAS! at Blog PPEC.
  9. FAPESP cria repositĂłrio de informaçÔes clĂ­nicas para subsidiar pesquisas sobre COVID-19 at AgĂȘncia FAPESP.

Videos

  1. The Super Mario Effect – Tricking Your Brain into Learning More by Mark Rober at TEDx Talks’ YouTube channel.
  2. Por que a concorrĂȘncia abre suas lojas perto das outras? at TED-Ed’s YouTube channel.

Podcast

  1. Causalidade na saĂșde at Dados e SaĂșde.
BestLinks, Data Science, R

Best links of the week #63

Reading time: 2 minutes

Best links of the week from 30th March to 5th April

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. MonitoraCovid-19 at Big Data Fiocruz.
  2. Livros Gratuitos da Springer at Marcus Nunes’ Blog.
  3. Painel de Leitos e Insumos dos estados brasileiros.
  4. See how your community is moving around differently due to COVID-19.
  5. Data extraction of Google’s COVID-19 Mobility Reports at vitorbaptista’s GitHub.
BestLinks

Best links of the week #52

Reading time: 3 minutes

Best links of the week from 13th January to 19th January

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Mapping China’s Global Development Footprint at aiddata.
  2. The Professional Data Science Manifesto.
  3. From Physics to Data Science by Martina Pugliese.
  4. “You and Your Research” by Richard Hamming at Gabriel Robins‘s website.
  5. Liverpool are using incredible data science during matches, and effects are extraordinary.
  6. Koch’s postulates at Wikipedia.
  7. Dados relativos ao pagamento do Bolsa FamĂ­lia.
  8. sensemakr R package.
BestLinks

Best links of the week #31

Reading time: 2 minutes

Best links of the week from 5th August to 11th August

Source: here.

Links

  1. randomizr: R Package for randomized experiments.
  2. Bayes’rule: Guide (course with several different levels).
  3. Extracting Brazilian schools census data with R at Fernando Barbalho’s gists.
  4. Download all data from DATASUS (several Brazilian health-related datasets) with R at Fernando Barbalho’s gists.
BestLinks

Best links of the week #29

Reading time: 3 minutes

Best links of the week from 22nd July to 28th July

Links

  1. Listen to people all over the world pronouncing the name of countries and capitals.
  2. Write a letter to the future!
  3. A Personal Journey into Bayesian Networks by Judea Pearl.
  4. An innovative way to publish at Nature.
  5. Here’s What Fruits And Vegetables Looked Like Before We Domesticated Them at Science Alert.
  6. Regression Sensitivity Analysis: the Robustness Value and the partial RÂČ, a shiny app by Carlos Cinelli.
  7. Do you need to normalize your input data for Random Forests and Neural Networks? (More on Random Forests here) at Data Science (Stack Exchange).
  8. Cumulative Variable Importance for Random Forest (RF) Models at Rich Pauloo’s Gists.
  9. Contributing to the R ecosystem by Colin Fay at SpeakerDeck.
  10. Entrevista: Por que homeopatia Ă© placebo – e nĂŁo deve ser paga pelo SUS at Super Interessante.
BestLinks

Best links of the week #27

Reading time: 2 minutes

Best links of the week from 8th July to 14th July

Source: here.

Links

  1. The “Rmd first” method: when projects start with documentation SĂ©bastien Rochette’s GitHub repository.
  2. goodpractice: Advice on R Package Building.
  3. Sampling (statistics) at Wikipedia.
  4. Bootstrapping (statistics) at Wikipedia.
  5. Jackknife resampling at Wikipedia.
  6. Bootstrap in R by Ɓukasz DeryƂo at DataCamp Tutorials.
  7. How can I generate Bootstrap statistics in R? at the FAQ of the Institute for Digital Research & Education (UCLA).
  8. How does R handle missing values? at the FAQ of the Institute for Digital Research & Education (UCLA).
  9. How does R handle overlapping object names? at the FAQ of the Institute for Digital Research & Education (UCLA).
  10. How can I test for contrasts in R? at the FAQ of the Institute for Digital Research & Education (UCLA).
  11. Explaining to laypeople why bootstrapping works at Cross Validated.
BestLinks

Best links of the week #21

Reading time: 2 minutes

Best links of the week from 27th May to 2nd June

Source: Geek Hero Cromic.

Links

  1. Samsung AI Can Turn a Single Portrait Into a Realistic Talking Head at PetaPixel.
  2. Let’s Encrypt (Free Certification Authority) at MLAIT.
  3. Public data from the French government.
  4. Paris opens a data center to control its digital infrastructure.
  5. genderBR is an R package that predicts gender from Brazilian first names using data from the Instituto Brasileiro de Geografia e Estatistica’s 2010 Census.
  6. Git Cherry Pick at Atlassian Git Tutorials.
  7. Refs and the Reflog at Atlassian Git Tutorials.
  8. Advanced Git log at Atlassian Git Tutorials.
  9. Merging vs. Rebasing at Atlassian Git Tutorials.
  10. Intro to Cherry Picking with Git at PreviousNext.
  11. O que faz o cientista de dados ser o profissional mais procurado pelos RHs? at StartSe.
  12. 8 habilidades indispensĂĄveis para cientistas de dados at CIO.
BestLinks, Data Science, R

Best links of the week #20

Reading time: < 1 minute

Best links of the week from 20th May to 26th May

Links

  1. UN (United Nations) data.
  2. A curated list of 200+ blogs related to Data Science at CybrHome.
  3. 25 Excellent Machine Learning Open Datasets.
  4. Group Chats Are Making the Internet Fun Again at Intelligencer.
  5. Do anything with dplyr.
  6. Starting out with R at Credibly Curious.