Tag: machine learning

Causality, Data Science, R, tools, Uncategorized

Continuous Machine Learning – Part I

Reading time: 9 minutes
Image by Taras Tymoshchuck from here.

This is a 3 part series about Continuous Machine Learning. You can check Part II here and Part III here.

What is it?

Continuous Machine Learning (CML) follows the same concept of Continuous Integration and Continuous Delivery (CI/CD), famous concepts in Software Engineering / DevOps, but applied to Machine Learning and Data Science projects.

What is this post about?

I will cover a set of tools that can make your life as a Data Scientist much more interesting. We will use MIIC, a network inference algorithm, to infer the network of a famous dataset (alarm from bnlearn). We will then use (1) git to track our code, (2) DVC to track our dataset, outputs and pipeline, (3) we will use GitHub as a git remote and (4) Google Drive as a DVC remote. I’ve written a tutorial on managing Data Science projects with DVC, so if you’re interested on it open a tab here to check it later.


Best links of the week #72

Reading time: 2 minutes

Best links of the week from 15th June to 5th July

This image has an empty alt attribute; its file name is meme-1.jpg


  1. Os 11 melhores canais de Data Science no Telegram at Insight.
  2. Prove Your Grit in our Competitions at bitgrit.
  3. Confounding in epidemiological studies at Health Knowledge.
  4. Cochran–Mantel–Haenszel statistics at Wikipedia.
  5. University and college students, learn for free with Coursera!
  6. Determine the most significant overlap between subsets of two or three sorted lists with DynaVenn.
  7. Scholarly Community Encyclopedia.
  8. Category mistake at Wikipedia.
  9. Black swan theory at Wikipedia.
  10. Hindsight bias at Wikipedia.
  11. Cochran–Mantel–Haenszel statistics at Wikipedia.
  12. Pareidolia e Apofenia at Wikipedia.
  13. Levenshtein distance at Wikipedia.
  14. A periodic table of visualization methods at Visual Literacy.
  15. Why It’s Hard to Evaluate State Policies in the Pandemic at Penn LDI.

Blog posts

  1. A Gentle Introduction to Concept Drift in Machine Learning at Machine Learning Mastery.
  2. What is the difference between Bagging and Boosting? at QuantDare.
  4. O modelo de #SquadGoals do Spotify falhou. at Flavio Clesio’s Blog.
  5. That one weird third variable problem nobody ever mentions: Conditioning on a collider at the 100 CI.
  6. Why Statistics Don’t Capture The Full Extent Of The Systemic Bias In Policing at Five Thirty Eight.
  7. Why Is the Average Human Body Temperature Decreasing? at Science and Philosophy’s Medium.
  9. FAPESP cria repositório de informações clínicas para subsidiar pesquisas sobre COVID-19 at Agência FAPESP.


  1. The Super Mario Effect – Tricking Your Brain into Learning More by Mark Rober at TEDx Talks’ YouTube channel.
  2. Por que a concorrência abre suas lojas perto das outras? at TED-Ed’s YouTube channel.


  1. Causalidade na saúde at Dados e Saúde.

Best links of the week #31

Reading time: 2 minutes

Best links of the week from 5th August to 11th August

Source: here.


  1. randomizr: R Package for randomized experiments.
  2. Bayes’rule: Guide (course with several different levels).
  3. Extracting Brazilian schools census data with R at Fernando Barbalho’s gists.
  4. Download all data from DATASUS (several Brazilian health-related datasets) with R at Fernando Barbalho’s gists.

Best links of the week #26

Reading time: 2 minutes

Best links of the week from 1st July to 7th July

Source: here.


  1. Open Source Guides
  2. Accuracy paradox at Wikipedia.
  3. Lucas critique, Goodhart’s law and Campbell’s law at Wikipedia.
  4. 40 Artificial Intelligence Interview Questions & Answers at Vipul Patel’s LinkedIn.
  5. State of AI Report 2019 at state.ai.
  6. Gramr add-in for RStudio at ROpenScilabs’s GitHub Repository.
  7. Chega a São Paulo a École 42, escola francesa que ensina programação sem cobrar nada at Época Negócios.
  8. Seleção de desafios para o Hackathon em Saúde 2019 at ICICT.
  9. Indicadores criminais divulgados oficialmente pela Secretaria de Segurança Pública e Defesa Social (SSPDS) do Ceará. Data!!!
  10. Drone com projetor consegue enganar IA de carro at Olhar Digital.
  11. Curso de Data Science em Português at sn3fu’s GitHub.

Best links of the week #24

Reading time: 2 minutes

Best links of the week from 17th June to 23rd June

Source here.


  1. Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained at Oleksii Trekhleb’s GitHub.
  2. TOP 100 R tutorials for beginners at Listen Data.
  3. Best ZSH theme (powerlevel9k) 🙂 at Ben Hilburn’s GitHub.
  4. Discover and install useful RStudio addins at Dean Attali’s GitHub.
  5. Reticulate: R Interface to Python. RMarkdown with Python and R together? Voilà!
  6. Create interactive timeline visualizations in R at Dean Attali’s GitHub.
  7. Building Shiny apps – an interactive tutorial at Dean Attali’s Blog.
  8. Globo recruta para treinamento em ciência de dados com chance de emprego at EXAME.

Best links of the week #22

Reading time: 2 minutes

Best links of the week from 3rd June to 9th June


  1. Visual Genome.
  2. Voice Changer.
  3. Royalty-free musics here, here and here.
  4. Online machine learning at Wikipedia.
  5. MLflow.
  6. Campanha busca 30 profissionais de TI no Brasil para trabalhar em Portugal.
  7. França proíbe predição baseada em sua jurisprudência, com pena de até 5 anos de prisão at Jusbrasil.
  8. Gerador de nome de Podcast.
  9. Salesforce compra empresa de big data por US$15 bi at Terra.

Best links of the week #21

Reading time: 2 minutes

Best links of the week from 27th May to 2nd June

Source: Geek Hero Cromic.


  1. Samsung AI Can Turn a Single Portrait Into a Realistic Talking Head at PetaPixel.
  2. Let’s Encrypt (Free Certification Authority) at MLAIT.
  3. Public data from the French government.
  4. Paris opens a data center to control its digital infrastructure.
  5. genderBR is an R package that predicts gender from Brazilian first names using data from the Instituto Brasileiro de Geografia e Estatistica’s 2010 Census.
  6. Git Cherry Pick at Atlassian Git Tutorials.
  7. Refs and the Reflog at Atlassian Git Tutorials.
  8. Advanced Git log at Atlassian Git Tutorials.
  9. Merging vs. Rebasing at Atlassian Git Tutorials.
  10. Intro to Cherry Picking with Git at PreviousNext.
  11. O que faz o cientista de dados ser o profissional mais procurado pelos RHs? at StartSe.
  12. 8 habilidades indispensáveis para cientistas de dados at CIO.

Best links of the week #18

Reading time: 3 minutes

Best links of the week from 6th May to 12th May

Source: xkcd.


  1. NextJournal, Seamless Data Science for Teams.
  2. An executive’s guide to AI.
  3. What should I use to serve R applications over the internet? at Brian Caffo’s YouTube channel. He talks about PlumbeR (PlumbeR book here).
  4. Will AI eat statistics? at Brian Caffo’s YouTube channel.
  5. A radical new neural network design could overcome big challenges in AI at MIT Technology Review.
  6. Urgent need for a government-led big data system, say industry experts at The Edge Markets.
  7. Top 10 Cities Across The Globe With The Highest Pay Packages For Data Scientists at Analytics India.
  8. What Nobody Tells You About Machine Learning at Forbes.
  9. How the data mining of failure could teach us the secrets of success at MIT Technology Review.
  10. How to hide from the AI surveillance state with a color printout at MIT Technology Review.
  11. Boosting (machine learning) at Wikipedia.
  12. Weak Learning, Boosting, and the AdaBoost algorithm at Jeremy Kun’s Blog.
  13. Weak vs. Strong Learning and the Adaboost Algorithm at Jenn Wortman Vaughan’s Website.
  14. What is a weak learner? at StackOverflow.
  15. AI está pronta para transformar radicalmente o desenvolvimento de software at CIO.
  16. O orçamento das universidades e institutos federais desde 2000 at NexoJornal.
  17. O governo contra as universidades, em dados e análises at NexoJornal.
  18. Existe alguma microevolução documentada nos humanos nos últimos duzentos anos? at Quora.