Tag: r

Causality, Data Science, R, tools, Uncategorized

Continuous Machine Learning – Part I

Reading time: 9 minutes
Image by Taras Tymoshchuck from here.

This is a 3 part series about Continuous Machine Learning. You can check Part II here and Part III here.

What is it?

Continuous Machine Learning (CML) follows the same concept of Continuous Integration and Continuous Delivery (CI/CD), famous concepts in Software Engineering / DevOps, but applied to Machine Learning and Data Science projects.

What is this post about?

I will cover a set of tools that can make your life as a Data Scientist much more interesting. We will use MIIC, a network inference algorithm, to infer the network of a famous dataset (alarm from bnlearn). We will then use (1) git to track our code, (2) DVC to track our dataset, outputs and pipeline, (3) we will use GitHub as a git remote and (4) Google Drive as a DVC remote. I’ve written a tutorial on managing Data Science projects with DVC, so if you’re interested on it open a tab here to check it later.

Data Science, PhD, R

Manage your Data Science Project in R

Reading time: 9 minutes

A simple project tutorial with R/RMarkdown, Packrat, Git, and DVC.

Source: Here.

The pain of managing a Data Science project

Something has been bothering me for a while: Reproducibility and data tracking in data science projects. I have read about some technologies but had never really tried any of them out until recently when I couldn’t stand this feeling of losing track of my analyses anymore. At some point, I decided to give DVC a try after some friends, mostly FlĂĄvio ClĂ©sio, suggested it to me. In this post, I will talk about Git, DVC, R, RMarkdown and Packrat, everything I think you may need to manage your Data Science project, but the focus is definitely on DVC.


Best links of the week #30

Reading time: 2 minutes

Best links of the week from 29th July to 4th August

Source: here.


  1. Some interesting shiny apps at Tychobra.
  2. Learn git branching!
  3. Learn vim.
  4. rThreeJS R Package.
  5. Difference Between Covariance and Correlation at Key Differences.
  6. Variance vs. Covariance: What’s the Difference? at Investopedia.
  7. Difference Between Correlation and Regression at Key Differences.
  8. Difference Between Parametric and Nonparametric Test at Key Differences.
  9. Preferential attachment at Wikipedia.
  10. Voice automated shiny app (example here) at Yihui Xie’s GitHub.
  11. Webcam (face) automated shiny app (example here) at Yihui Xie’s GitHub.
  12. Xaringan (presentation on xaringan here) at Yihui Xie’s GitHub.
  13. Learn R fast with fasteR!
  14. We’re told that too much screen time hurts our kids. Where’s the evidence? at The Guardian.
  15. pagedown: Creating beautiful PDFs with R Markdown and CSS at rstudio::conf 2019 website.
  16. Por que cientistas precisam ser também bons comunicadores at NEXO Jornal.
  17. Portugal cria visto especial para atrair profissionais de TI brasileiros at Folha de SĂŁo Paulo.

Best links of the week #27

Reading time: 2 minutes

Best links of the week from 8th July to 14th July

Source: here.


  1. The “Rmd first” method: when projects start with documentation SĂ©bastien Rochette’s GitHub repository.
  2. goodpractice: Advice on R Package Building.
  3. Sampling (statistics) at Wikipedia.
  4. Bootstrapping (statistics) at Wikipedia.
  5. Jackknife resampling at Wikipedia.
  6. Bootstrap in R by Ɓukasz DeryƂo at DataCamp Tutorials.
  7. How can I generate Bootstrap statistics in R? at the FAQ of the Institute for Digital Research & Education (UCLA).
  8. How does R handle missing values? at the FAQ of the Institute for Digital Research & Education (UCLA).
  9. How does R handle overlapping object names? at the FAQ of the Institute for Digital Research & Education (UCLA).
  10. How can I test for contrasts in R? at the FAQ of the Institute for Digital Research & Education (UCLA).
  11. Explaining to laypeople why bootstrapping works at Cross Validated.

Best links of the week #25

Reading time: 2 minutes

Best links of the week from 24th June to 30th June

Source here.


  1. I know what you did last session – clustering users with ml presented at PAPIs LATAM 2019.
  2. You can also see all the talks that happened during PAPIs LATAM 2018 here on their YouTube channel.
  3. Data Science Cheatsheet by Maverick Lin at his GitHub.
  4. Announcing the YouTube-8M Segments Dataset at Google AI Blog.
  5. BDMEP – Banco de Dados MeteorolĂłgicos para Ensino e Pesquisa.

Best links of the week #24

Reading time: 2 minutes

Best links of the week from 17th June to 23rd June

Source here.


  1. Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained at Oleksii Trekhleb’s GitHub.
  2. TOP 100 R tutorials for beginners at Listen Data.
  3. Best ZSH theme (powerlevel9k) 🙂 at Ben Hilburn’s GitHub.
  4. Discover and install useful RStudio addins at Dean Attali’s GitHub.
  5. Reticulate: R Interface to Python. RMarkdown with Python and R together? VoilĂ !
  6. Create interactive timeline visualizations in R at Dean Attali’s GitHub.
  7. Building Shiny apps – an interactive tutorial at Dean Attali’s Blog.
  8. Globo recruta para treinamento em ciĂȘncia de dados com chance de emprego at EXAME.

Best links of the week #23

Reading time: 2 minutes

Best links of the week from 10th June to 16th June

Source: here.


  1. Dataset with 81 variables on brazilian municipalities at Kaggle.
  2. Cientista russo pretende criar mais bebĂȘs modificados geneticamente at El PaĂ­s.
  3. A Ășltima coca-cola do deserto at Nossa CiĂȘncia.
  4. dplyr::case_when at R Documentation.
  5. Drawing Survival Curves Using ggplot2 at survminer references.
BestLinks, Data Science, R

Best links of the week #20

Reading time: < 1 minute

Best links of the week from 20th May to 26th May


  1. UN (United Nations) data.
  2. A curated list of 200+ blogs related to Data Science at CybrHome.
  3. 25 Excellent Machine Learning Open Datasets.
  4. Group Chats Are Making the Internet Fun Again at Intelligencer.
  5. Do anything with dplyr.
  6. Starting out with R at Credibly Curious.

Best links of the week #18

Reading time: 3 minutes

Best links of the week from 6th May to 12th May

Source: xkcd.


  1. NextJournal, Seamless Data Science for Teams.
  2. An executive’s guide to AI.
  3. What should I use to serve R applications over the internet? at Brian Caffo’s YouTube channel. He talks about PlumbeR (PlumbeR book here).
  4. Will AI eat statistics? at Brian Caffo’s YouTube channel.
  5. A radical new neural network design could overcome big challenges in AI at MIT Technology Review.
  6. Urgent need for a government-led big data system, say industry experts at The Edge Markets.
  7. Top 10 Cities Across The Globe With The Highest Pay Packages For Data Scientists at Analytics India.
  8. What Nobody Tells You About Machine Learning at Forbes.
  9. How the data mining of failure could teach us the secrets of success at MIT Technology Review.
  10. How to hide from the AI surveillance state with a color printout at MIT Technology Review.
  11. Boosting (machine learning) at Wikipedia.
  12. Weak Learning, Boosting, and the AdaBoost algorithm at Jeremy Kun’s Blog.
  13. Weak vs. Strong Learning and the Adaboost Algorithm at Jenn Wortman Vaughan’s Website.
  14. What is a weak learner? at StackOverflow.
  15. AI estĂĄ pronta para transformar radicalmente o desenvolvimento de software at CIO.
  16. O orçamento das universidades e institutos federais desde 2000 at NexoJornal.
  17. O governo contra as universidades, em dados e anĂĄlises at NexoJornal.
  18. Existe alguma microevolução documentada nos humanos nos Ășltimos duzentos anos? at Quora.

Best links of the week #16

Reading time: 2 minutes

Best links of the week from 22nd April to 28th April

You can check this comic here


  1. Do more with R: drag-and-drop ggplot at InfoWorld.
  2. Apart from esquisse, the package mentioned in the link above, there is another one that allows you to drag-and-drop and plot your data: ggplotAssist.
  3. DreamRs is a French R consulting firm. In their website, they have made publicly available some shiny apps on real data, such as RATP traffic and a GitHub dashboard.
  4. VCs just invested $8 million into this startup that gave away its software for free because they noticed how much people loved it!
  5. Cheat Sheets for several softwares and concepts related to Data Science at Asif Bhat GitHub.
  6. Data Science must read articles, tutorials and useful links at Asif Bhat GitHub.
  7. Math required for Data Science at Asif Bhat GitHub.
  8. Quick overview of Statistics for Biologists (it’s useful for pretty much everybody, you don’t say no to an offer of knowledge :-).
  9. How can I show the intermediate steps of a long routine in R? at StackOverflow.
  10. ‘Friendly’ reviewers rate grant applications more highly at Nature.
  11. Calm down, everyone. Keeping dead pig cells alive is not ‘brain resuscitation’ at Los Angeles Times.
  12. Uber is sharing publicly some data!
  13. Need help on choosing the right visualization method? From data-to-viz can help you!
  14. IBM releases Diversity in Faces, a dataset with over 1 million annotated images to help fight bias at Turing Tribe.
  15. Até 2030, AI contribuirå em mais de US$ 15,7 trilhÔes para economia global at Computer World.
  16. A extraordinĂĄria cientista que estudou o cĂ©rebro de Einstein e revolucionou a neurociĂȘncia moderna at Época NegĂłcios.
  17. TerraBrasilis, a open access public geographical data for environmental monitoring.