Category: Uncategorized

Causality, Data Science, R, tools, Uncategorized

Continuous Machine Learning – Part II

Reading time: 3 minutes

This is a 3-part series about Continuous Machine Learning. You can check Part I here and Part III here. This post is a continuation of the previous one, in which we initiated our experience on automating Data Science in GitHub with CML. We will basically make use of Docker to improve the computation time in our GitHub Actions checks.

You can think of a Docker image as taking a snapshot of the software environment of a project, and then being able to setup that snapshot on any other computer. When GitHub Actions is called, it loads your Docker image in their infrastructure and then runs your code. That’s why it’s quicker, because when you use a Docker container with your dependencies already installed, you don’t have to spend time setting them up all over again on your GitHub Actions runner every time it is triggered, which is the way we did in the first part of this series.

Creating a Docker image

Image from “Build a Docker Image just like how you would configure a VM”.
Uncategorized

Best links of the week #74

Reading time: 3 minutes

Best links of the week from 20th July to 17th August

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Match your manuscript to a potential journal (Clarivate).
  2. Match your manuscript to a potential journal (Elsevier).
  3. Match your manuscript to a potential journal (Springer).
  4. Cohen’s d at Wikiversity.
  5. Why, When and How to Adjust Your P Values? at Cell Journal.
  6. GitHub CLI.

Blog posts

  1. Entenda de uma vez por todas o que são testes unitários, para que servem e como fazê-los at Dayvson Lima‘s Medium.
  2. Unit Testing in R at Towards Data Science.
  3. Independent and Identically Distributed Data (IID) at Statistics by Jim.
  4. Ser cético não implica ser cínico! at Portal Deviante.
  5. Probabilistic Graphical Models Tutorial — Part 1 by Prasoon Goyal at Cube Dev‘s Medium.
  6. Probabilistic Graphical Models Tutorial — Part 2 by Prasoon Goyal at Cube Dev‘s Medium.
  7. Learning GitHub Actions: Creating Beautiful PR Comments by Ivan Shcheklein.
  8. Continuous Machine Learning at The Dataist Storyteller.
  9. Using Continuous Machine Learning to Run Your ML Pipeline at Vaithy Narayanan‘s Medium.
  10. Improve your workflow by managing your machine learning experiments using Sacred at Déborah Mesquita’s blog.
  11. A gentle introduction to D3: how to build a reusable bubble chart at Déborah Mesquita’s blog.
  12. The Rise of DataOps (from the ashes of Data Governance) by Ryan Gross at Towards Data Science.

Videos

  1. The Great Debate: THE STORYTELLING OF SCIENCE (Part 1/2).
  2. The Great Debate: THE STORYTELLING OF SCIENCE (Part 2/2).
  3. What Is And How To Calculate Cohen’s d? at Top Tip Bio‘s YouTube channel.
  4. What are degrees of freedom? at James Gilbert‘s YouTube channel.
  5. Data Analysis: Why do we test the null hypothesis? at James Gilbert‘s YouTube channel.
  6. Testing For Normality – Clearly Explained at Top Tip Bio‘s YouTube channel.
  7. Pearson Correlation Explained (Inc. Test Assumptions) at Top Tip Bio‘s YouTube channel.
  8. The Shape of Data: Distributions: Crash Course Statistics #7 at CrashCourse‘s YouTube channel.
  9. Regression: Crash Course Statistics #32 at CrashCourse‘s YouTube channel.
  10. The Multiple Comparisons Problem at Sprightly Pedagogue‘s YouTube channel.
  11. MLOps Tutorial #1: Intro to Continuous Integration for ML at DVCorg‘s Youtube channel.
  12. MLOps Tutorial #2: When data is too big for Git at DVCorg‘s Youtube channel.
  13. MLOps Tutorial #3: Track ML models with Git & GitHub Actions at DVCorg‘s Youtube channel.
  14. Introduction to Bayesian Networks | Implement Bayesian Networks In Python at edureka!‘s YouTube channel.
  15. Bayesian Network – Exact Inference Example (With Numbers, FULL Walk-Through) at John McVickar‘s YouTube channel.

Podcast

  1. Iniciativa monitora o distanciamento social no Brasil (#999) at Spin de Notícias.
Causality, Data Science, R, tools, Uncategorized

Continuous Machine Learning – Part I

Reading time: 9 minutes
Image by Taras Tymoshchuck from here.

This is a 3 part series about Continuous Machine Learning. You can check Part II here and Part III here.

What is it?

Continuous Machine Learning (CML) follows the same concept of Continuous Integration and Continuous Delivery (CI/CD), famous concepts in Software Engineering / DevOps, but applied to Machine Learning and Data Science projects.

What is this post about?

I will cover a set of tools that can make your life as a Data Scientist much more interesting. We will use MIIC, a network inference algorithm, to infer the network of a famous dataset (alarm from bnlearn). We will then use (1) git to track our code, (2) DVC to track our dataset, outputs and pipeline, (3) we will use GitHub as a git remote and (4) Google Drive as a DVC remote. I’ve written a tutorial on managing Data Science projects with DVC, so if you’re interested on it open a tab here to check it later.

Uncategorized

Best links of the week #71

Reading time: 2 minutes

Best links of the week from 8th June to 14th June

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Hipótese de Sapir-Whorf at Wikipedia.
  2. Live 2020 Max Planck Lecture by Geoffrey Hinton (June, 23rd).
  3. Um livro ilustrado de maus argumentos.
  4. Your logical fallacy is.
  5. Brazilian Symposium on Bioinformatics 2020.
  6. When 511 Epidemiologists Expect to Fly, Hug and Do 18 Other Everyday Activities Again at the New York Times.
  7. 7 Reasons Why Studying a Bachelor’s Degrees Abroad in Better than in Your Home Town at Study Portals Masters.
  8. 7 Decisive Reasons to Study Abroad in 2020 – Why You Won’t Regret It at Study Portals Bachelors.

Blog posts

  1. “Depois disso, logo, causado por isso”… Será? at Portal Deviante.
  2. Why is Linear Algebra Taught So Badly? by Callum Ballard at Towards Data Science.
  3. Why is Data Science Losing Its Charm? by Harshit Ahuja at Towards Data Science.

Videos

  1. Dividing by zero? at Eddie Woo‘s YouTube channel.
  2. Why is 0! = 1? at Eddie Woo‘s YouTube channel.
  3. What is 0 to the power of 0? at Eddie Woo‘s YouTube channel.

Podcast

  1. A Ciência e a COVID-19 (SciCast #380) at Portal Deviante.
Uncategorized

Best links of the week #69

Reading time: < 1 minute

Best links of the week from 11th May to 17th May

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Changes in new release of R (4.0.0).

Blog posts

  1. May ’20 DVC❤️Heartbeat at DVC Blog.
  2. Isotonic Regression is THE Coolest Machine-Learning Model You Might Not Have Heard Of by Emmett Boudreau at Towards Data Science.
  3. Econometrics 101 for data scientists by Mahbubul Alam at Towards Data Science.
  4. Panel data regression: a powerful time series modeling technique by Mahbubul Alam at Towards Data Science.
  5. Detecting stationarity in time series data by Shay Palachy at KDnuggets.
Uncategorized

Best links of the week #68

Reading time: 2 minutes

Best links of the week from 4th May to 10th May

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. COVID-19 dashboard by NatalNet lab (UFRN).
  2. COVID-19 dashboard by Brain Institute (UFRN).
  3. Rt Covid-19.
  4. Monitor COVID-19.
  5. Estimativas de R(t) por Estados do Brasil at Flavio Figueiredo’s website.
  6. Many other COVID-19 dashboards.
  7. COVID-19 Projections Using Machine Learning.
  8. OBSERVATÓRIO DA CIÊNCIA.
  9. Join the DVC Ambassador Program! at DVC Blog.
  10. Resultados da pesquisa de mercado de Data Science feita pelo Data Hackers at Kaggle.
Uncategorized

Best links of the week #65

Reading time: 2 minutes

Best links of the week from 13th April to 19th April

This image has an empty alt attribute; its file name is meme-1.jpg

Links

  1. Jobs in Science & Technology from Science Careers.
  2. Find jobs (and create job alerts) at Academic Positions.
  3. Find a Postdoc.
  4. Post doc jobs.
  5. Causas de mortes constantes nos registros de óbitos lavrados pelos Cartórios do Brasil.
  6. Brasil.IO especial COVID19.
Uncategorized

Best links of the week #5

Reading time: < 1 minute

Best links of the week from 4th February to 10th February.

Links

  1. Como controlar o braço de outra pessoa com o poder da sua mente? at UOL.
  2. vidente is an R package I am currently writing to parse and analyze data from the Surveillance, Epidemiology and End Results (SEER) Program, which covers over 1/3 of the US population on cancer incidence and survival.
  3. Ciência de Dados com R is a book on Data Science using R at Instituto Brasileiro de Pesquisa e Análise de Dados.
  4. Data Science & Machine Learning Course at Ivanovitch Silva’s GitHub repository.
  5. A receita dos candidatos a deputado federal em 2018 at Nexo Jornal.
  6. AI 100: The Artificial Intelligence Startups Redefining Industries at CB Insights.
  7. The open-source and crowd sourced conference website.
  8. Ranking of IT conferences.