Tag: r

Best links of the week #15

Reading time: 2 minutes

Best links of the week from 15th April to 21st April

Links

  1. When it comes to clustering, depending on the algorithm used, one may have a hard time determining the appropriate k (number of clusters). Some algorithms do not require it, but for the ones that do, such as k-means, you should have a look at the elbow method to evaluate the appropriate k or at the silhouette of objects regarding the clusters.
  2. Dunder Data is a professional training company dedicated to teaching data science and machine learning. There is paid and free online material.
  3. Software Carpentry, teaching basic lab skills for research computing.
  4. ROpenSci, transforming science through open data and software.
  5. mlmaisleve, conceitos rápidos e leves sobre Machine Learning ?.
  6. kite, Code Faster in Python with Line-of-Code Completions.

Best links of the week #12

Reading time: < 1 minute

Best links of the week from 25th March to 31th March.

Links

  1. Harvard Dataverse is a repository of data currently hosting over 82 thousand datasets.
  2. The origins of the job title “data scientist” at Quartz at work.
  3. The Data Incubator offers [paid] courses and bootcamps in Data Analysis.
  4. Teenagers are better behaved and less hedonistic nowadays at The Economist.
  5. Why You Procrastinate (It Has Nothing to Do With Self-Control) at The New York Times.
  6. RATP, Régie Autonome des Transports Parisiens (English: Autonomous Operator of Parisian Transports) is data friendly!
  7. Data Science Meetups: A list of Data Science Meetups from around the world!
  8. A list of R conferences, groups and meetings at Jumping Rivers GitHub page.

Best links of the week #9

Reading time: < 1 minute

Best links of the week from 4th March to 10th March.

Links

  1. What are some of your favorite, but less well-known, packages for R? [1] [2] at Statistics and Data Science sub Reddits.
  2. Why is it wrong to stop an A/B test before optimal sample size is reached? at Cross Validated (Stack Exchange).
  3. How do I calculate statistical power? at Effect Size FAQs.
  4. Personal website generator.
  5. From hard drive to over-heard drive: Boffins convert spinning rust into eavesdropping mic at The Register.
  6. List of Machine Learning / Deep Learning conferences in 2019 at Tryo Labs.
  7. We Use Less Information to Make Decisions Than We Think at Harvard Business Review.
  8. Apple CEO Tim Cook explains why you don’t need a college degree to be successful at Business Insider.
  9. Jordan Peterson’s 10-step process for stronger writing at Big Think.
  10. R package primer at Karl Broman‘s website.
  11. Researchers Can Now Cheaply Turn Atmospheric CO2 Back Into Coal at IFLScience.
  12. Plano de estudos em machine learning com conteúdos em português at Italo José’s GitHub.
  13. O Brasil em dados libertos.
  14. Reconhecimento facial ajuda a prender criminoso no Carnaval de Salvador at Canal Tech.
  15. Conhecer o próprio genoma envolve surpresas e decepções at Folha de São Paulo.
  16. Qual a lógica do detector de mentiras? at Revista Questão de Ciência.
  17. Pesquisas que parecem medicina, mas não são at Revista Questão de Ciência.
  18. A distribuição de pessoas com doutorado pelo Brasil at Nexo Jornal.
  19. Programadores tornarão o caminho mais fácil para invasores dizem pesquisadores at Mundo Hacker.

Best links of the week #7

Reading time: 2 minutes

Best links of the week from 18th February to 24th February.

Links

  1. AI Generates Hilarious “Inspirational Quotes” at Truth Theory.
  2. Creating New Variables in R with mutate() and ifelse().
  3. Best slides on ggplot2 I have ever seen: the ggplot flipbook at Gina Reynolds’ GitHub. Files here.
  4. The Tidyverse in Action at Gina Reynolds’ GitHub. Files here.
  5. A workflow template for using topic R markdown documents that can be used as inputs to xaringan slides or lecture notes at Gina Reynolds’ GitHub.
  6. The CRISPR twins had their brains altered at MIT Technology Review.
  7. Elon Musk, Tesla e um exemplo didático de jornalismo irresponsável e sensacionalista at MeioBit.
  8. This is how AI bias really happens—and why it’s so hard to fix at MIT Technology Review.
  9. Como um logaritmo pode ser chamado de natural? at Deviante.
  10. Replicabilidade em Ciências Sociais: em torno de 35% de estudos publicados em jornais renomados não foram replicados em um estudo recente at Deviante.
  11. Crise de Replicabilidade at Deviante.
  12. O Fim da Ciência no Brasil: A minha história com o financiamento público de pesquisa at Deviante.
  13. Paradoxo de Braess: construir mais uma rua pode… piorar? at Deviante.

Best links of the week #5

Reading time: < 1 minute

Best links of the week from 4th February to 10th February.

Links

  1. Como controlar o braço de outra pessoa com o poder da sua mente? at UOL.
  2. vidente is an R package I am currently writing to parse and analyze data from the Surveillance, Epidemiology and End Results (SEER) Program, which covers over 1/3 of the US population on cancer incidence and survival.
  3. Ciência de Dados com R is a book on Data Science using R at Instituto Brasileiro de Pesquisa e Análise de Dados.
  4. Data Science & Machine Learning Course at Ivanovitch Silva’s GitHub repository.
  5. A receita dos candidatos a deputado federal em 2018 at Nexo Jornal.
  6. AI 100: The Artificial Intelligence Startups Redefining Industries at CB Insights.
  7. The open-source and crowd sourced conference website.
  8. Ranking of IT conferences.

The unintended trap in bracket subsetting in R

Reading time: 3 minutes
The silent [and maybe mortal?] trap in bracket subsetting.

Dear reader,

It should be clear to you that, as several other programming languages, R provides different ways to tackle the same problem. One common problem in data analysis is to subset your data frame and, as Google can show you, there are several blog posts and articles trying to teach you different ways to subset your data frame in R. Let’s do a quick review here:

Before starting to subset a data frame, we must first create one. I will create a data frame of patients named var_example with two columns, one for vital status (is_alive) and one for birth year (birthyear). Birth year values are 4-digit numbers representing the year of birth. The is_alive column can have one of three values:

  • TRUE: The person is alive;
  • FALSE: The person is dead;
  • NA: We do not know if this person is either alive or dead.
> var_example <- cbind(as.data.frame(sample(c(NA, TRUE, FALSE),
                                          size=100,
                                          replace=TRUE,
                                          prob = c(0.1, 0.5, 0.4))),
                     as.data.frame(sample(c(1980:1995),
                                          size=100,
                                          replace=TRUE)))
> colnames(var_example) <- c("is_alive", "birthyear")