Photo by Jamie Street on Unsplash

But let’s get back to our article. Well, the answer to this question is easy — study! But I cannot write an article with just one word “study”. You’ll need a bit more.

A few days ago I passed an exam for AWS Data Analytics Specialty Certification and while my memory is fresh I would like to share a preparation process that helped me to achieve this.



AWS Data Analytics Specialty exam focuses mostly on AWS services related to data and analytics (kinda obvious, isn’t it) and covers the following topics or domains:

  • Collection
  • Storage and Data Management

Photo by Nelly Volkovich on Unsplash

When you want to start a project with data the first concern is to get data. Once you have it you can build fancy graphs, pie charts and ML models, but before that — please be a nice guy/gal and collect the data.

In one of my previous articles I’ve already explained the process of scraping data from a website — what libraries to use, how to browse through the tags etc. That mentioned article is here.

In this one I would like to make a step forward and improve the scraping process by automating it and moving it to…

Thanks for photo Zetong Li from Unsplash

It’s been a long time since my last article, but finally I have something to share.

Last two months were real hell for me — a short-term project that supposed to be easy and quick ended up as always — with famous quote about programmer’s credo: “We do these things not because they are easy, but because we thought they were going to be easy”.

The project was to build a data lake from the scratch with all the freedom of actions in order to find the best solution. …

Photo by SpaceX on Unsplash

When I’ve got an understanding of what ML is (more or less) I also realized that all those little projects with small datasets are cute and useful, and the topic seems to be easy, but there was one “but”. I came to a conclusion that for local development one thing may work and may be a perfect solution, but Google or Facebook or Amazon don’t use 987KB csv file in order to give some recommendation (or read your mind). They use tons of data, from different sources, with different structures and levels of cleansing necessary.

But let’s forget for a…

Photo by Luca Baggio on Unsplash

Data is everywhere. We generate data every second. Even now, while I am typing this I generate 1 byte of data with every symbol typed. And in the background I send to Spotify the fact that I’m listening to music. My smart bulbs generate light and data that the light is turned on. And if I wore a wearable of some kind I would be generating data about my steps, fitness routines etc. And there are millions of dudes like me.

It is said that more than 100 million spam emails are sent every minute. Netflix users stream more than…

Photo by Robynne Hu on Unsplash

I am pretty sure that on your data journey you came across some courses, videos, articles, maybe use cases where someone takes some data, builds a classification/regression model, shows you great results, you learn how that model works and why it works that way and not another and everything seems to be fine. You think you just learned a new thing (and you did), you are happy about that (yes, you are ! I am not kidding around here, you’re doing great!) and you continue to the next piece of content.

But later on you start to ask additional questions…

Data is tricky.

Photo by Franki Chamaki on Unsplash

Data, statistics, math, numbers — these are exact things we may think, but I have to disappoint you: there are a lot of paradoxes in this science and we have to be aware of them to do our work well and to make better decisions in our day-to-day life. The more things you know the easier it is to spot them in modern world, overwhelmed with information.

Here I present you 7 different paradoxes that exist in science, logic, statistics and math to show that everything is not that clear as it seems from time to time.

1. Prosecutor’s fallacy

Photo by Aron Van de Pol on Unsplash

DISCLAIMER: absolutely subjective point of view, for the official definition check out vocabularies or Wikipedia. And come on, you wouldn’t read an entire article just to get the definition.

Well, we analyse data every day, every hour, every minute. Our brain, using our 5 senses, reads the input, processes it and makes some conclusions. …

Taken from FC Barcelona web page

Yesterday’s game against Betis was difficult, but the team got 3 points. We have seen the typical for this Barcelona things again — mistakes in the defense, no people to score and crazy assists by Lionel Messi, no goals by the GOAT though. He doesn’t score 4 games already. Statistically, it means someone will be punished later in the future :D.

Let´s see this game again with some numbers. We will start with line-ups. FCB started in 4–3–3, but during the game it was more like 4–1–2–1–2, with Buskets that plays closer to defenders, Frenkie and Roberto on the wings…

Photo by Vienna Reyes on Unsplash


In this notebook we will explore modern metrics in football (xG, xGA and xPTS) and its’ influence in sport analytics.

  • Expected Goals (xG) — measures the quality of a shot based on several variables such as assist type, shot angle and distance from goal, whether it was a headed shot and whether it was defined as a big chance.
  • Expected Assits (xGA) — measures the likelihood that a given pass will become a goal assist. It considers several factors including the type of pass, pass end-point and length of the pass.
  • Expected Points (xPTS) — measures the likelihood of a…

Sergi Lehkyi

Data and Cloud Developer, love technology in general, maybe too much humor and never too serious, based in amazing Barcelona

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store