{"id":639,"date":"2025-02-01T18:50:00","date_gmt":"2025-02-01T17:50:00","guid":{"rendered":"https:\/\/noiseonthenet.space\/noise\/?p=639"},"modified":"2025-02-02T22:14:25","modified_gmt":"2025-02-02T21:14:25","slug":"data-the-final-frontier","status":"publish","type":"post","link":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/","title":{"rendered":"Data: the final frontier"},"content":{"rendered":"<div id=\"org4349562\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?ssl=1\" alt=\"philipp-dusel--Mbfhs0u4YQ-unsplash.jpg\" \/> <\/p> <\/div>\n\n<p> Photo by <a href=\"https:\/\/unsplash.com\/@philipp_dice?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash\">Philipp D\u00fcsel<\/a> on <a href=\"https:\/\/unsplash.com\/photos\/the-night-sky-is-filled-with-stars-above-a-mountain-range--Mbfhs0u4YQ?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash\">Unsplash<\/a> <\/p>\n\n<p> After <a href=\"https:\/\/noiseonthenet.space\/noise\/2025\/01\/a-trip-to-jupyter-lab\/\">heading onto Jupyter<\/a> and <a href=\"https:\/\/noiseonthenet.space\/noise\/2025\/01\/meet-the-pandas\/\">meeting the Pandas<\/a> let&rsquo;s boldly go where no one has gone before! <\/p>\n\n<p> Here are some powerful tools to explore and discover new lifeforms into our data <\/p>\n\n<p> <a id=\"org825f3a8\"><\/a> <\/p>\n<div id=\"outline-container-introduction-to-exploratory-data-analysis-with-matplotlib-and-seaborn\" class=\"outline-2\">\n<h2 id=\"introduction-to-exploratory-data-analysis-with-matplotlib-and-seaborn\">Introduction to Exploratory Data Analysis with Matplotlib and Seaborn<\/h2>\n<div class=\"outline-text-2\" id=\"text-introduction-to-exploratory-data-analysis-with-matplotlib-and-seaborn\">\n<p> In this part we are going to focus on a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exploratory_data_analysis\">quick exploration<\/a> of the data, according to their type and number. <\/p>\n\n<p> For simplicity we will talk about two main data kind: <\/p>\n\n<ul class=\"org-ul\">\n<li>categorical: i.e., a finite list of discrete values which may or may not have a specific order e.g., <code>yellow<\/code>, <code>red<\/code>, <code>blue<\/code><\/li>\n<li>continuous: i.e. numerical values (most often belonging to R) usually represented with a <code>float<\/code> computer type<\/li>\n<\/ul>\n\n<p> Jupyter and pandas allow you to easily interact with the data and perform operations and visualization. <\/p>\n\n<p> <a id=\"org353f7cc\"><\/a> <\/p>\n<\/div>\n<div id=\"outline-container-installing-basic-libraries\" class=\"outline-4\">\n<h4 id=\"installing-basic-libraries\">Installing basic libraries<\/h4>\n<div class=\"outline-text-4\" id=\"text-installing-basic-libraries\">\n<p> Execute the following cell only if you need to install the seaborn library <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-bash\" id=\"nil\"><span style=\"color: #89dceb;\">!<\/span>pip install --upgrade matplotlib seaborn\n<\/pre>\n<\/div>\n\n<p> <a id=\"org67d82ee\"><\/a> The following libraries are the foundation tools: <\/p>\n\n<ul class=\"org-ul\">\n<li><b>pandas<\/b> is an in-memory dataframe library<\/li>\n<li><b>matplotlib<\/b> is a plotting library inspired by matlab plotting API<\/li>\n<li><b>seaborn<\/b> is a chart library based on matplotlib, with more functionalities and themes<\/li>\n<li><b>numpy<\/b> is a numeric calculation library providing fast c arrays and scientific functions<\/li>\n<\/ul>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cba6f7;\">import<\/span> pandas <span style=\"color: #cba6f7;\">as<\/span> pd\n<span style=\"color: #cba6f7;\">import<\/span> matplotlib.pyplot <span style=\"color: #cba6f7;\">as<\/span> plt\n<span style=\"color: #cba6f7;\">import<\/span> seaborn <span style=\"color: #cba6f7;\">as<\/span> sns\n<span style=\"color: #cba6f7;\">import<\/span> numpy <span style=\"color: #cba6f7;\">as<\/span> np\n<span style=\"color: #cba6f7;\">from<\/span> tabulate <span style=\"color: #cba6f7;\">import<\/span> tabulate\n<\/pre>\n<\/div>\n\n<p> <a id=\"org3a90bdd\"><\/a> <\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-birds-eye-view-of-a-dataset-with-describe\" class=\"outline-3\">\n<h3 id=\"birds-eye-view-of-a-dataset-with-describe\">Bird&rsquo;s eye view of a dataset with Describe<\/h3>\n<div class=\"outline-text-3\" id=\"text-birds-eye-view-of-a-dataset-with-describe\">\n<p> let&rsquo;s start with a classic dataset including the passengers of Titanic ship. <\/p>\n\n<p> The <code>read_csv<\/code> function uploads this format in a pandas <code>DataFrame<\/code> which is a relation <\/p>\n\n<p> Note: the titanic dataset was downloaded at the beginning of Part 2; in case you missing it execute the code at the beginning of the lesson <\/p>\n\n<p> The <code>.head()<\/code> method returns the first lines of your data frame to quickly inspect it <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">titanic<\/span> <span style=\"color: #89dceb;\">=<\/span> pd.read_csv(<span style=\"color: #a6e3a1;\">\"datasets\/titanic.csv\"<\/span>)\n<span style=\"color: #cdd6f4;\">df<\/span> <span style=\"color: #89dceb;\">=<\/span>titanic.head()[[<span style=\"color: #a6e3a1;\">\"Survived\"<\/span>,<span style=\"color: #a6e3a1;\">\"Pclass\"<\/span>,<span style=\"color: #a6e3a1;\">\"Age\"<\/span>,<span style=\"color: #a6e3a1;\">\"Sex\"<\/span>]]\n<\/pre>\n<\/div>\n\n<table border=\"2\" cellspacing=\"0\" cellpadding=\"6\" rules=\"groups\" frame=\"hsides\">\n\n\n<colgroup>\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-left\" \/>\n<\/colgroup>\n<thead>\n<tr>\n<th scope=\"col\" class=\"org-right\">&#xa0;<\/th>\n<th scope=\"col\" class=\"org-right\">Survived<\/th>\n<th scope=\"col\" class=\"org-right\">Pclass<\/th>\n<th scope=\"col\" class=\"org-right\">Age<\/th>\n<th scope=\"col\" class=\"org-left\">Sex<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">22<\/td>\n<td class=\"org-left\">male<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">38<\/td>\n<td class=\"org-left\">female<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-right\">2<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">26<\/td>\n<td class=\"org-left\">female<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">35<\/td>\n<td class=\"org-left\">female<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-right\">4<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">35<\/td>\n<td class=\"org-left\">male<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n<p> <a id=\"orgd516c3d\"><\/a> the <code>.describe()<\/code> method returns basic statistics for all numerical columns <\/p>\n\n<ul class=\"org-ul\">\n<li>min<\/li>\n<li>max<\/li>\n<li>median<\/li>\n<li>mean<\/li>\n<li>quartiles<\/li>\n<li>count of elements<\/li>\n<\/ul>\n\n<p> by using the <code>.describe(include<\/code>&ldquo;all&rdquo;)= option also categorical values are shown with some other statistics: <\/p>\n\n<ul class=\"org-ul\">\n<li>number of unique discrete values<\/li>\n<li>the most common one<\/li>\n<li>its frequency<\/li>\n<\/ul>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">df<\/span> <span style=\"color: #89dceb;\">=<\/span>titanic.describe(include<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"all\"<\/span>)[[<span style=\"color: #a6e3a1;\">\"Survived\"<\/span>,<span style=\"color: #a6e3a1;\">\"Pclass\"<\/span>,<span style=\"color: #a6e3a1;\">\"Age\"<\/span>,<span style=\"color: #a6e3a1;\">\"Sex\"<\/span>]]\n<\/pre>\n<\/div>\n\n<table border=\"2\" cellspacing=\"0\" cellpadding=\"6\" rules=\"groups\" frame=\"hsides\">\n\n\n<colgroup>\n<col  class=\"org-left\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n<\/colgroup>\n<thead>\n<tr>\n<th scope=\"col\" class=\"org-left\">&#xa0;<\/th>\n<th scope=\"col\" class=\"org-right\">Survived<\/th>\n<th scope=\"col\" class=\"org-right\">Pclass<\/th>\n<th scope=\"col\" class=\"org-right\">Age<\/th>\n<th scope=\"col\" class=\"org-right\">Sex<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td class=\"org-left\">count<\/td>\n<td class=\"org-right\">891<\/td>\n<td class=\"org-right\">891<\/td>\n<td class=\"org-right\">714<\/td>\n<td class=\"org-right\">891<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">unique<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">2<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">top<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">male<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">freq<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">577<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">mean<\/td>\n<td class=\"org-right\">0.3838<\/td>\n<td class=\"org-right\">2.309<\/td>\n<td class=\"org-right\">29.7<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">std<\/td>\n<td class=\"org-right\">0.4866<\/td>\n<td class=\"org-right\">0.8361<\/td>\n<td class=\"org-right\">14.53<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">min<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">0.42<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">25%<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">2<\/td>\n<td class=\"org-right\">20.12<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">50%<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">28<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">75%<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">38<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">max<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">80<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n\n<p> <a id=\"org6d5ef3c\"><\/a> It is possible to access columns (called <code>Series<\/code> in pandas jargon) using the square bracket operator <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">titanic[<span style=\"color: #a6e3a1;\">\"Pclass\"<\/span>]\n<\/pre>\n<\/div>\n\n<p> columns whose name is a good python identifier (i.e. starts with a letter and contains only letters, numbers and underscore) can be accessed using the dot notation e.g. <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">titanic.Pclass\n<\/pre>\n<\/div>\n\n<p> each column has a data type, as <code>csv<\/code> do not carry any type information, this is inferred when loading; other binary data format also include a data type. The datas type of a column is saved in the <code>.dtype<\/code> attribute <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">pclass<\/span> <span style=\"color: #89dceb;\">=<\/span> titanic.Pclass\n<span style=\"color: #f38ba8;\">print<\/span>(pclass.dtype)\n<\/pre>\n<\/div>\n\n<p> int64 <\/p>\n\n<p> <a id=\"orgc49f1a0\"><\/a> we know this column represents the class of the ticket so we expect it to have a finite number of actual values: we can check it with the <code>.unique()<\/code> method <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">df<\/span> <span style=\"color: #89dceb;\">=<\/span>pclass.unique()\n<\/pre>\n<\/div>\n\n<p> <a id=\"org239a7ef\"><\/a> we see this is a discrete valued columns so we can transform its type with the <code>.astype()<\/code> method <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">pclass<\/span> <span style=\"color: #89dceb;\">=<\/span> pclass.astype(<span style=\"color: #a6e3a1;\">'category'<\/span>)\n<span style=\"color: #cdd6f4;\">df<\/span> <span style=\"color: #89dceb;\">=<\/span>pclass.dtype\n<\/pre>\n<\/div>\n\n<p> <a id=\"org431e905\"><\/a> Now the statistics are represented differently for pClass <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">titanic<\/span>[<span style=\"color: #a6e3a1;\">\"pClass\"<\/span>] <span style=\"color: #89dceb;\">=<\/span> pclass\n<span style=\"color: #cdd6f4;\">df<\/span> <span style=\"color: #89dceb;\">=<\/span>titanic.describe(include<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"all\"<\/span>)[[<span style=\"color: #a6e3a1;\">\"Survived\"<\/span>,<span style=\"color: #a6e3a1;\">\"Pclass\"<\/span>,<span style=\"color: #a6e3a1;\">\"Age\"<\/span>,<span style=\"color: #a6e3a1;\">\"Sex\"<\/span>]]\n<\/pre>\n<\/div>\n\n<table border=\"2\" cellspacing=\"0\" cellpadding=\"6\" rules=\"groups\" frame=\"hsides\">\n\n\n<colgroup>\n<col  class=\"org-left\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n<\/colgroup>\n<thead>\n<tr>\n<th scope=\"col\" class=\"org-left\">&#xa0;<\/th>\n<th scope=\"col\" class=\"org-right\">Survived<\/th>\n<th scope=\"col\" class=\"org-right\">Pclass<\/th>\n<th scope=\"col\" class=\"org-right\">Age<\/th>\n<th scope=\"col\" class=\"org-right\">Sex<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td class=\"org-left\">count<\/td>\n<td class=\"org-right\">891<\/td>\n<td class=\"org-right\">891<\/td>\n<td class=\"org-right\">714<\/td>\n<td class=\"org-right\">891<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">unique<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">2<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">top<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">male<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">freq<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">577<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">mean<\/td>\n<td class=\"org-right\">0.3838<\/td>\n<td class=\"org-right\">2.309<\/td>\n<td class=\"org-right\">29.7<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">std<\/td>\n<td class=\"org-right\">0.4866<\/td>\n<td class=\"org-right\">0.8361<\/td>\n<td class=\"org-right\">14.53<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">min<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">0.42<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">25%<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">2<\/td>\n<td class=\"org-right\">20.12<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">50%<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">28<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">75%<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">38<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">max<\/td>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">80<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n<p> <a id=\"orgc57fb80\"><\/a> If we know in advance about the type of a column we can give some hint to the csv reader <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">titanic<\/span> <span style=\"color: #89dceb;\">=<\/span> pd.read_csv(\n    <span style=\"color: #a6e3a1;\">\"datasets\/titanic.csv\"<\/span>,\n    dtype<span style=\"color: #89dceb;\">=<\/span>{\n        <span style=\"color: #a6e3a1;\">\"Survived\"<\/span>:<span style=\"color: #a6e3a1;\">\"category\"<\/span>,\n        <span style=\"color: #a6e3a1;\">\"Pclass\"<\/span>:<span style=\"color: #a6e3a1;\">\"category\"<\/span>,\n        <span style=\"color: #a6e3a1;\">\"Sex\"<\/span>:<span style=\"color: #a6e3a1;\">\"category\"<\/span>,\n    }\n)\n<\/pre>\n<\/div>\n\n<p> <a id=\"orgd19e243\"><\/a> <\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-monovariate-categorical\" class=\"outline-3\">\n<h3 id=\"monovariate-categorical\">Monovariate Categorical<\/h3>\n<div class=\"outline-text-3\" id=\"text-monovariate-categorical\">\n<p> When we have a category series we can list all of the possible values using the <code>.cat.categories<\/code> attribute <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #f38ba8;\">print<\/span>(pclass.cat.categories)\n<\/pre>\n<\/div>\n\n<p> Index([1, 2, 3], dtype=&rsquo;int64&rsquo;) <\/p>\n\n<p> <a id=\"org273d9ab\"><\/a> the <code>sns.countplot()<\/code> function show a bar plot of categorical values <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">sns.countplot(pclass)\n<\/pre>\n<\/div>\n\n<div id=\"orga525dac\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/04c5f7ee20b7c943d81ff65e17f36eaf85fead2b.png?ssl=1\" alt=\"04c5f7ee20b7c943d81ff65e17f36eaf85fead2b.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org5f7994f\"><\/a> <\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-monovariate-continuous\" class=\"outline-3\">\n<h3 id=\"monovariate-continuous\">Monovariate Continuous<\/h3>\n<div class=\"outline-text-3\" id=\"text-monovariate-continuous\">\n<p> <a id=\"org2a92a8e\"><\/a> this dataframe collects pollutant density in California <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">california<\/span> <span style=\"color: #89dceb;\">=<\/span> pd.read_csv(<span style=\"color: #a6e3a1;\">\"california_pb_2023.csv\"<\/span>)\n<span style=\"color: #cdd6f4;\">df<\/span> <span style=\"color: #89dceb;\">=<\/span>california.describe(include<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"all\"<\/span>)[[<span style=\"color: #a6e3a1;\">'Daily Mean Pb Concentration'<\/span>, <span style=\"color: #a6e3a1;\">'County'<\/span>]]\n<\/pre>\n<\/div>\n\n<table border=\"2\" cellspacing=\"0\" cellpadding=\"6\" rules=\"groups\" frame=\"hsides\">\n\n\n<colgroup>\n<col  class=\"org-left\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n<\/colgroup>\n<thead>\n<tr>\n<th scope=\"col\" class=\"org-left\">&#xa0;<\/th>\n<th scope=\"col\" class=\"org-right\">Daily Mean Pb Concentration<\/th>\n<th scope=\"col\" class=\"org-right\">County<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td class=\"org-left\">count<\/td>\n<td class=\"org-right\">1110<\/td>\n<td class=\"org-right\">1110<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">unique<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">13<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">top<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">Los Angeles<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">freq<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">458<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">mean<\/td>\n<td class=\"org-right\">0.00699<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">std<\/td>\n<td class=\"org-right\">0.008124<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">min<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">25%<\/td>\n<td class=\"org-right\">0.002863<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">50%<\/td>\n<td class=\"org-right\">0.00444<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">75%<\/td>\n<td class=\"org-right\">0.008<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">max<\/td>\n<td class=\"org-right\">0.101<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n<p> <a id=\"orgc1f3ac2\"><\/a> <code>sns.histplot<\/code> shows an histogram <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">sns.histplot(california,x<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"Daily Mean Pb Concentration\"<\/span>)\n<\/pre>\n<\/div>\n\n<table border=\"2\" cellspacing=\"0\" cellpadding=\"6\" rules=\"groups\" frame=\"hsides\">\n\n\n<colgroup>\n<col  class=\"org-left\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n<\/colgroup>\n<thead>\n<tr>\n<th scope=\"col\" class=\"org-left\">&#xa0;<\/th>\n<th scope=\"col\" class=\"org-right\">Daily Mean Pb Concentration<\/th>\n<th scope=\"col\" class=\"org-right\">County<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td class=\"org-left\">count<\/td>\n<td class=\"org-right\">1110<\/td>\n<td class=\"org-right\">1110<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">unique<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">13<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">top<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">Los Angeles<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">freq<\/td>\n<td class=\"org-right\">nan<\/td>\n<td class=\"org-right\">458<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">mean<\/td>\n<td class=\"org-right\">0.00699<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">std<\/td>\n<td class=\"org-right\">0.008124<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">min<\/td>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">25%<\/td>\n<td class=\"org-right\">0.002863<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">50%<\/td>\n<td class=\"org-right\">0.00444<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">75%<\/td>\n<td class=\"org-right\">0.008<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-left\">max<\/td>\n<td class=\"org-right\">0.101<\/td>\n<td class=\"org-right\">nan<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n<div id=\"org1fa465d\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/1e179c2227cfbdf703d241d0bb9385b826510526.png?ssl=1\" alt=\"1e179c2227cfbdf703d241d0bb9385b826510526.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org8ae1fce\"><\/a> This distribution looks like a lognormal distribution, let&rsquo;s show a cumulative distribution and plot it with a logaritmic x axis <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">sorted_pb<\/span> <span style=\"color: #89dceb;\">=<\/span> np.sort(california[<span style=\"color: #a6e3a1;\">\"Daily Mean Pb Concentration\"<\/span>])\n<span style=\"color: #cdd6f4;\">prob_pb<\/span> <span style=\"color: #89dceb;\">=<\/span> (np.arange(<span style=\"color: #f38ba8;\">len<\/span>(sorted_pb)) <span style=\"color: #89dceb;\">+<\/span> <span style=\"color: #fab387;\">1<\/span>)<span style=\"color: #89dceb;\">\/<\/span><span style=\"color: #f38ba8;\">len<\/span>(sorted_pb)\n<span style=\"color: #cdd6f4;\">ax<\/span><span style=\"color: #89dceb;\">=<\/span>sns.lineplot(x<span style=\"color: #89dceb;\">=<\/span>sorted_pb, y<span style=\"color: #89dceb;\">=<\/span>prob_pb)\nax.set_xscale(<span style=\"color: #a6e3a1;\">\"log\"<\/span>, base<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #fab387;\">10<\/span>)\n<\/pre>\n<\/div>\n\n<div id=\"org02b7cfe\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/98650be7328261cabcd95fd83a1dc52ecb101acd.png?ssl=1\" alt=\"98650be7328261cabcd95fd83a1dc52ecb101acd.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org6a53eab\"><\/a> This looks nice so we can check by fitting a quantile plot <\/p>\n\n<p> First we try with a normal quantile, we expect some queues <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cba6f7;\">from<\/span> scipy <span style=\"color: #cba6f7;\">import<\/span> stats\nstats.probplot(california[<span style=\"color: #a6e3a1;\">\"Daily Mean Pb Concentration\"<\/span>], plot<span style=\"color: #89dceb;\">=<\/span>sns.mpl.pyplot)\n<\/pre>\n<\/div>\n\n<div id=\"orgda7988b\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/07b8d558d22557e09c33cc108a169772832e1531.png?ssl=1\" alt=\"07b8d558d22557e09c33cc108a169772832e1531.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org66f5c34\"><\/a> We can fit it with a different distribution, so we choose a lognormal <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">stats.probplot(california[<span style=\"color: #a6e3a1;\">\"Daily Mean Pb Concentration\"<\/span>], plot<span style=\"color: #89dceb;\">=<\/span>sns.mpl.pyplot,dist<span style=\"color: #89dceb;\">=<\/span>stats.distributions.lognorm(s<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #fab387;\">1<\/span>))\n<\/pre>\n<\/div>\n\n<div id=\"orgf45b1c3\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/9cfc294ea181926dd8c3f3a056d94b07f48e2909.png?ssl=1\" alt=\"9cfc294ea181926dd8c3f3a056d94b07f48e2909.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org66f32ac\"><\/a> this looks quite better <\/p>\n\n<p> <a id=\"org3876b9b\"><\/a> <\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-multivariate-categorical\" class=\"outline-3\">\n<h3 id=\"multivariate-categorical\">Multivariate Categorical<\/h3>\n<div class=\"outline-text-3\" id=\"text-multivariate-categorical\">\n<p> let&rsquo;s consider a group of categorical variables and explore their interaction, the <code>pd.crosstab()<\/code> function provides a way to create a contingency table i.e. a table which counts all combination of the considered factors <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">titanic<\/span>[<span style=\"color: #a6e3a1;\">'survived'<\/span>] <span style=\"color: #89dceb;\">=<\/span> titanic.Survived.astype(<span style=\"color: #a6e3a1;\">'category'<\/span>)\n<span style=\"color: #cdd6f4;\">titanic<\/span>[<span style=\"color: #a6e3a1;\">'sex'<\/span>] <span style=\"color: #89dceb;\">=<\/span> titanic.Sex.astype(<span style=\"color: #a6e3a1;\">'category'<\/span>)\n<span style=\"color: #cdd6f4;\">titanic<\/span>[<span style=\"color: #a6e3a1;\">'pclass'<\/span>] <span style=\"color: #89dceb;\">=<\/span> titanic.Pclass.astype(<span style=\"color: #a6e3a1;\">'category'<\/span>)\n<span style=\"color: #cdd6f4;\">ct<\/span> <span style=\"color: #89dceb;\">=<\/span> pd.crosstab(titanic[<span style=\"color: #a6e3a1;\">'survived'<\/span>],columns<span style=\"color: #89dceb;\">=<\/span>[titanic[<span style=\"color: #a6e3a1;\">'sex'<\/span>],titanic[<span style=\"color: #a6e3a1;\">'pclass'<\/span>]])\n<span style=\"color: #cdd6f4;\">df<\/span> <span style=\"color: #89dceb;\">=<\/span>ct\n<\/pre>\n<\/div>\n\n<table border=\"2\" cellspacing=\"0\" cellpadding=\"6\" rules=\"groups\" frame=\"hsides\">\n\n\n<colgroup>\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n<\/colgroup>\n<thead>\n<tr>\n<th scope=\"col\" class=\"org-right\">&#xa0;<\/th>\n<th scope=\"col\" class=\"org-right\">(&rsquo;female&rsquo;, &rsquo;1&rsquo;)<\/th>\n<th scope=\"col\" class=\"org-right\">(&rsquo;female&rsquo;, &rsquo;2&rsquo;)<\/th>\n<th scope=\"col\" class=\"org-right\">(&rsquo;female&rsquo;, &rsquo;3&rsquo;)<\/th>\n<th scope=\"col\" class=\"org-right\">(&rsquo;male&rsquo;, &rsquo;1&rsquo;)<\/th>\n<th scope=\"col\" class=\"org-right\">(&rsquo;male&rsquo;, &rsquo;2&rsquo;)<\/th>\n<th scope=\"col\" class=\"org-right\">(&rsquo;male&rsquo;, &rsquo;3&rsquo;)<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">6<\/td>\n<td class=\"org-right\">72<\/td>\n<td class=\"org-right\">77<\/td>\n<td class=\"org-right\">91<\/td>\n<td class=\"org-right\">300<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">91<\/td>\n<td class=\"org-right\">70<\/td>\n<td class=\"org-right\">72<\/td>\n<td class=\"org-right\">45<\/td>\n<td class=\"org-right\">17<\/td>\n<td class=\"org-right\">47<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n<p> <a id=\"org208b35a\"><\/a> the <code>.plot.bar()<\/code> method provides a quick way to display this information as grouped bar plot <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">ct.plot.bar()\n<\/pre>\n<\/div>\n\n<div id=\"org8278b19\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/1d24d9251539adaf31f212d0a90dae8f08c90c42.png?ssl=1\" alt=\"1d24d9251539adaf31f212d0a90dae8f08c90c42.png\" \/> <\/p> <\/div>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">ct.plot.bar(stacked<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #fab387;\">True<\/span>)\n<\/pre>\n<\/div>\n\n<div id=\"org2cb8f38\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/6a211b5060db8a1a6052f4e092b775e7791ac988.png?ssl=1\" alt=\"6a211b5060db8a1a6052f4e092b775e7791ac988.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org07ae179\"><\/a> <\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-multivariate-continuous\" class=\"outline-3\">\n<h3 id=\"multivariate-continuous\">Multivariate Continuous<\/h3>\n<div class=\"outline-text-3\" id=\"text-multivariate-continuous\">\n<p> the <code>iris<\/code> dataset is a collection of measurements of this flower&rsquo;s features (sepal and petal length and width) across different varieties. <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">iris<\/span> <span style=\"color: #89dceb;\">=<\/span> pd.read_csv(<span style=\"color: #a6e3a1;\">\"iris.csv\"<\/span>)\n<span style=\"color: #cdd6f4;\">df<\/span> <span style=\"color: #89dceb;\">=<\/span>iris.head()\n<\/pre>\n<\/div>\n\n<table border=\"2\" cellspacing=\"0\" cellpadding=\"6\" rules=\"groups\" frame=\"hsides\">\n\n\n<colgroup>\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-right\" \/>\n\n<col  class=\"org-left\" \/>\n<\/colgroup>\n<thead>\n<tr>\n<th scope=\"col\" class=\"org-right\">&#xa0;<\/th>\n<th scope=\"col\" class=\"org-right\">sepal_length<\/th>\n<th scope=\"col\" class=\"org-right\">sepal_width<\/th>\n<th scope=\"col\" class=\"org-right\">petal_length<\/th>\n<th scope=\"col\" class=\"org-right\">petal_width<\/th>\n<th scope=\"col\" class=\"org-left\">variety<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td class=\"org-right\">0<\/td>\n<td class=\"org-right\">5.1<\/td>\n<td class=\"org-right\">3.5<\/td>\n<td class=\"org-right\">1.4<\/td>\n<td class=\"org-right\">0.2<\/td>\n<td class=\"org-left\">Setosa<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-right\">1<\/td>\n<td class=\"org-right\">4.9<\/td>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">1.4<\/td>\n<td class=\"org-right\">0.2<\/td>\n<td class=\"org-left\">Setosa<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-right\">2<\/td>\n<td class=\"org-right\">4.7<\/td>\n<td class=\"org-right\">3.2<\/td>\n<td class=\"org-right\">1.3<\/td>\n<td class=\"org-right\">0.2<\/td>\n<td class=\"org-left\">Setosa<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-right\">3<\/td>\n<td class=\"org-right\">4.6<\/td>\n<td class=\"org-right\">3.1<\/td>\n<td class=\"org-right\">1.5<\/td>\n<td class=\"org-right\">0.2<\/td>\n<td class=\"org-left\">Setosa<\/td>\n<\/tr>\n\n<tr>\n<td class=\"org-right\">4<\/td>\n<td class=\"org-right\">5<\/td>\n<td class=\"org-right\">3.6<\/td>\n<td class=\"org-right\">1.4<\/td>\n<td class=\"org-right\">0.2<\/td>\n<td class=\"org-left\">Setosa<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n<p> <a id=\"orgc90a1e5\"><\/a> <\/p>\n<\/div>\n<div id=\"outline-container-two-variables\" class=\"outline-4\">\n<h4 id=\"two-variables\">Two variables<\/h4>\n<div class=\"outline-text-4\" id=\"text-two-variables\">\n<p> the simplest way to look at the interaction between two of these features is the scatter plot <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">sns.scatterplot(iris,x<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"sepal_length\"<\/span>,y<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"sepal_width\"<\/span>)\n<\/pre>\n<\/div>\n\n<div id=\"orgeb3e32f\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/4c419a0ca12a4e26ba41985fdfac20af73b56257.png?ssl=1\" alt=\"4c419a0ca12a4e26ba41985fdfac20af73b56257.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org46b54a6\"><\/a> <\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-many-variables\" class=\"outline-4\">\n<h4 id=\"many-variables\">Many variables<\/h4>\n<div class=\"outline-text-4\" id=\"text-many-variables\">\n<p> the same can be done with all the features in a large simmetric matrix. <\/p>\n\n<p> In the diagonal are plotted histograms of the corresponding feature <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">sns.pairplot(iris)\n<\/pre>\n<\/div>\n\n<div id=\"org2a518d7\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/40e4e0e7a7353c852c5d91fb906062bb585cae19.png?ssl=1\" alt=\"40e4e0e7a7353c852c5d91fb906062bb585cae19.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org88decce\"><\/a> <\/p>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"outline-container-multivariate-mixed\" class=\"outline-3\">\n<h3 id=\"multivariate-mixed\">Multivariate Mixed<\/h3>\n<div class=\"outline-text-3\" id=\"text-multivariate-mixed\">\n<p> <a id=\"orgd927736\"><\/a> <\/p>\n<\/div>\n<div id=\"outline-container-one-continuous-variable-against-a-one-categorical-variable\" class=\"outline-4\">\n<h4 id=\"one-continuous-variable-against-a-one-categorical-variable\">One continuous variable against a one categorical variable<\/h4>\n<div class=\"outline-text-4\" id=\"text-one-continuous-variable-against-a-one-categorical-variable\">\n<p> box plots present a graphical synopsis of distributions grouped by a category <\/p>\n\n<ul class=\"org-ul\">\n<li>the middle line represent the median<\/li>\n<li>the top and bottom line of the box represent the 25th and 75th percentiles od the distribution<\/li>\n<li>the top and bottom whiskers are usually calculated in this way:\n\n<ol class=\"org-ol\">\n<li>select the most extreme sample value<\/li>\n<li>calculate the interquartile range i.e. the distance between the 25th and 75th percentiles<\/li>\n<li>multiply the interquartile range by 1.5 and sum to (or respectively subtract from) the median<\/li>\n<li>between the most extreme value and the value calculated at point 3 choose the one which is nearest to the median<\/li>\n<\/ol><\/li>\n<li>if the calculated value is chosen all samples which are farther from the mean are plotted as dot and may be interpreted as outliers<\/li>\n<\/ul>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">sns.boxplot(titanic,x<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"pclass\"<\/span>,y<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"Age\"<\/span>)\n<\/pre>\n<\/div>\n\n<div id=\"orgab1e05e\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/ec07ee2fc870feb9b837a8e21ba0fac0069235ca.png?ssl=1\" alt=\"ec07ee2fc870feb9b837a8e21ba0fac0069235ca.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org409d266\"><\/a> violin plots also show a smooth curve representng a continuous distribution calculated with kernel smoothing. <\/p>\n\n<p> This provides more visual information than box plot but may be effectively used only when the number of groups is limited <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">sns.violinplot(titanic,x<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"pclass\"<\/span>,y<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"Age\"<\/span>)\n<\/pre>\n<\/div>\n\n<div id=\"org46a4f74\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/ffdee419bb3798d2d38b21cf42559025e5b59f8e.png?ssl=1\" alt=\"ffdee419bb3798d2d38b21cf42559025e5b59f8e.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"orgb96fd00\"><\/a> <\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-many-continuous-variables-against-one-categorical-variable\" class=\"outline-4\">\n<h4 id=\"many-continuous-variables-against-one-categorical-variable\">Many continuous variables against one categorical variable<\/h4>\n<div class=\"outline-text-4\" id=\"text-many-continuous-variables-against-one-categorical-variable\">\n<p> the scatter matrix can show groups from a single category using colors <\/p>\n\n<p> The seaborn version also shows kernel density distributons <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\">sns.pairplot(iris,hue<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"variety\"<\/span>)\n<\/pre>\n<\/div>\n\n<div id=\"org13cd8ec\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/d434a97f3c99b61d058aae62790fa73ea533b7b4.png?ssl=1\" alt=\"d434a97f3c99b61d058aae62790fa73ea533b7b4.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"org16342fd\"><\/a> <\/p>\n<\/div>\n<\/div>\n<div id=\"outline-container-many-categorical-variables-against-one-or-more-continuous-variables\" class=\"outline-4\">\n<h4 id=\"many-categorical-variables-against-one-or-more-continuous-variables\">Many categorical variables against one or more continuous variables<\/h4>\n<div class=\"outline-text-4\" id=\"text-many-categorical-variables-against-one-or-more-continuous-variables\">\n<p> When dealing with multiple categorical variable is also possible to define a bidimensional grid. <\/p>\n\n<p> A plotting function can be applied on each subset represented in a given cell grid <\/p>\n\n<div class=\"org-src-container\">\n<label class=\"org-src-name\"><em><\/em><\/label>\n<pre class=\"src src-python\" id=\"nil\"><span style=\"color: #cdd6f4;\">g<\/span> <span style=\"color: #89dceb;\">=<\/span> sns.FacetGrid(titanic, col<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">\"sex\"<\/span>, row<span style=\"color: #89dceb;\">=<\/span><span style=\"color: #a6e3a1;\">'pclass'<\/span>)\ng.<span style=\"color: #f38ba8;\">map<\/span>(sns.histplot, <span style=\"color: #a6e3a1;\">\"Age\"<\/span>)\n<\/pre>\n<\/div>\n\n<div id=\"orgbb94fd5\" class=\"figure\"> <p><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/8a2ea5e8dd1009fa17e54573f5038253f725ff8d.png?ssl=1\" alt=\"8a2ea5e8dd1009fa17e54573f5038253f725ff8d.png\" \/> <\/p> <\/div>\n\n<p> <a id=\"orge95ca10\"><\/a> interestingly this representation shows the different age distribution as a function of the gender and the class of passengers <\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"After heading onto Jupyter and meeting the Pandas let's boldly go where no one has gone before!\n\nHere are some powerful tools to explore and discover new lifeforms into our data\n","protected":false},"author":1,"featured_media":647,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","inline_featured_image":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[4],"tags":[7],"class_list":["post-639","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-language-learning","tag-python"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data: the final frontier - Noise On The Net<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data: the final frontier - Noise On The Net\" \/>\n<meta property=\"og:description\" content=\"After heading onto Jupyter and meeting the Pandas let&#039;s boldly go where no one has gone before! Here are some powerful tools to explore and discover new lifeforms into our data\" \/>\n<meta property=\"og:url\" content=\"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/\" \/>\n<meta property=\"og:site_name\" content=\"Noise On The Net\" \/>\n<meta property=\"article:published_time\" content=\"2025-02-01T17:50:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-02-02T21:14:25+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"800\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"marco.p.v.vezzoli\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"marco.p.v.vezzoli\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/\"},\"author\":{\"name\":\"marco.p.v.vezzoli\",\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/#\\\/schema\\\/person\\\/88c3a70f2b23480197bc61d6e1e2e982\"},\"headline\":\"Data: the final frontier\",\"datePublished\":\"2025-02-01T17:50:00+00:00\",\"dateModified\":\"2025-02-02T21:14:25+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/\"},\"wordCount\":1127,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/#\\\/schema\\\/person\\\/88c3a70f2b23480197bc61d6e1e2e982\"},\"image\":{\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/noiseonthenet.space\\\/noise\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?fit=1200%2C800&ssl=1\",\"keywords\":[\"Python\"],\"articleSection\":[\"Language learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/\",\"url\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/\",\"name\":\"Data: the final frontier - Noise On The Net\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/i0.wp.com\\\/noiseonthenet.space\\\/noise\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?fit=1200%2C800&ssl=1\",\"datePublished\":\"2025-02-01T17:50:00+00:00\",\"dateModified\":\"2025-02-02T21:14:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/#primaryimage\",\"url\":\"https:\\\/\\\/i0.wp.com\\\/noiseonthenet.space\\\/noise\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?fit=1200%2C800&ssl=1\",\"contentUrl\":\"https:\\\/\\\/i0.wp.com\\\/noiseonthenet.space\\\/noise\\\/wp-content\\\/uploads\\\/2025\\\/02\\\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?fit=1200%2C800&ssl=1\",\"width\":1200,\"height\":800},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/2025\\\/02\\\/data-the-final-frontier\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data: the final frontier\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/#website\",\"url\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/\",\"name\":\"Noise On The Net\",\"description\":\"Sharing adventures in code\",\"publisher\":{\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/#\\\/schema\\\/person\\\/88c3a70f2b23480197bc61d6e1e2e982\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/#\\\/schema\\\/person\\\/88c3a70f2b23480197bc61d6e1e2e982\",\"name\":\"marco.p.v.vezzoli\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b9d9aab1df560bc14d73b0b442198f196ce39e7c7a38df1dc22fec0b97f17da9?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b9d9aab1df560bc14d73b0b442198f196ce39e7c7a38df1dc22fec0b97f17da9?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b9d9aab1df560bc14d73b0b442198f196ce39e7c7a38df1dc22fec0b97f17da9?s=96&d=mm&r=g\",\"caption\":\"marco.p.v.vezzoli\"},\"logo\":{\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b9d9aab1df560bc14d73b0b442198f196ce39e7c7a38df1dc22fec0b97f17da9?s=96&d=mm&r=g\"},\"description\":\"Self taught assembler programming at 11 on my C64 (1983). Never stopped since then -- always looking up for curious things in the software development, data science and AI. Linux and FOSS user since 1994. MSc in physics in 1996. Working in large semiconductor companies since 1997 (STM, Micron) developing analytics and full stack web infrastructures, microservices, ML solutions\",\"sameAs\":[\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/in\\\/marco-paolo-valerio-vezzoli-0663835\\\/\"],\"url\":\"https:\\\/\\\/noiseonthenet.space\\\/noise\\\/author\\\/marco-p-v-vezzoli\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data: the final frontier - Noise On The Net","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/","og_locale":"en_US","og_type":"article","og_title":"Data: the final frontier - Noise On The Net","og_description":"After heading onto Jupyter and meeting the Pandas let's boldly go where no one has gone before! Here are some powerful tools to explore and discover new lifeforms into our data","og_url":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/","og_site_name":"Noise On The Net","article_published_time":"2025-02-01T17:50:00+00:00","article_modified_time":"2025-02-02T21:14:25+00:00","og_image":[{"width":1200,"height":800,"url":"https:\/\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg","type":"image\/jpeg"}],"author":"marco.p.v.vezzoli","twitter_card":"summary_large_image","twitter_misc":{"Written by":"marco.p.v.vezzoli","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/#article","isPartOf":{"@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/"},"author":{"name":"marco.p.v.vezzoli","@id":"https:\/\/noiseonthenet.space\/noise\/#\/schema\/person\/88c3a70f2b23480197bc61d6e1e2e982"},"headline":"Data: the final frontier","datePublished":"2025-02-01T17:50:00+00:00","dateModified":"2025-02-02T21:14:25+00:00","mainEntityOfPage":{"@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/"},"wordCount":1127,"commentCount":0,"publisher":{"@id":"https:\/\/noiseonthenet.space\/noise\/#\/schema\/person\/88c3a70f2b23480197bc61d6e1e2e982"},"image":{"@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?fit=1200%2C800&ssl=1","keywords":["Python"],"articleSection":["Language learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/","url":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/","name":"Data: the final frontier - Noise On The Net","isPartOf":{"@id":"https:\/\/noiseonthenet.space\/noise\/#website"},"primaryImageOfPage":{"@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/#primaryimage"},"image":{"@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?fit=1200%2C800&ssl=1","datePublished":"2025-02-01T17:50:00+00:00","dateModified":"2025-02-02T21:14:25+00:00","breadcrumb":{"@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/#primaryimage","url":"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?fit=1200%2C800&ssl=1","contentUrl":"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?fit=1200%2C800&ssl=1","width":1200,"height":800},{"@type":"BreadcrumbList","@id":"https:\/\/noiseonthenet.space\/noise\/2025\/02\/data-the-final-frontier\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/noiseonthenet.space\/noise\/"},{"@type":"ListItem","position":2,"name":"Data: the final frontier"}]},{"@type":"WebSite","@id":"https:\/\/noiseonthenet.space\/noise\/#website","url":"https:\/\/noiseonthenet.space\/noise\/","name":"Noise On The Net","description":"Sharing adventures in code","publisher":{"@id":"https:\/\/noiseonthenet.space\/noise\/#\/schema\/person\/88c3a70f2b23480197bc61d6e1e2e982"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/noiseonthenet.space\/noise\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/noiseonthenet.space\/noise\/#\/schema\/person\/88c3a70f2b23480197bc61d6e1e2e982","name":"marco.p.v.vezzoli","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/b9d9aab1df560bc14d73b0b442198f196ce39e7c7a38df1dc22fec0b97f17da9?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/b9d9aab1df560bc14d73b0b442198f196ce39e7c7a38df1dc22fec0b97f17da9?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b9d9aab1df560bc14d73b0b442198f196ce39e7c7a38df1dc22fec0b97f17da9?s=96&d=mm&r=g","caption":"marco.p.v.vezzoli"},"logo":{"@id":"https:\/\/secure.gravatar.com\/avatar\/b9d9aab1df560bc14d73b0b442198f196ce39e7c7a38df1dc22fec0b97f17da9?s=96&d=mm&r=g"},"description":"Self taught assembler programming at 11 on my C64 (1983). Never stopped since then -- always looking up for curious things in the software development, data science and AI. Linux and FOSS user since 1994. MSc in physics in 1996. Working in large semiconductor companies since 1997 (STM, Micron) developing analytics and full stack web infrastructures, microservices, ML solutions","sameAs":["https:\/\/noiseonthenet.space\/noise\/","https:\/\/www.linkedin.com\/in\/marco-paolo-valerio-vezzoli-0663835\/"],"url":"https:\/\/noiseonthenet.space\/noise\/author\/marco-p-v-vezzoli\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/noiseonthenet.space\/noise\/wp-content\/uploads\/2025\/02\/philipp-dusel-Mbfhs0u4YQ-unsplash.jpg?fit=1200%2C800&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pdDUZ5-aj","jetpack-related-posts":[],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/posts\/639","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/comments?post=639"}],"version-history":[{"count":6,"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/posts\/639\/revisions"}],"predecessor-version":[{"id":650,"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/posts\/639\/revisions\/650"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/media\/647"}],"wp:attachment":[{"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/media?parent=639"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/categories?post=639"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/noiseonthenet.space\/noise\/wp-json\/wp\/v2\/tags?post=639"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}