Posts

Showing posts from October, 2025

Time Series Visualization: Why It Matters

Image
Time series data is everywhere in our world - stock prices, weather patterns, economic indicators, and even competitive eating records. But raw numbers in a spreadsheet don't tell us much. Visualization transforms these numbers into stories we can actually understand. Working with Real Data I recently explored time series visualization using R and ggplot2, working with two interesting datasets that show how powerful visual analysis can be. The Hot Dog Eating Contest The first dataset tracked Nathan's Famous Hot Dog Eating Contest from 1980 to 2010. At first glance, it's just a list of years and numbers. But when visualized as a bar chart with color coding for record-breaking years, patterns jump out immediately. You can see the competition was relatively stable for decades, then suddenly exploded in the 2000s when competitive eating became more serious and professional. Here's the same visualization using ggplot2: The visual instantly shows what would take paragraphs to...

Building My First R Package: hamad

I recently created my first R package called hamad, and it's been a solid learning experience. The package focuses on streamlining exploratory data analysis with practical utility functions. What the Package Does The hamad package includes three core functions designed to speed up initial data assessment: summary_stats() - Automatically identifies and summarizes all numeric variables in a dataset. This eliminates the need to manually select columns every time you want a quick overview. missing_report() - Generates a comprehensive report showing which variables contain missing values, along with counts and percentages. This helps identify data quality issues upfront. quick_hist() - Creates histogram visualizations for any variable, making it easy to quickly examine distributions during exploratory analysis. These functions address common tasks that come up repeatedly in data analysis workflows. Package Design Decisions The DESCRIPTION file required several key decisions: Dependencie...

Assignment9

Image
  Comparing Base R, Lattice, and ggplot2 I used the Guns dataset to explore how concealed carry laws relate to crime rates across US states from 1977-1999. Syntax Differences Base R uses simple commands like  plot()  and  boxplot()  but requires multiple steps to customize visualizations. Lattice introduced a formula syntax that made creating panel plots much easier, especially with the  |  operator for conditioning. ggplot2 uses layered grammar with  +  to build plots, which felt more logical once I got used to it. Which System Works Best ggplot2 produced the cleanest output with less code. The default styling looked professional and adding features like smoothing lines was straightforward. Lattice worked well for conditioned plots but felt limited outside that use case. Base R gave me direct control but required more manual work for decent results. Challenges Switching between systems was harder than expected. Each has different logic for t...

Weight, Engine Size, and Fuel Efficiency in the mtcars Dataset

Image
 I used the mtcars dataset to examine how vehicle weight and engine configuration affect fuel economy. The visualization displays three variables: weight (x-axis), miles per gallon (y-axis), and cylinder count shown through color-coded regression lines. The data reveals a clear pattern: heavier cars consume more fuel across all engine types, but 8-cylinder vehicles consistently perform worse than 4-cylinder cars at any given weight. The parallel slopes demonstrate that both weight and engine size independently impact efficiency. Multivariate visualization effectively uncovered relationships that single-variable plots would miss. I applied three design principles: contrast using intuitive colors (green for efficient engines, red for inefficient ones), hierarchy through a descriptive title that presents the main finding upfront, and simplicity by removing distracting elements like horsepower encoding. The result is a clear, focused visualization that tells a complete story about how ...

Module 8

Image
 This assignment shows how to load data, calculate averages by groups, find specific patterns in text, and save results to files. Part 1: Load the Data library(plyr) > student6 = read.table(file.choose(), header = TRUE, stringsAsFactors = FALSE) > str(student6) 'data.frame': 20 obs. of  1 variable:  $ Name.Age.Sex.Grade: chr  "Raul,25,Male,80" "Booker,18,Male,83" "Lauri,21,Female,90" "Leonie,21,Female,91" ... > head(student6)     Name.Age.Sex.Grade 1      Raul,25,Male,80 2    Booker,18,Male,83 3   Lauri,21,Female,90 4  Leonie,21,Female,91 5 Sherlyn,22,Female,85 6 Mikaela,20,Female,69 What it does:  Loads the student data file into R and checks that it imported correctly with 20 students and their information. Part 2: Calculate Average Grades  #Step 2 > student6 = read.table(file.choose(), header = TRUE, stringsAsFactors = FALSE, sep = ",") > str(student6) 'data.frame': 20 obs. of  4 variab...

Analyzing Fuel Efficiency: Relationships in the mtcars Dataset

Image
  What patterns or relationships did you observe? The mtcars dataset revealed clear patterns about fuel efficiency. Weight had the strongest negative correlation with mpg at -0.87, meaning heavier cars consistently get worse gas mileage. Cylinders (-0.85), displacement (-0.85), and horsepower (-0.78) showed similar negative relationships. The regression analysis confirmed these patterns. A simple model using just weight explained 75% of the MPG variation, while adding horsepower and displacement increased this to 83%. Interestingly, displacement became statistically insignificant in the multiple regression model, suggesting these variables overlap in measuring vehicle "bigness." How did your use of grid layout enhance interpretation? The 2x2 grid layout made comparing relationships much easier than viewing plots separately. I could immediately see that all four variables showed downward-sloping patterns, confirming negative relationships with mpg. The grid also revealed diffe...

Generic Functions in R

What I Did   This assignment helped me learn how R's generic functions work and the difference between S3 and S4 objects. I used the mtcars dataset to see how functions change behavior based on what kind of object they're working with.   Loading the Dataset   First, I loaded mtcars data(" mtcars ") head( mtcars ) str( mtcars )   This dataset has 11 columns with things like miles per gallon, horsepower, Etc . How Generic Functions Work   I tested some basic R functions to see how they adapt to different objects:   summary( mtcars ) plot( mtcars )   lm_model = lm (mpg ~ wt + hp, data = mtcars ) summary( lm_model )   What I noticed: when you run summary( ) on a data frame, you get min/max/mean values. But when you run it on a linear model, you get totally different outputs, like coefficients and p-values. Same function name, different behavior!   Creating S3 Objects   S3 is the simpler object system. Here's how I made one:   s3...