2023 Stack Overflow Developer Survey Analysis

Overview

Stack Overflow is a question and answer website for programmers, whose sole purpose is to empower the world develop technology through collective knowledge. The annual developer survey contains a wide range of information; from basic information such as age and education level to how developers learn, level up, and the tools they use.

Load the libraries

library(tidyverse)
library(ggrepel)

Import the dataset

survey <- read_csv("F:\\Tutorials\\R tutorials\\R MARKDOWN\\surveyResults.csv")
survey

## # A tibble: 89,184 x 84
##    ResponseId Q120    MainBranch    Age   Employment RemoteWork CodingActivities
##         <dbl> <chr>   <chr>         <chr> <chr>      <chr>      <chr>           
##  1          1 I agree None of these 18-2~ <NA>       <NA>       <NA>            
##  2          2 I agree I am a devel~ 25-3~ Employed,~ Remote     Hobby;Contribut~
##  3          3 I agree I am a devel~ 45-5~ Employed,~ Hybrid (s~ Hobby;Professio~
##  4          4 I agree I am a devel~ 25-3~ Employed,~ Hybrid (s~ Hobby           
##  5          5 I agree I am a devel~ 25-3~ Employed,~ Remote     Hobby;Contribut~
##  6          6 I agree I am a devel~ 35-4~ Employed,~ Remote     Hobby;Professio~
##  7          7 I agree I am a devel~ 35-4~ Employed,~ Remote     Hobby;Contribut~
##  8          8 I agree I am a devel~ 25-3~ Employed,~ Remote     Hobby           
##  9          9 I agree I am not pri~ 45-5~ Employed,~ Hybrid (s~ Hobby;Contribut~
## 10         10 I agree I am a devel~ 25-3~ Not emplo~ <NA>       <NA>            
## # i 89,174 more rows
## # i 77 more variables: EdLevel <chr>, LearnCode <chr>, LearnCodeOnline <chr>,
## #   LearnCodeCoursesCert <chr>, YearsCode <chr>, YearsCodePro <chr>,
## #   DevType <chr>, OrgSize <chr>, PurchaseInfluence <chr>, TechList <chr>,
## #   BuyNewTool <chr>, Country <chr>, Currency <chr>, CompTotal <dbl>,
## #   LanguageHaveWorkedWith <chr>, LanguageWantToWorkWith <chr>,
## #   DatabaseHaveWorkedWith <chr>, DatabaseWantToWorkWith <chr>, ...

View(survey)
glimpse(survey)

## Rows: 89,184
## Columns: 84
## $ ResponseId                            <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1~
## $ Q120                                  <chr> "I agree", "I agree", "I agree",~
## $ MainBranch                            <chr> "None of these", "I am a develop~
## $ Age                                   <chr> "18-24 years old", "25-34 years ~
## $ Employment                            <chr> NA, "Employed, full-time", "Empl~
## $ RemoteWork                            <chr> NA, "Remote", "Hybrid (some remo~
## $ CodingActivities                      <chr> NA, "Hobby;Contribute to open-so~
## $ EdLevel                               <chr> NA, "Bachelor’s degree (B.A., B.~
## $ LearnCode                             <chr> NA, "Books / Physical media;Coll~
## $ LearnCodeOnline                       <chr> NA, "Formal documentation provid~
## $ LearnCodeCoursesCert                  <chr> NA, "Other", NA, NA, "Other;Code~
## $ YearsCode                             <chr> NA, "18", "27", "12", "6", "21",~
## $ YearsCodePro                          <chr> NA, "9", "23", "7", "4", "21", "~
## $ DevType                               <chr> NA, "Senior Executive (C-Suite, ~
## $ OrgSize                               <chr> NA, "2 to 9 employees", "5,000 t~
## $ PurchaseInfluence                     <chr> NA, "I have a great deal of infl~
## $ TechList                              <chr> NA, "Investigate", "Given a list~
## $ BuyNewTool                            <chr> NA, "Start a free trial;Ask deve~
## $ Country                               <chr> NA, "United States of America", ~
## $ Currency                              <chr> NA, "USD\tUnited States dollar",~
## $ CompTotal                             <dbl> NA, 285000, 250000, 156000, 1320~
## $ LanguageHaveWorkedWith                <chr> NA, "HTML/CSS;JavaScript;Python"~
## $ LanguageWantToWorkWith                <chr> NA, "Bash/Shell (all shells);C#;~
## $ DatabaseHaveWorkedWith                <chr> NA, "Supabase", NA, "PostgreSQL;~
## $ DatabaseWantToWorkWith                <chr> NA, "Firebase Realtime Database;~
## $ PlatformHaveWorkedWith                <chr> NA, "Amazon Web Services (AWS);N~
## $ PlatformWantToWorkWith                <chr> NA, "Fly.io;Netlify;Render", NA,~
## $ WebframeHaveWorkedWith                <chr> NA, "Next.js;React;Remix;Vue.js"~
## $ WebframeWantToWorkWith                <chr> NA, "Deno;Elm;Nuxt.js;React;Svel~
## $ MiscTechHaveWorkedWith                <chr> NA, "Electron;React Native;Tauri~
## $ MiscTechWantToWorkWith                <chr> NA, "Capacitor;Electron;Tauri;Un~
## $ ToolsTechHaveWorkedWith               <chr> NA, "Docker;Kubernetes;npm;Pip;V~
## $ ToolsTechWantToWorkWith               <chr> NA, "Godot;npm;pnpm;Unity 3D;Unr~
## $ NEWCollabToolsHaveWorkedWith          <chr> NA, "Vim;Visual Studio Code", "E~
## $ NEWCollabToolsWantToWorkWith          <chr> NA, "Vim;Visual Studio Code", "E~
## $ `OpSysPersonal use`                   <chr> NA, "iOS;iPadOS;MacOS;Windows;Wi~
## $ `OpSysProfessional use`               <chr> NA, "MacOS;Windows;Windows Subsy~
## $ OfficeStackAsyncHaveWorkedWith        <chr> NA, "Asana;Basecamp;GitHub Discu~
## $ OfficeStackAsyncWantToWorkWith        <chr> NA, "GitHub Discussions;Linear;N~
## $ OfficeStackSyncHaveWorkedWith         <chr> NA, "Cisco Webex Teams;Discord;G~
## $ OfficeStackSyncWantToWorkWith         <chr> NA, "Discord;Signal;Slack;Zoom",~
## $ AISearchHaveWorkedWith                <chr> NA, "ChatGPT", NA, NA, "ChatGPT"~
## $ AISearchWantToWorkWith                <chr> NA, "ChatGPT;Neeva AI", NA, NA, ~
## $ AIDevHaveWorkedWith                   <chr> NA, "GitHub Copilot", NA, NA, NA~
## $ AIDevWantToWorkWith                   <chr> NA, "GitHub Copilot", NA, NA, NA~
## $ NEWSOSites                            <chr> NA, "Stack Overflow;Stack Exchan~
## $ SOVisitFreq                           <chr> NA, "Daily or almost daily", "A ~
## $ SOAccount                             <chr> NA, "Yes", "Yes", "Yes", "No", "~
## $ SOPartFreq                            <chr> NA, "A few times per month or we~
## $ SOComm                                <chr> NA, "Yes, definitely", "Neutral"~
## $ SOAI                                  <chr> NA, "I don't think it's super ne~
## $ AISelect                              <chr> NA, "Yes", "No, and I don't plan~
## $ AISent                                <chr> NA, "Indifferent", NA, NA, "Very~
## $ AIAcc                                 <chr> NA, "Other (please explain)", NA~
## $ AIBen                                 <chr> NA, "Somewhat distrust", NA, NA,~
## $ `AIToolInterested in Using`           <chr> NA, "Learning about a codebase;W~
## $ `AIToolCurrently Using`               <chr> NA, "Writing code;Committing and~
## $ `AIToolNot interested in Using`       <chr> NA, NA, NA, NA, NA, "Project pla~
## $ `AINextVery different`                <chr> NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ `AINextNeither different nor similar` <chr> NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ `AINextSomewhat similar`              <chr> NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ `AINextVery similar`                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ `AINextSomewhat different`            <chr> NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ TBranch                               <chr> NA, "Yes", "Yes", "Yes", "Yes", ~
## $ ICorPM                                <chr> NA, "People manager", "Individua~
## $ WorkExp                               <dbl> NA, 10, 23, 7, 6, 22, 4, 5, NA, ~
## $ Knowledge_1                           <chr> NA, "Strongly agree", "Strongly ~
## $ Knowledge_2                           <chr> NA, "Agree", "Neither agree nor ~
## $ Knowledge_3                           <chr> NA, "Strongly agree", "Agree", "~
## $ Knowledge_4                           <chr> NA, "Agree", "Agree", "Strongly ~
## $ Knowledge_5                           <chr> NA, "Agree", "Agree", "Agree", "~
## $ Knowledge_6                           <chr> NA, "Agree", "Agree", "Neither a~
## $ Knowledge_7                           <chr> NA, "Agree", "Agree", "Agree", "~
## $ Knowledge_8                           <chr> NA, "Strongly agree", "Agree", "~
## $ Frequency_1                           <chr> NA, "1-2 times a week", "6-10 ti~
## $ Frequency_2                           <chr> NA, "10+ times a week", "6-10 ti~
## $ Frequency_3                           <chr> NA, "Never", "3-5 times a week",~
## $ TimeSearching                         <chr> NA, "15-30 minutes a day", "30-6~
## $ TimeAnswering                         <chr> NA, "15-30 minutes a day", "30-6~
## $ ProfessionalTech                      <chr> NA, "DevOps function;Microservic~
## $ Industry                              <chr> NA, "Information Services, IT, S~
## $ SurveyLength                          <chr> NA, "Appropriate in length", "Ap~
## $ SurveyEase                            <chr> NA, "Easy", "Easy", "Easy", "Nei~
## $ ConvertedCompYearly                   <dbl> NA, 285000, 250000, 156000, 2345~

Data Cleaning and Preparation Tasks

Select columns to be used in the analysis
Check for duplicates and null values
Recode values in columns in Education Level column
Fix the YearsCodePro column and convert it to numeric form

1. Create new dataframe

survey_new <- survey %>% 
  select(-c(Q120, TechList, BuyNewTool, DatabaseWantToWorkWith, WebframeWantToWorkWith, MiscTechHaveWorkedWith, MiscTechWantToWorkWith, ToolsTechHaveWorkedWith, ToolsTechWantToWorkWith, NEWCollabToolsHaveWorkedWith, NEWCollabToolsWantToWorkWith, `OpSysPersonal use`, contains("OfficeStack"), contains("AIDev"), AISearchWantToWorkWith, SOAI, contains("AINext"), TBranch, contains("Knowledge"), contains("Frequency"), TimeAnswering, TimeSearching, SurveyLength, SurveyEase))

View(survey_new)

2. Check for null values and duplicates in the data set

# Check for duplicates
survey_new %>% 
  summarize(dups = sum(duplicated(.)))

#Use skimr to check null values
skimr::skim(survey_new)

There were no duplicates in this data set, and majority of the columns contain null values

3. Fix the Education level column

survey_new %>% 
  count(EdLevel)

## # A tibble: 9 x 2
##   EdLevel                                                                      n
##   <chr>                                                                    <int>
## 1 Associate degree (A.A., A.S., etc.)                                       2807
## 2 Bachelor’s degree (B.A., B.S., B.Eng., etc.)                             36706
## 3 Master’s degree (M.A., M.S., M.Eng., MBA, etc.)                          20543
## 4 Primary/elementary school                                                 1905
## 5 Professional degree (JD, MD, Ph.D, Ed.D, etc.)                            3887
## 6 Secondary school (e.g. American high school, German Realschule or Gymna~  8897
## 7 Some college/university study without earning a degree                   11753
## 8 Something else                                                            1475
## 9 <NA>                                                                      1211

survey_new <- survey_new %>% 
  mutate(EdLevel = fct_recode(EdLevel, 
                              "Associate degree" = "Associate degree (A.A., A.S., etc.)",
                              "Bachelor’s degree" = "Bachelor’s degree (B.A., B.S., B.Eng., etc.)",
                              "Master’s degree" = "Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",
                              "Professional degree" = "Professional degree (JD, MD, Ph.D, Ed.D, etc.)",
                              "Secondary school" = "Secondary school (e.g. American high school, German Realschule or Gymnasium, etc.)",
                              "Some college / University" = "Some college/university study without earning a degree"))

4. Fix the YearsCodePro column, convert it to numeric

unique(survey_new$YearsCodePro)

##  [1] NA                   "9"                  "23"                
##  [4] "7"                  "4"                  "21"                
##  [7] "3"                  "15"                 "Less than 1 year"  
## [10] "10"                 "2"                  "6"                 
## [13] "14"                 "5"                  "19"                
## [16] "13"                 "16"                 "28"                
## [19] "1"                  "30"                 "11"                
## [22] "8"                  "25"                 "32"                
## [25] "24"                 "40"                 "17"                
## [28] "45"                 "29"                 "12"                
## [31] "31"                 "20"                 "18"                
## [34] "50"                 "27"                 "43"                
## [37] "22"                 "26"                 "38"                
## [40] "33"                 "44"                 "35"                
## [43] "34"                 "37"                 "42"                
## [46] "41"                 "More than 50 years" "47"                
## [49] "36"                 "39"                 "48"                
## [52] "46"                 "49"

#There are two strings in this column: "Less than 1 year" and "More than 50 years", I replaced them with 1 and 51 respectively

survey_new <- survey_new %>% 
  mutate(
    across(YearsCodePro,
           .fns = ~as.numeric(str_replace_all(YearsCodePro, c("Less than 1 year" = "1", "More then 50 years" = "51"))))
  )

Data Analysis / EDA

1. What is the age distribution of developers?

age_distrn <- survey_new %>% 
  group_by(Age) %>% 
  summarize(n = n())

class(age_distrn$Age)

## [1] "character"

age_distrn$Age <- factor(age_distrn$Age, levels = c(
"Under 18 years old", "18-24 years old", "25-34 years old", "35-44 years old", "45-54 years old", "55-64 years old", "65 years or older", "Prefer not to say"))

ggplot(age_distrn, aes(Age, n)) +
  geom_col(aes(fill = fct_reorder(Age, n))) +
   geom_text(aes(label = n), vjust = -0.05, position = position_stack(vjust = 0.5)) +
  labs(
    title = "Age Distribution of developers",
    subtitle = "(Most developers are aged between 25 - 34 years)",
    x = "Age Group", 
    y = "Count"
  ) +
  theme_minimal() +
  theme(legend.position = "none",
        axis.text = element_text(angle = 90),
        plot.subtitle = element_text(size = 10, face = "bold"),
        plot.title = element_text(size = 14, face = "bold"),
        axis.title = element_text(size = 12, face = "bold")) +
  scale_fill_brewer(palette = "Paired")

Insights

More than 33,000 developers are aged between 25 - 34 years, which are the majority, and only 1171 are aged above 65 years.

2. What are the top 10 countries with the most developers?

library(colourpicker)
library(RColorBrewer)

#First recode countries with long names to abbreviated forms, then use slice_max() to obtain the top 10 countries with most developers

Top_Countries <- survey_new %>% 
  mutate(Country = fct_recode(Country, "UK" = "United Kingdom of Great Britain and Northern Ireland",
                              "US" = "United States of America")) %>% 
  group_by(Country) %>%
  summarize(top10 = n()) %>% 
  arrange(desc(top10)) %>% 
  slice_max(top10, n = 10)

#Plot a bar plot showing the distribution of developers

Top_Countries %>% 
  ggplot(aes(fct_rev(fct_reorder(Country, top10)), top10)) +
  geom_col(aes(fill = fct_reorder(Country, top10))) +
  geom_text(aes(label = top10), vjust = -0.1, position = position_stack(vjust = 0.5)) +
  labs(
    title = "Distribution of Developers Across the World",
    subtitle = "(United States of America has the most Developers in the world)",
    x = "Country",
    y = "Count"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90),
        plot.title = element_text(size = 14, face = "bold"),
        plot.subtitle = element_text(size = 10, face = "bold", colour = ("#333333")),
        axis.title = element_text(size = 12, face = "bold"),
        legend.position = "none") +
  scale_fill_brewer(palette = "Paired")

Insights

United States of America has the largest number of developers, followed by Germany. Notably, US has double the number of developers compared to Germany.

3. What are the most used programming languages among developers? (Top 10)

library(knitr)
library(kableExtra)

#Use separate_rows() to get the correct number of languages each developer has used

Most_used <- survey_new %>% 
  separate_rows(LanguageHaveWorkedWith, sep = ";") %>% 
  group_by(LanguageHaveWorkedWith) %>% 
  summarize(Language_count = n()) %>% 
  arrange(desc(Language_count)) %>% 
  slice_max(Language_count, n = 10) 

#Plot a table for the top 10 most used languages

Most_used %>% 
  kable(digits = 0, format = "html", caption = "Javascript is the most Widely used Language among Developers") %>% 
  kable_classic("striped", "bordered", full_width = FALSE, html_font = "cambria",
                position = "left",
                fixed_thead = T) %>%
  row_spec(0, bold = TRUE) %>%
  column_spec(1, bold = TRUE, color = "black", background = ("#F0E68C")) %>% 
  column_spec(2, color = "black", background = "lightblue")

Javascript is the most Widely used Language among Developers
LanguageHaveWorkedWith	Language_count
JavaScript	55711
HTML/CSS	46396
Python	43158
SQL	42623
TypeScript	34041
Bash/Shell (all shells)	28351
Java	26757
C#	24193
C++	19634
C	16940

4. What is the most popular method of learning to code?

#Use separate_rows() to separate the different learning methods, then recode the long labels in Learn code column to enable easy plotting 

method_code <- survey_new %>% 
  filter(!is.na(LearnCode)) %>% 
  separate_rows(LearnCode, sep = ";") %>% 
  group_by(LearnCode) %>% 
  summarize(n = n()) %>% 
  ungroup() %>% 
  mutate(LearnCode = fct_recode(LearnCode, 
                                "Other online resources\n(e.g., videos, blogs, forum)" = "Other online resources (e.g., videos, blogs, forum)",
                                "School\n(i.e., University, College, etc)" = "School (i.e., University, College, etc)",
                                "Hackathons\n(virtual or in-person)" = "Hackathons (virtual or in-person)"),
         percent = round(n / sum(n), 2)) %>%
  arrange(desc(percent))

method_code

## # A tibble: 10 x 3
##    LearnCode                                                  n percent
##    <fct>                                                  <int>   <dbl>
##  1 "Other online resources\n(e.g., videos, blogs, forum)" 70244    0.24
##  2 "Books / Physical media"                               45406    0.15
##  3 "Online Courses or Certification"                      43201    0.15
##  4 "School\n(i.e., University, College, etc)"             43957    0.15
##  5 "On the job training"                                  40380    0.14
##  6 "Colleague"                                            20523    0.07
##  7 "Coding Bootcamp"                                       8602    0.03
##  8 "Friend or family member"                               9936    0.03
##  9 "Hackathons\n(virtual or in-person)"                    7033    0.02
## 10 "Other (please specify):"                               5451    0.02

#Use a bar plot to visualize the results
ggplot(method_code, aes(fct_reorder(LearnCode, percent), percent)) +
  geom_col(aes(fill = LearnCode)) +
  coord_flip() +
  scale_y_continuous(labels = scales::percent) +
  geom_text(aes(label = percent), vjust = 0.5, position = position_stack(vjust = 0.5)) + 
  labs(
    title = "Most Popular methods of Learning to Code",
    subtitle = "(Other online resouces, e.g videos and blogs are the most popular method)",
    x = "Method of Learning",
    y = "Percent"
  ) +
  theme_minimal() +
  theme(legend.position = "none",
        plot.title = element_text(size = 14, face = "bold"),
        axis.title = element_text(size = 12, face = "bold"),
        plot.subtitle = element_text(size = 10, face = "bold")) +
  scale_fill_brewer(palette = "Paired")

Insights

24% of the developers learned to code through “Other online resources (e.g, videos, blogs, and forum), Thus it was the most popular method of learning.

5. Is there a significant difference between salaries earned by data scientists and data analysts?

#First calculate the average and median salary across all DevType groups

survey_new %>% 
  group_by(DevType) %>%
  summarize(avg_sal = mean(ConvertedCompYearly, na.rm = TRUE),
            median_sal = median(ConvertedCompYearly, na.rm = TRUE)) %>% 
  arrange(desc(avg_sal)) %>% 
  view()

#Use a density plot to check if data for Data scientist and Data/Business analyst follows a normal distribution

survey_new %>% 
  filter(DevType %in% c("Data scientist or machine learning specialist", "Data or business analyst")) %>%
  ggplot(aes(ConvertedCompYearly)) +
  geom_density() +
  facet_wrap(~DevType)

#Since the data is not normally distributed, perform a mann whitney u test to check if there is a significant difference in salaries 

#Null Hypothesis : The rank sums of salary of data scientists and data analysts do not differ significantly
#Alternative Hypothesis : The rank sums of salary of data scientists and data analysts do differ significantly
#p-value = 0.05

survey_new %>%
  filter(DevType %in% c("Data scientist or machine learning specialist", "Data or business analyst")) %>%
  wilcox.test(ConvertedCompYearly ~ DevType, data = ., 
              alternative = "two.sided", paired = FALSE)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  ConvertedCompYearly by DevType
## W = 174417, p-value = 2.769e-11
## alternative hypothesis: true location shift is not equal to 0

From above test, the p-value is less than 0.05, thus we reject the null hypothesis and conclude that there is a statistically significant difference in the Yearly Compensation of the two groups.

6. At what companies do developers get paid the most

#Filter out null values
#I focused on the top 5 most paying companies and used median to compare the Yearly compensation across the companies
#Used median because the data contains a few large and very small Yearly compensations, which may skew the results if mean is used

paid_most<- survey_new %>% 
  filter(!is.na(PlatformHaveWorkedWith) & !is.na(ConvertedCompYearly) & ConvertedCompYearly > 1000 & ConvertedCompYearly < 1000000) %>% 
  select(PlatformHaveWorkedWith, ConvertedCompYearly) %>% 
  separate_rows(PlatformHaveWorkedWith, sep = ";") %>% 
  group_by(PlatformHaveWorkedWith) %>%
  mutate(median_pay = median(ConvertedCompYearly)) %>% 
  filter(median_pay >= 83000)

View(paid_most)

#Plot a boxplot to visualize the salary distribution

ggplot(paid_most, aes(fct_reorder(PlatformHaveWorkedWith, ConvertedCompYearly), ConvertedCompYearly)) +
  geom_boxplot(aes(fill = PlatformHaveWorkedWith)) +
  coord_flip() +
  labs(
    title = "Distribution of Yearly Compensation of Developers\n Under Different Companies",
    x = "Company",
    y = "Yearly Compensation"
  ) + 
  theme_minimal() +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    axis.title = element_text(size = 12, face = "bold")
  )

Insights

Colocation has the highest median Yearly compensation of approximately 105,000. It is the highest paying company, followed closely by Fly.io and Amazon Web Services.

7.Are you more likely to get a job as a developer if you have a masters degree?

#First select only developers with a masters degree, then create 3 new levels of employment status: Employed, Unemployed, and Other

Employement_masters <- survey_new %>% 
  filter(EdLevel == "Master’s degree" & !is.na(Employment)) %>% 
  separate_rows(Employment, sep = ";") %>% 
  mutate(Employment = fct_collapse(Employment, 
                                   Employed = c("Employed, full-time", "Employed, part-time"),
                                   UnemployedORFreelancers = c("I prefer not to say", "Independent contractor, freelancer, or self-employed", "Not employed, but looking for work"),
                                   Other = c("Not employed, and not looking for work", "Retired", "Student, full-time", "Student, part-time"))) %>% 
  group_by(Employment) %>% 
  summarize(n = n()) %>% 
  ungroup() %>% 
  mutate(prop = round(n / sum(n), 2))

View(Employement_masters)

#Plot a table using kable Extra package

Employement_masters %>% 
    kable(digits = 2, format = "html", caption = "Likelihood of getting a job\n if you have a masters degree") %>% 
  kable_classic("striped", "bordered", full_width = FALSE, html_font = "cambria",
                position = "left",
                fixed_thead = T) %>%
  row_spec(0, bold = TRUE) %>%
  column_spec(1, bold = TRUE, color = "black", background = ("#F0E68C")) %>% 
  column_spec(2, color = "black", background = "lightblue")

Likelihood of getting a job if you have a masters degree
Employment	n	prop
Employed	17390	0.76
UnemployedORFreelancers	4081	0.18
Other	1481	0.06

Insights

76% of developers who have a master’s degree are employed either full time or part time, thus, their is a higher chance of securing a job as a developer if you have a masters degree.

8. How does coding experience affect the level of pay?

survey_new %>% 
  filter(!is.na(YearsCodePro) & !is.na(ConvertedCompYearly) & ConvertedCompYearly <= 1000000) %>% 
  ggplot(aes(YearsCodePro, ConvertedCompYearly)) +
  geom_point(position = "jitter", aes(color = Age)) +
  geom_smooth(se = FALSE) +
  scale_y_log10() +
  labs(
    title = "Relationship between Coding Experience and Yearly\n Compensation for Developers",
    x = "Years Professionaly Coded",
    y = "Yearly Compensation"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", hjust = 0.5),
    axis.title = element_text(size = 12, face = "bold")
  )

Insights

Looking at the above figure, there appears to be a non linear relationship between Yearly compensation of developers and work experience.

Conclusion

Based on our dataset, majority of the developers (33,247) are aged between 25 - 34 years, and as expected, developers aged above 65 years were the least. Out of the 89,184 developers who participated in the survey 18,647 of them were from the United States of America, Germany came second, followed closely by India and the United Kingdom.
Javascript is the most widely used programming language among developers, HTML/CSS closely follows. Intrestingly, Python has gained more popularity than SQL over the past year, compared to the 2022 Developer Survey.
24% of the developers indicated that they use Other online resources (e.g, videos, blogs, and forums) to learn to code, making it the most popular method. School, i.e, Universities/colleges, Books/Physical media, and Online courses / Certifications had 15 % popularity consecutively among developers as their go to method for learning how to code.
The Mann Whitney u test indicated that there is a statistically significant difference between the Yearly Compensation given to data scientists and data/business analysts.
The survey also indicated that Cocolation, Fly.io, Amazon Web Services, Linode, and Cloudfare were the best paying companies to developers. With Cocolation having the highest median annual pay.

Data Source Link

The dataset used in this analysis was obtained from Kaggle.com and can be found here: [id]: https://www.kaggle.com/datasets/stackoverflow/stack-overflow-2023-developers-survey

StackOverflow Survey

Benard Omido

2023-12-04

2023 Stack Overflow Developer Survey Analysis

Overview

Data Cleaning and Preparation Tasks

Data Analysis / EDA

Conclusion