Project 2: Shiny Dashboard app - Data Scientist Salary Comparator
Contributed by Sung Pil Moon. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his second class project - R Shiny (due on the 4th week of the program)
1. Introduction
Shiny app provides efficient ways to manipulate and visualize data. It allows users with and without enough R expertise to explore the data as well as find insights from it. I implemented the Data Scientist Salary Comparator using Shiny package to explore salary data of 8 professions based on the data about prevailing wage of foreign employers in the United States. The 8 professions include:
- Data scientist
- Software engineers
- Data Analyst
- Business Analyst
- Management Consultant
- Assistant Professor
- Attorney
- Teacher
data set:
- It is about the prevailing wage data of foreign employers seeking to file applications in the Permanent Labor Certification Program (PERM), the H-1B, H-1B1, and E-3 Professional and Specialty Occupation Programs, and the H-2B Non-agricultural Temporary Labor Certification Program.
- The dataset is from the United States Department of Labor, Employment & Training Administration. (Source)
- The prevailing wage data of US natives are not included
- The filtered data for this application contains total 167,278 cases (in 19 columns) in 2015
2. Structures
The Data Scientist Salary Comparator Shiny Application has basically 5 components:
- Salary Scatter Plot
- Salary Data Explorer
- Salary Comparison Map
- Top Recruiters
- External Info
2.1. DS Salary Scatter Plot
This 'Salary Scatter Plot' panel shows the salary distribution by 8 different jobs. It comprises of three sections: an option input section, a plot area section, and an aggregate summary box section.
The plot area shows two types of visualizations: a scatter plot showing all the salary data by 8 professions, and a box plot showing values of minimum, 25 percent quantile, median, 75 percent quantile, and maximum. Users can toggle the 'showing data points' option above the plot so that they can only see the boxplot alone. Users also can interactively change the options of the target states (All or one state among 50), and target salary range. Corresponding changes are updated on the plot area and the aggregate summary boxes as soon as users made any change.
Salary data of assistant professor are in red color, attorney salary are in orange color, business analyst salary are in light green color, data analyst salary are in green color, data scientist salary are in teal color , management consultant salary are in turkey blue color, software engineer salary are in purple and teacher salary are in red violet color.
2.2. DS Salary Data Explorer
This 'Salary Data Explorer' is a data table having features of filtering, paginating, searching, and sorting to explore the data of users' interests.
Like the scatter plot panel, users can interactively choose the options, then the table shows updated result. The data of the table can be filtered by profession (multiple choices), state, salary range, and name (of city and employer)
2.3. DS Salary Comparison Map
The 'Salary Comparison Map' provides a way to compare salary distribution of two professions in the United States. Users can choose the professions (job titles), then the distribution map and data table will show the updated result. Users can also sort the results in the table by state, average salary and the number of jobs. (Note that when the panel is initialized, it will show all the data which is not filtered by state, profession, average salary and the number of jobs.)
2.4. Top Recruiter tables
The 'Top Recruiter Tables' panel comprises of 5 salary data tables showing who the top recruiters are for each profession. Each table contains employer names, the number of jobs, average salary, the minimum salary, the 25% quantile salary, median salary, the 75% quantile salary, and the maximum salary.
The first table intentionally shows the salary data without distinguishing the profession to provide an overall idea who the top recruiters are regardless of a profession across the United States. However, other four remaining tables provide summary tables filtering options by states and specific professions: data scientist, software engineer, data analyst, and other professions. (The tables are sorted by the number of jobs and the average salary in descending order.
2.5. External Recources
This 'External Resources' panel shows a collection of valuable and meaningful information from external sources. The external resources are embedded or regenerated for better readability and interactivity. (All sources and author names of the external resources are included)
3. Code
Since this Shiny application has long lines of codes which also contain some duplicates, only the essential code snippets are described here. You can access the full code in my github page (here).
a. UI part
This code snippet shows a basic structure of UI part in the shiny dashboard having three components (header, sidebar, and body). The sidebar has 5 sidebar menuItems and the body part has 5 corresponding 'tabItems' components under one 'tabItems' component. (Duplicates are intentionally omitted)
header <- dashboardHeader( title = "DS Salary Explorer" ) sidebar <- dashboardSidebar( sidebarMenu( menuItem("Salary Scatter Plot", tabName = "myTabForScatterPlot", icon = icon("bar-chart-o")), menuItem("Salary Data Explorer", tabName = "myTabForDataTable", icon = icon("fa fa-table")), menuItem("Salary Comparison Map", tabName = "myTabForGvisMap", icon = icon("fa fa-map-marker")), menuItem("Top Recruiters", tabName = "myTabForRecruitRanking", icon = icon("fa fa-list-ol")), menuItem("External Info", tabName = "myTabForExternalInfo", icon = icon("fa fa-external-link")) ) ) body <- dashboardBody( tabItems( tabItem("myTabForScatterPlot", h2("Salary Data Scatter Plot"), # ... more sub components in this tabItem ), tabItem("myTabForDataTable", h2("DS Sarary Data Explorer"), # ... more sub components in this tabItem ), tabItem("myTabForGvisMap", h2("Salary Comparison Map"), fluidRow( box( title = "Map 1", solidHeader = TRUE, collapsible = TRUE, htmlOutput("myGvisMap1") ), box( title = "Map 2", solidHeader = TRUE, collapsible = TRUE, htmlOutput("myGvisMap2") ) ), fluidRow( box( title = "DataTable for Map 1", solidHeader = TRUE, collapsible = TRUE, DT::dataTableOutput("myComparisonTableByJobTitle1") ), box( title = "DataTable for Map 2", solidHeader = TRUE, collapsible = TRUE, DT::dataTableOutput("myComparisonTableByJobTitle2") ) ) ), tabItem("myTabForRecruitRanking", h2("Top Recruiters"), # ... more sub components in this tabItem ), tabItem("myTabForExternalInfo", h2("External sources"), # ... more sub components in this tabItem ) ) # end of tabItems ) # end of body
b. Server part
This code snippet below shows a basic structure of server part how to manipulate the data based on user input. The code snippet below shows how to react user input for the comparison map and comparison data table. Briefly speaking, the comparison map and the data table catches the user input and send them to the updateInputDataForMapByJobTitle1() function which returns the filtered data so that the map and data table synchronously show the updated results.
server <- function(input, output) { # ... # ... Other functions are intentionally omitted for brevity ... # ... #/////////////////////////////////////////////////////////////////////////// # reactive function for comparison Map 1 and comparison table 1 #/////////////////////////////////////////////////////////////////////////// updateInputDataForMapByJobTitle1 <- reactive({ # Data filtering from the original data 'salary_refined' dataFilteredForMapByJobTitle1 <- salary_refined dataFilteredForMapByJobTitle1 % group_by(WORK_STATE, JOB_TITLE_SUBGROUP) %>% summarise(AVG_SALARY= round(mean(PAID_WAGE_PER_YEAR), 2), NUM_POS = n()) dataFilteredForMapByJobTitle1 # return the filtered data }) #/////////////////////////////////////////////////////////////////////////// # comparison Map 1 (googleVis) #/////////////////////////////////////////////////////////////////////////// output$myGvisMap1 <- renderGvis({ # call the updateInputDataForMapByJobTitle1() to get the filtered data # This function call to updateInputDataForMapByJobTitle1() enables to # synchronously react user input and show the updated results. mapData <- updateInputDataForMapByJobTitle1() # Render the map using the filtered data gvisGeoChart(mapData, locationvar= "WORK_STATE", colorvar="AVG_SALARY", options=list(region="US", displayMode="regions", resolution="provinces", width="100%", backgroundColor="gray" ) ) }) #/////////////////////////////////////////////////////////////////////////// # Comparison Map 1 #/////////////////////////////////////////////////////////////////////////// output$myComparisonTableByJobTitle1 <- DT::renderDataTable(DT::datatable({ # call the updateInputDataForMapByJobTitle1() to get the filtered data # This function call to updateInputDataForMapByJobTitle1() enables to # synchronously react user input and show the updated results. dataForDTable1 <- updateInputDataForMapByJobTitle1() # Change the call names colnames(dataForDTable1) <- c("STATE","JOB_TITLE","AVG_SALARY", "JOBS") dataForDTable1 # filtered data for the dataTable }, rownames = FALSE, extensions = c('ColVis','ColReorder','Scroller'), options = list( deferRender = TRUE, searching = T, dom = 'RClfrtip', colVis = list(activate = 'mouseover'), lengthMenu = list(c(10, 5, 15, 25, 25, 50, 100), c('10', '5', '15', '20', '25','50','100')) )) %>% formatCurrency(c('AVG_SALARY'), "$") ) # ... # ... Other functions are intentionally omitted for brevity ... # ... }
c. Calling the Shiny Dashboard app
Then, you can call the shiny dashboard app like below
shinyApp( ui = dashboardPage(header, sidebar, body, skin = "black"), server )
- If you have any suggestion, question, or reviews for my Shiny Dashboard app, please leave your comments. Also, if any of the information above is incorrect or needs to be updated, please send an email to monspo1@gmail.com.