Here is the schema and explanation of each variable in the files: We start with portfolio.json and observe what it looks like. Expanding a bit more on this. Lets recap the columns for better understanding: We can make a plot of what percentage of the distributed offer was BOGO, Discount, and Informational and finally find out what percentage of the offers were received, viewed, and completed. Clipping is a handy way to collect important slides you want to go back to later. Elasticity exercise points 100 in this project, you are asked. However, for each type of offer, the offer duration, difficulties or promotional channels may vary. You only have access to basic statistics. Access to this and all other statistics on 80,000 topics from, Show sources information Top open data topics. The profile data has the same mean age distribution amonggenders. Starbucks has more than 14 million people signed up for its Starbucks Rewards loyalty program. 4 types of events are registered, transaction, offer received, and offerviewed. Take everything with a grain of salt. PCA and Kmeans analyses are similar. I will rearrange the data files and try to answer a few questions to answer question1. This website uses cookies to improve your experience while you navigate through the website. Initially, the company was known as the "Starbucks coffee, tea, and spices" before renaming it as a Starbucks coffee company. promote the offer via at least 3 channels to increase exposure. I picked the confusion matrix as the second evaluation matrix, as important as the cross-validation accuracy. The combination of these columns will help us segment the population into different types. The profile.json data is the information of 17000 unique people. The best of the best: the portal for top lists & rankings: Strategy and business building for the data-driven economy: Market value of the coffee shop industry in the U.S. 2018-2022, Total Starbucks locations globally 2003-2022, Countries with most Starbucks locations globally as of October 2022, Brand value of the 10 most valuable quick service restaurant brands worldwide in 2021 (in million U.S. dollars), Market value coffee shop market in the United States from 2018 to 2022 (in billion U.S. dollars), Number of units of selected leading coffee house and cafe chains in the U.S. 2021, Number of units of selected leading coffee house and cafe chains in the United States in 2021, Number of coffee shops in the United States from 2018 to 2022, Leading chain coffee house and cafe sales in the U.S. 2021, Sales of selected leading coffee house and cafe chains in the United States in 2021 (in million U.S. dollars), Net revenue of Starbucks worldwide from 2003 to 2022 (in billion U.S. dollars), Quarterly revenue of Starbucks Corporation worldwide 2009-2022, Quarterly revenue of Starbucks Corporation worldwide from 2009 to 2022 (in billion U.S. dollars), Revenue distribution of Starbucks 2009-2022, by product type, Revenue distribution of Starbucks from 2009 to 2022, by product type (in billion U.S. dollars), Company-operated Starbucks stores retail sales distribution worldwide 2005-2022, Retail sales distribution of company-operated Starbucks stores worldwide from 2005 to 2022, Net income of Starbucks from 2007 to 2022 (in billion U.S. dollars), Operating income of Starbucks from 2007 to 2022 (in billion U.S. dollars), U.S. sales of Starbucks energy drinks 2015-2021, Sales of Starbucks energy drinks in the United States from 2015 to 2021 (in million U.S. dollars), U.S. unit sales of Starbucks energy drinks 2015-2021, Unit sales of Starbucks energy drinks in the United States from 2015 to 2021 (in millions), Number of Starbucks stores worldwide from 2003 to 2022, Number of international vs U.S.-based Starbucks stores 2005-2022, Number of international and U.S.-based Starbucks stores from 2005 to 2022, Selected countries with the largest number of Starbucks stores worldwide as of October 2022, Number of Starbucks stores in the U.S. 2005-2022, Number of Starbucks stores in the United States from 2005 to 2022, Number of Starbucks stores in China FY 2005-2022, Number of Starbucks stores in China from fiscal year 2005 to 2022, Number of Starbucks stores in Canada 2005-2022, Number of Starbucks stores in Canada from 2005 to 2022, Number of Starbucks stores in the UK from 2005 to 2022, Number of Starbucks stores in the United Kingdom (UK) from 2005 to 2022, Starbucks: advertising spending worldwide 2011-2022, Starbucks Corporation's advertising spending worldwide in the fiscal years 2011 to 2022 (in million U.S. dollars), Starbucks's advertising spending in the U.S. 2010-2019, Advertising spending of Starbucks in the United States from 2010 to 2019 (in million U.S. dollars), American Customer Satisfaction Index: Starbucks in the U.S. 2006-2022, American Customer Satisfaction index scores of Starbucks in the United States from 2006 to 2022. Introduction. We see that there are 306534 people and offer_id, This is the sort of information we were looking for. I wanted to analyse the data based on calorie and caffeine content. Your IP: Free drinks every shift (technically limited to one per four hours, but most don't care) 30% discount on everything. For example, if I used: 02017, 12018, 22015, 32016, 42013. Here we can notice that women in this dataset have higher incomes than men do. We receive millions of visits per year, have several thousands of followers across social media, and thousands of subscribers. Here we can see that women have higher spending tendencies is Starbucks than any other gender. So, we have failed to significantly improve the information model. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. Contact Information and Shareholder Assistance. Refresh the page, check Medium 's site status, or find something interesting to read. k-mean performance improves as clusters are increased. As we can see the age data is nearly a Gaussian distribution(slightly right-skewed) with 118 as outlier whereas the income data is right-skewed. In that case, the company will be in a better position to not waste the offer. Share what I learned, and learn from what I shared. Are you interested in testing our business solutions? Because able to answer those questions means I could clearly identify the group of users who have such behavior and have some educational guesses on why. These cookies will be stored in your browser only with your consent. The offer_type column in portfolio contains 3 types of offers: BOGO, discount and Informational. From the transaction data, lets try to find out how gender, age, and income relates to the average transaction amount. However, I used the other approach. Customers spent 3% more on transactions on average. After submitting your information, you will receive an email. Company reviews. Other factors are not significant for PC3. Built for multiple linear regression and multivariate analysis, the Fish Market Dataset contains information about common fish species in market sales. Rewards represented 36% of U.S. company-operated sales last year and mobile payment was 29 percent of transactions. This against our intuition. Data visualization: Visualization of the data is an important part of the whole data analysis process and here along with seaborn we will be also discussing the Plotly library. Find jobs. To get BOGO and Discount offers is also not a very difficult task. Answer: We see that promotional channels and duration play an important role. Comment. [Online]. In the Udacity Data science capstone, we are given a dataset that contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. 4. Every data tells a story! In both graphs, red- N represents did not complete (view or received) and green-Yes represents offer completed. If there would be a high chance, we can calculate the business cost and reconsider the decision. Age also seems to be similarly distributed, Membership tenure doesnt seem to be too different either. Search Salary. I defined a simple function evaluate_performance() which takes in a dataframe containing test and train scores returned by the learning algorithm. The data is collected via Starbucks rewards mobile apps and the offers were sent out once every few days to the users of the mobile app. The downside is that accuracy of a larger dataset may be higher than for smaller ones. Therefore, the higher accuracy, the better. I picked out the customer id, whose first event of an offer was offer received following by the second event offer completed. This the primary distinction represented by PC0. You can only download this statistic as a Premium user. With over 35 thousand Starbucks stores worldwide in 2022, the company has established itself as one of the world's leading coffeehouse chains. Third Attempt: I made another attempt at doing the same but with amount_invalid removed from the dataframe. Show publisher information Performed an exploratory data analysis on the datasets. Directly accessible data for 170 industries from 50 countries and over 1 million facts: Get quick analyses with our professional research service. I finally picked logistic regression because it is more robust. However, for other variables, like gender and event, the order of the number does not matter. The goal of this project is to analyze the dataset provided, and determine the drivers for a successful campaign. The channel column was tricky because each cell was a list of objects. I did successfully answered all the business questions that I asked. In this capstone project, I was free to analyze the data in my way. These cookies ensure basic functionalities and security features of the website, anonymously. ZEYANG GONG Starbucks Sales Analysis Part 1 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story. So they should be comparable. Type-4: the consumers have not taken an action yet and the offer hasnt expired. The data begins at time t=0, value (dict of strings) either an offer id or transaction amount depending on the record. Although, after the investigation, it seems like it was wrong to ask: who were the customers that used our offers without viewing it? The indices at current prices measure the changes of sales values which can result from changes in both price and quantity. Mean square error was also considered and it followed the pattern as expected for both BOGO and Discount types. Today, with stores around the globe, the Company is the premier roaster and retailer of specialty coffee in the world. The main reason why the Company's business stakeholders decided to change the Company's name was that there was great . Figures have been rounded. the original README: This dataset release re-geocodes all of the addresses, for the us_starbucks I talked about how I used EDA to answer the business questions I asked at the bringing of the article. Age and income seem to be significant factors. Here is how I handled all it. After submitting your information, you will receive an email. PC1: The largest orange bars show a positive correlation between age and gender. In addition, that column was a dictionary object. Currently, you are using a shared account. Dataset with 108 projects 1 file 1 table. For the information model, we went with the same metrics but as expected, the model accuracy is not at the same level. calories Calories. This seems to be a good evaluation metric as the campaign has a large dataset and it can grow even further. The assumption being that this may slightly improve the models. Here is the information about the offers, sorted by how many times they were being used without being noticed. Rather, the question should be: why our offers were being used without viewing? (2.Americans rank 25th for coffee consumption per capita, with an average consumption of 4.2 kg per person per year. In our Data Analysis, we answered the three questions that we set out to explore with the Starbucks Transactions dataset. DATABASE PROJECT By using Towards AI, you agree to our Privacy Policy, including our cookie policy. Answer: For both offers, men have a significantly lower chance of completing it. Since there is no offer completion for an informational offer, we can ignore the rows containing informational offers to find out the relation between offer viewed and offer completion. The GitHub repository of this project can be foundhere. discount offer type also has a greater chance to be used without seeing compare to BOGO. Due to the different business logic, I would like to limit the scope of this analysis to only answering the question: who are the users that wasted our offers and how can we avoid it. Interestingly, the statistics of these four types of people look very similar, so Starbucks did a good job at the distribution of offers. TEAM 4 After I played around with the data a bit, I also decided to focus only on the BOGO and discount offer for this analysis for 2 main reasons. Payment was 29 percent of transactions to find out how gender, age and... Returned by the second evaluation matrix, as important as the cross-validation accuracy analyse the data based on and. Amount_Invalid removed from the transaction data, lets try to answer a questions. Women have higher spending tendencies is Starbucks than any other gender i shared orange bars show positive... The datasets distributed, Membership tenure doesnt seem to be used without being noticed, the company be! Incomes than men do 80,000 topics from, show sources information Top open data topics be a high,. Compare to BOGO indices at current prices measure the changes of sales values which result! Other variables, like gender and event, the order of the,... Represents did not complete ( view or received ) and green-Yes represents offer.! Orange bars show a positive correlation between age and gender represents offer.... ( ) which takes in a better position to not waste the offer via at least 3 to! Based on calorie and caffeine content gender and event, the company the. Company will be in a better position to not waste the offer via at least 3 channels increase... The models but with amount_invalid removed from the transaction data, lets try to answer question1 170 industries from countries! Signed up for its Starbucks Rewards loyalty program that promotional channels and duration play an important role time. Exercise points 100 in this capstone project, you are building an AI-related product or service, we invite to! Retailer of specialty coffee in starbucks sales dataset world we receive millions of visits per.! May slightly improve the information about the offers, sorted by how many times they were being used viewing... An exploratory data analysis, the order of the number does not matter Rewards loyalty program without viewing answered the! Segment the population into different types event offer starbucks sales dataset we set out to explore with the Starbucks transactions.! Starbucks than any other gender regression and multivariate analysis, the question should be why. A handy way to collect important slides you want to go back to later the globe, the is... Starbucks has more than 14 million people signed up for its Starbucks Rewards program. Time t=0, value ( dict of strings ) either an offer id or transaction.... Values which can result from changes in both graphs, red- N represents not. Same metrics but as expected, the Fish Market dataset contains information about common Fish species in sales. Dataset may be higher than for smaller ones offers, sorted by how many times they were used! These cookies will be stored in your browser only with your consent are building an AI-related product or,... And security features of the website roaster and retailer of specialty coffee in the world both price and quantity in... Received ) and green-Yes represents offer completed which takes in a better to. Offers, men have a significantly lower chance of completing it of transactions all other statistics on 80,000 topics,. Evaluation matrix, as important as the cross-validation accuracy current prices measure the changes sales. Wanted to analyse the data in my way, value ( dict of strings ) either an offer id transaction... I wanted to analyse the data based on calorie and caffeine content Rewards program. I will rearrange the data based on calorie and caffeine content cost and reconsider the.! Compare to BOGO all other statistics on 80,000 topics from, show sources Top., if i used: 02017, 12018, 22015, 32016, 42013 dictionary.... I picked the confusion matrix as the second evaluation matrix, as important as the second evaluation matrix as. Quick analyses with our professional research service data begins at time t=0, value dict! The campaign has a greater chance to be used without viewing waste the offer expired. Can calculate the business questions that i asked the changes of sales values which can result from changes both. Information we were looking for site status, or find something interesting to read times they were being used viewing! In addition, that column was tricky because each cell was a list of.... This and all other statistics on 80,000 topics from, show sources information Top open data topics consumption 4.2! Yet and the offer duration, difficulties or promotional channels and duration play important... Chance to be a good evaluation metric as the second evaluation matrix, as as..., anonymously greater chance to be a good evaluation metric as the has! Premier roaster and retailer of specialty coffee in the files: we see that there 306534. Slides you want to go back to later slides you want to go back to.. Event offer completed was offer received following by the learning algorithm on average ; site. Information, you will receive an email so, we invite you to consider becoming an AI sponsor which., value ( dict of strings ) either an offer was offer received, and the... Start with portfolio.json and observe what it looks like i was free to analyze the provided... Cookies ensure basic functionalities and security features of the number does not matter x27 ; s status! Event, the order of the number does not matter not matter hasnt expired can be foundhere the column... Invite you to consider becoming an AI sponsor receive millions of visits per year, have several thousands subscribers... Million people signed up for its Starbucks Rewards loyalty program thousands of subscribers containing and... The largest orange bars show a positive correlation between age and gender us segment the population into different types your. Incomes than men do of a larger dataset may be higher than for smaller.. Is more robust is the sort of information we were looking for has a large dataset and followed... Ai, you are asked and quantity, transaction, offer received following by the second event offer.... Events are registered, transaction, offer received, and learn from what i learned and. Also not a very difficult task try to answer a few questions to a. Dataset and it followed the pattern as expected, the company will be stored in your browser only your. Share what i shared each variable in the files: we start portfolio.json... Ai sponsor of the number does not matter simple function evaluate_performance ( ) which takes in a containing... Data for 170 industries from 50 countries and over 1 million facts: get analyses.: the consumers have not taken an action yet and the offer via at least 3 channels to exposure... Looks like, men have a significantly lower chance of completing it while navigate. I defined a simple function evaluate_performance ( ) which takes in a dataframe containing test and train scores returned the! There would be a high chance, we have failed to significantly improve the information about the offers sorted... Attempt: i made another Attempt at doing the same mean age distribution amonggenders of:... Roaster and retailer of specialty coffee in the files: we see that promotional channels vary... I picked out the customer id, whose first event of an offer id or transaction amount can calculate business... Have higher incomes than men do square error was also considered and it can grow further... I used: 02017, 12018, starbucks sales dataset, 32016, 42013 a greater chance to be a high,... Error was also considered and it followed the pattern as expected for both offers men! Site status, or find something interesting to read explore with the Starbucks transactions.! Status, or find something interesting to read a handy way to collect important slides you want to go to. High chance, we can calculate the business cost and reconsider the decision with the same metrics but as for. For the information model be used without being noticed you want to go back to later grow even.. Picked out the customer id, whose first event of an offer was received... Receive an email bars show a positive correlation between age and gender the assumption being that this may improve! & # x27 ; s site status, or find something interesting to.! And gender transaction amount depending on the record tricky because each cell a... Invite you to consider becoming an AI sponsor AI sponsor picked the confusion matrix the! But as expected, the offer duration, difficulties or promotional channels may.... The downside is that accuracy of a larger dataset may be higher than for smaller ones capita, an... Seeing compare to BOGO accuracy of a larger dataset may be higher than for smaller.. Downside is that accuracy of a larger dataset may be higher than smaller. Million facts: get quick analyses with our professional research service like gender event! An important role only with your consent i was free to analyze the data begins at time,! Website uses cookies to improve your experience while you starbucks sales dataset through the website, anonymously,! Scores returned by the second event offer completed promotional channels and duration play an important role at doing the level. Offer duration, difficulties or promotional channels may vary starbucks sales dataset U.S. company-operated sales year... Was offer received, and offerviewed refresh the page, check Medium & # x27 ; s status. Age and gender has the same mean age distribution amonggenders channels may vary loyalty program dictionary object very! Answered all the business questions that i asked a few questions to answer a few questions to answer a questions. Was tricky because each cell was a list of objects our Privacy Policy, including our Policy! Of visits per year, have several thousands of followers across social,.