Our customers gain competitive advantage from our commitment to technology and long-term partnerships.
Easysize’s AI-driven technology analyses shopping and returns behaviour of individual users, unique product features, as well as their style and fit preferences and fashion choices to predict and recommend the best possible size and fit for a shopper. It creates an opportunity to start benefiting from historical data as well as to profit from a network effect.
It all starts with the data. Over the years we have built a very complex pipeline with a data lake concept in mind. All data from our clients, whether it comes in the form of CSV files or directly through the API, gets picked up by our first line of data processing routines. There the data is being structured, cleaned, parsed and so on. This allows us to deal with any kind of data, regardless of the format and structure of it.
In addition to transactions (purchases, returns, cancelations, exchanges, etc.), catalog, user identifiers and all on-site tracking events we also receive rich feedback data from shoppers as well as their size & fit profiles. That allows us to calibrate all features to each user’s unique preferences.
Here we identify all possible information about the item, such as category, sub-category, fit, cut, fabric, color, brand, sub-brand, brand’s collection, model, SKU, EAN, and much more. We also use extensive NLP techniques to derive information from unstructured data. Around 70% of all data doesn’t require any human interaction in processing, the other 30% helps us to improve our pipeline and teach it to handle more without the help from humans.
Once all of that is done, the data goes through pattern recognition, outlier detection and post processing analysis. We identify purchases that came from family accounts, gifts that people bought not for themselves, as well as see whether their sizing patterns have changed over the last few purchases.
After that, the data pipeline triggers post-loading processes. There we recalculate all affected weights and features for the algorithms and remove data that has reached decay point. We aim to use as fresh data as possible, so we apply degradation criteria to all previously collected data. In addition to that, we find and merge all newly arrived data with on-site behavioural tracking in order to compile it all into a final feature set for the models.
We have an extensive feature engineering process, that incorporates learnings we get with testing and evaluating new approaches in our 4 year-long development and improvement process. To give an example of domain drawn features we use, we can share the way we look into fabric. Instead of using fabric as a categorical feature, we performed extensive research on how the fabric affects the size, we then tested several hypotheses and one of the best performing ones is what we call a stretchiness/elasticity factor. It accounts for a change an item undergoes over time and this is something that experienced stylists and shoppers always take into account, so we do as well.
We are running a multi-model setup: each model gives us features for the final algorithm, as well as actionable insights about a user’s behaviour, a brand’s and item’s performance, and raises alerts in case something deviates from the average stats by a margin. Once all of that is completed, all the models are trained and the final trained sets are prepared, we run evaluations and push it all to be used in production.
In some ways, the problem is a classic collaborative filtering problem: given different users’ feedback on different items, we must fill in the gaps in the (sparse) matrix to predict the result of recommending a size for an item to a client who has not yet received it. As such, we do use some standard collaborative filtering algorithms (e.g. those who have liked what you have liked have also liked ...).
However, we have a lot of explicit data, both from users' feedback and from clothing attributes. This helps with the cold start problem and also allows for greater accuracy if we employ algorithms that consider this data. So we combine both latent features with some extensive statistical modeling.
For example, a new client from Germany wants to buy an item, and she may tell us that she usually wears medium-sized tops, but sometimes has better luck when buying large, while also often struggling with chest fit. We then look into the item performance across other shoppers, while applying all statistical features linked to her answers in the size quiz and taking into account possible deviations based on the country of origin, both for the user and the item. In that way we ensure that we use every single bit of information available to recommend a size that fits best with her preferences.
Once we make a recommendation, and a user has purchased and received it, we then reach out with a request for feedback. That not only helps us to improve this particular user’s future recommendations, but also update all affected weights and in turn improve all future recommendations. That closes the loop and allows us to accompany a user on every step of their experience.