top of page

Industry Engagement

Year 2.1 FP2

I attended two talks provided by the Shopee Code League 2020 Competition. They are Natural Language Processing(NLP) by Smartcademy and Product Detection(Data Science) by Shopee.

Industry Engagement.png

NLP Talk

I will first share my insight on the NLP talk. NLP allows machines to interpret human language. This is driven by data and knowledge collected by the machine, hence generating the potential outcomes.

 

Applications of NLP are sentimental analysis, chatbot, speech recognition, language translation, information retrieval/extraction, and advertisement matching. Sentimental analysis allows for information with more depth. For instance, Nordstorm used sentimental analysis to dig 5-star customer reviews and finds a shipping problem. Another example is the sentiment analysis of different songs by interpreting the human language used (in Figure 4).

 

During NLP, data cleaning is necessary. Firstly, text normalization is carried out. This includes conversion of all letters to the same case and numbers into words (alternative: remove numbers). This also includes removal of diacritics, white spaces, and stop words (e.g. she, the, an).

 

Next, pre-processing is carried out and split into 5 parts, tokenisation, N-grams, stemming, part of speech (POS) tagging, and named entity recognition.

Tokenization is the process of taking a text or a set of text and breaking it up into its individual tokens (sentences, words, characters).

N-grams is to create a sequence of N-words and identify popular keywords in the phrase or sentence.

Stemming is to bring variant forms of a word together through lemmatisation (the process of converting a word to its common base form, e.g. bought is converted to buy).

POS tagging is the process of marking up a word in a corpus to a corresponding part of a speech tag, based on its context and definition. For instance, the word ‘left’ can be used as an adjective and a verb which means two different meanings. Hence, POS tagging will identify its definition accordingly.

Named Entity Recognition is the process of identifying all textual mentions of the named entities and classify them into pre-defined categories. Examples of the pre-defined categories are names, organizations, locations, time, percent, and monetary value.

Industry Engagement 2.png
Industry Engagement 2.1.png

Product Detection Talk

Next, I will share about my insight on the Product Detection talk. The purpose of Product Detection in Shopee is to provide correct listing and categorization of products to its customers.

 

With the implementation of Convolutional Neural Network (CNN), we are able to identify objects and group them in categories. Firstly, convolution is done to modify the spatial frequency characteristics of an image to smoothen, sharpen, intensify or enhance it. This is computed by the given input and receptive field. Next, normalizing, pooling and activation is done. The applications of CNN are image search, product classification, face detection/recognition, image segmentation, etc.

 

The common pipeline of data science is broken down into 5 parts. Firstly, we have to fetch data and extract it into usable format, such as csv, json, xml files. Secondly, we have to pre-process the data by denoise and long-tail process. In Figure 8, it is seen that there is an anomaly in the data, where a green bean image is found in the yellow millet category. The anomaly is a noisy data and we will have to remove it. Thirdly, we explore the data through visualizations. Fourthly, we model the data with machine learning to create predictive models and algorithms. Lastly, we interpret the data and identify business insights.

Nature of Webinars attended

NLP is the interdisciplinary field combining computer science and linguistics while Product Detection is a field of Data Science.

Reason for choosing the attended webinars

I am interested in NLP as it allows us to dive deeper into data that we have obtained from surveys, reviews and feedbacks using machines. Furthermore, I am interested in chatbot software applications and wanted to learn more about its algorithm.

 

I am interested in Data Science as it allows us to spot and predict trends with data, which would help organizations to better understand its customers’ requirements.

Relevancy of the webinar to the course of study

I intend to take Machine Learning in Year 3, where I will learn how to implement and train models. Through the NLP talk, I am able to understand one of the fundamentals of Machine Learning as NLP is one of its fields.

 

I also intend to take Descriptive or Predictive Analysis in Year 3, where I will learn how to analyse and gain insights from the data captured. Hence, they are relevant to the Data Science webinar I attended.

​

Impact of the webinar to future goal

I am interested in becoming a data analyst in the future. The Data Science webinar is useful as it gave me insights on how to interpret patterns and trends in data. The NLP webinar is also useful as it taught me how to interpret data and measure its sentiment, and future studies allows specialization to become a data science analyst.

bottom of page