Using Natural Language Processing (NLP) to enhance commerce: An African Lens

Share on:

What are our Future FinTech Champions learning at the moment? And what are their thoughts on the current development of FinTech? Read through this submission from one of our FFCs, Margaret Maina, currently studying @ African Leadership University.

Communication between human beings and computers is one of the most dominant problems in the evolution of Artificial Intelligence (Al). AI is an intelligence demonstrated by software or machinery that imitates the working of the human mind. Natural Language Processing (NLP) is an area of AI that allows robots to comprehend, interpret, and manipulate human language in text and speech. Humans are imprecise and unclear, whereas computers require clear and unambiguous communication. NLP seeks to address this problem. The finance sector often puts out text documents to communicate a wide variety of messages. NLP applications are used to mine these documents to obtain insights, make inferences, create additional methodologies and artefacts to advance knowledge in different finance sectors.

The growth in digital and social media usage by businesses has increased unstructured text documents. Given that the dissemination of these texts is primarily automated by and through computers, NLP research and applications have a considerable potential to enhance communication. NLP has been integrated into various aspects of finance in Africa, especially in automating Banking related procedures. From data processing and analysis to regulatory compliance to instant Customer service. Using NLP to assign metadata to every uploaded document could potentially optimize document search by fueling it with broader notions. This will, in turn, improve business processes while making them more efficient. The use of chatbots in the Fintech industry has saved organizations a significant amount of working hours. Additionally, they have enabled seamless communication with clients and facilitated 24/7 customer service.

Is the current implementation of NLP sustainable in Africa without including African dialects in the NLP corpus?

A corpus is a set of datasets comprising authentic text or audio. The corpus contains text and speech data that can be utilized to train AI and machine learning systems in natural language processing. For Natural language annotation (the process of enhancing and augmenting corpus with higher-level information)to provide enough data and precise meaning of the language, the corpus must be vast. On the other hand, the annotation must be correct and relevant to the task the algorithms are expected to accomplish for them to learn effectively. Therefore, a good corpus must be large enough and contain high-quality, clean, accurate data relevant to the task to be performed. However, it is essential to remember that too much data can cause the algorithm to slow down and produce erroneous findings.

Given that digital and social media usage in businesses has significantly increased to almost all parts of Africa, is the current implementation of NLP sustainable in Africa without including African dialects in the NLP corpus?

Africa, the world’s second-most populous continent with over one billion people, also has the world’s biggest linguistic diversity, with over 1500 languages. People all over Africa should be able to use their language to comfortably access and use different digital business services. This requires a variety of applications, including local language spell-checkers, word processors, machine translation systems and search engines.

Some of the major challenges to the growth of NLP

Some of the major challenges to the growth of NLP for African languages could be attributed to the fact that African societies do not see hope for African languages being accepted as primary means of communication. As a result, there have been few initiatives to fund NLP or translation for African languages, despite the potential benefit. This lack of focus has had a rippling effect over time, thus stifling growth. The limited materials are difficult to find, are published in closed journals and non-indexed local conferences, or are undigitized and only exist in private collections. Researchers’ capacity to duplicate and build on current discoveries and develop and improve the various corpora is hampered by this opacity.

The other challenge is that African researchers are disproportionately affected by socioeconomic factors and are often hindered by visa issues and high costs of flights from and within Africa. They are dispersed and alone on the continent, with little opportunities to mingle, collaborate, or share.

Furthermore, African languages have a high linguistic complexity and variety, with a wide range of morphologies and phonologies and lexical and grammatical tonal patterns. Many are spoken in multilingual cultures where code-switching is common. Due to this complexity, cross-lingual generalization from success in languages like English are not guaranteed.

On a concluding note

To improve data collection, curation and annotation for the corpora of African languages, there has to be a significant investment in more data collection, collation and annotation for NLP across the African continent. Such investments would not only serve the aim of increasing the data but also better preserve the languages. The collection will require content creators to understand ways to make their content more accessible to machines, not only to humans. This data then needs to be curated and shared in data archives for future use and expansion.