Difficulties and Implications of Chinese Dialect Variation

Jordan Crowley

University of California, Davis




This paper serves as an insight to the vast variety of dialects in the Chinese language. The goal is to examine what effects this variety of dialects has on people learning Chinese as a second language. To corroborate this, an emphasis will be placed on the phonetics of these languages and how non-Chinese speaking individuals interpreted the difficulty of audible Chinese. This paper used two different surveys to collect data. The first (direct) was designed to get rationale behind why people think Chinese is a difficult language. The second (indirect) was designed for non-Chinese speaking individuals to test my hypothesis of audible difficulty. Work here is mostly preliminary, and most of the data seemingly inconclusive. Many responses from the direct survey centered around disparate difficulties. The indirect survey – interestingly – showed clear standouts in what was perceived as ‘difficult sounding.’ In the future, in order to make the most of what I have done here, more specificity in the expected results would prove more useful. 



Mandarin Chinese is the most spoken language in the world. In the United States, it appears to be growing as a second language in certain demographics (Shao, 2015). However, the group that is experiencing the most growth tends to be the younger generation. According to Grace Shao, that number does not represent college and university students, or individual learners who are not students. Estimates on the number of dialects of Chinese exceed 200. While some of these dialects are spoken nationwide, many dialects are specific to individual cities, rural areas, and even villages. Mandarin is the most common variety of spoken Chinese. 

I present here a few different questions that will be considered. One major topic of this paper is focused on dialect variation and how that may have an effect on the number of people interested in learning Chinese as a second language. Announced separately, my research questions are as follows: What kind of factors of the Chinese language make it such a difficult thing for non-native speakers to grasp. I will explore a few distinct phonetic features of Chinese dialects. Next, does the sheer vastness of the language and its myriad of varieties play a role in its difficulty? And finally, do these things limit the number of new speakers of Chinese? This third question will rely heavily on previous scholarship, whereas the first two will be more so my personal experience and experimentation. 

I personally am not ethnically Chinese, but I have close ties to the country and the language. I have travelled to various parts of China and met many individuals from all sorts of walks of life. Furthermore, I used to be an English teacher to children in Beijing and have a decent, working proficiency of the Chinese language. Finally, my interests and future career goals align with potential embassy work in China or other Asian countries. My motivation for writing on this topic was a culmination of many of these factors in addition to my linguistic background interest in phonetics and bilingualism. 

From a Western perspective, Chinese is viewed as an incredibly difficult language to delve into. In this study, I will attempt to make a connection between the vast amount of dialect variation and how this influences why the language is considered to be so difficult. 

Literature Review:

The exploration of a few key articles that set the stage for my research is necessary. Each of my research questions has a piece of literature that it can be tied directly to. Likely the first thought to come to mind when the words ‘Chinese’ and ‘difficulty’ are mentioned together is the usage of characters instead of an alphabet. Two researchers, Marcus Taft and Kevin Chung, addressed this difficulty of Chinese characters by teaching a group of non-Chinese adults a list of twenty-four Chinese characters using the radical method. Many Chinese characters are composed of smaller parts (radicals) that make up larger ones. When a smaller character is put into a larger character, the latter’s meaning may be influenced by the former’s meaning. For example, the character (kou – mouth) can be implanted into the following characters (chi)、喝(he)、唱(chang) to create ‘eat,’ ‘drink,’ and ‘sing’ respectively. The researchers tested how participants would perform given an explanation of the radical method before, at the beginning of the experiment, at the end of the experiment, and then not at all. The results showed that explanation of the radical method at the very beginning yielded the highest amount of character memorization after one week. This publication elucidates that when broken down and explained properly, Chinese characters are not as mysterious and elusive as once thought.

The second key article used in my research is simply a large survey of dialects of Chinese. Though this publication is relatively old, the material presented by Victor H. Mair is in no way time sensitive or subject to alteration based on its age. In this publication, Mair summarizes what exactly makes a dialect of Chinese a dialect. He is sure to describe the differences between 普通 (putonghua – standard Mandarin) and smaller, less frequently spoken tongues. The key takeaway from this article is that the amount of languages still spoken in China is vast, and though the government has tried, it has been unable to unify every person in China under a single language. This article breaks down many dialects across China and analyzes the demographics of its speakers, some phonetic and syntactic characteristics of the dialects, and a brief history of its speakers and current administration opinion. One point that is made throughout this piece is how rampant the variety is in Chinese, and that even native speakers are unable to distinguish all of them. Therefore, it is even more of a nightmare for non-native speakers who are trying to learn. 

The third key article that I will summarize is a recent publication by Grace Shao. Though not published in an academic journal, Grace Shao is a long-time Beijing based reporter for China Global Television Network (CGTN) and has years of experience on the topic, with an emphasis on current affairs and cultures of fast-paced cities throughout the world. In her article Chinese as a Second Language Growing in Popularity explains just that. It is true that there is an increased number of people choosing to learn Chinese, but that the increase in popularity is much more prominent in the younger generation. She explains how Chinese is available in up to 550 elementary and middle schools, which is over a 100% increase in the last decade. The number of Chinese classes offered at universities still increased, but not nearly to the extent that it appears to be doing for the next generation. 


My data collection was broken down into two halves. I believe it is important to present this methodology in a particular way in order to properly convey my intentions. 

My first research question before doing any data collection was slightly different than what it is now. Originally, I had intended to focus entirely on why Chinese is often not the first to be chosen as a second language. In order to help answer this question, I created a simple survey — from this point forward referred to as the ‘direct survey.’ This survey only had two questions: “Are you ethnically (wholly or partially) Chinese?” and then “What makes Chinese so hard and why do you think people choose not to learn Chinese as a second language?”  This survey was sent out on my personal Facebook account, a few private groups, and then posted publicly on three different Subreddits (/r/samplesize — /r/chineselanguage — /r/languagelearning). 

The second half of my research was significantly more involved. This portion was focused on the sounds of dialects and audible complexity. I created a list of sixteen words and five sentences in Chinese (hereinafter referred to as ‘script’). These words and sentences were chosen specifically to cover a wide range of phonetic utterances and also a unique combination of potential dialect aberrations. Once the script was complete, I found nine individuals who could fluently speak eight different dialects of Chinese. Some of these individuals I know personally, some of them are students at UC Davis, and some of them were recruited via subreddits similar to the one that the direct survey was posted to. These nine individuals recorded their voices (or I recorded them in some cases) via either WeChat, LINE messenger, or iMessage. These recordings were posted to a private YouTube channel so that I could export them as MP3 audio files. I then spliced all the audio to be succinct and evenly separated. I removed long gaps, took out breaths, and reordered the recruit’s words when they deviated from the original script. I am providing the standard Mandarin script that I wrote in the appendix. I am not including altered scripts that some of the dialect speakers used. 

Once the audio splicing was complete, I took each recruit’s sentences and implanted them into another survey (hereinafter ‘indirect survey’). This survey was (again) posted online in various places, but this time with the intent of finding non-Chinese speaking participants. I created the indirect survey to get non-Chinese peoples’ opinions on which dialects of Chinese sound most difficult. Their job was to rank each audio file from 1st (hardest) to 9th (easiest) in relation to the other eight.



Let us now lay out the results from both halves of my experiment. It is worthwhile to analyze them both separately and then together. Not every response from the direct survey was useful in my data analysis. Because of the fluidity and open-endedness of this survey, the individuals could write anything to their heart’s desire. One individual submitted the survey after only saying that he/she was ethnically Chinese. Of these twenty-five valid responses, sixteen identified as non-ethnically Chinese, while the other nine identified as ethnically (partially or wholly) Chinese. Of these twenty-five respondents, nine mentioned that the learning of characters seemed especially difficult. Six mentioned the use of tones in the language was off-putting. Further, there were five individuals who did not use the word ‘tone,’ but still suggested that the language sounded difficult. Eight individuals mentioned that there may not be much interest in the language, or that it is not needed given where people live and their potential travel plans. Some of these responses also went on to say how more people in the United States may be interested in Western languages such as French or Spanish. There were no individuals who specifically mentioned the word 'dialect’ or expressed difficulty based on the variety of languages encompassed by the umbrella term ‘Chinese.’ 

The dialects present in my study were as follows: Beijing, Nanjing, Chongqing, Wenzhou, Taiwanese Hokkien (hereinafter simply ‘Hokkien’), Quanzhang, Henan, and Cantonese. It is important to note that from a Western view, some of these should be considered accents and others as different languages, however, the umbrella term that I am using in my study is ‘Chinese,’ which is all-encompassing. Furthermore, the Chinese government considers every language spoken within China and Taiwan as ‘dialects’ as the parent tongue is general Chinese.

Because my study is focused on the variety of dialects and their phonetic features, I pursued the indirect survey further. Every dialect of Chinese (excluding indigenous population languages and non-Chinese regions of Inner Mongolia and far West China) shares the same script; the only differences are the spoken components. My indirect survey was pointed at non-Chinese speakers; my goal here was to test perceived audible difficulty in people who did not speak or understand any amount of Chinese. Even though the participants of this survey were not knowledgeable of the Chinese language, some interesting patterns still emerged. 

I have two different ways of presenting my data. The raw data that I received from the indirect survey is somewhat ghastly and hard to interpret. Having each person rank nine different choices is cause for a wide array of potentially incoherent graphs. In order to remedy this, I made a slight amalgamation of the data. In order to create more readable graphs, each option ranked as either 1st, 2nd, or 3rd was given the new category ‘1st.’ Each option ranked as either 4th, 5th, or 6th was given the new category ‘2nd.’ And finally, each option ranked as either 7th, 8th, or 9th was given the new category ‘3rd.’ Both resulting datasets are attached in the appendix.

Of the eight different dialects tested (with Chongqing repeated once), it was clear that the Beijing accent came off as the easiest to ‘understand,’ and the Wenzhou dialect was the exact opposite. This trend was clear not only in the conglomerated data, but also in the raw data as well. Nearly half of all respondents ranked these two dialects as either 1st or 9th and nowhere in between. The data received from both Hokkien and Cantonese were evenly distributed, meaning there was no real discernable difference in difficulty ranking of these two dialects. Furthermore, the two dialects from Henan and Chongqing shared similar patterns. Both of these dialects had 9.7% ranked as 1st. The percentages ranked as 2nd were 51.6% and 48.4% respectively and the 3rd ranked indices were 38.7% and 41.9% respectively.

This was an interesting trend, and possibly could just be due to coincidence. The Henan province and Chongqing are not near each other geographically. The Henan dialect is influenced largely by its neighboring province of Hebei (where Beijing is), so this dialect is only slightly different from Standard Mandarin. The Chongqing dialect is very unique; it utilizes a lot of slurred speech patterns. Both of these options had a very small percentage of people who believe that it sounded the most difficult. The data that I have collected in this research study is difficult to relate to any current scholarship. A majority of what I did was original research. What was not original research was used as base information to supplement my rationale for pursuing this project in the first place. I appear to have missed the mark connecting the variation of these dialects to a perceived difficulty and lack of interest in non-natives learning the language, which was my original goal. I discuss a potential implication of this in the next section. 


I believe it is important to be transparent with data collection and mention not only the positive results, but also any and all errors that occurred in the process. A few key things happened during my research that needs to be mentioned, as the validity of some of the data may be under scrutiny. I will go step-by-step pseudo-chronologically. 

When I was analyzing the results of the direct survey, some participants did not use a full range of ideas to express their opinions. Four responses said things so simple that I did not include them in the analysis. These examples include responses such as “Because it’s difficult.” or “It’s really hard.” I am attaching the full list of responses of the direct survey in the appendix. 

Next, during the indirect survey, as I touched on earlier, some of the dialects are colloquially viewed as separate languages entirely. Their phonetics and syntax vary significantly from Standard Mandarin, so much so to the point that they are entirely unintelligible between the two. A respondent pointed out to me that people may be confused about the reason both dialects and languages are included in my survey. Again, from a technical standpoint, I view ‘Chinese’ as an umbrella term and wanted to discuss the overall language family because this paper is all about diversity.

Next — probably most potentially influential to my data — the names of all the dialects were left on the survey. This was not intentional, but it was something that had not crossed my mind, and was pointed out to me by other individuals later. Because the indirect survey was directed at non-Chinese speaking individuals, I was under the impression that leaving the names on the audio files would not prove a nuisance. My line of thinking was that the average non-Chinese speaking individual does not know enough about areas of China or names of the dialects to cause that to be an influence in how they reported ‘difficulty.’. I realized after the survey was done and the data were all recorded that even foreigners are likely to know the name ‘Beijing,’ and that it is the capital of the country. Though that does not equate them having knowledge of the Beijing accent being the influence of Standard Mandarin.

There is not a large amount of previous research done on the topic of Chinese language acquisition to foreigners, and even less so of dialect variety having an influence on this. From the work that I have done up to this point and the data that I have collected thus far, it seems difficult to conclude anything definitive about the effect that dialect variation has on Chinese as L2. Because there appeared to be a lack of presumed connection between the two, the findings here do not appear to be significant enough to warrant this research as a success. I discuss potential future directions of this research in the next section. 


I thoroughly enjoyed putting together this research, conducting my surveys, and seeing the results. This project is one that I would love to continue on and tweak in the future, potentially in graduate school. When this study reaches the next chapter of its life, I want the specificity of the research question to be narrowed slightly. The way that this research was based was somewhat too broad, and the research questions that I asked were difficult to get concrete answers for. Future research in this field will hopefully concoct a more definitive connection between Chinese dialect variety and the difficulty of the language. This current study relied heavily on phonetic variables. Perhaps it would be better if this study had focused more on either phonetics or dialect variety and its effects on Chinese L2 and perceived difficulty. Combining them both into one paper became somewhat cumbersome and hard to track. 


Standard Mandarin Characters and Pronunciation

Chinese Characters














Ten is ten





























61078.05 块钱


61078.05 (monetary amount)



Milk tea

Standard Mandarin Characters 




How many people in this world do you think can speak Chinese?


That classroom has approximately 302 people.

遗憾的是, 我不能参加你的婚礼。

Regrettably, I am unable to attend your wedding. 

这书虽然旧得很, 但它仍然包含有用的信息。

Although this book is old, it still has useful information. 

我想问你一个问题 —  我可以吻你吗?

I want to ask you a question — can I kiss you? 

Forms response chart. Question title: Dialect 6 Number of responses: 31 responses.
Forms response chart. Question title: Dialect 3 Number of responses: 31 responses.
Forms response chart. Question title: Dialect 5 Number of responses: 31 responses.
Forms response chart. Question title: Dialect 8 Number of responses: 31 responses.
Forms response chart. Question title: Dialect 7 Number of responses: 31 responses.
Forms response chart. Question title: Dialect 1 Number of responses: 31 responses.
Forms response chart. Question title: Dialect 4 Number of responses: 31 responses.
Forms response chart. Question title: Dialect 2 Number of responses: 31 responses.

Forms response chart. Question title: Dialect 9 Number of responses: 31 responses.

Responses From Direct Survey


In America, Spanish seems the be the second most relevant language. With the increase in Chinese foreigners, this could be changing.


Chinese is symbols, which seems like it would be harder to learn than just a language itself. However, I think most people don’t have the option of Chinese in schools.


Learning new characters is hard


It has a reputation for being hard for Americans because they have to learn the characters. 


It is much more difficult to speak than other languages, and is not as useful.


it's really hard


For me, I see myself more likely to travel to parts of Europe versus China which is why I have never tried to learn Chinese. 


It sounds super hard 


It seems very hard to start learning due to different characters and letters, as well as not knowing many people that speak Chinese in my personal life. 


I think they believe they won't be able to learn to read and write the characters and will struggle with the tones


The characters and the fact that there is no similarities between any of the words


The use of characters instead of an alphabet is a very foreign concept


I think the language and culture is wonderful, but l guess that it’s due to it being unfamiliar for foreigners first getting exposed. It just isn’t accommodating enough for foreigners.


Because of lack of education/opportunity. 


Maybe because is complex and completely different from English, it can discourage people   


It sounds very difficult and looks very difficult so it's challenging to wrap your head around assigning english meaning to Chinese words


I am Chinese, but Mandarin isn't my native language, and I don't know a lot of it. In my experience, Mandarin is hard. It's much different from English. It has tones which is absent in English. I can't hear the difference in the tones, so a lot of words sound similar to me, and I have to rely on context or ask to understand sometimes. I suppose foreigners would rather choose languages closer to their own, such as an English native speaker selecting Spanish. 


Chinese is a very difficult language for westerners. As someone who is partially chinese and have a parent that speaks Chinese it can still be hard if your native language is English. I also think that there is not a general trend for westerners to learn an East Asian language unless they are interested in the culture or something. I feel that more westerners will learn Spanish, French or some language like this because it is easier and more relatable while more Asian people might focus on their own region.


Because it's difficult.


It seems complicated and the alphabet is huge 


Proximity to China and its use in daily life makes it less attractive to learn (ex. Spanish is convenient in the US because of the Hispanic/Chicanx community). Chinese is also known to be really hard. 


The intonations are especially hard 


tonal languages are extremely difficult for non-native speakers to learn fluently. Also, western political propaganda has reduced the prestige of China internationally. It is for this reason that sinophiles with interests in chinese cinema and literature are far less common than weeaboos and Kpop stans.


Cuz english is more common and useful duh


They are most likely scared off by the structure of the language with the pinyin and how the characters look like 

Order of Dialect Numbers








Chongqing 1


Chongqing 2









