top of page

Deep Learning in Korean Studies / Wayne de Fremery

작성자 사진: 한국연구원한국연구원

Deep Learning in Korean Studies[1]


 

What counts as deep learning in Korean studies? This is a usefully tricky question, especially when one wishes to consider the field’s future. The phrase deep learning has become an important double entendre in our time, suggesting both artificial forms of “intelligence” and deeply engaged forms of human knowing. What counts is similarly plural, entailing processes associated with counting (who or what does it) and its consequences, especially who and what are made to count (i.e. matter). The meaning of Korean studies is as usefully amorphous as ever.

What follows is meditative rather than expository. It is an essay in the root sense of the term. It is an experiment that touches only briefly on a variety of varied and complex topics, in this case: how we might think about Korean studies as a field, the ways that the field can be distinguished from others, the infrastructure that supports it, artificial intelligence and research, as well as the kinds of deep learning we might wish to facilitate as a community. A central hypothesis will hold my attention. It is uncomfortably simple: copies and practices related to copying shape scholarly infrastructure in Korean studies, as they do in other fields. 

A corollary to this hypothesis is that we as a community might more explicitly recognize bibliography, that old art and science of accounting for copies through enumeration, description, critique, and additional careful copying,[2] as a resource as the field moves into the future. As I describe briefly in my conclusion, bibliography can help us to attend to the material objects and processes that shape the scholarly infrastructure supporting Korean studies. As I hope to suggest, a future in which we more explicitly attend to our philological commitments through bibliographical investigation will likely facilitate more powerful forms of artificial intelligence relevant to the study of Korea and, most importantly, more deeply human ways to know Korea. More intentionally utilizing tools from bibliography, we are likely to be able to facilitate deep learning in both of its contemporary senses. 

 

Korean Studies

Benedict Anderson has made the case that nations can, at least in part, be understood as opportunities for individuals to imagine themselves as part of a community.[3] He identifies a material mechanism that facilitates this kind of imaginative process: print capitalism, especially the production of newspapers. Implicit in Anderson’s analysis is the idea that engagements with copies created with fidelity at regular intervals and at industrial scale can enable individuals to collectively imagine a national community. Korean studies, I’ve come to think, can be understood in a similar way, as an imagined community. Rather than daily newspapers, copies of publications like this one allow us to imagine a community of people who share an interest in the contested intellectual geographies that formulate Korea.

The deep learning displayed in copies of our academic journals is another example of how Korean studies is supported by shared practices of copying and considering copies. Take the 2023 issue of Korea Studies, which includes a special section devoted to the digital humanities (as well as a copy of this essay). Despite their disciplinary diversity—a diversity not dissimilar to Korean studies as a field and the eclectic news items of Anderson’s community building mechanism—the research presented in each article in the special section of last year’s Korean Studies is made possible by the collection, creation, and consideration of digital representations of historical phenomena: i.e. digital copies. The digital materiality of these copies and their similarity to the phenomena they copy facilitate the arguments about Korea made in the special issue. It is research that could not be done without digital representations of Korean documents and digital simulacra of Korean places and people, a simple fact that helps to reveal how copies serve as infrastructure for the kind of learning made possible by these articles. The digital copies, the specifics of their materiality, together with the creativity and insightfulness of the authors, help to formulate what can be asserted about Korea and what we, as readers, can learn. And we learn so much!

Digital copies now powerfully contribute to formulating what can be known and learned about Korea, just as copies produced with clay, bamboo, stone, or paper have powerfully shaped (and continue to shape) what can be formulated as knowledge in Korean studies. The special section of Korean Studies is just one of many indications that knowledge practices associated with the study of Korea are changing. Foregrounding what can be learned by means of producing and manipulating digital copies, the issue also implicitly emphasizes the important ways that other material forms of mediation have (and continue) to shape the ways that we can know and debate Korea.

 

Our Diverse Disciplinary Community

Considering copies can also help us to consider the plurality of our community and the ways it intersects with others. Since graduate school, a great deal of my research has concerned early twentieth-century Korean poetry. If I can claim any expertise, any “deep learning” in the subject, it will be derived largely from the hours and years I’ve spent exploring, considering, and comparing a diverse variety of copies of Korean poems. My interest in poets and poiesis shapes the ways that I have become close with the many copies I have studied.

Depending on our lived experience, central interests, and disciplinary training, the ways that we each will have become intimate with the copies that inform our learning will be as distinct as the diverse types of copies we study. But, as tangential as the association may be, the representations with which we will have undertaken our research, will have had some relationship to Korea, even if it is only a fleeting or imagined association. Whether we are considering early twentieth-century books of poetry, the revival of classical learning in Renaissance Europe as it might relate to the revival and reformulation of Confucianism in Song China and Chosŏn Korea—to say nothing of how these revivals and reformations have informed the various literary, historical, and socio-political revolutions of the early twentieth-century or the ones we are currently living through—the assessment and production of reproductions will have been central to what we have learned and how we learned it. We know Confucius and Aristotle as we do because what they said or wrote, or what we imagine they said or wrote, has been copied and recopied. The depth of our knowledge of them (mine not so deep) is a function of how many different copies of each we have encountered and how we have become intimate with each. 

Made up of historians and literary specialist, sociologists and linguists, and curators and archivists of every kind, our Korean studies community is as disciplinarily diverse as any nation is socio-politically diverse and diversely intimate with what represents the objects of our interest. This variegated intimacy grounds our learning and formulates the ways that we know our objects of study and ourselves as Koreanist. It also shapes how we see ourselves in relation to others.

 

Information Science

Copies are, of course, central to other imagined communities of study, ones that we encounter and inhabit if only marginally and intermittently. Together with my colleague Michael Buckland, I have argued recently copies and copying are central to the sprawling community associated with information science.[4] Claude Shannon’s seminal 1948 paper, “A Mathematical Theory of Communication,” from which we get his definition of information as entropy, is, for example, essentially a theory of how to copy messages. Shannon’s theory of communication concerns the copying of information at a source so that it is available at a destination (see Fig. 1).

 

Fig. 1. “Schematic diagram of a general communication system,” redrawn by author based on Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal (July, 1948): 381, accessed 3 January 2020, https://archive.org/details/bellsystemtechni27amer rich/page/n9  

For Shannon and many who identify as information scientist, meaning and purpose are situated in the engineering problems associated with copying and preserving copies that enable and formulate human knowledge—this instead of attempting to establish the significance of particular copies as we might do in Korean studies when we attempt to explain the meaning of a poem or summarize economic data discovered in a government report. Tefko Saracevic, for example, has written, “The domain of information science is the transmission of the universe of human knowledge in recorded form, centering on manipulation (representation, organization, and retrieval) of information, rather than knowing information.”[5] A distinct relationship with copies shapes information science as a field, just as a similarly distinct relationship with copies shapes Korean studies.

 

Infrastructures

Acknowledging how information scientists consider and produce copies helps us to see how copies and copying, and increasingly digital copies and copying, are embedded in our learning practices as a kind of infrastructure. Indeed, embeddedness is a key element in Leigh Star’s well-known framework for thinking about infrastructure.[6] Copies and copying become transparent, which is to say they are difficult to see because they are so obviously important to practice in Korean studies. Copies have reach and scope. They are formulated to be elsewhere and to suggest what they have been made to represent, i.e. what they copy. The ways that we as Koreanists attend to copies and copying are learned as part of membership in our diverse community (as membership is orchestrated by various university exam systems, language proficiencies, scholarly review boards, and academic committees) and have links with our evolving conventions of practice (such as the evolving methodologies we use to formulate our investigations, the academic genres we deploy to present our discoveries, and even our Romanization practices). Methods for considering and producing copies become standards of practice, standards that are fashioned incrementally using established bases of knowledge. Like other infrastructures, previous forms of copying “break down” and new ones are built on top of them. Consider how our academic monographs and journals have been shaped to resemble academic monographs and journals from other disciplines and fields, how the monograph in North America, and the journal article in South Korea, have become essential currency in the community, as well as ways that even our newer digital publications resemble the shapes and forms of their analog predecessors. Print and manuscript copies, along with other non-digital simulacra, function as an established base for much of the research we conduct in Korean studies and will always support rich and diverse modes of scholarly investigation, even if, as infrastructure they have “broken down” and alone cannot facilitate the many kinds of inquiry that are only possible when using our newer forms of digital reproduction.  

 

Deep Learning and Artificial Intelligence

Recognizing the infrastructural role played by copies in human learning in general, and Korean studies specifically, makes it easier to see and conceptualize the infrastructural role that copies and copying play in artificial forms of “intelligence.” To suggest that digital copies are integral to evolving forms of artificial intelligence and deep learning is to state the obvious. It is also to recognize the difficulty of seeing obvious infrastructures that support diverse work in diverse communities of practice, as well as the opportunities and challenges presented by shared infrastructures that can be used differently by different communities.

The terms artificial intelligence (AI), machine learning (ML), and deep learning (DL) are often used interchangeably. In the popular press, AI can mean just about any kind of computerized analysis.[7] But experts will make distinctions between “general,” truly human-like forms of intelligence, and “narrow” artificial intelligence. As sophisticated as AI systems have become, they are still “narrow” “mathematical method[s] of prediction”[8] enabled by, yes, fancy copying. When I use the term artificial intelligence below, I mean mathematical methods of prediction.

The relationship between AI and ML has been an intimate one since AI was formulated as its own field. As data scientists John Kelleher and Brendan Tierney suggest, the term “machine learning” was being used to “describe programs that gave a computer the ability to learn from data” [9] in the early stages of artificial intelligence’s development. Machines “learn” by comparing data and keeping a record of their comparisons. A machine has “learned” something when it has created a record of comparisons that usefully describes the relationship between categories of data. These descriptions are often called “models,” which is what I mean when I use the term. In the more formal language of Kelleher and his colleagues, “Machine learning algorithms automate the process of learning a model that captures the relationship between the descriptive features and the target feature in a dataset.”[10]

These “descriptive” and “target” features can be anything. Recently my colleague Sanghun Kim and I organized sets of features that enabled us to develop deep learning models capable of automating the transcription of images of rare periodicals in the National Library of Korea.[11] We were tasked with enhancing the descriptive metadata for the library’s periodicals, in addition to developing an economical way to transcribe its periodicals so that researchers and patrons could have “full text,” digital copies. The first set of “descriptive” and “target” features that Sanghun and I developed was associated with identifying specific regions in images of periodical pages associated with colophons. We planned to transcribe the colophon information from each of the periodical issues we received to enhance the library’s standard author–publisher–date-of-publication metadata with information describing who was responsible for printing a particular periodical issue, as well as where particular issues were printed. The second set of descriptive and target features was associated with identifying meaningful elements in the colophons, han’gŭl glyphs or hancha as opposed to smudges, for example. We also created descriptive and target features for categorizing the meaningful elements of the broader bibliographical systems embodied by the libraries periodicals. That is, we created descriptive and target features for individual han’gŭl syllables and Sino- Korean glyphs, as well as punctuation marks, so that an image such as the following  could be associated with an appropriate target: 印. An obvious but non-trivial point to be made is that the deep learning models we built using these data and algorithms associated with convolutional neural networks were built using digital copies (digital images) with the aim of producing new copies (encoded text).

A less obvious and also non-trivial point to be made is that the algorithms used to build deep learning models themselves rely on sophisticated kinds of copying to do their work. Where machine learning enables computers to automatically identify (i.e. learn) patterns that map descriptive features to targets, deep learning is a specific kind of machine learning that enables machines to identify which patterns in features distinguish the features. The process is called “deep learning” because each representation of a feature is recursively represented by simpler representations to identify which part of the descriptive feature is best associated with a “target.” Deep learning automates the process of nesting and networking increasingly simpler representations (copies of less fidelity) inside of more complex representations in order to build complex descriptions that can be associated with particular targets. The process creates copies (rather than turtles) all the way down, copies that facilitate the predictive powers of deep learning.

Finally, it is important to point out that this work with Sanghun not only enhanced the library’s metadata and created deep learning models capable of predicting the location of colophons in collections of images of rare Korean periodicals, as well as transcriptions of the colophons, it deepened our understanding Korean publishing and print culture by identifying hundreds of printing companies and printers responsible for producing the periodicals, many of whom were previously unknown. As I describe in a pair of publications,[12] the work enabled deep learning in both senses of the term.

 

Philology in a New Key with a Bibliographical Bent

When copies and the process of copying are acknowledged as essential infrastructure supporting both human and artificial modes of learning, we are positioned better to consider how deeply entangled both modes of learning are becoming. We are presented an opportunity to consider the infrastructural mechanisms that formulate Korean studies as a community, the community’s evolving relationships with other communities, and even what counts as deep learning in Korean studies. We are situated to consider how the infrastructures that support deep learning in both its human and artificial forms are entangled with—and contribute to—formulating what can be known about Korea by counting and accounting for diverse representations of Korea. We can learn, as just one of many possible examples, something about the people that formulated representations of Korea in print by counting printers and the printshops in which they worked. We are positioned to see that, just as tools and methodologies from information and computer science increasingly facilitate our explorations of Korea, our human forms of deep learning enabled by our intimate understanding of the people, objects, and events that we have taken to represent Korea can (and arguably should) inform the methodologies of information and computer science. We are situated to undertake our explorations of Korea with the knowledge that our old philological tools will serve us well in our age of algorithms and artificial intelligence since these algorithms and intelligences are formulated by copies and the socio-mechanical process that produce them, things and processes we know intimately even if they are not top of mind but serve as infrastructure for our work and the ways that we know Korea.

Indeed, philology in a “new key” with a bibliographical bent may provide a useful framework for exploring the new scholarly horizons brought into view by our evolving relationship with what represents Korea, i.e. what presents Korea again. Where philology can be thought of as the “multifaceted study of texts, languages and the phenomenon of language itself,”[13] philology in a new key suggests “procedures for investigating the ‘implicate order’ of human memory and its material representations.”[14] The bibliographical bent entails globally diverse and historically informed recursive practices of enumeration, description, analysis, and critique[15] brought to bear on human memory and its material representations, i.e. on what has been cared enough about to be made available elsewhere through copies. The recursive nature of bibliography, of counting and accounting for what count as meaningful representations in the contexts provided by Korea, can help us to be better informed about what and how we are learning. Analytical bibliography’s attention to mechanical processes of reproduction can be leveraged to enable even deeper engagements with the mechanical processes of digital copying, artificial intelligence, and the ways that both shape learning. Critical bibliography provides a richly contentious discourse about the ways that we should think about what should be copied and how. It is a discourse that makes apparent just how fraught and consequential decisions are when they concern who or what will be made available elsewhere and to the future through reproductions. Philology in a new key with a bibliographical bent presents Koreanists an opportunity to learn how and more deeply. As I hope to have intimated with my description of work done with my colleague at the National Library of Korea, a future in which Koreanists more self-consciously investigate their philological commitments through bibliographical investigations will likely facilitate the creation of powerful forms of artificial intelligence relevant to the study of Korea and, most importantly, more deeply engaging human ways to learn more about Korea.


 

 

Annotation


[1]Apropos of one of its central themes, this essay reproduces pieces and parts of previous articles. A previous version of this essay initially appeared as “What Counts as Deep Learning in Korean Studies? Korean Studies 47 (2023): 300–311. I would like to thank the Editorial Board of Korean Studies for the use of my article "Epilogue: What Counts as Deep Learning in Korean Studies?" published in the journal's 2023 Special Section on Digital Korean Studies. Versions of pieces and parts of the essay have also appeared in several other publications. Rather than creating numerous citations to previous work, I wish to acknowledge and thank the editors and organizers of the following conferences and publications for the opportunity to present my work in their venues: Wayne de Fremery, “Teaching Computers to Read Korean: Big Data and Artificial Intelligence at Adan Mun’go,” Muncha wa sasang 3 (2018): 107–115; Wayne de Fremery, “Twenty-First-Century Pleasures: Some Notes on Form, Media Transformations, and Korean Literary Translation,” Translation Review 108 (2021): 78–103; Wayne de Fremery, “Opportunities for Deep Learning: Early-to-mid Twentieth-Century Korean Periodicals” presented at the “New Perspectives on the History of Books and Reading in Korea” conference, Harvard University, December 8, 2022; Wayne de Fremery, Cats, Carpenters, and Accountants: Bibliographical Foundations of Information Science (Cambridge Massachusetts and London England: MIT Press, 2024); Wayne de Fremery, “Comparative Global—Digital—Humanities,” History of Humanities 9 no. 1 (2024): 115–128; Wayne de Fremery and Michael Buckland, “Copy Theory,” Journal for the Association of Information Science and Technology 73, no. 3 (2022): 407–418.

[2] For definitions of bibliography, see de Fremery, Cats, Carpenters, and Accountants.

[3] See Benedict Anderson, Imagined Communities: Reflections on the Origins and Spread of Nationalism, 2nd ed. (New York and London: Verso, 2006)

[4] See Wayne de Fremery and Michael Buckland, “Copy Theory” and de Fremery, “Twenty-First-Century Pleasures: Some Notes on Form, Media Transformations, and Korean Literary Translation.”

[5] Tefko Saracevic, “Information Science,” in Encyclopedia of Library and Information Science, 4th ed., eds. John McDonald and Michael Levine-Clark (Boca Raton: CRC Press, 2018), 2216.

[6] Infrastructure, according to Star, is embedded and transparent. It has “reach or scope” and is “learned as part of membership.” It has “links with conventions of practice.” It facilitates standards but is also an “embodiment of standards.” It is built on what Star calls “an installed base” and becomes “visible upon breakdown.” It can be fixed in “modular increments,” but “not all at once or globally.” Geoffrey Bowker and Susan Leigh Star, Sorting Things Out: Classification and Its Consequences (Cambridge, MA and London, England: MIT Press, 1999), loc. 572 of 4690, Kindle, citing Susan Leigh Star and Karen Ruhleder, “Steps Toward an Ecology of Infrastructure: Design and Access for Large Information Spaces,” in Information Systems Research 7 (1996): 111–134.

[7] Mariya Yao, Adelyn Zhou, and Marlene Jia, Applied Artificial Intelligence: A Handbook for Business Leaders (NP, Topbots, 2018), 8.

[8] Mariya Yao, Adelyn Zhou, and Marlene Jia, Applied Artificial Intelligence, 8.

[9] John D. Kelleher and Brendan Tierney, Data Science (Cambridge, MA and London, England: MIT Press, 2018), 14. Kindle.

[10] John D. Kelleher, Brian Mac Namee, and Aoife D'Arcy, Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies (Cambridge, Massachusetts and London, England: MIT Press, 2015), locs. 475476 of 13053. Kindle.

[11] See Wayne de Fremery et al., Han’gukhyŏng ingong chinŭng kwanghak muncha insik (AI OCR) palchŏn ŭihan yŏn’gu 한국형 인공지능 광학적문자인식(AI OCR) 발전을 위한 연구 (Toward the development of a Korean AI optical character recognition system)  (Seoul: National Library of Korea, 2021).

[12] de Fremery et al., Han’gukhyŏng ingong chinŭng kwanghak muncha insik (AI OCR) palchŏn ŭihan yŏn’gu and de Fremery, “Opportunities for Deep Learning: Early-to-mid Twentieth-Century Korean Periodicals.”

[13] James Turner, Philology: The Forgotten Origins of the Modern Humanities (Princeton: Princeton University Press, 2014), loc. 115 of 19247, Kindle.

[14] Jerome McGann, A New Republic of Letters: Memory and Scholarship in the Age of Digital Reproduction (Cambridge, MA and London, England: Harvard University Press, 2014), 3. Kindle

[15] For a description of these key elements of bibliography see Wayne de Fremery, Cats, Carpenters, and Accountants: Bibliographical Foundations of Information Science, Boston and London: MIT Press, 2024.


 

Works Referenced


Anderson, Benedict. Imagined Communities: Reflections on the Origins and Spread of Nationalism, 2nd ed. New York and London: Verso, 2006.

Bowker, Geoffrey, and Susan Leigh Star. Sorting Things Out: Classification and Its Consequences. Cambridge, Massachusetts and London, England: MIT Press, 1999. Kindle.

Broussard, Meredith. Artificial Unintelligence: How Computers Misunderstand the World.

Cambridge, Massachusetts and London, England: MIT Press, 2018. Kindle.

de Fremery, Wayne. “Teaching Computers to Read Korean: Big Data and Artificial Intelligence at Adan Mun’go.” Muncha wa Sasang 3 (2018): 107–15.

de Fremery, Wayne. “Twenty-First-Century Pleasures: Some Notes on Form, Media Transformations, and Korean Literary Translation.” Translation Review 108 (2021): 78–103.

de Fremery, Wayne. “Opportunities for Deep Learning: Early-to-mid Twentieth-Century Korean Periodicals.” Paper presented at the “New Perspectives on the History of Books and Reading in Korea” conference, Harvard University, December 8, 2022.

de Fremery, Wayne. “Comparative Global—Digital—Humanities.” History of Humanities 9 no. 1 (2024): 115–128

de Fremery, Wayne. Cats, Carpenters, and Accountants: Bibliographical Foundations of Information Science. Cambridge, Massachusetts and London, England: MIT Press, 2024.

de Fremery, Wayne, and Michael Buckland. “Copy Theory.” Journal for the Association of Information Science and Technology 73, no. 3 (2022): 407–418.

de Fremery, Wayne, et al., Han’gukhyŏng ingong chin˘ung kwanghak muncha insik (AI OCR) palchŏnŭihan yŏn’gu 한국형 인공지능 광학적문자인식 (AI OCR) 발전을 위한 연구 (Toward the development of a Korean AI optical character recognition system). Seoul: National Library of Korea, 2021.

Kelleher, John D., and Brendan Tierney. Data Science. Cambridge, MA and London, England: MIT Press, 2018. Kindle.

Kelleher, John D., Brian Mac Namee, and Aoife D’Arcy. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies. Cambridge, Massachusetts and London, England: MIT Press, 2015. Kindle.

McGann, Jerome. A New Republic of Letters: Memory and Scholarship in the Age of Digital Reproduction. Cambridge, MA and London, England: Harvard University Press, 2014. Kindle.

Saracevic, Tefko. “Information Science.” In Encyclopedia of Library and Information Science, 4th ed., edited by John McDonald and Michael Levine-Clark, 3105–3116. Boca Raton: CRC Press, 2018.

Shannon, Claude E. “A Mathematical Theory of Communication.” The Bell System Technical Journal (July and October, 1948): 379–423, 623–656, https://archive.org/details/ bellsystemtechni27amerrich/page/n9

Star, Susan Leigh, and Karen Ruhleder. “Steps Toward an Ecology of Infrastructure: Design and Access for Large Information Spaces.” Information Systems Research 7 (1996): 111–34.

Turner, James. Philology: The Forgotten Origins of the Modern Humanities. Princeton: Princeton University Press, 2014. Kindle.

Yao, Mariya, Adelyn Zhou, and Marlene Jia. Applied Artificial Intelligence: A Handbook for Business Leaders. NP, Topbots, 2018.


Professor | Dominican University of California

조회수 51회댓글 0개

최근 게시물

전체 보기

Comments


bottom of page