Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 205399

Import and transform multiple xml files into DF

$
0
0

I am trying to create a routine for importing numerous xml files from a given directory. Possibly I will have to import over a thousand xml files at one time and turn them into dataframe. I have already created the import routine from a single file:

require(tidyverse)
require(xml2)
setwd("D:/")
page<- read_xml("base.xml")
ns<- page %>% xml_find_all(".//test:billing")
billing<-xml2::as_list(ns) %>% jsonlite::toJSON() %>% jsonlite::fromJSON()

My exemple xml:

<?xml version="1.0" encoding="ISO-8859-1" ?>


<test:TASS xmlns="http://www.vvv.com/schemas"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://www.vvv.com/schemas http://www.vvv.com/schemas/testV2_02_03.xsd"  xmlns:test="http://www.vvv.com/schemas">
    <test:house>
                <test:billing>
                    <test:proceduresummary>
                        <test:guidenumber>X2030</test:guidenumber>
                            <test:diagnosis>
                                <test:table>ICD-10</test:table>
                                <test:diagnosiscod>J441</test:diagnosiscod>
                                <test:description>CHRONIC OBSTRUCTIVE PULMONARY DISEASE WITH (ACUTE) EXACERBATION</test:description>
                            </test:diagnosis>
                            <test:procedure>
                                <test:procedure>
                                    <test:description>HOSPITAL</test:description>
                                </test:procedure>
                                <test:amount>12</test:amount>
                            </test:procedure>
                    </test:proceduresummary>
                </test:billing>
                    <test:billing>
                    <test:proceduresummary>
                        <test:guidenumber>Y6055</test:guidenumber>
                            <test:diagnosis>
                                <test:table>ICD-10</test:table>
                                <test:diagnosiscod>I21</test:diagnosiscod>
                                <test:description>ACUTE MYOCARDIAL INFARCTION</test:description>
                            </test:diagnosis>
                            <test:procedure>
                                <test:procedure>
                                    <test:description>HOSPITAL</test:description>
                                </test:procedure>
                                <test:amount>8</test:amount>
                            </test:procedure>
                    </test:proceduresummary>
                </test:billing>
                    <test:billing>
                    <test:proceduresummary>
                        <test:guidenumber>Z9088</test:guidenumber>
                            <test:diagnosis>
                                <test:table>ICD-10</test:table>
                                <test:diagnosiscod>F20</test:diagnosiscod>
                                <test:description>SCHIZOPHRENIA</test:description>
                            </test:diagnosis>
                            <test:procedure>
                                <test:procedure>
                                    <test:description>HOSPITAL</test:description>
                                </test:procedure>
                                <test:amount>1</test:amount>
                            </test:procedure>
                    </test:proceduresummary>
                </test:billing>
    </test:house>
</test:TASS>

All files in the directory ("D: /") have a name pattern, for example:

20215_ABFF20.xml
35700_38HY9R.xml
38597_40YY9J.xml
99853_99PP1Z.xml
115341_663QQP.xml

However, not all files in this directory (“D: /”) will be imported, just a few that I want. Unfortunately, the names of the files I want to import are set to a different default than the ones in the directory (“D: /”). For example, the files I am interested in importing are named ABFF20 and 38HY9R. See that their names are the same after the underscore sign

enter image description here

The first step I tried was to identify all the files in the directory (“D: /”), I did this:

require(tidyverse)
require(xml2)
setwd("D:/")
files <- list.files(pattern = ".xml$")

My second step would be to select only the files that interest me (ABFF20 and 38HY9R), but I couldn't make that code. And finally, the third step would be to turn the two files into DF, according to the code described above:

ns<- page %>% xml_find_all(".//test:billing")
billing<-xml2::as_list(ns) %>% jsonlite::toJSON() %>% jsonlite::fromJSON()

In this third step I thought about using for, but I couldn't, because I stopped in step 2.

Anyway, is it possible to quickly import and transform multiple xml files into DF?


Viewing all articles
Browse latest Browse all 205399

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>