In this article, we will see how we can download an XML file from a given URL
and then access its elements.
This can be used in many cases, for example, scraping data from a site which has data in the XML file or multiple files.
The article is pretty short and most of it is just self-explanatory code.
Also nowadays, you can get a lot of freelance jobs in data scrapping as data is the new oil today so this might be helpful.
So let's begin with the code,
var fs = require('fs')
var https = require('https');//For accessing https url we need this module instead of http.
var xml2js = require('xml2js');//Required for xml parsing.
var file_name = 'data.xml'//This will be the name of file we will be generating.
var DOWNLOAD_DIR =__dirname+'/';
//This function reads data from URL and writes data into new file
//with respect to the given name and directory path.
function download(){
var file_url='https://www.w3schools.com/xml/note.xml'
var file =
fs.createWriteStream(DOWNLOAD_DIR +file_name,{'flags': 'w'});
const request = https.get(file_url, function(response) {
response.pipe(file);
});
}
//This function reads data from the XML file and parses it into JSON
//format to access its elements.
function read(){
var fileData = fs.readFileSync(file_name, 'ascii');
var parser = new xml2js.Parser();
parser.parseString(fileData.substring(0, fileData.length),
function (err, result) {
console.log(result)//Here you will get data in json format.
});
}
Note:
1)In Node.js, __dirname is always the directory in which the currently executing script resides.So if you typed __dirname into /A1/A2/script.js, the value would be /A1/A2.
2)The pipe() function reads data from a readable stream as it becomes available and writes it to a destination writable stream. In our code variable file is the writable stream and response is the readable stream. This is also a most asked interview question for node.js developer profile.
You Might Like:
Node.js interview questions set 1
Node JS: Understanding bin in package.json.
Node.js: Bundling your Node.js application to a single executable for Windows.
Node.js: Extract text from image using Tesseract.
Nice Surabh Sir
ReplyDelete