Uploading Tab Delimited Text Data Into R

Reading, Wrangling and Writing the Tab delimited Input Data to a Text file in R
Suppose nosotros have a Tab delimited Text Input file in below location
> filepath<-"T:/My_Tech_Lab/R_Lab/Dataframes/SalesByRegion.txt"


If we  observe the information in the input file, we notice that the data has headers , delimiter(separator) is tab("\t") and the Net_Sales column(Vector) having the values with comma(,) decimal.
Now we can read this files using either of the post-obit line of code.
>mydata<-read.table(filepath,header = TRUE,sep ="\t",dec=",",numerals="warn.loss",every bit.is = TRUE,stringsAsFactors =Faux)

OR

>mydata<-read.csv(filepath,header = True,sep ="\t",dec=",",numerals="warn.loss",as.is = True,stringsAsFactors =FALSE)

OR

>mydata<-read.csv2(filepath,header = True,sep ="\t",december=",",numerals="warn.loss",as.is = Truthful,stringsAsFactors =False)


here , as.is=True property allows to read the data as it is from source, and Strings will exist read as strings instead of Factors.

stringsAsFactors =False , allows to read the strings equally strings instead of Factors. This holding will be overridden by as.is=FALSE holding.

The input in the dataframe is as follows :

> mydata

The properties of the information frame is as follows..

In the in a higher place data(Observations), for the fields(Variables) we got the "Zippo", "NA", "na", "" equally values. Nosotros can get to know in which rows they are appearing by checking as follows..

> which(is.na(mydata$Net_Sales))
[1]  3 15 25
> which(mydata$Net_Sales=="na")
[one]  vii 30
> which(mydata$Net_Sales=="Zilch")
[ane] eighteen 22

We can convert all this type of values to NA while reading from the File equally follows..

 mydata<-read.csv(filepath,header = True,sep ="\t",dec=",",numerals="warn.loss",as.is = Truthful,stringsAsFactors =FALSE,na.strings = c("Cipher","NA","na",""))

Now the input dataframe is equally follows :
> mydata

Notes :

Later on reading the data as well, you can replace/encode the values to NA using the ifelse as follows...

> mydata$Prod_Name<- ifelse (mydata$Prod_Name=="NULL"|mydata$Prod_Name=="na"|mydata$Prod_Name=="",

NA ,mydata$Prod_Name)

Nosotros can know the count of NA values by each column in the unabridged dataframe every bit follows..

> colSums(is.na(mydata)) SalesOrder_Id Order_Date Region_Name Cust_Segment Prod_Name Net_Sales
0 0 0 0 3 11

> sum(is.na(mydata))

[1] 14

> sum(is.na(mydata$Net_Sales))
[1] 11

Type Conversion for a Variable(Vector) "Net_Sales" :

Since the variable "Net_Sales" having the values with Comma, they readed as grapheme from the input file.

> typeof(mydata$Net_Sales)
[i] "character"
> class(mydata$Net_Sales)
[1] "character"

At present we will covert this Colum to Numeric, so that nosotros can perform whatsoever calculations on it.

>mydata$Net_Sales<-as.numeric(sub(",", "", mydata$Net_Sales, fixed = TRUE))

If we have 2 Commas in the data.. then apply the below..

> mydata$Net_Sales<- equally.numeric(sub(",","",sub(",", "", mydata$Net_Sales)))

Warning message:

NAs introduced by compulsion

The warning message thrown because some of the missing values are there in the observations, those volition be not converted.

After conversion, the observations of that variables is follows..



Now we tin can calculate the Mean values for that variable, past ignoring the "NA" as follows..

> Avg<- mean(mydata$Net_Sales,na.rm = True)
> Avg
[i] 345912.2


Notes :
--na.rm will ignore the NA values from the hateful calculation.
--R will automatically ignores the Null values but not NA

Encoding NAs with a Hateful values for a Variable(Vector) "Net_Sales" :

If we desire to supersede the NAs with the Mean values for the observations in the Variable "Net_Sales", nosotros can do that encoding as follows..

> mydata$Net_Sales[is.na(mydata$Net_Sales)] <- mean(mydata$Net_Sales, na.rm = TRUE)

At present the dataframe is looks as follows..

Notes :
If you wants to calculate the Mean by Trimming(excluding) some values in the variable, we tin do it as follows.. > TrimAvg<-hateful(mydata$Net_Sales,trim=0.v,na.rm = Truthful)

here the trim=0.v will exclude/ignores(autonomously from NA, if NA exists in that v) the 5 values in beginning and v values at final in the Observations of the variable Net_Sales.
> TrimAvg

[1] 67500

> Avg<-hateful(mydata$Net_Sales,na.rm = Truthful)
> Avg
[one] 345912.ii


If you wants to calculate the Mode value, in that location is no direct function in R but we can summate by defining a function as follows..

>modeval <- office(ten) {
uniqv <- na.omit(unique(x))
uniqv[which.max(tabulate(lucifer(ten, uniqv)))]

} >modeval(mydata$Net_Sales)

Adding a New Variable "Sales_Rating" into the Dataset :

Now we volition add together a new Variable "Sales_Rating" into the Dataset which calculated based on the "Net_Sales" variable as follows..
>mydata<-inside ( mydata,

          {
Sales_Rating <- NA
              Sales_Rating[Net_Sales>100000]<-"Meliorate"
              Sales_Rating[Net_Sales>=50000 & Net_Sales<=100000 ]<-"Skillful"
              Sales_Rating[Net_Sales<50000]<-"Poor"
          }
)


Note :
The within() function is similar to with() function but it allows y'all to modify the dataframe.
Output :

Boosted Notes :
If y'all desire to set only specific no.of rows from the input file, utilize the following code :
> mydata<-read.csv(filepath,header = True,sep ="\t",dec=",",numerals="warn.loss",as.is = Truthful,stringsAsFactors =FALSE,na.strings = c("NULL","NA","na",""),nrows=10)

if yous want to run into the beginning few rows of the data prepare :
>head(mydata)
if you want to see the last few rows of the data set :
>tail(mydata)

if you lot desire to see beginning or middle 10 rows from all the columns of the information set :
>mydata[1:10,]
>mydata[10:twenty,]

if you want to see kickoff or middle 10 rows from 3 the columns of the information ready :

> mydata[1:10,1:3]
> mydata[x:20,1:3]


Writing Dataset to an Output file :

Nosotros can write the information from Dataset "mydata" to various output destinations like .csv, .txt using the write.table() function.

Output path :

>opath<- "T:/My_Tech_Lab/R_Lab/Dataframes/Output/"

#Writing to .csv file with comma(,) every bit delimiter :

>write.table(mydata, file =paste(opath,"comOutput.csv",sep=""), append = FALSE, quote = FALSE, sep = ",", eol = "\n", na = "NA", dec = ".", row.names = FALSE, col.names = Truthful, qmethod = c("escape", "double"),fileEncoding = "")

Output :

Notes :

Next time if we writing the Output to the same destination then don't forget set up the properties as follows..

append = TRUE, col.names = False

#Writing to .csv file with TAB(\t) as delimiter :

>write.table(mydata, file =paste(opath,"tabOutput.csv",sep=""), append = FALSE, quote = FALSE, sep = "\t", eol = "\n", na = "NA", dec = ".", row.names = Imitation, col.names = True, qmethod = c("escape", "double"),fileEncoding = "")

Output:

Note :

In case of tab delimited output, information technology is written to the 1 column in .csv file, with \t tab delimited format. Nosotros can split up the columns using the excel Text to Columns method.

#Writing to .txt file with Comma(,) as delimiter :

>write.table(mydata, file =paste(opath,"com_Output.txt",sep=""), suspend = Imitation, quote = FALSE, sep = ",", eol = "\n", na = "NA", dec = ".", row.names = False, col.names = TRUE, qmethod = c("escape", "double"),fileEncoding = "")

Output :

#Writing to .txt file with TAB(\t) as delimiter :

>write.table(mydata, file =paste(opath,"tab_Output.txt",sep=""), append = FALSE, quote = FALSE, sep = "\t", eol = "\north", na = "NA", dec = ".", row.names = FALSE, col.names = TRUE, qmethod = c("escape", "double"),fileEncoding = "")

Output:



--------------------------------------------------------------------------------------------------------
Cheers, TAMATAM ; Business Intelligence & Analytics Professional
--------------------------------------------------------------------------------------------------------


armstrongsularoat.blogspot.com

Source: https://excelkingdom.blogspot.com/2018/11/how-to-read-tab-delimited-input-data.html

0 Response to "Uploading Tab Delimited Text Data Into R"

Enregistrer un commentaire

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel