How To work with Real Data.

As I am doing a course on data science with python first thing i learned is how to use real data. Before we can do any data science work on those files or set of data, we have to know how to access the data in all its myriad forms.

Data Scientists call the columns in a database as variables/features and rows are the cases. Here Each row represents a collection of variables that you can analyze.

1) Uploading Small amount of Data – For data first we can use the sklearn.datasets which has small toy datasets to learn. To extract the file we have to use the open method which obtains a file object. The open function accepts the filename as an access mode. At last to read the file, read method is used.
Example – filename is “colors.txt” and mode is read binary – rb .

with open ("colors.txt" , 'rb') as access_file:

Read Method – If we specify the size argument as part of read such as read(15), python would read only the number of characters that you specify or stop when it reaches the end of file (eof).

2) Streaming large amount of data into memory stream data (to work with little data)
Example –

with open("colors.txt", 'rb')as access_file:
    for observation in access_file:
        print('Reading data : ' + observation)

As the code performs, data reads in the for loop, the file pointer moves to the next record. Each record appears one at a time in observation.


